Movie Posters’ Classiﬁcation into Multiple Genres

—Our project intends to classify movies into the three most probable genres that they belong to, from a predeﬁned set of 25 genres, based on only one image i.e the movie poster. We have made use of Convolutional Neural Networks (CNN) to realize this project as we believe it would be of help to extract the features and visual information from the image. Instead of a multi-class classiﬁcation problem in which the input is classiﬁed into any one class, this project would be more correctly described as a multi-label classiﬁcation problem as a movie belongs to more than one genre. In this project we see a comparative study of different architectures and tune them to yield the best result based on the metric of accuracy. We have applied various techniques such as data augmentation and L2 regularization to comparatively deduce the model that performs best from all the tested models.


I. INTRODUCTION
In this problem we have used Multi-label classification and Convolutional Neural Networks for Genres based classification of Movie Posters. Multi-label classification originated from the investigation of text categorisation problem, where each document may belong to many pre-defined topics simultaneously. Multi-label classification of textual data is a crucial problem. Examples range from news articles to emails. In multi-label classification, the training set consists of instances that are labelled from a group of labels and the label for a unseen instance needs to be rightly predicted by analyzing the training instances that the model has been trained on with known label sets. In machine learning, Convolutional Neural Networks (CNN or ConvNet) are complex feed forward neural networks. CNNs are used for image classification and recognition due to its high accuracy. It was proposed by scientist Yann LeCun within the late 90s, when he was inspired from the human beholding of recognizing things. The CNN follows a hierarchical model which works on building a network, sort of a funnel, and eventually gives out a fully-connected layer where all the neurons are connected to each other and the output is processed.

A. Objective
The Objectives of this project is to categorize a movie according to the given poster into predetermined genres with the use of CNN. This classification would able the user to make informed decisions by providing the genres of the movies based on the features present in the poster. The proper classification of movies will enable seamless process of categorizing the movies according to the previous preferences of the user. Humans are able to recognize certain elements in a movie to classify it into genres and to automate this process CNN can be used to compare the posters present in the dataset and accordingly assign the relevant genres to the movie.

B. Motivation
The motivation behind this problem statement is to let the user identify if they would be interested in a particular movie by classifying the movie into multiple genres by identifying certain features from the poster of the said movie. This would be helpful to the user as the poster of the movie gets released up-to an year in advance, as opposed to the trailers which are released much later. The model would look for specific features in a movie poster that are predominantly present in a poster belonging to that specific genre, this would serve as a tool to notify the user about an upcoming movie that they might be interested in.
II. LITERATURE SURVEY 1) Binary classification and Multi-Class classification have been highly discussed topics but a lot of newer technology and research has seen the development of Multi-Label classification. This has also highlighted the problems which come along with this classification which were previously not known in binary or multi-class classification.  (2020) proposed a method to extract the optimal information and characteristics from movie posters to aid the classification of movies into genres calling it the Gram layer in a convolutional neural network (CNN). It initially extracts style features by first applying Gram matrix onto the feature map of the movie poster. This is used as style weight, the existing feature map is used along with style information for performing inference. For their study, a total of 20,764 movie posters were collected and 12 genres were defined.
[7] 8) An extensive survey on modern architectures of Deep Convolutional Neural Networks was done by Asifullah Khan, Anabia Sohail, Umme Zahoora I& Aqsa Saeed Qureshi (2020). While it has become common knowledge in the deep learning community that deeper neural networks result in higher accuracy, this has also given rise to problems like disappearing gradient problems which occur in which more layers cause the loss function gradient to approach zero, making it difficult to train the network.
[8] 9) The latest approach to solving multi-label classification problems was brought up by the work of Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter and Lihi Zelnik-Manor (2020). In their study, they have discussed the most common problem associated with multi label classification, the high imbalance in the positive vs negative labels for a single input(picture). This problem, dubbed a the Positive-Negative imbalance, takes up most of the time in optimization process and if not taken care of, can result in under-emphasizing gradients from positive labels which in turn will result in lower accuracy. [9] A. Limitations of Existing System 1) The paper by Purvi Prajapati, Amit Thakkar, Amit Ganatra (2012) [1] presented a research of problem transformation vs algorithm adaption methods to solve multi label classification problems. One of the challenges stated by them in this field is the preprocessing of data that is required before the machine learning algorithms are applied. These techniques include but are not limited to pruning, feature selection and handling missing values to increase the performance of such problems. 2) According to Ivasic-Kos, Pobar and Ipsic's [2] comparative study, the Naïve Bayes classifier yielded the best result when considering the F1 score as the parameter of judgement, in which Naïve Bayes' F1 score was 0.38. Their application of GIST image descriptor along with classic machine learning algorithms can be considered outdated as per modern standards as neural network architectures have proved to outperform machine learning algorithms when it comes to large datasets especially the ones which involve image classification. 3) Chu and Guo's [3] study showcases the effectiveness of incorporating object information in neural networks using the YOLO v2 pre-trained object detection module which works quite well in detecting animals, which often appear in animation movies. Although their improvement over previous works has been promising, the neural network architecture they used (AlexNet) is an early one (2012) which is not in the trend nowadays. 4) Although the work done by Jonatas Wehrmann and Rodrigo C. Barros (2017) [4] showed very promising results especially when compared to low level features and improving accuracy on certain genres, their usage of movie trailers as their data set is not only computationally heavy but also redundant because by the time the trailers are released, the genres of the movies are already made very clear. 5) Sung and Chokshi's [5] study was an approach to find the optimum model for identifying genres and led to the comparative study of the THEN promising new models among which:DenseNet-169 was able to generate the best results. They can be still improved. They propose that the improvement can come through various techniques They forecasted that this improvement could come through various techniques, some being, the use of an object identifier algorithm such as YOLO, collection of better data, and application of other convolutional neural network (CNN) architectures to the problem. 6) Barney and Kaya (2019) [6] used the largest dataset and after preprocessing, it yielded a total of 35,000 movies belonging to 20 different genres. Possible improvement is using more balanced data. As the dataset that is used for the model is imbalanced, data augmentation can be used to increase the data the model can train from. A mixture of undersampling and oversampling can be applied respectively to the classes that have less data and the classes that have greater number of entries.

III. GOALS
• The project aims at classifying the movies into appropriate genre based only on the movie poster of the said movie by performing Multi-Label Classification and Deep Learning to provide an automated way of categorizing the movies into specific genres in order to accommodate the viewing preferences of users consuming the content. • The Convolutional Neural Network has been identified as the Deep learning model for processing the images of movie posters. • The dataset of 7242 movies has been used for this problem where these movies are spread across 25 different genres viz. Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, Horror, Music, Musical, Mystery, News, Reality-TV, Romance, Sci-Fi, Short, Sport, Thriller, War, Western. • We propose to classify the movie poster into specific categories of genres from the 25 genres that are pre-defined. The movie would be categorized into different top three genres based on the probability of them belonging to the said genres. This classification would enable the user to make an informed decision regarding their interest in watching the movie when it releases. These classifications of movie posters into genres would be useful as the posters for various movies are released months before the trailers are released. • The method incorporated for this project is deep learning.
We have used the CNN which is widely used for processing images and recognizing patterns from these images. The first component of CNN called the convolution layer is for feature extraction, specifies the number of filters or feature detectors that make a feature map based on the feature that match with the filters and add a higher number in the feature map in such a place. The second component is the Max Pool where the size of the feature map is reduced while preserving the information about the features. The third component called Flattening transforms a two-dimensional matrix into a vector that can be fed into a fully connected layer. The fourth and final component before the output layer is the fully connected layer. In this layer as the name suggests every neuron is connected to every other neuron from the previous layer.
As the genres can be classified into from a total of 25 the output layer has 25 nodes.

A. Scope
• Movie Poster Classification is a multi-label classification problem with a large movie dataset the aim is to achieve multiple movie genre classification based only on movie posters images. • For movie-goers, movie posters are among the very first impressions used to get an idea about the content of the movie and its what genres it could possibly be. Humans can get ideas based on things like color, objects, expressions on the faces of objects etc to quickly determine the genre (horror,comedy,animation,etc). • We can assume that the poster possesses some characteristics which might be utilized in deep learning algorithms to predict its genres.

IV. DESIGN OF THE SYSTEM
The system has been developed using TensorFlow. The Convolutional Neural Network has been identified as the Deep learning model for processing the images of movie posters. The dataset of 7242 movies has been used for this problem where these movies are spread across 25 different genres viz. Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, Horror, Music, Musical, Mystery, News, Reality-TV, Romance, Sci-Fi, Short, Sport, Thriller, War, Western.
The system takes an input in the form of an image of a movie poster and this image is passed through a CNN model to make the appropriate inference of the genre of the movie.
[ Figure 1]    The baseline model, Model 1[ Figure 4], and the best performing model, Model 2[ Figure 5], are compared through graphs for cross entropy loss and accuracy. The inference from the model is given below where the most probable three genres are predicted. These genres are compared with actual listed genres of the movie.

VII. CONCLUSION
With the results shown and on comparing the accuracies of different models, we can see that deeper and more complex neural network architectures tend to give significantly better results as compared to simpler architectures. After a certain point, the accuracies become stagnant and even different regularization techniques do not give improved results in terms of the metric of accuracy, although the loss is diminished by them. The regularization techniques also helped prevent overfitting of the model. The dataset too, as of now, is not very robust and is heavily partial towards the Drama, Comedy and Romance genres. Especially the genres, Comedy (2900 entries) and Drama (3619 entries) in a dataset of 7242. As is evident by the previous works, the imbalanced dataset problem has been common. Since our problem statement is a multilabel classification problem, classic methods of over-sampling or under-sampling of dataset, which works well for multi-class problems, will not provide us with desired results here. So in future, a well balanced dataset is sure to give much better results on our model.