0% found this document useful (0 votes)
20 views6 pages

Predicting Movie Ratings With Multimodal Data: Yichen Yang Ruoyun Ma Min Haeng Cho

This document discusses predicting movie ratings using multimodal data from posters, synopses and other metadata available prior to theatrical release. The authors apply various machine learning algorithms including linear regression, decision trees, random forests, support vector regression and neural networks to predict ratings from visual features extracted from posters and textual features from synopses. They conduct an ablation study to evaluate the predictive power of different features. The goal is to accurately predict movie ratings before release to provide recommendations to consumers and inform film promotion strategies for producers.

Uploaded by

Skaminur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Predicting Movie Ratings With Multimodal Data: Yichen Yang Ruoyun Ma Min Haeng Cho

This document discusses predicting movie ratings using multimodal data from posters, synopses and other metadata available prior to theatrical release. The authors apply various machine learning algorithms including linear regression, decision trees, random forests, support vector regression and neural networks to predict ratings from visual features extracted from posters and textual features from synopses. They conduct an ablation study to evaluate the predictive power of different features. The goal is to accurately predict movie ratings before release to provide recommendations to consumers and inform film promotion strategies for producers.

Uploaded by

Skaminur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Predicting Movie Ratings with Multimodal Data

Yichen Yang Ruoyun Ma Min Haeng Cho


[email protected] [email protected] [email protected]

1. Introduction test for the best algorithm to predict user ratings for
films streamed on the platform, researchers utilized
Can we predict the success of a movie based on in-
singular value decomposition (SVD) to predict users’
formation about the film available prior to theatrical
movie ratings based on their previous ratings history
release? To answer this question, we use open-source
[1]. By utilizing a Bayesian approach, they effectively
multimodal data released by IMDB and TMDB, such
mitigated overfitting in SVD. In a different context,
as movie posters and synopses. By applying image
researchers utilized both surface and textual features
processing to visual data extracted from posters and
such as the number of tweets as well as Youtube com-
natural language processing to textual data, we aim to
ments for each film to predict ratings [2]. Despite
predict movie ratings.
strong predictive performance, the model cannot pre-
Although predicting movie ratings has been an ac-
dict movie ratings prior to release, since features from
tive research area, a rather unexplored avenue of re-
social media are extracted only after theatrical release,
search in this application is the exclusive use of infor-
and is thus limited. Others developed a supervised la-
mation available prior to theatrical release as features
tent Dirichlet allocation model (LDA), shown to have
to predict movie ratings. Moviegoers base their deci-
more predictive power than unsupervised LDA, to pre-
sions whether to watch a pre-released movie on lim-
dict ratings from movie reviews [3]. Others used data
ited information about the film, such as posters, syn-
mining to build a model on interesting relations be-
opsis, genre, cast and crew. Therefore, it behooves re-
tween different attributes and assign a weight for each
searchers to ask whether it is possible to predict movie
feature in every movie, enhancing prediction accuracy
ratings by only using information available prior to
[4]. Most recently, researchers extracted and utilized
theatrical release as features, and if so, which features
visual features from movie trailers to predict ratings.
have the strongest predictive power. To predict movie
[5]. Although results are preliminary, the approach
ratings, we conduct an ablation study of various visual
is novel and we thus adopt this method to extract vi-
and textual features and evaluate their performance in
sual features from posters. Indeed, research in predict-
the prediction accuracy. Specifically, we use linear
ing movie ratings using state-of-the-art techniques and
and ridge regression, decision trees and random forest,
various features has been active and ongoing.
SVR and neural networks as input algorithms.
Our motivation for exploring this question is
3. Dataset and Features
twofold. First, we aim to provide consumers with film
recommendations by predicting movie ratings accu- We utilized open-source data, “Movie Genre from
rately prior to theatrical release. Second, we hope to its Poster Dataset” [6] and “The Movie Dataset” [7],
offer insight on the determining factors of film ratings from Kaggle. We selected posters, synopses, cast,
that will guide producers through film promotion. crew, runtime and genre as input features and IMDB
film ratings as the prediction objective.
2. Related Work
We split features into three categories: images
Previous research has shown remarkable interest (posters), text (synopses), and others (cast, crew,
and advances in predicting movie ratings. As part genre, and runtime). For posters, we transformed the
of the 2009 Netflix Prize competition, an open con- pixel dimensions from 900x600 to the input resolu-

1
Table 1. Processed Data by Groups correlations among features, and efficiently improves
Data Type Dimension Example model prediction accuracy.
Genre Categorical 23 Action
Runtime Numerical 1 100 (minutes)
4.2. Decision Trees and Random Forest
Actors Categorical 1603 Robert Downey Jr. Since a nonlinear model could provide a better fit
Director Categorical 492 Steven Spielberg to our data, we also experimented with decision trees
Poster Numerical 13 Number of faces = 1
and random forest, an ensemble learning method that
Synopses Categorical 3884 “innocence”
aggregates outputs from a multitude of decision trees.
A decision tree contains internal nodes corresponding
tion 224x224 for ResNet34. Besides feeding pixels to input features, and leaves, which represent the out-
into our model, we manually extracted 13 visual fea- put value following certain paths from the root. Ran-
tures. Specifically, we considered posters in both RGB dom forest utilizes a bagging strategy by splitting in-
format and HSB format, and extracted the mean and put features into a random subset while selecting the
standard deviation of red, green, blue, hue, saturation, best feature for each node of a decision tree. It is ex-
brightness (as suggested by [5]), as well as the number tensively used for non-linear problems due to strong
of human faces using openCV (as suggested by [8]). stability and efficient reduction of overfitting.
For synopses, we used spaCy[9] for tokenization and 4.3. Support Vector Regression
only kept words that appeared in at least 20 movies.
For cast and crew, we extracted the main director and Support Vector Machine (SVM) is commonly used
top three leading actors for each movie, and kept di- in many machine learning problems as baselines.
rectors and actors involved in at least 5 movies in our SVM defines a margin from the hyperplane, where
dataset. The result is shown in Table 1. points inside the margin incur loss, while Support Vec-
After filtering out movies released before 1980 and tor Regression (SVR) defines a margin of -distance
whose original language is not English, we had 19429 from the hyperplane such that data points inside the
movies in our sample. We then did a train-validation- boundary are error-free. For our project, we used the
test split at the 70%-15%-15% ratio, with 13600 train- radical basis function as the kernel.
ing, 2914 validation and 2915 test data points. We 4.4. Neural Networks
standardized all the numerical features, performed fea-
ture selection and hyperparameter tuning using the val- To effectively capture information from the mul-
idation set, and did model selection with the test set. timodal data, we used convolutional neural network
(CNN), used to scale down the magnitude of pat-
4. Methods terns from original inputs, as a primary tool to ana-
lyze posters and perform natural language processing
4.1. Linear Regression Models
on synopses. We did word embedding on synopses
For our baseline algorithm, we used linear regres- and applied kernels on the embedding. We also ap-
sion, which models the relationship between indepen- plied a residual neural network ResNet 34 [10], known
dent and dependent variables by finding a linear corre- for mitigating the problem of vanishing gradients via
lation and minimizing the sum of the squares of the shortcuts between layers, to poster data.
differences between predicted and actual values. It
4.5. Feature Importance
is among the most common, fundamental and simple
methods used to solve regression problems. To evaluate the predictive power of different fea-
To mitigate overfitting that might arise from fitting tures, we employed permutation feature importance
a simple regression model to our data, we also used (FI) [11], a technique that measures the importance
ridge regression. This regularization method adds an of each feature. The intuition is that if a feature is
extra l2 -norm of the parameter to the cost function to not useful for predicting an outcome, then permuting
penalize large regression coefficients. In general, ridge its values will not result in a significant reduction in
regression helps solve multicollinearity, or high inter- a model’s performance. For each feature, permutation

2
FI is defined by: Table 2. Preliminary results on text and image data
Data Method Valid MSE Valid R2
FI = errperm − errorig Word Embedding
Overview Only 1.2900 0.136
+ CNN
where errorig is the baseline model error with all the Tend to Tend to
Poster Only ResNet34
overfit overfit
original features, and errperm is the model error when a Extracted Features
certain feature is permuted. Linear Regression 1.4699 0.015
from Poster
Also known as Random Forest FI, mean decrease
impurity (MDI) is another FI metric in a random for- Table 3. Model MSE and R2
est model that computes the extent to which each fea- Method Test MSE Test R2
ture decreases the weighted impurity on average while Linear Regression 0.9302 0.3745
training the model. We evaluated each individual fea- Ridge Regression 0.8775 0.4099
ture and group of features on the train set using MDI Decision Tree Regression 0.8959 0.3975
FI and permutation FI, respectively. Random Forest Regression 0.8546 0.4253
Support Vector Regression 0.8542 0.4256
5. Experiments and Results Neural Network 0.8765 0.4109

5.1. Metrics
We used the Mean Square Error (MSE) and R2 as Additionally, we used L2 regularization on linear re-
our metrics. MSE was also used for training and R2 gression, and results show that the best alpha is 10. For
for representing the proportion of the variance for the decision tree regression, the optimal maximum depth
predicted film ratings explained by the features. was 8. For random forest regression, we set the max-
imum depth to be 32 and used 100 trees. For support
5.2. Individual Models vector regression, the optimal penalty parameter was
1 and the optimal  was 0.1. Finally, we combined
We first assessed individual contributions of the results from the CNN for the textual features and non-
three feature categories–text, images, others–in pre- textual features into 5 fully connected layers for our fi-
dicting film ratings. When we trained word embedding nal neural network. The regularization techniques we
and a CNN model (Kernel size = 2,3,4, 100 kernels used for the neural network are dropout, L2 penalty,
each) on the synopses as our only feature, we had an and early stopping. The results are shown in Table 3.
R2 of 0.136. This implies that synopses can be used
We found that all three feature categories contribute
to predict movie ratings. However, when we trained a
to the prediction of IMDB scores. Our current results
ResNet34 model only on 224x224 posters, we either
show that random forest regression and support vector
had overfitting (small regularization) or saw no signif-
regression have the best performance, which explain
icant improvement over simply predicting movie rat-
the 42 percent of variance in IMDB movie scores.
ings using the mean (big regularization). Thus, a com-
plicated model is not suitable for predicting movie rat- 6. Discussion
ings with posters. We therefore manually extracted 13
visual features instead of directly feeding the posters 6.1. Best Model
to our model. Combining these 13 visual features into Adding L2 regularization improved regression per-
a linear regression model, we had an R2 of 0.015, sim- formance. SVR and random forest regression have
ilar to the result in [5]. Thus, visual features in movie similar R2 , but SVR is significantly slower and harder
posters are capable of predicting movie ratings. Table to interpret. Considering accuracy, efficiency and in-
2 displays the results of all the individual models. terpretability, random forest was our best model.
Compared to the previous related work discussed in
5.3. Combined Models
Section 2, which used movie trailers and genre to pre-
We set the results from linear regression as our base- dict film ratings and achieved an MSE of 0.88 [5], we
line, and used validation data to tune hyperparameters. have a smaller MSE because our model had more fea-

3
1.00
train train
0.75 dev 0.8 dev
R^2

R^2
0.50
0.6
0.25
0.00 0.4
0 20 40 60 80 100 120 20 40 60 80 100
1.5 train train
0.75
dev dev
1.0
MSE

MSE
0.50
0.5 0.25

0 20 40 60 80 100 120 20 40 60 80 100


n_estimator max_depth

Figure 1. Model performance of different number of trees Figure 2. Model performance of different max depth

tures. However, compared to the social media model


which had an MSE of 0.2 [2], our model performed
worse. This could be because we exclusively used
as features information available before theatrical re-
lease, whereas the social media model used features
such as social media comments, which can only be re-
trieved after the movie release.

6.2. Effect of Parameter Tuning


We discuss the effect of two key parameters in the
random forest model: number and maximum depth
of trees. Figure 1 shows that as the number of trees Figure 3. Feature importance using Random Forest
(n estimator) increased (max depth = 32), the train and
valid sets showed the same trend in R2 and MSE. The
performance curves also flattened out after 20 trees. ually extracted visual features are also important, in-
Figure 2 (number of trees = 100) shows that the cluding standard deviations for RGB and HSB chan-
performance of the train set improved as the maxi- nels. By contrast, none of the features associated with
mum depth increased, while the valid set line almost actors, directors or synopses appear in the 20 most im-
remained unchanged. When we reduced the number portant features.
of trees, the valid set and train set lines moved in op- Table 4 shows the top 5 permutation features in de-
posite directions. The fewer trees we used, the sharper scending order for four feature groups. FI determines
the change in R2 and MSE for the dev set. That is, the the magnitude, not the direction, of effect on movie
bagging strategy with 100 trees in the forest effectively ratings. Since random forest FI considers the impor-
mitigates or even eliminates high variance. tance of each individual feature, it assigns low scores
Due to a trade-off between training time and model to sparse feature columns. Thus, we used permuta-
performance, 100 and 32 are reasonable values for the tion FI to determine the predictive power of feature
number of trees and maximum depth, respectively. groups by permuting all feature columns for every fea-
ture group and calculating the difference of the new
6.3. Feature Contributions and original full-model MSE (Figure 4).
For Random Forest FI (Figure 3), certain genres, As with Random Forest FI, genre, runtime and man-
such as documentary, horror and drama, have high FI. ually extracted visual features had the highest FI. By
Runtime is the second most important feature that can contrast, actors, directors and synopses had low to neg-
be used to predict movie ratings. Furthermore, man- ligible importance.

4
Table 4. Feature Importance Ranking (Top 5)
Poster Genre Actor Director
hue sd documentary Stephen Baldwin Tyler Perry
hue horror Thomas Kretschmann Woody Allen
saturation drama Lauren Bacall Craig Moss
blue sd animation Manisha Koirala Uwe Boll
green sd action Angie Everhart Steven R. Monroe

Figure 5. Permutation feature importance of nested random


forest

Table 5. Comparison of Regression and Classification


Method Test MSE Test R2
Classification (5 classes) 0.8722 0.4135
Classification (10 classes) 0.8649 0.4184
Classification (20 classes) 0.8608 0.4212
Figure 4. Permutation feature importance Regression 0.8546 0.4253

6.4. Nested Model tion algorithms with corresponding regression models


Although CNN with synopses as the sole feature (e.g. SVM for SVR, Logistic Regression for Linear
can reach R2 of 13.5%, synopses alone have negligible Regression). For all the models, regression outper-
FI in random forest. A possible explanation is that us- formed classification because discretizing data renders
ing vocabulary dictionary failed to capture sequential some information missing from the original data.
information in synopses.
We used CNN to extract and feed 300 features from 7. Conclusion and Future Work
synopses into the final random forest model. Yet our All three feature categories contribute to the predic-
resulting nested model was 1% lower in R2 . This tion of movie ratings. By feeding information avail-
could be because CNN models tend to overfit, causing able prior to theatrical release into a random forest
our nested model to overfit. After using the features model, we could explain 42% of variance in English-
extracted by CNN, synopses and posters had higher language movie ratings, better than other models that
and lower permutation FI, respectively. depend only on information prior to theatrical release.
Among all the features, genre had the highest predic-
6.5. Classification Perspective
tive power for film ratings. Certain combinations of
In addition to regression, we framed the film rat- genre like horror and action could lead to bad ratings,
ing prediction task as a classification problem. We while others like documentary and history could lead
discretized movie ratings uniformly into 5, 10 and 20 to good ratings. Thus, production companies and con-
classes, and chose a central value for each interval to sumers should consider various combinations of gen-
represent each class. We then applied a random forest res for movie production and choice.
classification model to the data using the same param- Adding new features, such as movie trailers, could
eter settings as in regression. help improve our film rating prediction. Another area
Test MSE and R2 are shown in table 5. The higher for future research is the development of a sophis-
the number of classes, the higher the R2 and lower the ticated nested model to better combine information
MSE on the test data. We also tested other classifica- from all three feature categories.

5
8. Contributions
Yichen Yang and Ruoyun Ma co-wrote the report
and ran experiments. Min Haeng Cho performed liter-
ature review and co-wrote and finalized the report. We
all created the poster together. The GitHub Link for
codes is https://github.com/JeffJeffy/CS229Project.

References
[1] Y. J. Lim and Y. W. Teh, “Variational bayesian ap-
proach to movie rating prediction,” in Proceedings of
KDD Cup and Workshop, vol. 7, 2007, pp. 15–21.
[2] A. Oghina, M. Breuss, M. Tsagkias, and M. De Rijke,
“Predicting imdb movie ratings using social media,”
in European Conference on Information Retrieval.
Springer, 2012, pp. 503–507.
[3] J. D. Mcauliffe and D. M. Blei, “Supervised topic
models,” in Advances in Neural Information Process-
ing Systems, 2008, pp. 121–128.
[4] J. Ahmad, P. Duraisamy, A. Yousef, and B. Buckles,
“Movie success prediction using data mining,” in In-
stitute of Electrical and Electronics Engineers, 2017.
[5] F. B. Moghaddam, M. Elahi, R. Hosseini, C. Trattner,
and M. Tkalčič, “Predicting movie popularity and rat-
ings with visual features,” in 2019 14th International
Workshop on Semantic and Social Media Adaptation
and Personalization (SMAP). IEEE, 2019, pp. 1–6.
[6] KaggleInc, “Movie genre from its
poster,” https://www.kaggle.com/neha1703/
movie-genre-from-its-poster.
[7] Kaggle, “The movies dataset,” https://www.kaggle.
com/rounakbanik/the-movies-dataset.
[8] C. Sun, “Predict movie rating,” https:
//nycdatascience.com/blog/student-works/
web-scraping/movie-rating-prediction/, 2016.
[9] M. Honnibal and I. Montani, “spaCy 2: Natural lan-
guage understanding with Bloom embeddings, con-
volutional neural networks and incremental parsing,”
2017, to appear.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep resid-
ual learning for image recognition,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 770–778.
[11] A. Fisher, C. Rudin, and F. Dominici, “Model class
reliance: Variable importance measures for any ma-
chine learning model class, from the “rashomon” per-
spective,” arXiv preprint arXiv:1801.01489, 2018.

You might also like