0% found this document useful (0 votes)

293 views10 pages

Project: Predicting Box Office Revenues: A Report Submitted To

1) The project aims to build a machine learning model to predict box office revenues using factors like popularity, budget, runtime, genre, production company, release date from a dataset of over 4,000 movies. 2) Exploratory data analysis found relationships between higher revenues and higher popularity, budget, runtime, action and sci-fi genres, major production companies, and release during summer or December. 3) A support vector machine model was built and found to predict revenues with 97.2% accuracy, outperforming a random forest model at 89.56% accuracy based on metrics like MAE, MSE, and RMSE. 4) The trained SVM model was used to predict revenues for test data

Uploaded by

Battagiri Sai Jyothi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

293 views10 pages

Project: Predicting Box Office Revenues: A Report Submitted To

Uploaded by

Battagiri Sai Jyothi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Project: Predicting box office revenues

A report submitted to

Prof. Ujjwal Das

In the partial fulfilment of the course

Advanced Methods for Data Analysis (AMDA)

By
Faisal Abrar - 2011072
Sai Jyothi - 2011299
Soumava Ghosh - 2011247
Vivek Sreenivasan - 2011278

On
19-12-2020

Business Problem
The business problem we are focusing on is predicting the box office revenues. The
entertainment industry has observed various changes in technology, which has modernized the
industry. Accordingly, consumer preferences have changed, leading to an increased need for
managers to estimate the predicted revenue for the film planned to be released. The costs of
creating a film have increased over the years. Additionally, due to the increased focus on ROI
and box office revenue as a critical parameter for the success of the film house, the industry is
investing in improving their prediction accuracy to make informed decisions like movie
scheduling, advertising strategy etc.

The project aims to build a machine learning predictive model to estimate the revenue generated
at the box office, taking into consideration the various factors.

Date Set used

The data set contains data regarding movies released worldwide, with 4398 movie entries
containing several details, including movie details, overviews and credits. This dataset is
extracted from TMDB API and is certified by TMDB. We can also access data of many other
additional movies, actors and actresses, crew members, and TV shows from the API's provided.

Variable Used

id belongs_to_collection budget

genre homepage imdb_id

original_language original_title overview

popularity poster_path production_companies

production_countries production_companies release_date

run_time spoken_languages status

tagline title keywords

cast crew revenue

Exploratory Data Analysis

The complete analysis for this dataset has been done using R software. In the first step of our
study, exploratory data analysis, we have tried to identify the various variables that impact the
revenue and understand the relationships we have with the dependent variable. Some of the
graphs have been shown below, which have been plotted using ggplot and ggplotextra packages.

The first 3 plots show the relationship of the 3 variables, viz, popularity, budget and runtime,
with the revenue earned.

As can be seen from the graphs, all 3 have a positive relationship with the revenue, ie, increasing
each of them leads to an increase in revenue. This seems logical as well, as an increase in budget
increases the quality of production and cast, leading to a high preference by consumers. Also,
more popularity (marketing and advertising of the movie) leads to better reception. Runtime
shows a similar trend but does not have a strong relationship like budget and popularity.
Next, we have tried to see the impact of genre on the revenues earned by plotting the number of
movies for different genres and the median revenue earned by different genres.

It can be observed that movies in action and science fiction genres have a higher median
revenue. But genres with low movie counts like foreign or history do not represent a correct
representation of the impact on revenue due to the low sample size.

Next, we have tried to see the effect of the production house on the revenues earned.

The revenues earned by popular big production houses is much more than the small production
companies.

In the next set of plots, we have tried to see the effect of the movie's release time (year, quarter,
month, week and day) on the revenues earned.
From the above plots, it can be seen that:
1. The revenue on average has increased with the increase in year
2. The revenues earned in 3 months, June, July, and December, are much higher than those
released in other months. A possible reason for this could be that many big movies target
a summer release (June and July), while some major blockbuster movies aim for the
December release to capitalize on the winter holiday season.
3. The revenue earned for movies releasing on Wednesdays seems to be higher than the
other days of the week.

In the next set of plots, we estimated the number of occurrences of variables for the following
parameters 1. Genres, 2. Production Companies 3. Production Countries 4. Spoken Languages 5.
Keywords and created a correlation matrix with the revenue.
We found the following observations:
1. The higher the median revenue, the more genres a film has. The bigger the number of
production companies in a film, up to six, the higher the revenue.
2. A greater number appears to produce more erratic outcomes. Smaller sample sizes could
account for this.
3. There does not appear to be a clear correlation between the number of producing
countries and revenue. There appears to be no discernible trend in the number of spoken
languages as well.
4. There is a correlation between having more keywords and having greater revenue.
Data Processing
1. We created a new variable to convert the “belongs to collection” variable to a categorical
variable containing 2 values “ Collection” and “ No Collection”
2. We extracted the main genre from the “genre” column to create a new variable named
“main genre”
3. We extracted the first main id from the “production companies” column to create a new
variable named “ prod comp id”
4. Next, we extracted the production company name to create a new variable named “ top
prod comp” and categorized it such a that companies with less than 60 movies have value
as “others”

Models

Multi-class SVM
SVM is a supervised machine learning algorithm which can be used for both classification and
regression. Our problem is a regression problem and we have used the extension, multi-class
SVM to predict the revenue earned. We have used the e1071 package to build support vector
machines.

Data Preparation
After removing all the null values from the full_dataset we have splitted the data into testing and
training parts. After data preparation, we looked into different variables to select important
variables for the revenue.

About the model and test data

Using the SVM function and predict function we have extracted the SVM model.
About the model and test data
Using the SVM function and predict function we have extracted the SVM model. The results
were found that SVM type is of eps-regression and the kernel is radial. The number of support
vectors was 2441 in total.

Next, we predicted the test data and results plot to compare visually. This shows that the revenue
is distributed unevenly with different figures.
Accuracy
We have calculated the MAE, MSE, RMSE, R-squared along with accuracy. The accuracy
turned out to be about 97.2%.

Comparison
Along with the multiclass SVM we have used Random forest since its regression model. The
summary of the random forest is given below. Total about 501 trees were obtained and accuracy
turned out to be 89.56%. Of two models comparing the accuracy, MSE, MAE the multi classifier
SVM is the best one.
Prediction
We have created our model and trained the model. Using the multi classifier SVM we will
predict the revenue of the test data using the movie id. We have saved the predicted data into a
csv file. The glimpse of data is given below.

Conclusion
Using this dataset in the future we can predict the ratings of the movies releases based on the
cast, crew along with the revenues using machine learning algorithms.

Python
No ratings yet
Python
30 pages
A Survey On Machine Learning Techniques
No ratings yet
A Survey On Machine Learning Techniques
14 pages
A Machine Learning Approach To Predict M
No ratings yet
A Machine Learning Approach To Predict M
66 pages
Movie Alaysis Report
No ratings yet
Movie Alaysis Report
22 pages
Yalamanchili
No ratings yet
Yalamanchili
15 pages
Movie Project 2
No ratings yet
Movie Project 2
18 pages
Predicting Movie Prices Through Dynamic Social Net
No ratings yet
Predicting Movie Prices Through Dynamic Social Net
12 pages
Review 0
No ratings yet
Review 0
9 pages
Vizathon Movies
No ratings yet
Vizathon Movies
8 pages
Movie Box Office Revenue Prediction Using Machine Learning: Team Members: (BATCH N0:13) Guide Name
No ratings yet
Movie Box Office Revenue Prediction Using Machine Learning: Team Members: (BATCH N0:13) Guide Name
8 pages
Predicting Movie Box Office Based On Machine Learn
No ratings yet
Predicting Movie Box Office Based On Machine Learn
13 pages
Ahmad Et Al. - 2020 - Movie Revenue Prediction Based On Purchase Intenti
No ratings yet
Ahmad Et Al. - 2020 - Movie Revenue Prediction Based On Purchase Intenti
15 pages
Zhang Wenbin ISF2009 Paper
No ratings yet
Zhang Wenbin ISF2009 Paper
7 pages
Project 5
No ratings yet
Project 5
13 pages
Predicting Movie Rating Prior To Release
No ratings yet
Predicting Movie Rating Prior To Release
15 pages
TMDB Box Office Prediction: Group 6
No ratings yet
TMDB Box Office Prediction: Group 6
7 pages
Group Project Description
No ratings yet
Group Project Description
6 pages
Data Analysis Project Zach Ashmore
No ratings yet
Data Analysis Project Zach Ashmore
6 pages
Individual Assignment - Alejandro Gutierrez - Data Science
No ratings yet
Individual Assignment - Alejandro Gutierrez - Data Science
4 pages
Bheem Final
No ratings yet
Bheem Final
65 pages
Review 2
No ratings yet
Review 2
21 pages
A Two-Stage Proxy Variable Approach To Estimating Movie Box Office Receipts
No ratings yet
A Two-Stage Proxy Variable Approach To Estimating Movie Box Office Receipts
17 pages
Predicting Movie Ratings With Multimodal Data: Yichen Yang Ruoyun Ma Min Haeng Cho
No ratings yet
Predicting Movie Ratings With Multimodal Data: Yichen Yang Ruoyun Ma Min Haeng Cho
6 pages
Data Analytics Group 7
No ratings yet
Data Analytics Group 7
7 pages
Week 3
No ratings yet
Week 3
2 pages
ML Project
100% (1)
ML Project
10 pages
Final Project - CS181
No ratings yet
Final Project - CS181
3 pages
Movie Success Prediction Using Data Mining
No ratings yet
Movie Success Prediction Using Data Mining
9 pages
Movie Sales Analysis
No ratings yet
Movie Sales Analysis
16 pages
Movie Revenue Classification Workflow
No ratings yet
Movie Revenue Classification Workflow
2 pages
Final Review
No ratings yet
Final Review
24 pages
Movie Success Prediction Using Data Mining
No ratings yet
Movie Success Prediction Using Data Mining
3 pages
Rascunho Entrega Final
No ratings yet
Rascunho Entrega Final
4 pages
Review 1
No ratings yet
Review 1
18 pages
Batch 13
No ratings yet
Batch 13
11 pages
Summary and Q/A of Opening & Ending Vignettes of Data Mining For Business Intelligence & Data Warehousing
0% (1)
Summary and Q/A of Opening & Ending Vignettes of Data Mining For Business Intelligence & Data Warehousing
17 pages
Zhang2015 Pre Production Phase Paper
No ratings yet
Zhang2015 Pre Production Phase Paper
4 pages
IMDB Box Office Prediction Using Machine Learning Algorithms
No ratings yet
IMDB Box Office Prediction Using Machine Learning Algorithms
7 pages
Analysis of Relation Between Budgets and Revenues From Movies
No ratings yet
Analysis of Relation Between Budgets and Revenues From Movies
5 pages
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
No ratings yet
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
6 pages
Aravindan Ingersol IMI Delhi Method
No ratings yet
Aravindan Ingersol IMI Delhi Method
1 page
Prediks I Movie
No ratings yet
Prediks I Movie
25 pages
Conference Paper
No ratings yet
Conference Paper
6 pages
BDC Project Real Time
No ratings yet
BDC Project Real Time
14 pages
DM 8
No ratings yet
DM 8
6 pages
Report
No ratings yet
Report
26 pages
A Predictor For Movie Success: 2.1 Data Collection
No ratings yet
A Predictor For Movie Success: 2.1 Data Collection
5 pages
Predicting Movie Success Based On Imdb Data
No ratings yet
Predicting Movie Success Based On Imdb Data
5 pages
Analyzing and Predicting The Success of Box Office Collection of A Movie Using Machine Learning
No ratings yet
Analyzing and Predicting The Success of Box Office Collection of A Movie Using Machine Learning
7 pages
b1 PDF
No ratings yet
b1 PDF
6 pages
Yanmar 4lha STP
No ratings yet
Yanmar 4lha STP
2 pages
Predicting Movie Success Based On IMDB Data
No ratings yet
Predicting Movie Success Based On IMDB Data
4 pages
Film Data Analysis
No ratings yet
Film Data Analysis
3 pages
IMDB - Movie Recomendation-DA Project
No ratings yet
IMDB - Movie Recomendation-DA Project
4 pages
Subtraction Strategies That Lead To Regrouping
100% (1)
Subtraction Strategies That Lead To Regrouping
6 pages
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
No ratings yet
IS 802 (Part 1 Sec 2) 2016 Use of Structural Steel in Overhead Transmission Line Towers - Part 1 Section 2 Design Strengths
19 pages
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
No ratings yet
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
363 pages
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
No ratings yet
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
92 pages
Group1 A
No ratings yet
Group1 A
18 pages
SWOT Analysis Starbucks
No ratings yet
SWOT Analysis Starbucks
2 pages
Anesthetic Technique For Inferior Alveolar Nerve Block: A New Approach
No ratings yet
Anesthetic Technique For Inferior Alveolar Nerve Block: A New Approach
5 pages
Tablice 1 PDF
No ratings yet
Tablice 1 PDF
1 page
SYNOPSIS Hydraulic Press
No ratings yet
SYNOPSIS Hydraulic Press
10 pages
Airport and Railway Engin
No ratings yet
Airport and Railway Engin
36 pages
Edb Postgres Architecture Deep Dive
No ratings yet
Edb Postgres Architecture Deep Dive
5 pages
WPA Exploitation in The World of Wireless Network
No ratings yet
WPA Exploitation in The World of Wireless Network
34 pages
S-8244 Series: Battery Protection Ic For 1-Serial To 4-Serial-Cell Pack (Secondary Protection)
No ratings yet
S-8244 Series: Battery Protection Ic For 1-Serial To 4-Serial-Cell Pack (Secondary Protection)
28 pages
IC693ALG223
No ratings yet
IC693ALG223
17 pages
Consolidation of Clay
No ratings yet
Consolidation of Clay
17 pages
Geo F2 (Kilombero Prenecta 19)
No ratings yet
Geo F2 (Kilombero Prenecta 19)
6 pages
5 5kw+8p+ie3
No ratings yet
5 5kw+8p+ie3
5 pages
Consumer Theory
No ratings yet
Consumer Theory
17 pages
Grade 8-Ls-13-Light-Work Book
No ratings yet
Grade 8-Ls-13-Light-Work Book
6 pages
Using Python To Explore GOES-16 Data
No ratings yet
Using Python To Explore GOES-16 Data
13 pages
7 - Perfect Square and Square Root
No ratings yet
7 - Perfect Square and Square Root
26 pages
SSMP Vespa Service Manual
No ratings yet
SSMP Vespa Service Manual
25 pages
Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
No ratings yet
Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
31 pages
Biology Photosynthesis A-Level OCR Notes
No ratings yet
Biology Photosynthesis A-Level OCR Notes
13 pages
International Standards in Nanotechnologies: A B C C D
No ratings yet
International Standards in Nanotechnologies: A B C C D
15 pages
Cloud Computing and ERP: V. Venkata Rao I.I.M. Ahmedabad
No ratings yet
Cloud Computing and ERP: V. Venkata Rao I.I.M. Ahmedabad
11 pages
Group10 Case3
No ratings yet
Group10 Case3
9 pages
MS5105 Module Outline 2022-2023
No ratings yet
MS5105 Module Outline 2022-2023
4 pages
Group7 Report
No ratings yet
Group7 Report
10 pages
ERP at A Management Institute
No ratings yet
ERP at A Management Institute
7 pages
Epc & Petri-Net For Book Acquisition Process by Abc Library
No ratings yet
Epc & Petri-Net For Book Acquisition Process by Abc Library
4 pages
Jadwal Uts Ganjil 2020-2021 TGL 7 Sept
No ratings yet
Jadwal Uts Ganjil 2020-2021 TGL 7 Sept
4 pages
Master of Business Administration 2020-22
No ratings yet
Master of Business Administration 2020-22
3 pages
Should Walmart Be Worried About Aldi?
No ratings yet
Should Walmart Be Worried About Aldi?
2 pages
How Find Out How Many Numbers in A Minitab Column Are in A Given Range
No ratings yet
How Find Out How Many Numbers in A Minitab Column Are in A Given Range
2 pages
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
No ratings yet
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
2 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
IBM Cognos 10 Framework Manager
From Everand
IBM Cognos 10 Framework Manager
Terry Curran
No ratings yet
Making Big Data Work for Your Business: A guide to effective Big Data analytics
From Everand
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Sudhi Sinha
No ratings yet
JasperReports 3.5 for Java Developers
From Everand
JasperReports 3.5 for Java Developers
David R. Heffelfinger
No ratings yet
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
From Everand
Learning Hunk: A quick, practical guide to rapidly visualizing and analyzing your Hadoop data using Hunk
Dmitry Anoshin
No ratings yet
Big Data Visualization
From Everand
Big Data Visualization
James D. Miller
No ratings yet

Project: Predicting Box Office Revenues: A Report Submitted To

Uploaded by

Project: Predicting Box Office Revenues: A Report Submitted To

Uploaded by

Project: Predicting box office revenues

Prof. Ujjwal Das

In the partial fulfilment of the course

Advanced Methods for Data Analysis (AMDA)

Date Set used

genre homepage imdb_id

original_language original_title overview

popularity poster_path production_companies

production_countries production_companies release_date

tagline title keywords

cast crew revenue

Exploratory Data Analysis

About the model and test data

You might also like