0% found this document useful (0 votes)
20 views23 pages

Data Storytelling Final Project

The document outlines a data analysis project for Spectacular Studios, focusing on factors influencing movie attendance, such as genres, actors, directors, and reviews. It recommends producing films that align with the top features of the highest-rated and most profitable movies based on IMDb scores. Limitations of the analysis include data collection biases and the need for updated datasets to accurately reflect audience preferences and trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views23 pages

Data Storytelling Final Project

The document outlines a data analysis project for Spectacular Studios, focusing on factors influencing movie attendance, such as genres, actors, directors, and reviews. It recommends producing films that align with the top features of the highest-rated and most profitable movies based on IMDb scores. Limitations of the analysis include data collection biases and the need for updated datasets to accurately reflect audience preferences and trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Build a Data Story

Final Project
By: Dorothy Kunth

As a data science consultant for a large production company, Spectacular Studios, the tasks
involve defining a problem that is compelling enough for the executive team to warrant taking
action and to develop the storyline of what can be expected of the data to tell from the analysis.

Dataset - a movies metadata from The Movies Dataset available on Kaggle


https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset
The factors that influence the
people’s decision to see a movie
Executive Summary
The factors that influence the people’s
decision to see a movie are genres,
actors, directors, plot summary, word of
mouth advertising, IMDb scores,
professional movie critical reviews,
IMDb user reviews, popularity,
languages and production countries.

We recommend for Spectacular Studios


to consider producing in the next three
years, movie projects that have the
top 5 features out of the top 250
movies based on IMDb score and the
top 250 movies based on estimated
profit.
Overview of Analysis
What are the key factors that influence the people’s decision to see movies that belong to the
top 250 based on IMDb score and the top 250 based on estimated profit?

How do these affect the recommendation to produce highly popular and financially successful
movies in the next three years?

We focused our analysis on drilling down into these workstreams:

1. If genres influence the people to watch a movie.


2. If popularity influences the people to watch a movie.
3. If original languages influence the people to watch a movie
4. If production countries influence the people to watch a movie
Genre: The top 5 genres from the top 250 movies based on IMDb score
Drama on the top spot!

8.4% of the top 250 movies


based on IMDb score are
Drama, which is on the top spot.
As shown in the graph and the
table below, the remaining 4 are
cross-genre or hybrid genre of
Drama and other genres
Genre: The top 5 genres from the top 250 movies based on Estimated Profit (US$)

7.2 % of the top 250 movies


based on estimated profit are a
hybrid genre of Action,
Adventure and Science Fiction.
As shown in the graph and table,
the remaining 4 are also
cross-genres.
Popularity: The top 5 popular movies from the top 250 movies based on IMDb score
Popularity: Comparison of Top 5 movies based on Popularity and IMDb Score

Top 5 movies based on Popularity Top 5 movies based on IMDb Score


Popularity: The top 5 popular movies from the top 250 movies based on Estimated
Profit (US$)
Among the top 250
movies based on
estimated profit,
Minions got the
highest popularity
score. And it shows
Big Hero 6 is the 4th
popular movie while it
is the top popular
movie based on IMDB
score
Popularity: Comparison of Top 5 movies based on Popularity and Estimated Profit
(US$)
Top 5 movies based on Popularity Top 5 movies based on Estimated Profit
Original Language: The Top 5 Languages from the top 250 movies based on IMDB
score

Out of the 90 languages represented


in the dataset, as expected, English
language movies are the most
watched and most liked movies
which form the 84% of the top 250
movies. Japanese and Italian came
at a very distant second and third
respectively.
Original Language: The Top 4 Languages from the top 250 movies based on Estimated Profit
(US$)

For the top 250 movies based on


estimated profit, English language
movies are the most financially
successful movies.
Production Country: The Top 5 Countries from the top 250 movies based on IMDB score

Almost 55% of the top 250 movies


based on IMDb score are produced in
the US. Followed by an international
co-production of Great Britain and the
US.
Production Country: The Top 5 Countries from the top 250 movies based on Estimated Profit
(US$)

70% of the top 250 movies based


on estimated profit are produced in
the US. The next top 4 are
international co-production
between US and countries Great
Britain, New Zealand, Germany
and China respectively
Summary: Top 5 features from the top 250 movies based on IMDb score

The features of top 5 popular movies


Summary: Top 5 features from the top 250 movies based on Estimated Profit
(US$)

The features of top 5 popular movies


Limitations and Biases
Data Collection:
1. An updated dataset is vital most importantly on the IMDb scores and Popularity
scores which consistently change over time and calculations are being updated
on a weekly basis.

2. There is a limitation on the fact that the IMDb’s 83 million registered users are not
the absolute representation of the world’s total movie going audience. Not all
moviegoers are IMDb registered user.

3. Other factors that affect the people’s decision to see a movie are the professional
movie critical review, IMDb user review, actors, directors, plot summary, word of
mouth advertising which are not present in the feature set. And factors such as
plot summary and word of mouth advertising are not possible to measure.
Limitations and Biases
Data Preprocessing:
1. The genres are a stringified list of dictionaries that list out all the genres and
hybrid genres per movie which has about 5-6 genres. Upon data preprocessing,
the genres were converted into a list of maximum of 3 genres only.
2. The production countries are a stringified list of countries where the movies are
produced. Some of the movies are international co-production between 5-6
countries. Upon data preprocessing, the production countries were converted
into a list of maximum of 3 countries only.
3. Less than 1% of missingness in the following features:
vote_average (6), vote_count (6), revenue (6), popularity (5), language (11) and
production countries (3)
Out of the 45466 records, missing values were just around less than 1%
therefore, these were just ignored due to a very small percentage.
Limitations and Biases
Insights : Popularity distribution of the Top 250 movies based on IMDb Score is right skewed
with 8.8% high outliers. However, these outliers have to be included as removal of the
observations will have a significant effect on the analysis.
Limitations and Biases
Insights 2: Popularity distribution of the Top 250 movies based on estimated profit is right
skewed with 8.0% high outliers. However, these outliers have to be included as removal of the
observations will have a significant effect on the analysis.
Limitations and Biases

Insights 3
IMDb uses proprietary algorithms that take into account several measures of
popularity and the primary measure is what people are looking at on IMDb. IMDb
records and sums the pageviews which form part of the foundation of popularity
rankings.

In the feature set, the popularity is not expressed in ranking but scores which we
assumed to be the number of user visits and pageviews expressed in millions.
Next Steps
1. Identify sources of potential data for popularity ranking, professional critical review, IMDb user review
rating, movie actors and directors.
2. Follow-up analysis based on a more recent dataset, probably weeks-old dataset.
3. Since profitability of a film studio is crucially dependent on picking the right film projects and box office
revenue is highly concentrated in a small number of very successful films, the proposed next steps from
the analysis made, suggest:
● Consider movie projects that are in the genres or hybrid genres of Drama, Crime, Romance,
Comedy, Adventure, Action, Science Fiction, Fantasy and Animation
● Produce movie projects in English language.
Thank you!

You might also like