IMDB Movie Analysis
IMDB Movie Analysis
Analysis
Contents:
1. Project Description
2. Tech Stack Used
3. Approach
4. Insights
5. Results and Conclusion
Excel Tasks:
1. Movie Genre Analysis
2. Duration Analysis
3. Language Analysis
4. Director Analysis
5. Budget Analysis
Project Description:
The IMDb Movie Analysis project aims to explore and analyze a comprehensive
dataset of movies available on the IMDb platform.
This dataset contains essential information about movies, including director
names, movie titles, duration, genre, budget, gross earnings, IMDb ratings, and
more.
Through in-depth data analysis using Excel, Data Visualization and Statistics
techniques this project seeks to extract valuable insights and trends that
contribute to a movie's success.
NOTE: ALL THE LINKS FOR CLEANED DATASET AND SOLUTIONS DATASET ARE
PROVIDED BELOW !!!
DATA HANDLING
My Approach:
I have gone through the dataset and understood all the given columns. Then I
have observed that there are a total of 28 Columns and 5043 Rows. This dataset
consists of unwanted columns, Null values and Blank rows. So, I have decided to
Clean this dataset thoroughly.
1. First, I have deleted the columns which have no relation to our project and don't
provide any valuable insights. In the end, I only left with 9 Columns which are
director’s name, duration, movie title, genre, budget, gross, IMDB rating,
language and country.
2. Then, I noticed that there were many blank rows. To find them I first clicked on
“Find & Select” then clicked on “go to special” and selected the “blank” option.
It highlighted all the blank rows. Then I clicked the shortcut “CTRL + - ” and
selected the “Entire rows” option. This process deleted the entire blank rows in
the dataset.
3. Finally, I also deleted the duplicate rows present in the dataset. Now, I left with
a total of 9 Columns and 3786 Rows.
DATA ANALYSIS
1) Movie Genre Analysis:
Analyze the distribution of movie genres and their impact on the IMDB score.
Task: Determine the most common genres of movies in the dataset. Then,
for each genre, calculate descriptive statistics (mean, median, mode, range,
variance, standard deviation) of the IMDB scores.
Results:
DATA ANALYSIS
1) Movie Genre Analysis:
Result:
DATA ANALYSIS
1) Movie Genre Analysis:
Result:
DATA ANALYSIS
1) Movie Genre Analysis:
Result:
DATA ANALYSIS
2) Movie Duration Analysis:
Analyze the distribution of movie durations and its impact on the IMDB score.
Task: Analyze the distribution of movie durations and identify the relationship
between movie duration and IMDB score.
Results:
DATA ANALYSIS
2) Movie Duration Analysis:
Analyze the distribution of movie durations and its impact on the IMDB score.
Task: Analyze the distribution of movie durations and identify the relationship
between movie duration and IMDB score.
Results:
DATA ANALYSIS
3) Movie Language Analysis:
Situation: Examine the distribution of movies based on their language.
Task: Determine the most common languages used in movies and analyze
their impact on the IMDB score using descriptive statistics.
Results:
DATA ANALYSIS
4) Movie Director Analysis:
Influence of directors on movie ratings.
Task: Identify the top directors based on their average IMDB score and analyze
their contribution to the success of movies using percentile calculations.
Result:
DATA ANALYSIS
5) Movie Budget Analysis:
Explore the relationship between movie budgets and their financial success.
Task: Analyze the correlation between movie budgets and gross earnings, and
identify the movies with the highest profit margin.
Results:
DATA ANALYSIS
5) Movie Budget Analysis:
Explore the relationship between movie budgets and their financial success.
Task: Analyze the correlation between movie budgets and gross earnings, and
identify the movies with the highest profit margin.
Results:
DATA ANALYSIS
5) Movie Budget Analysis:
Explore the relationship between movie budgets and their financial success.
Task: Analyze the correlation between movie budgets and gross earnings, and
identify the movies with the highest profit margin.
Correlation Graph:
Insights
The Most common movie genres from the dataset are Drama, Comedy, Thriller
and Action.
The Average duration of a Movie is 109 minutes. The trendline between the
duration vs imdb score is elevated upward with R^2 = 0.131
The Most common languages used in the movies are English, French, Spanish,
Mandarin and German. I have also Observed that the languages Telugu and
Persian have the highest average imdb score.
I have identified that Tony Kaye, Charles Chaplin, Alfred Hitchcock, Ron
Fricke, Damien Chazelle, Majid Majidi, Sergio Leone, Christopher Nolan, SS
Rajamouli and Richard Marquand are the top 10 directors with average imdb
score >=8.4
The Top-5 with highest profits are Avatar, Jurassic World, Titanic, Star Wars:
Episode IV - A New Hope and E.T. The Extra-Terrestrial. The Correlation
between budget and gross is positive.
Conclusion
The Cleaned Dataset:
https://docs.google.com/spreadsheets/d/1QZcrT5BZhKOTA9_pnpaorlPPRI7wW4B
CzT_FyVd0YQY/edit?usp=sharing
The Results Dataset:
https://docs.google.com/spreadsheets/d/1X-
ak_kajhbePb1_NzvtnA8kmErCSwUN9yEJqvIdsac0/edit?usp=sharing
Thank You