0% found this document useful (0 votes)
24 views13 pages

Project 5

The document outlines a project focused on analyzing IMDB movie data to identify factors influencing movie success, defined by high ratings. It includes data cleaning steps and various analytical tasks such as genre analysis, duration analysis, language analysis, director analysis, and budget analysis, each with specific methods and statistical calculations. The project aims to provide insights valuable for movie producers and investors to make informed decisions.

Uploaded by

rachanareddy4a6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

Project 5

The document outlines a project focused on analyzing IMDB movie data to identify factors influencing movie success, defined by high ratings. It includes data cleaning steps and various analytical tasks such as genre analysis, duration analysis, language analysis, director analysis, and budget analysis, each with specific methods and statistical calculations. The project aims to provide insights valuable for movie producers and investors to make informed decisions.

Uploaded by

rachanareddy4a6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Project 5: IMDB Movie Analysis

DESCRIPTION:

Problem Statement: The dataset provided is related to IMDB


Movies. A potential problem to investi gate could be: "What
factors influe nce the success of a movi e on IMDB?" Here , success
can be defined by high IMDB ratings. The impact of this problem
is signifi cant for movie produce rs, dire ctors, and i nvestors who
want to understand what makes a mov ie successful to make
informed decisions in thei r future projects.

Before getting started with the data we need to start off with cleaning
the data as it contains blank cells, duplicate values and data which is
not productive for analyzing the data.

For data cleaning we need to delete the rows which are not necessary
to us for the future analysis. After that we need to select the data
which is remaining after cleaning and identify blank cells. We need to
do this by going to the home tab and then select FIND & SELECT
function to find the blank cells and then delete the entire row as the
missing value will be meaningful with the rest. After performing this
we need to check for the duplicates in the text by selecting the Data
tab we find Remove Duplicates function and it deletes the duplicates
from the data. Now we are good to go with the rest of the data.

Data Analytics Tasks:

You are required to provide a detailed report for the below data record
mentioning the answers to the questions that follows:

TASK A: Movie Genre Analysis: Analyze the distribution of movie


genres and their impact on the IMDB score. Determine the most
common genres of movies in the dataset. Then, for each genre,
calculate descriptive statistics (mean, median, mode, range, variance,
standard deviation) of the IMDB scores.

Soln: So, before counting the number of movies for each genre the
genre column must be manipulated so I've used text to column
function from the data tab and removed the delimiters from the data.
And then countif function has been used to find the number of movies
for that genre.

PAGE 2
Here I've obtained different genres from the given data and found that
Drama genre is the most common among the movies and performed
descriptive statistics to find out AVERAGE, MEDIAN, MODE, MAX, MIN,
VAR, and STDEV.

TASK B: Movie Duration Analysis: Analyze the distribution of


movie durations and its impact on the IMDB score. Analyze the
distribution of movie durations and identify the relationship between
movie duration and IMDB score.

Soln: Here I've copied the data into a new sheet and calculated the
Average movie duration, Median and Standard Deviation for the same.
Then I've plotted the scatter plot for Movie Duration and the
respective IMDB score. Then I've added a trendline for the same.

PAGE 3
The above plot depicts that from movie duration 80 mins to 150 mins
the graph shows average score is similar. And the trendline shows
slightly exponential growth.

TASK C: Language Analysis: Situation: Examine the distribution of


movies based on their language. Determine the most common
languages used in movies and analyze their impact on the IMDB score
using descriptive statistics.

Soln: Here I’ve used COUNTIF function to calculate the number of


movies for each language. Formula:
=COUNTIF($B$2:$D$3884,""&L13&"") for each specific language and
the observations are below. There are a total of 38 languages and
there are 3708 movies which are in english language. Then ive
calculated MEAN, MEDIAN and STANDARD DEVIATION for the
data.

PAGE 4
TASK D. Director Analysis: Influence of directors on movie ratings.
Identify the top directors based on their average IMDB score and
analyze their contribution to the success of movies using percentile
calculations.

Soln: Here I’ve used Pivot table for the director names and IMDB score
of the movies and I've applied filters to obtain the directors with
maximum average rating using pivot table. Then ive used large
function to find out the highest average rating among the IMDB scores
and then PERCENTRANK to find out how the ranks compared to the
rest. PERCENTILE function is used to find the value below which a
given percentage of data falls.

PAGE 5
TASK E. Budget Analysis: Explore the relationship between movie
budgets and their financial success. Analyze the correlation between
movie budgets and gross earnings and identify the movies with the
highest profit margin.

Soln: Here to calculate the profit margin we need to subtract gross


earnings from budget. Then find the highest profit of a movie so we
use MAX function from the obtained Profit values. We find the
correlation between movie budgets and profits by using CORREL
function and we find out that the correlation is relatively weak
positive linear relationship.

PAGE 6
Loom Video:
https://drive.google.com/file/d/1gWec9iuXerOzKgWa4JDFUBMOZy -
c67WQ/view?usp=drive_link

PAGE 7
PAGE 8
PAGE 9
PAGE 10
PAGE 11
PAGE 12
PAGE 13

You might also like