0% found this document useful (0 votes)

13 views9 pages

IMDb+Movie+Assignment Stub

Uploaded by

kxhyccbq8w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views9 pages

IMDb+Movie+Assignment Stub

Uploaded by

kxhyccbq8w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

# Filtering out the warnings

import warnings

warnings.filterwarnings('ignore')

# Importing the required libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

IMDb Movie Assignment

You have the data for the 100 top-rated movies from the past decade along with various
pieces of information about the movie, its actors, and the voters who have rated these
movies online. In this assignment, you will try to find some interesting insights into these
movies and their voters, using Python.

Task 1: Reading the data

• Subtask 1.1: Read the Movies Data.
Read the movies data file provided and store it in a dataframe movies.
# Read the csv file using 'read_csv'. Please write your dataset
location here.

• Subtask 1.2: Inspect the Dataframe

Inspect the dataframe for dimensions, null-values, and summary of different numeric
columns.
# Check the number of rows and columns in the dataframe

# Check the column-wise info of the dataframe

# Check the summary for the numeric columns

Task 2: Data Analysis

Now that we have loaded the dataset and inspected it, we see that most of the data is in
place. As of now, no data cleaning is required, so let's start with some data manipulation,
analysis, and visualisation to get various insights about the data.
• Subtask 2.1: Reduce those Digits!
These numbers in the budget and gross are too big, compromising its readability. Let's
convert the unit of the budget and gross columns from $ to million $ first.
# Divide the 'gross' and 'budget' columns by 1000000 to convert '$' to
'million $'

• Subtask 2.2: Let's Talk Profit!

a. Create a new column called profit which contains the difference of the two
columns: gross and budget.
b. Sort the dataframe using the profit column as reference.
c. Extract the top ten profiting movies in descending order and store them in a
new dataframe - top10.
d. Plot a scatter or a joint plot between the columns budget and profit and
write a few words on what you observed.
e. Extract the movies with a negative profit and store them in a new dataframe -
neg_profit
# Create the new column named 'profit' by subtracting the 'budget'
column from the 'gross' column

# Sort the dataframe with the 'profit' column as reference using the
'sort_values' function. Make sure to set the argument
#'ascending' to 'False'

# Get the top 10 profitable movies by using position based indexing.

Specify the rows till 10 (0-9)

#Plot profit vs budget

The dataset contains the 100 best performing movies from the year 2010 to 2016.
However, the scatter plot tells a different story. You can notice that there are some movies
with negative profit. Although good movies do incur losses, but there appear to be quite a
few movie with losses. What can be the reason behind this? Lets have a closer look at this
by finding the movies with negative profit.
#Find the movies with negative profit

Checkpoint 1: Can you spot the movie Tangled in the dataset? You may be aware of the
movie 'Tangled'. Although its one of the highest grossing movies of all time, it has negative
profit as per this result. If you cross check the gross values of this movie (link:
https://www.imdb.com/title/tt0398286/), you can see that the gross in the dataset
accounts only for the domestic gross and not the worldwide gross. This is true for may
other movies also in the list.

• Subtask 2.3: The General Audience and the Critics

You might have noticed the column MetaCritic in this dataset. This is a very popular
website where an average score is determined through the scores given by the top-rated
critics. Second, you also have another column IMDb_rating which tells you the IMDb
rating of a movie. This rating is determined by taking the average of hundred-thousands of
ratings from the general audience.
As a part of this subtask, you are required to find out the highest rated movies which have
been liked by critics and audiences alike.
1. Firstly you will notice that the MetaCritic score is on a scale of 100 whereas the
IMDb_rating is on a scale of 10. First convert the MetaCritic column to a scale of
10.
2. Now, to find out the movies which have been liked by both critics and audiences
alike and also have a high rating overall, you need to -
– Create a new column Avg_rating which will have the average of the
MetaCritic and Rating columns
– Retain only the movies in which the absolute difference(using abs() function)
between the IMDb_rating and Metacritic columns is less than 0.5. Refer
to this link to know how abs() funtion works -
https://www.geeksforgeeks.org/abs-in-python/ .
– Sort these values in a descending order of Avg_rating and retain only the
movies with a rating equal to or greater than 8 and store these movies in a
new dataframe UniversalAcclaim.
# Change the scale of MetaCritic

# Find the average ratings

#Sort in descending order of average rating

# Find the movies with metacritic-Imdb rating < 0.5 and also with an
average rating of >= 8 (sorted in descending order)

Checkpoint 2: Can you spot a Star Wars movie in your final dataset?

• Subtask 2.4: Find the Most Popular Trios - I

You're a producer looking to make a blockbuster movie. There will primarily be three lead
roles in your movie and you wish to cast the most popular actors for it. Now, since you
don't want to take a risk, you will cast a trio which has already acted in together in a movie
before. The metric that you've chosen to check the popularity is the Facebook likes of each
of these actors.
The dataframe has three columns to help you out for the same, viz.
actor_1_facebook_likes, actor_2_facebook_likes, and
actor_3_facebook_likes. Your objective is to find the trios which has the most number
of Facebook likes combined. That is, the sum of actor_1_facebook_likes,
actor_2_facebook_likes and actor_3_facebook_likes should be maximum. Find
out the top 5 popular trios, and output their names in a list.
# Write your code here

• Subtask 2.5: Find the Most Popular Trios - II

In the previous subtask you found the popular trio based on the total number of facebook
likes. Let's add a small condition to it and make sure that all three actors are popular. The
condition is none of the three actors' Facebook likes should be less than half of the
other two. For example, the following is a valid combo:
• actor_1_facebook_likes: 70000
• actor_2_facebook_likes: 40000
• actor_3_facebook_likes: 50000
But the below one is not:
• actor_1_facebook_likes: 70000
• actor_2_facebook_likes: 40000
• actor_3_facebook_likes: 30000
since in this case, actor_3_facebook_likes is 30000, which is less than half of
actor_1_facebook_likes.

Having this condition ensures that you aren't getting any unpopular actor in your trio
(since the total likes calculated in the previous question doesn't tell anything about the
individual popularities of each actor in the trio.).
You can do a manual inspection of the top 5 popular trios you have found in the previous
subtask and check how many of those trios satisfy this condition. Also, which is the most
popular trio after applying the condition above? Write your answers in the markdown cell
provided below.
Write your answers below.
• No. of trios that satisfy the above condition: (your answer here)

• Most popular trio after applying the condition: (your answer here)
Optional: Even though you are finding this out by a natural inspection of the dataframe,
can you also achieve this through some if-else statements to incorporate this. You can try
this out on your own time after you are done with the assignment.
# Your answer here (optional and not graded)

• Subtask 2.6: Runtime Analysis

There is a column named Runtime in the dataframe which primarily shows the length of
the movie. It might be intersting to see how this variable this distributed. Plot a histogram
or distplot of seaborn to find the Runtime range most of the movies fall into.
# Runtime histogram/density plot

Checkpoint 3: Most of the movies appear to be sharply 2 hour-long.

• Subtask 2.7: R-Rated Movies

Although R rated movies are restricted movies for the under 18 age group, still there are
vote counts from that age group. Among all the R rated movies that have been voted by the
under-18 age group, find the top 10 movies that have the highest number of votes
i.e.CVotesU18 from the movies dataframe. Store these in a dataframe named PopularR.
# Write your code here

Checkpoint 4: Are these kids watching Deadpool a lot?

Task 3 : Demographic analysis

If you take a look at the last columns in the dataframe, most of these are related to
demographics of the voters (in the last subtask, i.e., 2.8, you made use one of these columns
- CVotesU18). We also have three genre columns indicating the genres of a particular
movie. We will extensively use these columns for the third and the final stage of our
assignment wherein we will analyse the voters across all demographics and also see how
these vary across various genres. So without further ado, let's get started with
demographic analysis.

• Subtask 3.1 Combine the Dataframe by Genres

There are 3 columns in the dataframe - genre_1, genre_2, and genre_3. As a part of this
subtask, you need to aggregate a few values over these 3 columns.
1. First create a new dataframe df_by_genre that contains genre_1, genre_2, and
genre_3 and all the columns related to CVotes/Votes from the movies data frame.
There are 47 columns to be extracted in total.
2. Now, Add a column called cnt to the dataframe df_by_genre and initialize it to
one. You will realise the use of this column by the end of this subtask.
3. First group the dataframe df_by_genre by genre_1 and find the sum of all the
numeric columns such as cnt, columns related to CVotes and Votes columns and
store it in a dataframe df_by_g1.
4. Perform the same operation for genre_2 and genre_3 and store it dataframes
df_by_g2 and df_by_g3 respectively.
5. Now that you have 3 dataframes performed by grouping over genre_1, genre_2,
and genre_3 separately, it's time to combine them. For this, add the three
dataframes and store it in a new dataframe df_add, so that the corresponding
values of Votes/CVotes get added for each genre.There is a function called add() in
pandas which lets you do this. You can refer to this link to see how this function
works.
https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFra
me.add.html
6. The column cnt on aggregation has basically kept the track of the number of
occurences of each genre.Subset the genres that have atleast 10 movies into a new
dataframe genre_top10 based on the cnt column value.
7. Now, take the mean of all the numeric columns by dividing them with the column
value cnt and store it back to the same dataframe. We will be using this dataframe
for further analysis in this task unless it is explicitly mentioned to use the dataframe
movies.
8. Since the number of votes can't be a fraction, type cast all the CVotes related
columns to integers. Also, round off all the Votes related columns upto two digits
after the decimal point.
# Create the dataframe df_by_genre

# Create a column cnt and initialize it to 1

# Group the movies by individual genres

# Add the grouped data frames and store it in a new data frame

# Extract genres with atleast 10 occurences

# Take the mean for every column by dividing with cnt

# Rounding off the columns of Votes to two decimals

# Converting CVotes to int type

If you take a look at the final dataframe that you have gotten, you will see that you now
have the complete information about all the demographic (Votes- and CVotes-related)
columns across the top 10 genres. We can use this dataset to extract exciting insights about
the voters!

• Subtask 3.2: Genre Counts!

Now let's derive some insights from this data frame. Make a bar chart plotting different
genres vs cnt using seaborn.
# Countplot for genres

Checkpoint 5: Is the bar for Drama the tallest?

• Subtask 3.3: Gender and Genre

If you have closely looked at the Votes- and CVotes-related columns, you might have
noticed the suffixes F and M indicating Female and Male. Since we have the vote counts for
both males and females, across various age groups, let's now see how the popularity of
genres vary between the two genders in the dataframe.
1. Make the first heatmap to see how the average number of votes of males is varying
across the genres. Use seaborn heatmap for this analysis. The X-axis should contain
the four age-groups for males, i.e., CVotesU18M,CVotes1829M, CVotes3044M, and
CVotes45AM. The Y-axis will have the genres and the annotation in the heatmap tell
the average number of votes for that age-male group.

2. Make the second heatmap to see how the average number of votes of females is
varying across the genres. Use seaborn heatmap for this analysis. The X-axis should
contain the four age-groups for females, i.e., CVotesU18F,CVotes1829F,
CVotes3044F, and CVotes45AF. The Y-axis will have the genres and the annotation
in the heatmap tell the average number of votes for that age-female group.

3. Make sure that you plot these heatmaps side by side using subplots so that you
can easily compare the two genders and derive insights.

4. Write your any three inferences from this plot. You can make use of the previous bar
plot also here for better insights. Refer to this link-
https://seaborn.pydata.org/generated/seaborn.heatmap.html. You might have to
plot something similar to the fifth chart in this page (You have to plot two such
heatmaps side by side).

5. Repeat subtasks 1 to 4, but now instead of taking the CVotes-related columns, you
need to do the same process for the Votes-related columns. These heatmaps will
show you how the two genders have rated movies across various genres.
You might need the below link for formatting your heatmap.
https://stackoverflow.com/questions/56942670/matplotlib-seaborn-first-and-last-row-
cut-in-half-of-heatmap-plot
• Note : Use genre_top10 dataframe for this subtask
# 1st set of heat maps for CVotes-related columns

Inferences: A few inferences that can be seen from the heatmap above is that males have
voted more than females, and Sci-Fi appears to be most popular among the 18-29 age
group irrespective of their gender. What more can you infer from the two heatmaps that
you have plotted? Write your three inferences/observations below:
• Inference 1:
• Inference 2:
• Inference 3:
# 2nd set of heat maps for Votes-related columns

Inferences: Sci-Fi appears to be the highest rated genre in the age group of U18 for both
males and females. Also, females in this age group have rated it a bit higher than the males
in the same age group. What more can you infer from the two heatmaps that you have
plotted? Write your three inferences/observations below:
• Inference 1:
• Inference 2:
• Inference 3:

• Subtask 3.4: US vs non-US Cross Analysis

The dataset contains both the US and non-US movies. Let's analyse how both the US and the
non-US voters have responded to the US and the non-US movies.
1. Create a column IFUS in the dataframe movies. The column IFUS should contain
the value "USA" if the Country of the movie is "USA". For all other countries other
than the USA, IFUS should contain the value non-USA.

2. Now make a boxplot that shows how the number of votes from the US people i.e.
CVotesUS is varying for the US and non-US movies. Make use of the column IFUS to
make this plot. Similarly, make another subplot that shows how non US voters have
voted for the US and non-US movies by plotting CVotesnUS for both the US and non-
US movies. Write any of your two inferences/observations from these plots.

3. Again do a similar analysis but with the ratings. Make a boxplot that shows how the
ratings from the US people i.e. VotesUS is varying for the US and non-US movies.
Similarly, make another subplot that shows how VotesnUS is varying for the US and
non-US movies. Write any of your two inferences/observations from these plots.
Note : Use movies dataframe for this subtask. Make use of this documention to format your
boxplot - https://seaborn.pydata.org/generated/seaborn.boxplot.html
# Creating IFUS column

# Box plot - 1: CVotesUS(y) vs IFUS(x)

Inferences: Write your two inferences/observations below:

• Inference 1:
• Inference 2:
# Box plot - 2: VotesUS(y) vs IFUS(x)

Inferences: Write your two inferences/observations below:

• Inference 1:
• Inference 2:

• Subtask 3.5: Top 1000 Voters Vs Genres

You might have also observed the column CVotes1000. This column represents the top
1000 voters on IMDb and gives the count for the number of these voters who have voted
for a particular movie. Let's see how these top 1000 voters have voted across the genres.
1. Sort the dataframe genre_top10 based on the value of CVotes1000in a descending
order.

2. Make a seaborn barplot for genre vs CVotes1000.

3. Write your inferences. You can also try to relate it with the heatmaps you did in the
previous subtasks.
# Sorting by CVotes1000

# Bar plot

Inferences: Write your inferences/observations here.

Checkpoint 6: The genre Romance seems to be most unpopular among the top 1000
voters.
With the above subtask, your assignment is over. In your free time, do explore the dataset
further on your own and see what kind of other insights you can get across various other
columns.

Hands-On Lab - Importing Data in R
No ratings yet
Hands-On Lab - Importing Data in R
8 pages
IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
Recommendation Engine Problem Statement
No ratings yet
Recommendation Engine Problem Statement
37 pages
IMDB Movie Analysis: by Biswajeet Nayak
No ratings yet
IMDB Movie Analysis: by Biswajeet Nayak
23 pages
Source Code
No ratings yet
Source Code
19 pages
DSLAB5
No ratings yet
DSLAB5
17 pages
Team Renegades MMLA Report
No ratings yet
Team Renegades MMLA Report
27 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
1st Harvard Project
No ratings yet
1st Harvard Project
17 pages
3 An Illustrative Analysis: 3.1 Gathering Data
No ratings yet
3 An Illustrative Analysis: 3.1 Gathering Data
11 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Final Project
No ratings yet
Final Project
7 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
14 pages
AIML Mod4 Loki
No ratings yet
AIML Mod4 Loki
11 pages
04 - Movie Rating Analysis
No ratings yet
04 - Movie Rating Analysis
9 pages
Project 5
No ratings yet
Project 5
5 pages
Import As Import As Import As Import Import As From Import: 'Ggplot'
No ratings yet
Import As Import As Import As Import Import As From Import: 'Ggplot'
13 pages
DOST AI - Coding Exercises v0.1
No ratings yet
DOST AI - Coding Exercises v0.1
12 pages
Movie Recommendation System Analysis
No ratings yet
Movie Recommendation System Analysis
8 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
SDM - Task B - Group 1G - Movies
No ratings yet
SDM - Task B - Group 1G - Movies
11 pages
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
No ratings yet
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
16 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
DA Lab Program-6
No ratings yet
DA Lab Program-6
4 pages
Movie Recommendation System in R Jupyter Notebook
No ratings yet
Movie Recommendation System in R Jupyter Notebook
18 pages
Python Project Description
No ratings yet
Python Project Description
4 pages
IMDB Dataframe Insights
No ratings yet
IMDB Dataframe Insights
3 pages
Recommendation Engine 1657857468
No ratings yet
Recommendation Engine 1657857468
15 pages
Vortex Tube Thesis
100% (3)
Vortex Tube Thesis
8 pages
NEEL (1) - Edited
No ratings yet
NEEL (1) - Edited
12 pages
Divya NM (1) - 2
No ratings yet
Divya NM (1) - 2
41 pages
NEEL
No ratings yet
NEEL
12 pages
Neel
No ratings yet
Neel
12 pages
Ex 3
No ratings yet
Ex 3
2 pages
Mechanisms in Modern Engineering Design PDF
100% (3)
Mechanisms in Modern Engineering Design PDF
618 pages
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
No ratings yet
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
24 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
All CLR
No ratings yet
All CLR
8 pages
METTL - Logical Building 1 - 2 and 3 Links
100% (1)
METTL - Logical Building 1 - 2 and 3 Links
2 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
SAP S - 4HANA Sourcing and Procurement - 1
100% (2)
SAP S - 4HANA Sourcing and Procurement - 1
36 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
DC9 072A Industrial
No ratings yet
DC9 072A Industrial
4 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
No ratings yet
Southern Province Grade 10 Information and Communication Technology Ict 2020 1 Term Test Paper 61e9422335b6f
13 pages
COM 428 - Jupyter Notebook2 - 101223
No ratings yet
COM 428 - Jupyter Notebook2 - 101223
16 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
Pega (PRPC) Concepts PDF
No ratings yet
Pega (PRPC) Concepts PDF
14 pages
SCADA System of NLDC
100% (1)
SCADA System of NLDC
38 pages
Naan Muthalvan Practical Sample
No ratings yet
Naan Muthalvan Practical Sample
7 pages
Report
No ratings yet
Report
26 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
Reda Hps PDF
100% (1)
Reda Hps PDF
1 page
NM Assignment
No ratings yet
NM Assignment
14 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
Recommender System
No ratings yet
Recommender System
45 pages
Syllabus Computer Class-3
No ratings yet
Syllabus Computer Class-3
9 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
Datos Tecnicos RLN
No ratings yet
Datos Tecnicos RLN
7 pages
Integration-And System Testing: O O S C
No ratings yet
Integration-And System Testing: O O S C
32 pages
MIT Data Science and Big Data Analytics Case Study
No ratings yet
MIT Data Science and Big Data Analytics Case Study
8 pages
RHLS User Guidelines PDF
No ratings yet
RHLS User Guidelines PDF
50 pages
2.AquaArm SBS 3000X
No ratings yet
2.AquaArm SBS 3000X
3 pages
Hussein Abdullahi Elmi: Personal Profile
No ratings yet
Hussein Abdullahi Elmi: Personal Profile
3 pages
BIM Project Delivery Waste
No ratings yet
BIM Project Delivery Waste
6 pages
IP CSV Project For Class 12
No ratings yet
IP CSV Project For Class 12
22 pages
Unit III 8254
No ratings yet
Unit III 8254
29 pages
New Low Rank Optimization Model and Convex Approach For Robust Spectral Compressed Sensing
No ratings yet
New Low Rank Optimization Model and Convex Approach For Robust Spectral Compressed Sensing
13 pages
Walmart's (Key Success Factors)
No ratings yet
Walmart's (Key Success Factors)
4 pages
Typical Slab and Beams and Columns Bbs 1st 9th Floor
No ratings yet
Typical Slab and Beams and Columns Bbs 1st 9th Floor
19 pages
ISAAC Info For Online Portfolio: About
No ratings yet
ISAAC Info For Online Portfolio: About
1 page
Soal
No ratings yet
Soal
14 pages
ER04242
No ratings yet
ER04242
5 pages
YATO Konteyner 9
No ratings yet
YATO Konteyner 9
8 pages
Assessment User Experience Responsive Web Applications Case Study
No ratings yet
Assessment User Experience Responsive Web Applications Case Study
8 pages
Mais Lang Atong Lungagon
No ratings yet
Mais Lang Atong Lungagon
1 page
Code Wars 2024 Sponsorship
No ratings yet
Code Wars 2024 Sponsorship
9 pages
IMDB Movie Analysis Report
No ratings yet
IMDB Movie Analysis Report
11 pages
EN - Update0910 - Datasheet BDH-800
No ratings yet
EN - Update0910 - Datasheet BDH-800
2 pages
DCCN Lab
No ratings yet
DCCN Lab
37 pages
MTCP NJ Client
No ratings yet
MTCP NJ Client
4 pages

IMDb+Movie+Assignment Stub

Uploaded by

IMDb+Movie+Assignment Stub

Uploaded by

# Filtering out the warnings

# Importing the required libraries

IMDb Movie Assignment

Task 1: Reading the data

• Subtask 1.2: Inspect the Dataframe

# Check the column-wise info of the dataframe

# Check the summary for the numeric columns

Task 2: Data Analysis

• Subtask 2.2: Let's Talk Profit!

# Get the top 10 profitable movies by using position based indexing.

#Plot profit vs budget

• Subtask 2.3: The General Audience and the Critics

# Find the average ratings

#Sort in descending order of average rating

• Subtask 2.4: Find the Most Popular Trios - I

• Subtask 2.5: Find the Most Popular Trios - II

• Subtask 2.6: Runtime Analysis

Checkpoint 3: Most of the movies appear to be sharply 2 hour-long.

• Subtask 2.7: R-Rated Movies

Checkpoint 4: Are these kids watching Deadpool a lot?

Task 3 : Demographic analysis

• Subtask 3.1 Combine the Dataframe by Genres

# Create a column cnt and initialize it to 1

# Group the movies by individual genres

# Extract genres with atleast 10 occurences

# Take the mean for every column by dividing with cnt

# Rounding off the columns of Votes to two decimals

# Converting CVotes to int type

• Subtask 3.2: Genre Counts!

Checkpoint 5: Is the bar for Drama the tallest?

• Subtask 3.3: Gender and Genre

• Subtask 3.4: US vs non-US Cross Analysis

# Box plot - 1: CVotesUS(y) vs IFUS(x)

Inferences: Write your two inferences/observations below:

Inferences: Write your two inferences/observations below:

• Subtask 3.5: Top 1000 Voters Vs Genres

2. Make a seaborn barplot for genre vs CVotes1000.

Inferences: Write your inferences/observations here.

You might also like