0% found this document useful (0 votes)
9 views27 pages

Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram

The document outlines a business case study focused on analyzing Netflix data to generate insights for content production and business growth in various countries. It details the data exploration process, including data cleaning, handling null values, and merging datasets to create a comprehensive view of shows and movies. The final dataset includes information on titles, actors, directors, genres, countries, and other relevant attributes, ready for further analysis.

Uploaded by

maxy23065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views27 pages

Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram

The document outlines a business case study focused on analyzing Netflix data to generate insights for content production and business growth in various countries. It details the data exploration process, including data cleaning, handling null values, and merging datasets to create a comprehensive view of shows and movies. The final dataset includes information on titles, actors, directors, genres, countries, and other relevant attributes, ready for further analysis.

Uploaded by

maxy23065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation..

Sonam Meshram - Colaboratory

AIM :- Analyze the data and generate insights that could help Netflix in deciding which type of shows/movies to produce and how they can grow
the business in different countries.

Double-click (or enter) to edit

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from google.colab import drive


drive.mount('/content/drive')

Mounted at /content/drive

df=pd.read_csv('/content/drive/MyDrive/Copy of d2beiqkhq929f0.cloudfront.net_public_assets_assets_000_000_940_original_netfli
df.head()

show_id type title director cast country date_added release_

Dick
Kirsten United September
0 s1 Movie Johnson Is NaN
Johnson States 25, 2021
Dead

Ama
Qamata,
Khosi
TV Blood & South September
1 s2 NaN Ngema,
Show Water Africa 24, 2021
Gail
Mabalane,
Thaban...

Sami
Bouajila,
Tracy
TV Julien September
2 3 G l d G t N N

#df=pd.read_csv('/content/drive/MyDrive/bq-results-20230623-060109-1687502380344/d2beiqkhq929f0.cloudfront.net_public_assets_

#df.head()

#length of data
Automatic saving failed. This file was updated remotely or in another tab.
len(df) Show diff

8807

#checking data types


df.dtypes

show_id object
type object
title object
director object
cast object
country object
date_added object
release_year int64
rating object
duration object
listed_in object
description object
dtype: object

#number of unique values in our data


for i in df.columns:
print(i,':',df[i].nunique())

show_id : 8807
type : 2
title : 8807
director : 4528
cast : 7692
country : 748
date_added : 1767
release_year : 74

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 1/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
rating : 17
duration : 220
listed_in : 514
description : 8775

#checking null values in every column of our data


df.isnull().sum()/len(df)*100

show_id 0.000000
type 0.000000
title 0.000000
director 29.908028
cast 9.367549
country 9.435676
date_added 0.113546
release_year 0.000000
rating 0.045418
duration 0.034064
listed_in 0.000000
description 0.000000
dtype: float64

#checking thhe occurences of each of the rating


df['rating'].value_counts()

TV-MA 3207
TV-14 2160
TV-PG 863
R 799
PG-13 490
TV-Y7 334
TV-Y 307
PG 287
TV-G 220
NR 80
G 41
TV-Y7-FV 6
NC-17 3
UR 3
74 min 1
84 min 1
66 min 1
Name: rating, dtype: int64

#unnesting the directors column, i.e creating separate lines for each director
constraint1=df['director'].apply(lambda x: str(x).split(', ')).tolist()
df_new1=pd.DataFrame(constraint1,index=df['title'])
df_new1=df_new1.stack()
df_new1=pd.DataFrame(df_new1.reset_index())
df_new1.rename(columns={0:'Directors'},inplace=True)
Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_new1.drop(['level_1'],axis=1,inplace=True)
df_new1.head()

title Directors

0 Dick Johnson Is Dead Kirsten Johnson

1 Blood & Water nan

2 Ganglands Julien Leclercq

3 Jailbirds New Orleans nan

4 Kota Factory nan

#unnesting the cast column, i.e creating separate lines for each cast member
constraint2=df['cast'].apply(lambda x: str(x).split(', ')).tolist()
df_new2=pd.DataFrame(constraint2,index=df['title'])
df_new2=df_new2.stack()
df_new2=pd.DataFrame(df_new2.reset_index())
df_new2.rename(columns={0:'Actors'},inplace=True)
df_new2.drop(['level_1'],axis=1,inplace=True)
df_new2.head()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 2/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

title Actors

0 Dick Johnson
##unnesting Is Dead column, i.e-
the listed_in nan creating separate lines for each genre in a mo
constraint3=df['listed_in'].apply(lambda x: str(x).split(', ')).tolist()
1 Blood & Water Ama Qamata
df_new3=pd.DataFrame(constraint3,index=df['title'])
df_new3=df_new3.stack()
2 Blood & Water Khosi Ngema
df_new3=pd.DataFrame(df_new3.reset_index())
3 Blood & Water Gail Mabalane
df_new3.rename(columns={0:'Genre'},inplace=True)
df_new3.drop(['level_1'],axis=1,inplace=True)
4 Blood & Water Thabang Molaba
df_new3.head()

title Genre

0 Dick Johnson Is Dead Documentaries

1 Blood & Water International TV Shows

2 Blood & Water TV Dramas

3 Blood & Water TV Mysteries

4 Ganglands Crime TV Shows

#unnesting the country column, i.e- creating separate lines for each country in a mo
constraint4=df['country'].apply(lambda x: str(x).split(', ')).tolist()
df_new4=pd.DataFrame(constraint4,index=df['title'])
df_new4=df_new4.stack()
df_new4=pd.DataFrame(df_new4.reset_index())
df_new4.rename(columns={0:'country'},inplace=True)
df_new4.drop(['level_1'],axis=1,inplace=True)
df_new4.head()

title country

0 Dick Johnson Is Dead United States

1 Blood & Water South Africa

2 Ganglands nan

3 Jailbirds New Orleans nan

4 Kota Factory India

#merging the unnested director data with unnested actors data


df_new5=df_new2.merge(df_new1,on=['title'],how='inner')
#merging the above merged data with unnested genre data
df_new6=df_new5.merge(df_new3,on=['title'],how='inner')
#merging the above merged data with unnested country data
df_new=df_new6.merge(df_new4,on=['title'],how='inner')
Automatic saving failed. This file was updated remotely or in another tab. Show diff

#replacing nan values of director and actor by Unknown Actor and Director
df_new['Actors'].replace(['nan'],['Unknown Actor'],inplace=True)
df_new['Directors'].replace(['nan'],['Unknown Director'],inplace=True)
df_new['country'].replace(['nan'],[np.nan],inplace=True)

df_new.head()

title Actors Directors Genre country

Dick Johnson Is Unknown United


0 Kirsten Johnson Documentaries
Dead Actor States

Unknown International TV
1 Blood & Water Ama Qamata South Africa
Director Shows

Unknown
2 Blood & Water Ama Qamata TV Dramas South Africa
Director

Unknown
3 Bl d&W t A Q t TV M t i S th Af i

#merging our unnested data with the original data


df_final=df_new.merge(df[['show_id', 'type', 'title', 'date_added',
'release_year', 'rating', 'duration']],on=['title'],how='left')
df_final.head()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 3/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
#now checking nulls
df_final.isnull().sum()
Blood & Ama Unknown International South TV Septem
1 s2
Water Qamata Director TV Shows Africa Show 24, 20
title 0
Blood &
Actors Ama 0 Unknown South TV Septem
2 TV Dramas s2
Water
Directors Qamata 0 Director Africa Show 24, 20
Genre 0
country 11897
show_id 0
type 0
date_added 158
release_year 0
rating 67
duration 3
dtype: int64

In duration column, it was observed that the nulls had values which were written in corresponding ratings column, i.e- you can't expect ratings to
be in min. So the duration column nulls are replaced by corresponding values in ratings column

df_final.loc[df_final['duration'].isnull(),'duration']=df_final.loc[df_final['duration'].isnull(),'duration'].fillna(df_final
df_final.loc[df_final['rating'].str.contains('min', na=False),'rating']='NR'
df_final.isnull().sum()

title 0
Actors 0
Directors 0
Genre 0
country 11897
show_id 0
type 0
date_added 158
release_year 0
rating 67
duration 0
dtype: int64

#Ratings can't be in min, so it has been made NR(i.e- Non Rated)


df_final.loc[df_final['rating'].str.contains('min', na=False),'rating']='NR'
df_final['rating'].fillna('NR',inplace=True)
pd.set_option('display.max_rows',None)

#just an attempt to observe nulls in date_added column


df_final[df_final['date_added'].isnull()].head()

Automatic saving failed. This file was updated remotely or in another tab. Show diff
title Actors Directors Genre country show_id type date_ad

A Young
Doctor's
Notebook Daniel Unknown British TV United TV
136893 s6067
and Radcliffe Director Shows Kingdom Show
Other
Stories

A Young
Doctor's

#date added column is imputed on the basis of release year,


#when release year was 2013.So below piece of code just checks the mode of date added in group
# and imputes in place of nulls the corresponding mode
for i in df_final[df_final['date_added'].isnull()]['release_year'].unique():
imp=df_final[df_final['release_year']==i]['date_added'].mode().values[0]
df_final.loc[df_final['release_year']==i,'date_added']=df_final.loc[df_final['release_year']==i,'date_added'].fillna(imp)

df_final[df_final['date_added'].isnull()].head()

title Actors Directors Genre country show_id type date_added releas

for i in df_final[df_final['country'].isnull()]['Directors'].unique():
if i in df_final[~df_final['country'].isnull()]['Directors'].unique():
imp=df_final[df_final['Directors']==i]['country'].mode().values[0]
df_final.loc[df_final['Directors']==i,'country']=df_final.loc[df_final['Directors']==i,'country'].fillna(imp)

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 4/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

So we imputed the country column on the basis of directors whose other movie titles had countries given. But there might be directors who
have only one occurence in our data. In that scenario, I have used Actors as a basis. i.e- for this Actor majorly acts in movies of which country?
Imputation has been done on this basis. For remaining rows, country has been filled as Unknown Country

for i in df_final[df_final['country'].isnull()]['Actors'].unique():
if i in df_final[~df_final['country'].isnull()]['Actors'].unique():
imp=df_final[df_final['Actors']==i]['country'].mode().values[0]
df_final.loc[df_final['Actors']==i,'country']=df_final.loc[df_final['Actors']==i,'country'].fillna(imp)
#If there are still nulls, I just replace it by Unknown Country
df_final['country'].fillna('Unknown Country',inplace=True)
df_final.isnull().sum()

title 0
Actors 0
Directors 0
Genre 0
country 0
show_id 0
type 0
date_added 0
release_year 0
rating 0
duration 0
dtype: int64

df_final.isnull().sum()

title 0
Actors 0
Directors 0
Genre 0
country 0
show_id 0
type 0
date_added 0
release_year 0
rating 0
duration 0
dtype: int64

df_final.head()

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1
Automatic saving s2
Waterfailed. This file wasDirector
Qamata updated remotely or in anotherAfrica
TV Shows tab. Show diff Show 24, 20

Blood & Ama Unknown South TV Septem


2 TV Dramas s2
Water Qamata Director Africa Show 24, 20

df_final['duration'].value_counts()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 5/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
253 min 21
15 min 20
167 min 20
233 min 18
237 min 18
49 min 16
37 min 16
43 min 16
312 min 15
12 min 14
31 min 13
191 min 13
230 min 12
41 min 11
19 min 8
273 min 7
34 min 6
17 min 5
39 min 5
10 min 4
16 min 4
196 min 4
20 min 4
18 min 4
3 min 4
5 min 3
11 min 2
8 min 2
9 min 2
Name: duration, dtype: int64

#removing mins from data


df_final['duration']=df_final['duration'].str.replace(" min","")
df_final.head()

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1 s2
Water Qamata Director TV Shows Africa Show 24, 20

Blood & Ama Unknown South TV Septem


2 TV Dramas s2
Water Qamata Director Africa Show 24, 20

df_final['duration'].unique()

array(['90', '2 Seasons', '1 Season', '91', '125', '9 Seasons', '104',
'127', '4 Seasons', '67', '94', '5 Seasons', '161', '61', '166',
'147', '103', '97', '106', '111', '3 Seasons', '110', '105', '96',
'124', '116', '98', '23', '115', '122', '99', '88', '100',
Automatic saving failed. This file was updated remotely or in another tab. Show diff
'6 Seasons', '102', '93', '95', '85', '83', '113', '13', '182',
'48', '145', '87', '92', '80', '117', '128', '119', '143', '114',
'118', '108', '63', '121', '142', '154', '120', '82', '109', '101',
'86', '229', '76', '89', '156', '112', '107', '129', '135', '136',
'165', '150', '133', '70', '84', '140', '78', '7 Seasons', '64',
'59', '139', '69', '148', '189', '141', '130', '138', '81', '132',
'10 Seasons', '123', '65', '68', '66', '62', '74', '131', '39',
'46', '38', '8 Seasons', '17 Seasons', '126', '155', '159', '137',
'12', '273', '36', '34', '77', '60', '49', '58', '72', '204',
'212', '25', '73', '29', '47', '32', '35', '71', '149', '33', '15',
'54', '224', '162', '37', '75', '79', '55', '158', '164', '173',
'181', '185', '21', '24', '51', '151', '42', '22', '134', '177',
'13 Seasons', '52', '14', '53', '8', '57', '28', '50', '9', '26',
'45', '171', '27', '44', '146', '20', '157', '17', '203', '41',
'30', '194', '15 Seasons', '233', '237', '230', '195', '253',
'152', '190', '160', '208', '180', '144', '5', '174', '170', '192',
'209', '187', '172', '16', '186', '11', '193', '176', '56', '169',
'40', '10', '3', '168', '312', '153', '214', '31', '163', '19',
'12 Seasons', '179', '11 Seasons', '43', '200', '196', '167',
'178', '228', '18', '205', '201', '191'], dtype=object)

df_final['duration_copy']=df_final['duration'].copy()
df_final1=df_final.copy()

df_final1.loc[df_final1['duration_copy'].str.contains('Season'),'duration_copy']=0
df_final1['duration_copy']=df_final1['duration_copy'].astype('int')
df_final1.head()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 6/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1
df_final1['duration_copy'].describe() s2
Water Qamata Director TV Shows Africa Show 24, 20

count
Blood &201991.000000
Ama Unknown South TV Septem
2 TV Dramas s2
mean Water 77.152789 Director
Qamata Africa Show 24, 20
std 52.269154
min 0.000000
25% 0.000000
50% 95.000000
75% 112.000000
max 312.000000
Name: duration_copy, dtype: float64

import seaborn as sns


sns.distplot(df_final1['duration_copy'], hist=True, kde=True,
bins=int(36), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 4})
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_final1['duration'].value_counts()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 7/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
49 16
37 16
43 16
312 15
12 14
31 13
191 13
230 12
41 11
19 8
273 7
34 6
17 5
39 5
10 4
16 4
196 4
20 4
18 4
3 4
5 3
11 2
8 2
9 2
Name: duration, dtype: int64

from datetime import datetime


from dateutil.parser import parse
arr=[]
for i in df_final1['date_added'].values:
dt1=parse(i)
arr.append(dt1.strftime('%Y-%m-%d'))
df_final1['Modified_Added_date'] =arr
df_final1['Modified_Added_date']=pd.to_datetime(df_final1['Modified_Added_date'])
df_final1['month_added']=df_final1['Modified_Added_date'].dt.month
df_final1['week_Added']=df_final1['Modified_Added_date'].dt.week
df_final1['year']=df_final1['Modified_Added_date'].dt.year
df_final1.head()

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1 s2
Water Qamata Director TV Shows Africa Show 24, 20

Blood & Ama Unknown South TV Septem


2 TV Dramas s2
Water Qamata Director Africa Show 24, 20

Blood & Ama Unknown South TV Septem


3 TV Mysteries s2
Water Qamata Director Africa Show 24, 20
AutomaticBlood
saving&failed.Khosi
This file was updated remotely
Unknown or in anotherSouth
International tab. Show diff TV Septem
4 s2
Water Ngema Director TV Shows Africa Show 24, 20

#presence of brackets and content between brackets is removed.


df_final1['title']=df_final1['title'].str.replace(r"\(.*\)","")
df_final1.head()

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1 s2
Water Qamata Director TV Shows Africa Show 24, 20

Blood & Ama Unknown South TV Septem


2 TV Dramas s2
Water Qamata Director Africa Show 24, 20

Blood & Ama Unknown South TV Septem


3 TV Mysteries s2
Water Qamata Director Africa Show 24, 20

Blood & Khosi Unknown International South TV Septem


4 s2
Water Ngema Director TV Shows Africa Show 24, 20

Univariate Analysis in terms of counts of each column

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 8/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
#number of distinct titles on the basis of genre
df_final1.groupby(['Genre']).agg({"title":"nunique"}).sort_values(by=['title'],ascending=False)
TV Comedies 581

Thrillers 573

Crime TV Shows 470

Kids' TV 451

Docuseries 395

Music & Musicals 372

Romantic TV Shows 370

Horror Movies 353

Stand-Up Comedy 343

Reality TV 255

British TV Shows 253

Sci-Fi & Fantasy 243

Sports Movies 219

Anime Series 176

Spanish-Language TV Shows 174

TV Action & Adventure 168

Korean TV Shows 151

Classic Movies 116

LGBTQ Movies 102

TV Mysteries 98

Science & Nature TV 92

TV Sci-Fi & Fantasy 84

TV Horror 75

Anime Features 71

Cult Movies 71

Teen TV Shows 69

Faith & Spirituality 65

TV Thrillers 57

Movies 57
Automatic saving failed. This file was updated remotely or in another tab. Show diff
Stand-Up Comedy & Talk Shows 56

Classic & Cult TV 28

TV Shows 16

df_genre=df_final1.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 9/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
plt.ylabel('Genres')
plt.show()

df_final1.groupby(['type']).agg({"title":"nunique"})

title

type

Movie 6115

TV Show 2676

Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_type=df_final1.groupby(['type']).agg({"title":"nunique"}).reset_index()
plt.pie(df_type['title'],explode=(0.05,0.05), labels=df_type['type'],colors=['yellow','blue'],autopct='%.lf%%')
plt.show()

#number of distinct titles on the basis of country


df_final1.groupby(['country']).agg({"title":"nunique"})

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 10/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Senegal 3

Serbia 7

Singapore 41

Slovakia 1

Slovenia 3

Somalia 1

South Africa 65

South Korea 235

Soviet Union 3

Spain 239

Sri Lanka 1

Sudan 1

Sweden 44

Switzerland 19

Syria 3

Taiwan 94

Thailand 74

Turkey 115

Uganda 1

Ukraine 3

United Arab Emirates 38

United Kingdom 829

United Kingdom, 2

United States 4245

United States, 1

Unknown Country 175

Uruguay 14

Vatican City 1

Venezuela 4

Vietnam 7

AutomaticWest Germany
saving failed. This file was5updated remotely or in another tab. Show diff
Zimbabwe 3

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 11/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 12/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

The above dataframe shows a flaw in which we are seeing countries, such as Cambodia and Cambodia, or United States and United States, are
Automatic saving failed. This file was updated remotely or in another tab. Show diff
shown as different countries.They should have been same

df_final1['country'] = df_final1['country'].str.replace(',', '')


df_final1.head()

title Actors Directors Genre country show_id type date_add

Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead

Blood & Ama Unknown International South TV Septem


1 s2
Water Qamata Director TV Shows Africa Show 24, 20

Blood & Ama Unknown South TV Septem


2 TV Dramas s2
Water Qamata Director Africa Show 24, 20

Blood & Ama Unknown South TV Septem


3 TV Mysteries s2
Water Qamata Director Africa Show 24, 20

Blood & Khosi Unknown International South TV Septem


4 s2
Water Ngema Director TV Shows Africa Show 24, 20

#number of distinct titles on the basis of country


df_final1.groupby(['country']).agg({"title":"nunique"})

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 13/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Samoa 1

Saudi Arabia 14

Senegal 3

Serbia 7

Singapore 41

Slovakia 1

Slovenia 3

Somalia 1

South Africa 65

South Korea 235

Soviet Union 3

Spain 239

Sri Lanka 1

Sudan 1

Sweden 44

Switzerland 19

Syria 3

Taiwan 94

Thailand 74

Turkey 115

Uganda 1

Ukraine 3

United Arab Emirates 38

United Kingdom 831

United States 4246

Unknown Country 175

Uruguay 14

Vatican City 1

Venezuela 4

Vietnam 7

AutomaticWest Germany
saving failed. This file was5updated remotely or in another tab. Show diff
Zimbabwe 3

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 14/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 15/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

df_country=df_final1.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,10))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 16/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

#number of distinct titles on the basis of rating


df_final1.groupby(['rating']).agg({"title":"nunique"})

title

rating

G 41

NC-17 3

NR 87

PG 287

PG-13 490

R 799

TV-14 2151

TV-G 220

TV-MA 3204

TV-PG 863

TV-Y 305

TV-Y7 334

TV-Y7-FV 6

UR 3

df_rating=df_final1.groupby(['rating']).agg({"title":"nunique"}).reset_index()
plt.figure(figsize=(15,8))
plt.barh(df_rating[::-1]['rating'], df_rating[::-1]['title'],color=['violet'])
plt.xlabel('Frequency by Ratings')
plt.ylabel('Ratings')
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 17/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

Most of the highly rated content on Netflix is intended for Mature Audiences, R Rated, content not intended for audience under 14 and those
which require Parental Guidance

#number of distinct titles on the basis of duration


df_final1.groupby(['duration']).agg({"title":"nunique"})

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 18/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
72 33

73 30

74 32

75 35

76 31

77 30

78 45

79 35

8 1

8 Seasons 17

80 43

81 62

82 52

83 65

84 68

85 73

86 103

87 101

88 116

89 106

9 1

9 Seasons 9

90 152

91 144

92 129

93 146

94 146

95 137

96 130

97 146

98
Automatic saving failed.118
This file was updated remotely or in another tab. Show diff
99 118

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 19/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 20/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 21/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

df_duration=df_final1.groupby(['duration']).agg({"title":"nunique"}).reset_index()
Automatic saving failed. This file was updated remotely or in another tab.
plt.figure(figsize=(15,8)) Show diff
plt.barh(df_duration[::-1]['duration'], df_duration[::-1]['title'],color=['pink'])
plt.xlabel('Frequency by Duration')
plt.ylabel('Duration')
plt.show()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 22/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

df_year=df_final1.groupby(['year']).agg({"title":"nunique"}).reset_index()
sns.lineplot(data=df_year, x='year', y='title')
plt.ylabel("Movies Released in the Year")
plt.xlabel("Year")
plt.show()

The Amount of Content across Netflix has increased from 2008 continuously till 2019. Then started decreasing from here(probably due to
Covid)

#number of distinct titles on the basis of week


df_final1.groupby(['week_Added']).agg({"title":"nunique"})

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 23/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

title

week_Added

1 372

2 108

3 113

4 88

5 208

6 97

7 147

8 110

9 254

10 135

11 163

12 109

13 250

14 173

15 153

16 160

17 154

18 234

19 116

20 130

21 117

22 206

23 151

24 164

25 143

26 269

27 241

28 131
Automatic saving failed. This file was updated remotely or in another tab. Show diff
29 140

30 160

31 269

32 118

33 153
df_week=df_final1.groupby(['week_Added']).agg({"title":"nunique"}).reset_index()
34 139
plt.figure(figsize=(15,8))
sns.lineplot(data=df_week,
35 265 x='week_Added', y='title')
plt.ylabel("Movies Released in the Week")
36
plt.xlabel("Week 142
No.")
plt.show()
37 183

38 139

39 165

40 287

41 116

42 133

43 116

44 318

45 98

46 134

47 120

48 200
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 24/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

49 140

50 189

51 137

52 132

53 104

Most of the Content across Netflix is added in the first week of the year and it follows a bit of a cyclical pattern

df_month=df_final1.groupby(['month_added']).agg({"title":"nunique"}).reset_index()
sns.lineplot(data=df_month, x='month_added', y='title')
plt.ylabel("Movies Released in the Month")
plt.xlabel("Month")
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff

Most of the content is added in the first and last months across Netflix(reinstating what we observed for first week in baove plot )

Univariate Analysis separately for shows and movies

df_shows=df_final1[df_final1['type']=='TV Show']
df_movies=df_final1[df_final1['type']=='Movie']

df_genre=df_shows.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')
plt.ylabel('Genres')
plt.show()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 25/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

International TV Shows, Dramas and Comedy Genres are popular across TV Shows in Netflix

df_genre=df_movies.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')
plt.ylabel('Genres')
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff

International Movies, Dramas and Comedy Genres are popular followed by Documentaries across Movies on Netflix

df_country=df_shows.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 26/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory

df_country=df_movies.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()

Automatic saving failed. This file was updated remotely or in another tab. Show diff

https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 27/27

You might also like