Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
AIM :- Analyze the data and generate insights that could help Netflix in deciding which type of shows/movies to produce and how they can grow
the business in different countries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
Mounted at /content/drive
df=pd.read_csv('/content/drive/MyDrive/Copy of d2beiqkhq929f0.cloudfront.net_public_assets_assets_000_000_940_original_netfli
df.head()
Dick
Kirsten United September
0 s1 Movie Johnson Is NaN
Johnson States 25, 2021
Dead
Ama
Qamata,
Khosi
TV Blood & South September
1 s2 NaN Ngema,
Show Water Africa 24, 2021
Gail
Mabalane,
Thaban...
Sami
Bouajila,
Tracy
TV Julien September
2 3 G l d G t N N
#df=pd.read_csv('/content/drive/MyDrive/bq-results-20230623-060109-1687502380344/d2beiqkhq929f0.cloudfront.net_public_assets_
#df.head()
#length of data
Automatic saving failed. This file was updated remotely or in another tab.
len(df) Show diff
8807
show_id object
type object
title object
director object
cast object
country object
date_added object
release_year int64
rating object
duration object
listed_in object
description object
dtype: object
show_id : 8807
type : 2
title : 8807
director : 4528
cast : 7692
country : 748
date_added : 1767
release_year : 74
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 1/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
rating : 17
duration : 220
listed_in : 514
description : 8775
show_id 0.000000
type 0.000000
title 0.000000
director 29.908028
cast 9.367549
country 9.435676
date_added 0.113546
release_year 0.000000
rating 0.045418
duration 0.034064
listed_in 0.000000
description 0.000000
dtype: float64
TV-MA 3207
TV-14 2160
TV-PG 863
R 799
PG-13 490
TV-Y7 334
TV-Y 307
PG 287
TV-G 220
NR 80
G 41
TV-Y7-FV 6
NC-17 3
UR 3
74 min 1
84 min 1
66 min 1
Name: rating, dtype: int64
#unnesting the directors column, i.e creating separate lines for each director
constraint1=df['director'].apply(lambda x: str(x).split(', ')).tolist()
df_new1=pd.DataFrame(constraint1,index=df['title'])
df_new1=df_new1.stack()
df_new1=pd.DataFrame(df_new1.reset_index())
df_new1.rename(columns={0:'Directors'},inplace=True)
Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_new1.drop(['level_1'],axis=1,inplace=True)
df_new1.head()
title Directors
#unnesting the cast column, i.e creating separate lines for each cast member
constraint2=df['cast'].apply(lambda x: str(x).split(', ')).tolist()
df_new2=pd.DataFrame(constraint2,index=df['title'])
df_new2=df_new2.stack()
df_new2=pd.DataFrame(df_new2.reset_index())
df_new2.rename(columns={0:'Actors'},inplace=True)
df_new2.drop(['level_1'],axis=1,inplace=True)
df_new2.head()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 2/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
title Actors
0 Dick Johnson
##unnesting Is Dead column, i.e-
the listed_in nan creating separate lines for each genre in a mo
constraint3=df['listed_in'].apply(lambda x: str(x).split(', ')).tolist()
1 Blood & Water Ama Qamata
df_new3=pd.DataFrame(constraint3,index=df['title'])
df_new3=df_new3.stack()
2 Blood & Water Khosi Ngema
df_new3=pd.DataFrame(df_new3.reset_index())
3 Blood & Water Gail Mabalane
df_new3.rename(columns={0:'Genre'},inplace=True)
df_new3.drop(['level_1'],axis=1,inplace=True)
4 Blood & Water Thabang Molaba
df_new3.head()
title Genre
#unnesting the country column, i.e- creating separate lines for each country in a mo
constraint4=df['country'].apply(lambda x: str(x).split(', ')).tolist()
df_new4=pd.DataFrame(constraint4,index=df['title'])
df_new4=df_new4.stack()
df_new4=pd.DataFrame(df_new4.reset_index())
df_new4.rename(columns={0:'country'},inplace=True)
df_new4.drop(['level_1'],axis=1,inplace=True)
df_new4.head()
title country
2 Ganglands nan
#replacing nan values of director and actor by Unknown Actor and Director
df_new['Actors'].replace(['nan'],['Unknown Actor'],inplace=True)
df_new['Directors'].replace(['nan'],['Unknown Director'],inplace=True)
df_new['country'].replace(['nan'],[np.nan],inplace=True)
df_new.head()
Unknown International TV
1 Blood & Water Ama Qamata South Africa
Director Shows
Unknown
2 Blood & Water Ama Qamata TV Dramas South Africa
Director
Unknown
3 Bl d&W t A Q t TV M t i S th Af i
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 3/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Dick
Unknown Kirsten United Septem
0Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
#now checking nulls
df_final.isnull().sum()
Blood & Ama Unknown International South TV Septem
1 s2
Water Qamata Director TV Shows Africa Show 24, 20
title 0
Blood &
Actors Ama 0 Unknown South TV Septem
2 TV Dramas s2
Water
Directors Qamata 0 Director Africa Show 24, 20
Genre 0
country 11897
show_id 0
type 0
date_added 158
release_year 0
rating 67
duration 3
dtype: int64
In duration column, it was observed that the nulls had values which were written in corresponding ratings column, i.e- you can't expect ratings to
be in min. So the duration column nulls are replaced by corresponding values in ratings column
df_final.loc[df_final['duration'].isnull(),'duration']=df_final.loc[df_final['duration'].isnull(),'duration'].fillna(df_final
df_final.loc[df_final['rating'].str.contains('min', na=False),'rating']='NR'
df_final.isnull().sum()
title 0
Actors 0
Directors 0
Genre 0
country 11897
show_id 0
type 0
date_added 158
release_year 0
rating 67
duration 0
dtype: int64
Automatic saving failed. This file was updated remotely or in another tab. Show diff
title Actors Directors Genre country show_id type date_ad
A Young
Doctor's
Notebook Daniel Unknown British TV United TV
136893 s6067
and Radcliffe Director Shows Kingdom Show
Other
Stories
A Young
Doctor's
df_final[df_final['date_added'].isnull()].head()
for i in df_final[df_final['country'].isnull()]['Directors'].unique():
if i in df_final[~df_final['country'].isnull()]['Directors'].unique():
imp=df_final[df_final['Directors']==i]['country'].mode().values[0]
df_final.loc[df_final['Directors']==i,'country']=df_final.loc[df_final['Directors']==i,'country'].fillna(imp)
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 4/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
So we imputed the country column on the basis of directors whose other movie titles had countries given. But there might be directors who
have only one occurence in our data. In that scenario, I have used Actors as a basis. i.e- for this Actor majorly acts in movies of which country?
Imputation has been done on this basis. For remaining rows, country has been filled as Unknown Country
for i in df_final[df_final['country'].isnull()]['Actors'].unique():
if i in df_final[~df_final['country'].isnull()]['Actors'].unique():
imp=df_final[df_final['Actors']==i]['country'].mode().values[0]
df_final.loc[df_final['Actors']==i,'country']=df_final.loc[df_final['Actors']==i,'country'].fillna(imp)
#If there are still nulls, I just replace it by Unknown Country
df_final['country'].fillna('Unknown Country',inplace=True)
df_final.isnull().sum()
title 0
Actors 0
Directors 0
Genre 0
country 0
show_id 0
type 0
date_added 0
release_year 0
rating 0
duration 0
dtype: int64
df_final.isnull().sum()
title 0
Actors 0
Directors 0
Genre 0
country 0
show_id 0
type 0
date_added 0
release_year 0
rating 0
duration 0
dtype: int64
df_final.head()
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
df_final['duration'].value_counts()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 5/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
253 min 21
15 min 20
167 min 20
233 min 18
237 min 18
49 min 16
37 min 16
43 min 16
312 min 15
12 min 14
31 min 13
191 min 13
230 min 12
41 min 11
19 min 8
273 min 7
34 min 6
17 min 5
39 min 5
10 min 4
16 min 4
196 min 4
20 min 4
18 min 4
3 min 4
5 min 3
11 min 2
8 min 2
9 min 2
Name: duration, dtype: int64
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
df_final['duration'].unique()
array(['90', '2 Seasons', '1 Season', '91', '125', '9 Seasons', '104',
'127', '4 Seasons', '67', '94', '5 Seasons', '161', '61', '166',
'147', '103', '97', '106', '111', '3 Seasons', '110', '105', '96',
'124', '116', '98', '23', '115', '122', '99', '88', '100',
Automatic saving failed. This file was updated remotely or in another tab. Show diff
'6 Seasons', '102', '93', '95', '85', '83', '113', '13', '182',
'48', '145', '87', '92', '80', '117', '128', '119', '143', '114',
'118', '108', '63', '121', '142', '154', '120', '82', '109', '101',
'86', '229', '76', '89', '156', '112', '107', '129', '135', '136',
'165', '150', '133', '70', '84', '140', '78', '7 Seasons', '64',
'59', '139', '69', '148', '189', '141', '130', '138', '81', '132',
'10 Seasons', '123', '65', '68', '66', '62', '74', '131', '39',
'46', '38', '8 Seasons', '17 Seasons', '126', '155', '159', '137',
'12', '273', '36', '34', '77', '60', '49', '58', '72', '204',
'212', '25', '73', '29', '47', '32', '35', '71', '149', '33', '15',
'54', '224', '162', '37', '75', '79', '55', '158', '164', '173',
'181', '185', '21', '24', '51', '151', '42', '22', '134', '177',
'13 Seasons', '52', '14', '53', '8', '57', '28', '50', '9', '26',
'45', '171', '27', '44', '146', '20', '157', '17', '203', '41',
'30', '194', '15 Seasons', '233', '237', '230', '195', '253',
'152', '190', '160', '208', '180', '144', '5', '174', '170', '192',
'209', '187', '172', '16', '186', '11', '193', '176', '56', '169',
'40', '10', '3', '168', '312', '153', '214', '31', '163', '19',
'12 Seasons', '179', '11 Seasons', '43', '200', '196', '167',
'178', '228', '18', '205', '201', '191'], dtype=object)
df_final['duration_copy']=df_final['duration'].copy()
df_final1=df_final.copy()
df_final1.loc[df_final1['duration_copy'].str.contains('Season'),'duration_copy']=0
df_final1['duration_copy']=df_final1['duration_copy'].astype('int')
df_final1.head()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 6/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
count
Blood &201991.000000
Ama Unknown South TV Septem
2 TV Dramas s2
mean Water 77.152789 Director
Qamata Africa Show 24, 20
std 52.269154
min 0.000000
25% 0.000000
50% 95.000000
75% 112.000000
max 312.000000
Name: duration_copy, dtype: float64
Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_final1['duration'].value_counts()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 7/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
49 16
37 16
43 16
312 15
12 14
31 13
191 13
230 12
41 11
19 8
273 7
34 6
17 5
39 5
10 4
16 4
196 4
20 4
18 4
3 4
5 3
11 2
8 2
9 2
Name: duration, dtype: int64
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 8/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
#number of distinct titles on the basis of genre
df_final1.groupby(['Genre']).agg({"title":"nunique"}).sort_values(by=['title'],ascending=False)
TV Comedies 581
Thrillers 573
Kids' TV 451
Docuseries 395
Reality TV 255
TV Mysteries 98
TV Horror 75
Anime Features 71
Cult Movies 71
Teen TV Shows 69
TV Thrillers 57
Movies 57
Automatic saving failed. This file was updated remotely or in another tab. Show diff
Stand-Up Comedy & Talk Shows 56
TV Shows 16
df_genre=df_final1.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 9/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
plt.ylabel('Genres')
plt.show()
df_final1.groupby(['type']).agg({"title":"nunique"})
title
type
Movie 6115
TV Show 2676
Automatic saving failed. This file was updated remotely or in another tab. Show diff
df_type=df_final1.groupby(['type']).agg({"title":"nunique"}).reset_index()
plt.pie(df_type['title'],explode=(0.05,0.05), labels=df_type['type'],colors=['yellow','blue'],autopct='%.lf%%')
plt.show()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 10/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Senegal 3
Serbia 7
Singapore 41
Slovakia 1
Slovenia 3
Somalia 1
South Africa 65
Soviet Union 3
Spain 239
Sri Lanka 1
Sudan 1
Sweden 44
Switzerland 19
Syria 3
Taiwan 94
Thailand 74
Turkey 115
Uganda 1
Ukraine 3
United Kingdom, 2
United States, 1
Uruguay 14
Vatican City 1
Venezuela 4
Vietnam 7
AutomaticWest Germany
saving failed. This file was5updated remotely or in another tab. Show diff
Zimbabwe 3
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 11/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 12/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
The above dataframe shows a flaw in which we are seeing countries, such as Cambodia and Cambodia, or United States and United States, are
Automatic saving failed. This file was updated remotely or in another tab. Show diff
shown as different countries.They should have been same
Dick
Unknown Kirsten United Septem
0 Johnson Documentaries s1 Movie
Actor Johnson States 25, 20
Is Dead
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 13/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Samoa 1
Saudi Arabia 14
Senegal 3
Serbia 7
Singapore 41
Slovakia 1
Slovenia 3
Somalia 1
South Africa 65
Soviet Union 3
Spain 239
Sri Lanka 1
Sudan 1
Sweden 44
Switzerland 19
Syria 3
Taiwan 94
Thailand 74
Turkey 115
Uganda 1
Ukraine 3
Uruguay 14
Vatican City 1
Venezuela 4
Vietnam 7
AutomaticWest Germany
saving failed. This file was5updated remotely or in another tab. Show diff
Zimbabwe 3
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 14/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 15/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
df_country=df_final1.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,10))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 16/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
title
rating
G 41
NC-17 3
NR 87
PG 287
PG-13 490
R 799
TV-14 2151
TV-G 220
TV-MA 3204
TV-PG 863
TV-Y 305
TV-Y7 334
TV-Y7-FV 6
UR 3
df_rating=df_final1.groupby(['rating']).agg({"title":"nunique"}).reset_index()
plt.figure(figsize=(15,8))
plt.barh(df_rating[::-1]['rating'], df_rating[::-1]['title'],color=['violet'])
plt.xlabel('Frequency by Ratings')
plt.ylabel('Ratings')
plt.show()
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 17/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Most of the highly rated content on Netflix is intended for Mature Audiences, R Rated, content not intended for audience under 14 and those
which require Parental Guidance
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 18/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
72 33
73 30
74 32
75 35
76 31
77 30
78 45
79 35
8 1
8 Seasons 17
80 43
81 62
82 52
83 65
84 68
85 73
86 103
87 101
88 116
89 106
9 1
9 Seasons 9
90 152
91 144
92 129
93 146
94 146
95 137
96 130
97 146
98
Automatic saving failed.118
This file was updated remotely or in another tab. Show diff
99 118
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 19/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 20/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 21/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
df_duration=df_final1.groupby(['duration']).agg({"title":"nunique"}).reset_index()
Automatic saving failed. This file was updated remotely or in another tab.
plt.figure(figsize=(15,8)) Show diff
plt.barh(df_duration[::-1]['duration'], df_duration[::-1]['title'],color=['pink'])
plt.xlabel('Frequency by Duration')
plt.ylabel('Duration')
plt.show()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 22/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
df_year=df_final1.groupby(['year']).agg({"title":"nunique"}).reset_index()
sns.lineplot(data=df_year, x='year', y='title')
plt.ylabel("Movies Released in the Year")
plt.xlabel("Year")
plt.show()
The Amount of Content across Netflix has increased from 2008 continuously till 2019. Then started decreasing from here(probably due to
Covid)
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 23/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
title
week_Added
1 372
2 108
3 113
4 88
5 208
6 97
7 147
8 110
9 254
10 135
11 163
12 109
13 250
14 173
15 153
16 160
17 154
18 234
19 116
20 130
21 117
22 206
23 151
24 164
25 143
26 269
27 241
28 131
Automatic saving failed. This file was updated remotely or in another tab. Show diff
29 140
30 160
31 269
32 118
33 153
df_week=df_final1.groupby(['week_Added']).agg({"title":"nunique"}).reset_index()
34 139
plt.figure(figsize=(15,8))
sns.lineplot(data=df_week,
35 265 x='week_Added', y='title')
plt.ylabel("Movies Released in the Week")
36
plt.xlabel("Week 142
No.")
plt.show()
37 183
38 139
39 165
40 287
41 116
42 133
43 116
44 318
45 98
46 134
47 120
48 200
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 24/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
49 140
50 189
51 137
52 132
53 104
Most of the Content across Netflix is added in the first week of the year and it follows a bit of a cyclical pattern
df_month=df_final1.groupby(['month_added']).agg({"title":"nunique"}).reset_index()
sns.lineplot(data=df_month, x='month_added', y='title')
plt.ylabel("Movies Released in the Month")
plt.xlabel("Month")
plt.show()
Automatic saving failed. This file was updated remotely or in another tab. Show diff
Most of the content is added in the first and last months across Netflix(reinstating what we observed for first week in baove plot )
df_shows=df_final1[df_final1['type']=='TV Show']
df_movies=df_final1[df_final1['type']=='Movie']
df_genre=df_shows.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')
plt.ylabel('Genres')
plt.show()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 25/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
International TV Shows, Dramas and Comedy Genres are popular across TV Shows in Netflix
df_genre=df_movies.groupby(['Genre']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_genre[::-1]['Genre'], df_genre[::-1]['title'],color=['orange'])
plt.xlabel('Frequency of Genres')
plt.ylabel('Genres')
plt.show()
Automatic saving failed. This file was updated remotely or in another tab. Show diff
International Movies, Dramas and Comedy Genres are popular followed by Documentaries across Movies on Netflix
df_country=df_shows.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 26/27
07/08/2023, 22:21 Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram - Colaboratory
df_country=df_movies.groupby(['country']).agg({"title":"nunique"}).reset_index().sort_values(by=['title'],ascending=False)
plt.figure(figsize=(15,8))
plt.barh(df_country[::-1]['country'], df_country[::-1]['title'],color=['blue'])
plt.xlabel('Titles by Countries')
plt.ylabel('Countries')
plt.show()
Automatic saving failed. This file was updated remotely or in another tab. Show diff
https://colab.research.google.com/drive/1HNkyYVTjGFxRFACo3l8GeBdJi5PRKsPT#scrollTo=Udi85PGM5G0g&printMode=true 27/27