0% found this document useful (0 votes)

63 views24 pages

Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)

Uploaded by

Vamshi Krishna reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views24 pages

Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)

Uploaded by

Vamshi Krishna reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Project Title

Netflix Data: Cleaning,

Analysis and Visualization
Tools Python, ML, SQL, Excel

Technologies Data Analyst & Data scientist

Project Difficulties level intermediate

Dataset : Dataset is available in the given link. You can download it at your convenience.

Click here to download data set

About Dataset
Netflix is a popular streaming service that offers a vast catalog of movies, TV shows, and original contents. This
dataset is a cleaned version of the original version which can be found here. The data consist of contents added to
Netflix from 2008 to 2021. The oldest content is as old as 1925 and the newest as 2021. This dataset will be
cleaned with PostgreSQL and visualized with Tableau. The purpose of this dataset is to test my data cleaning and
visualization skills. The cleaned data can be found below and the Tableau dashboard can be found here .

Data Cleaning

We are going to:

1. Treat the Nulls

2. Treat the duplicates
3. Populate missing rows
4. Drop unneeded columns
5. Split columns
Extra steps and more explanation on the process will be explained through the
code comments

Example
what steps you should have to follow

Netflix Data: Cleaning, Analysis, and Visualization (Beginner ML Project)

This project involves loading, cleaning, analyzing, and visualizing data from a Netflix
dataset. We'll use Python libraries like Pandas, Matplotlib, and Seaborn to work
through the project. The goal is to explore the dataset, derive insights, and prepare
for potential machine learning tasks.

Step 1: Import Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud

Step 2: Load the Dataset

Assume we have a dataset named netflix_titles.csv.

# Load the dataset

data = pd.read_csv('netflix_titles.csv')

# Display the first few rows of the dataset

print(data.head())

Step 3: Data Cleaning

Identify and handle missing data, correct data types, and drop duplicates.

# Check for missing values

print(data.isnull().sum())

# Drop duplicates if any

data.drop_duplicates(inplace=True)

# Drop rows with missing critical information

data.dropna(subset=['director', 'cast', 'country'],
inplace=True)

# Convert 'date_added' to datetime

data['date_added'] = pd.to_datetime(data['date_added'])
# Show data types to confirm changes
print(data.dtypes)

Step 4: Exploratory Data Analysis (EDA)

1. Content Type Distribution (Movies vs. TV Shows)

# Count the number of Movies and TV Shows
type_counts = data['type'].value_counts()

# Plot the distribution

plt.figure(figsize=(8, 6))
sns.barplot(x=type_counts.index, y=type_counts.values,
palette='Set2')
plt.title('Distribution of Content by Type')
plt.xlabel('Type')
plt.ylabel('Count')
plt.show()

2. Most Common Genres

# Split the 'listed_in' column and count genres

data['genres'] = data['listed_in'].apply(lambda x: x.split(',
'))
all_genres = sum(data['genres'], [])
genre_counts = pd.Series(all_genres).value_counts().head(10)
# Plot the most common genres
plt.figure(figsize=(10, 6))
sns.barplot(x=genre_counts.values, y=genre_counts.index,
palette='Set3')
plt.title('Most Common Genres on Netflix')
plt.xlabel('Count')
plt.ylabel('Genre')
plt.show()

3. Content Added Over Time

# Extract year and month from 'date_added'
data['year_added'] = data['date_added'].dt.year
data['month_added'] = data['date_added'].dt.month

# Plot content added over the years

plt.figure(figsize=(12, 6))
sns.countplot(x='year_added', data=data, palette='coolwarm')
plt.title('Content Added Over Time')
plt.xlabel('Year')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

4. Top 10 Directors with the Most Titles

# Count titles by director
top_directors = data['director'].value_counts().head(10)

# Plot top directors

plt.figure(figsize=(10, 6))
sns.barplot(x=top_directors.values, y=top_directors.index,
palette='Blues_d')
plt.title('Top 10 Directors with the Most Titles')
plt.xlabel('Number of Titles')
plt.ylabel('Director')
plt.show()

5. Word Cloud of Movie Titles

# Generate word cloud
movie_titles = data[data['type'] == 'Movie']['title']
wordcloud = WordCloud(width=800, height=400,
background_color='black').generate(' '.join(movie_titles))

# Plot word cloud

plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
Step 5: Conclusion and Insights

In this project, we:

1. Cleaned the data by handling missing values, removing duplicates, and

converting data types.
2. Explored the data through various visualizations such as bar plots and word
clouds.
3. Analyzed content trends over time, identified popular genres, and highlighted
top directors.

Step 6: Next Steps

1. Feature Engineering: Create new features, such as counting the number of

genres per movie or extracting the duration in minutes.
2. Machine Learning: Use the cleaned and processed data to build models for
recommendations or trend predictions.
3. Advanced Visualization: Use interactive plots or dashboards for more detailed
analysis.

This project is a foundational exercise that introduces essential data analysis

techniques, paving the way for more advanced projects.

Sample code
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

Importing data from csv and getting info about data.

In [2]:

data=pd.read_csv("/kaggle/input/netflix-data-cleaning-analysis-and-visualization/net
flix1.csv")
data.head()

Out[2]:

show_i date_add release_ye ratin duratio

type title director country listed_in
d ed ar g n

Movi Dick Johnson Kirsten United PG-

0 s1 9/25/2021 2020 90 min Documentaries
e Is Dead Johnson States 13

1 Crime TV Shows,
TV Julien TV-
1 s3 Ganglands France 9/24/2021 2021 Seaso International TV
Show Leclercq MA
n Shows, TV Act...

Mike 1 TV Dramas, TV
TV United TV-
2 s6 Midnight Mass Flanaga 9/24/2021 2021 Seaso Horror, TV
Show States MA
n n Mysteries
Confessions
Movi Bruno TV- Children & Family
3 s14 of an Invisible Brazil 9/22/2021 2021 91 min
e Garotti PG Movies, Comedies
Girl

Dramas,
Movi Haile United TV- 125 Independent
4 s8 Sankofa 9/24/2021 1993
e Gerima States MA min Movies,
International Movies

In [3]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8790 entries, 0 to 8789
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 show_id 8790 non-null object
1 type 8790 non-null object
2 title 8790 non-null object
3 director 8790 non-null object
4 country 8790 non-null object
5 date_added 8790 non-null object
6 release_year 8790 non-null int64
7 rating 8790 non-null object
8 duration 8790 non-null object
9 listed_in 8790 non-null object
dtypes: int64(1), object(9)
memory usage: 686.8+ KB

In [4]:

data.shape

Out[4]:

(8790, 10)
In [5]:

data=data.drop_duplicates()

Content distribution on Netflix.

In [6]:

data['type'].value_counts()

Out[6]:

type
Movie 6126
TV Show 2664

Name: count, dtype: int64

In [7]:

freq=data['type'].value_counts()

fig, axes=plt.subplots(1,2, figsize=(8, 4))

sns.countplot(data, x=data['type'], ax=axes[0])

plt.pie(freq, labels=['Movie', 'TV Show'], autopct='%.0f%%')

plt.suptitle('Total Content on Netflix', fontsize=20)

Out[7]:

Text(0.5, 0.98, 'Total Content on Netflix')

In [8]:

data.info()

In [9]:

data['rating'].value_counts()

Out[9]:

rating
TV-MA 3205
TV-14 2157
TV-PG 861
R 799
PG-13 490
TV-Y7 333
TV-Y 306
PG 287
TV-G 220
NR 79
G 41
TV-Y7-FV 6
NC-17 3
UR 3

Name: count, dtype: int64

In [10]:

ratings=data['rating'].value_counts().reset_index().sort_values(by='count',
ascending=False)

plt.bar(ratings['rating'], ratings['count'])
plt.xticks(rotation=45, ha='right')
plt.xlabel("Rating Types")
plt.ylabel("Rating Frequency")

plt.suptitle('Rating on Netflix', fontsize=20)

Out[10]:

Text(0.5, 0.98, 'Rating on Netflix')

In [11]:

plt.pie(ratings['count'][:8], labels=ratings['rating'][:8], autopct='%.0f%%')

plt.suptitle('Rating on Netflix', fontsize=20)

Out[11]:

Text(0.5, 0.98, 'Rating on Netflix')

Converting date_added column to datetime.

In [12]:

# lets convert column date_added to datetime.

data['date_added']=pd.to_datetime(data['date_added'])

In [13]:

data.describe()

Out[13]:

date_added release_year
count 8790 8790.000000

mean 2019-05-17 21:44:01.638225408 2014.183163

min 2008-01-01 00:00:00 1925.000000

25% 2018-04-06 00:00:00 2013.000000

50% 2019-07-03 00:00:00 2017.000000

75% 2020-08-19 18:00:00 2019.000000

max 2021-09-25 00:00:00 2021.000000

std NaN 8.825466

In [14]:

data['country'].value_counts()

Out[14]:

country
United States 3240
India 1057
United Kingdom 638
Pakistan 421
Not Given 287
...
Iran 1
West Germany 1
Greece 1
Zimbabwe 1
Soviet Union 1

Name: count, Length: 86, dtype: int64

Monthly releases of Movies and TV shows on Netflix

In [17]:

monthly_movie_release=data[data['type']=='Movie']['month'].value_counts().sort_index
()
monthly_series_release=data[data['type']=='TV
Show']['month'].value_counts().sort_index()

plt.plot(monthly_movie_release.index, monthly_movie_release.values, label='Movies')

plt.plot(monthly_series_release.index, monthly_series_release.values,
label='Series')
plt.xlabel("Months")
plt.ylabel("Frequency of releases")
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec'])
plt.legend()
plt.grid(True)
plt.suptitle("Monthly releases of Movies and TV shows on Netflix")
plt.show()

Yearly releases of Movies and TV Shows on Netflix

In [18]:

yearly_movie_releases=data[data['type']=='Movie']['year'].value_counts().sort_index(
)
yearly_series_releases=data[data['type']=='TV
Show']['year'].value_counts().sort_index()

plt.plot(yearly_movie_releases.index, yearly_movie_releases.values, label='Movies')

plt.plot(yearly_series_releases.index, yearly_series_releases.values, label='TV
Shows')
plt.xlabel("Years")
plt.ylabel("Frequency of releases")
plt.grid(True)
plt.suptitle("Yearly releases of Movies and TV Shows on Netflix")
plt.legend()

Out[18]:

<matplotlib.legend.Legend at 0x7a14cb8327a0>

Top 10 popular movie genres

In [19]:

popular_movie_genre=data[data['type']=='Movie'].groupby("listed_in").size().sort_val
ues(ascending=False)[:10]
popular_series_genre=data[data['type']=='TV
Show'].groupby("listed_in").size().sort_values(ascending=False)[:10]

plt.bar(popular_movie_genre.index, popular_movie_genre.values)
plt.xticks(rotation=45, ha='right')
plt.xlabel("Genres")
plt.ylabel("Movies Frequency")
plt.suptitle("Top 10 popular genres for movies on Netflix")
plt.show()
Top 10 TV Shows genres

In [20]:

plt.bar(popular_series_genre.index, popular_series_genre.values)
plt.xticks(rotation=45, ha='right')
plt.xlabel("Genres")
plt.ylabel("TV Shows Frequency")
plt.suptitle("Top 10 popular genres for TV Shows on Netflix")
plt.show()

Top 15 directors across Netflix with hoigh frequency of movies and shows.
In [21]:

directors=data['director'].value_counts().reset_index().sort_values(by='count',
ascending=False)[1:15]

plt.bar(directors['director'], directors['count'])
plt.xticks(rotation=45, ha='right')

Out[21]:

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],

[Text(0, 0, 'Rajiv Chilaka'),
Text(1, 0, 'Alastair Fothergill'),
Text(2, 0, 'Raúl Campos, Jan Suter'),
Text(3, 0, 'Suhas Kadav'),
Text(4, 0, 'Marcus Raboy'),
Text(5, 0, 'Jay Karas'),
Text(6, 0, 'Cathy Garcia-Molina'),
Text(7, 0, 'Youssef Chahine'),
Text(8, 0, 'Jay Chapman'),
Text(9, 0, 'Martin Scorsese'),
Text(10, 0, 'Steven Spielberg'),
Text(11, 0, 'Mark Thornton, Todd Kauffman'),
Text(12, 0, 'Don Michael Paul'),

Text(13, 0, 'David Dhawan')])

In [ ]:

linkcode

1 Reference link
2 Reference link for ML project

Consensual Real Slavery
67% (3)
Consensual Real Slavery
28 pages
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
No ratings yet
Business Case - Netflix - Data Exploration and Visualisation - Ipynb - Colab
9 pages
Netflix Data Analysis
No ratings yet
Netflix Data Analysis
11 pages
Manual de Servicio Naida V
No ratings yet
Manual de Servicio Naida V
17 pages
Naan Muthalvan Practical Sample
No ratings yet
Naan Muthalvan Practical Sample
7 pages
Netflix Data Analysis Project
No ratings yet
Netflix Data Analysis Project
16 pages
NM Assignment
No ratings yet
NM Assignment
14 pages
Case Study Data Analytics
No ratings yet
Case Study Data Analytics
12 pages
Netflix Analysis Report (2105878 - Bibhudutta Swain)
No ratings yet
Netflix Analysis Report (2105878 - Bibhudutta Swain)
19 pages
Tableu Ca Suheal Updated
No ratings yet
Tableu Ca Suheal Updated
16 pages
Netflix Data Analysis
No ratings yet
Netflix Data Analysis
23 pages
Tableu Ca Suheal
No ratings yet
Tableu Ca Suheal
13 pages
Visualizing Netflix Data Using Python!
No ratings yet
Visualizing Netflix Data Using Python!
13 pages
Netflix Data Analysis Vashisht
No ratings yet
Netflix Data Analysis Vashisht
29 pages
Netflix Data Exploration Solution Approach
No ratings yet
Netflix Data Exploration Solution Approach
6 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
Netflix Case Study by Pavithran
No ratings yet
Netflix Case Study by Pavithran
36 pages
Datascience Pepar
No ratings yet
Datascience Pepar
9 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
14 pages
STA220 FInal Project Report
No ratings yet
STA220 FInal Project Report
30 pages
Tableu Ca Suheal
No ratings yet
Tableu Ca Suheal
16 pages
Netflix Case
0% (1)
Netflix Case
19 pages
Business Intelligence Project Report
No ratings yet
Business Intelligence Project Report
14 pages
18BCS053
No ratings yet
18BCS053
17 pages
Powerbi Questions
No ratings yet
Powerbi Questions
2 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Netflix Ip Investigatory Project XLL-C
No ratings yet
Netflix Ip Investigatory Project XLL-C
22 pages
Analyzing Netflix Data
No ratings yet
Analyzing Netflix Data
9 pages
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
No ratings yet
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
12 pages
Assignment Final
No ratings yet
Assignment Final
1 page
Netflix Case Study
No ratings yet
Netflix Case Study
12 pages
EDA Case Study
No ratings yet
EDA Case Study
2 pages
R Project 98
No ratings yet
R Project 98
15 pages
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
No ratings yet
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
5 pages
PowerBi Report
No ratings yet
PowerBi Report
6 pages
A1: Resit Coursework: Big Data (6CS030)
100% (1)
A1: Resit Coursework: Big Data (6CS030)
40 pages
3rd Yr Review2
No ratings yet
3rd Yr Review2
18 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
Prateek Intern Synopsis
No ratings yet
Prateek Intern Synopsis
17 pages
Netflix Content Analysis Using Python
No ratings yet
Netflix Content Analysis Using Python
16 pages
Example Project
No ratings yet
Example Project
31 pages
Netflix Movies and TV Shows Clustering
No ratings yet
Netflix Movies and TV Shows Clustering
29 pages
2412796-2401987-C 3netflix Data Anaytics
No ratings yet
2412796-2401987-C 3netflix Data Anaytics
4 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Kavin
No ratings yet
Kavin
13 pages
Sneha Kumari - 262 - DS Project.
No ratings yet
Sneha Kumari - 262 - DS Project.
19 pages
Ads Phase3
No ratings yet
Ads Phase3
9 pages
Pandas Prac
No ratings yet
Pandas Prac
4 pages
IMDb+Movie+Assignment Stub
No ratings yet
IMDb+Movie+Assignment Stub
9 pages
Datasets
No ratings yet
Datasets
5 pages
Predicting Favourite TV Show
No ratings yet
Predicting Favourite TV Show
9 pages
Project 5
No ratings yet
Project 5
5 pages
Lab 3 Sentimental Analysis
No ratings yet
Lab 3 Sentimental Analysis
5 pages
Capstone Porject 1 - Netflix Data Analysis
No ratings yet
Capstone Porject 1 - Netflix Data Analysis
3 pages
Tableau Case Study
No ratings yet
Tableau Case Study
1 page
All CLR
No ratings yet
All CLR
8 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
Sma Exp 3
No ratings yet
Sma Exp 3
7 pages
Data Portfolio
No ratings yet
Data Portfolio
6 pages
Epap 61
No ratings yet
Epap 61
4 pages
20-Minute (Or Less) Animation Hacks
From Everand
20-Minute (Or Less) Animation Hacks
Sheela Preuitt
No ratings yet
Financial Performance Dashboard - (Tableau - Finance Analyst)
100% (1)
Financial Performance Dashboard - (Tableau - Finance Analyst)
9 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
Personalized Healthcare Recommendations
No ratings yet
Personalized Healthcare Recommendations
6 pages
Project Valuation (Finance Analysis)
No ratings yet
Project Valuation (Finance Analysis)
41 pages
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
No ratings yet
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
21 pages
Regulatory Affairs of Road Accident Data 2020 India
No ratings yet
Regulatory Affairs of Road Accident Data 2020 India
23 pages
Tobacco Use and Mortality, 2004-2015
No ratings yet
Tobacco Use and Mortality, 2004-2015
12 pages
Climate Change Modeling
No ratings yet
Climate Change Modeling
10 pages
66-Article Text-89-2-10-20230228
No ratings yet
66-Article Text-89-2-10-20230228
7 pages
Examen Febrero
No ratings yet
Examen Febrero
10 pages
IMportant Question 4th
No ratings yet
IMportant Question 4th
8 pages
Vortex Laser Beams Alexey Kovalev Alexey Porfirev Kotlyar Victor Download
No ratings yet
Vortex Laser Beams Alexey Kovalev Alexey Porfirev Kotlyar Victor Download
83 pages
Mac Brochure-OM MacGuide - Compressed
No ratings yet
Mac Brochure-OM MacGuide - Compressed
14 pages
Physics II Notes
No ratings yet
Physics II Notes
211 pages
Alphabetical List of All Countries and Capitals of
No ratings yet
Alphabetical List of All Countries and Capitals of
9 pages
Elements of Art and Principles of Design
No ratings yet
Elements of Art and Principles of Design
22 pages
Module 3 & Module 4 Thematic Lesson Plan/ Unit in Kindergarten
50% (2)
Module 3 & Module 4 Thematic Lesson Plan/ Unit in Kindergarten
6 pages
Guideline-For-Application-For - Energy-Auditor-Accreditation
No ratings yet
Guideline-For-Application-For - Energy-Auditor-Accreditation
16 pages
Holidays Homework Class VI-1
No ratings yet
Holidays Homework Class VI-1
3 pages
End Term Examination IKS
No ratings yet
End Term Examination IKS
3 pages
2.1 Project Planning, Scheduling & Resource Leveling
No ratings yet
2.1 Project Planning, Scheduling & Resource Leveling
25 pages
Robotics Unit 1 Notes
No ratings yet
Robotics Unit 1 Notes
20 pages
MDB2013 Business Statistic (Set A) A202
No ratings yet
MDB2013 Business Statistic (Set A) A202
5 pages
Numerical Methods For Engineers and Scie
No ratings yet
Numerical Methods For Engineers and Scie
7 pages
Previewpdf
No ratings yet
Previewpdf
13 pages
Permutations & Combinations MS
No ratings yet
Permutations & Combinations MS
19 pages
First Quarter Tos Science
No ratings yet
First Quarter Tos Science
3 pages
Untitled - EH-AD-2.EHTW-VAC-2
No ratings yet
Untitled - EH-AD-2.EHTW-VAC-2
1 page
The AI Power Paradox
No ratings yet
The AI Power Paradox
16 pages
P11A Ganjil
No ratings yet
P11A Ganjil
13 pages
Lembar Kerja Peserta Didik Kelas Xii Application Letter
No ratings yet
Lembar Kerja Peserta Didik Kelas Xii Application Letter
2 pages
ADS - Documentation - Channel Simulation
No ratings yet
ADS - Documentation - Channel Simulation
294 pages
CEM Skills Nigerian Voiceover Script FULL
No ratings yet
CEM Skills Nigerian Voiceover Script FULL
4 pages
AgXeed AgBot 5.115T2 Specifications-1
No ratings yet
AgXeed AgBot 5.115T2 Specifications-1
1 page
12 Simple Life Lessons Summary
No ratings yet
12 Simple Life Lessons Summary
3 pages
Entrepreneurship Reviewer (G12)
No ratings yet
Entrepreneurship Reviewer (G12)
10 pages

Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)

Uploaded by

Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)

Uploaded by

Project Title

Netflix Data: Cleaning,

Technologies Data Analyst & Data scientist

Project Difficulties level intermediate

Click here to download data set

We are going to:

1. Treat the Nulls

Netflix Data: Cleaning, Analysis, and Visualization (Beginner ML Project)

Step 1: Import Required Libraries

Step 2: Load the Dataset

# Load the dataset

# Display the first few rows of the dataset

Step 3: Data Cleaning

# Check for missing values

# Drop duplicates if any

# Drop rows with missing critical information

# Convert 'date_added' to datetime

Step 4: Exploratory Data Analysis (EDA)

1. Content Type Distribution (Movies vs. TV Shows)

# Plot the distribution

2. Most Common Genres

# Split the 'listed_in' column and count genres

3. Content Added Over Time

# Plot content added over the years

4. Top 10 Directors with the Most Titles

# Plot top directors

5. Word Cloud of Movie Titles

# Plot word cloud

In this project, we:

1. Cleaned the data by handling missing values, removing duplicates, and

Step 6: Next Steps

1. Feature Engineering: Create new features, such as counting the number of

This project is a foundational exercise that introduces essential data analysis

import matplotlib.pyplot as plt

Importing data from csv and getting info about data.

show_i date_add release_ye ratin duratio

Movi Dick Johnson Kirsten United PG-

Content distribution on Netflix.

Name: count, dtype: int64

fig, axes=plt.subplots(1,2, figsize=(8, 4))

sns.countplot(data, x=data['type'], ax=axes[0])

plt.suptitle('Total Content on Netflix', fontsize=20)

Text(0.5, 0.98, 'Total Content on Netflix')

Name: count, dtype: int64

plt.suptitle('Rating on Netflix', fontsize=20)

Text(0.5, 0.98, 'Rating on Netflix')

plt.pie(ratings['count'][:8], labels=ratings['rating'][:8], autopct='%.0f%%')

Text(0.5, 0.98, 'Rating on Netflix')

# lets convert column date_added to datetime.

mean 2019-05-17 21:44:01.638225408 2014.183163

min 2008-01-01 00:00:00 1925.000000

25% 2018-04-06 00:00:00 2013.000000

50% 2019-07-03 00:00:00 2017.000000

75% 2020-08-19 18:00:00 2019.000000

max 2021-09-25 00:00:00 2021.000000

std NaN 8.825466

Name: count, Length: 86, dtype: int64

Top 10 countries with most content on Netflix

Monthly releases of Movies and TV shows on Netflix

plt.plot(monthly_movie_release.index, monthly_movie_release.values, label='Movies')

Yearly releases of Movies and TV Shows on Netflix

plt.plot(yearly_movie_releases.index, yearly_movie_releases.values, label='Movies')

Top 10 popular movie genres

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],

Text(13, 0, 'David Dhawan')])

You might also like