Netflix Data Analysis Project Report
Netflix Data Analysis Project Report
Overview
This project aims to analyze Netflix’s content library through data cleaning, exploratory data analysis (EDA),
and visualization using SQL and Python. The goal is to derive insights into the trends and patterns in Netflix's
catalog, including the types of content, release trends, and genre distributions.
Table of Contents
Project Overview Dataset Tools and Technologies Data Cleaning Exploratory Data Analysis (EDA) Key
Insights Visualizations Conclusion Installation Usage
Dataset
Source: The dataset used in this project is from Netflix’s Movies and TV Shows dataset. Description: The
dataset contains information about movies and TV shows available on Netflix, including columns like title,
type, release_year, date_added, country, duration, and genre.
Tools and Technologies
SQL: Used for initial data cleaning and transformations. Python: Used for in-depth analysis and visualization.
Pandas: Data manipulation and analysis. Matplotlib & Seaborn: Data visualization. Jupyter Notebook: To
document the analysis. Power BI / Tableau: For creating an interactive dashboard to visualize the analysis.
Data Cleaning
The data cleaning process involved:
Removing rows with missing date_added values. Converting date_added to a proper datetime format.
Extracting month and year from date_added for trend analysis. Handling null values in columns like country
and duration.
Visualizations
Chart 1: Content type distribution
Content type distribution: The majority of the content on Netflix is movies (69.9%).
T
V
S
h
o
w
,
Country distribution: The United States has the largest number of contents on Netflix, followed
by India and the United Kingdom.
Content growth over time: The number of new contents added to Netflix has increased steadily
over time
42
24 82
2 2 13 3 11
200 2009 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
8
Rajiv Chilaka 19 1
Alastair Fothergill 4 14
Raúl Campos, Jan Suter 18
Suhas Kadav 16
Marcus Raboy 15 1
Jay Karas 14
Cathy Garcia-Molina 13
Martin Scorsese 12
Youssef Chahine 12
Jay Chapman 12
Popular genres: We can see that Drama & International movies followed by Documentary
have the highest number of contents on Netflix within the period.
Top Genres
Dramas, International Movies 362
Documentaries 359
Stand-Up Comedy 334
Comedies, Dramas, International Movies 274
Dramas, Independent Movies, International Movies 252
Kids' TV 219
Children & Family Movies 215
Children & Family Movies, Comedies 201
The top rated contents on Netflix are rated TV-MA. We can note that most contents on Netflix are
rated TV-MA. TV-MA in the United States by the TV Parental Guidelines signifies content for
mature audiences
Top Ratings
More movies have been added to Netflix than TV shows over time.
In 2013, the number of contents added to Netflix for both were almost the same with Movies
having 6 contents that year and Tv shows having 5.
It shows that in th efirst5 years, only movies were added to Netflix.
993
835
592 595
505
411
349
251
175
52
1 1 2 1 13 3 6 15
5 9 6
Chart 9: Release Year With Highest Content On Netflix
Release years with most content: The release years with the most content on Netflix are 2012 to
2018.
We can see that from 2012 to 2018, Netflix added most recent contents, they made sure most recent
contents per release year are higher than the older release year contents. Then in 2019, it started
dropping, this may be due to the Covid-19, but further analysis may be needed to determine this.
1146
1200
1030 1030
953
1000 901
800
555 592
600
352
400 286
236
200
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021