0% found this document useful (0 votes)

27 views15 pages

Netflix Users Analysis Using Python-1

The document analyzes a Netflix user dataset using Python. It explores attributes like age, gender, subscription plan, revenue, and device to understand user behavior and identify trends. Advanced visualization and analysis are used to provide valuable insights for Netflix.

Uploaded by

deepak Rulez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views15 pages

Netflix Users Analysis Using Python-1

Uploaded by

deepak Rulez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Netflix Users Analysis Using Python

📝 Project Description:
Leveraging the power of Python and cutting-edge data analysis libraries, we delved into a fascinating
dataset on Netflix users to uncover valuable insights. Explored key attributes such as Age, Gender,
Subscription Plan, Monthly Revenue, Last Date of Activity, Join Date, and Device to gain a
comprehensive understanding of user behavior and preferences. Employed advanced data visualization
techniques to present findings in an insightful and visually appealing manner. Conducted in-depth
analysis to identify trends, patterns, and correlations within the dataset, providing actionable insights for
Netflix and related stakeholders.

Import Library
In [1]: import pandas as pd

In [2]: import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns

C:\Users\Syed Arif\anaconda3\lib\site-packages\scipy\init.py:146: UserWarning: A

NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected v
ersion 1.25.1
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"

Uploading Csv fle

In [3]: df = pd.read_csv(r"C:\Users\Syed Arif\Desktop\Netflix User Base\Netflix Userbase.csv"

Data Preprocessing

.head()
head is used show to the By default = 5 rows in the dataset
In [4]: df.head()

Out[4]:
Last
User Subscription Monthly Join Plan
Payment Country Age Gender Device
ID Type Revenue Date Duration
Date

15- United
0 1 Basic 10 10-06-23 28 Male Smartphone 1 Month
01-22 States

05-
1 2 Premium 15 22-06-23 Canada 35 Female Tablet 1 Month
09-21

28- United
2 3 Standard 12 27-06-23 42 Male Smart TV 1 Month
02-23 Kingdom

10-
3 4 Standard 12 26-06-23 Australia 51 Female Laptop 1 Month
07-22

01-
4 5 Basic 10 28-06-23 Germany 33 Male Smartphone 1 Month
05-23

.tail()
tail is used to show last rows

In [5]: df.tail()

Out[5]:
Last
User Subscription Monthly Join Plan
Payment Country Age Gender Device
ID Type Revenue Date Duration
Date

25- Smart
2495 2496 Premium 14 12-07-23 Spain 28 Female 1 Month
07-22 TV

04- Smart
2496 2497 Basic 15 14-07-23 Spain 33 Female 1 Month
08-22 TV

09- United
2497 2498 Standard 12 15-07-23 38 Male Laptop 1 Month
08-22 States

12-
2498 2499 Standard 13 12-07-23 Canada 48 Female Tablet 1 Month
08-22

13- United Smart

2499 2500 Basic 15 12-07-23 35 Female 1 Month
08-22 States TV

.shape
It show the total no of rows & Column in the dataset

In [6]: df.shape

Out[6]: (2500, 10)

.Columns
It show the no of each Column

In [7]: df.columns

Out[7]: Index(['User ID', 'Subscription Type', 'Monthly Revenue', 'Join Date',

'Last Payment Date', 'Country', 'Age', 'Gender', 'Device',
'Plan Duration'],
dtype='object')

.dtypes
This Attribute show the data type of each column

In [8]: df.dtypes

Out[8]: User ID int64

Subscription Type object
Monthly Revenue int64
Join Date object
Last Payment Date object
Country object
Age int64
Gender object
Device object
Plan Duration object
dtype: object

.unique()
In a column, It show the unique value of specific column.

In [9]: df["Country"].unique()

Out[9]: array(['United States', 'Canada', 'United Kingdom', 'Australia',

'Germany', 'France', 'Brazil', 'Mexico', 'Spain', 'Italy'],
dtype=object)

.nuique()
It will show the total no of unque value from whole data frame
In [10]: df.nunique()

Out[10]: User ID 2500

Subscription Type 3
Monthly Revenue 6
Join Date 300
Last Payment Date 26
Country 10
Age 26
Gender 2
Device 4
Plan Duration 1
dtype: int64

.describe()
It show the Count, mean , median etc

In [11]: df.describe()

Out[11]:
User ID Monthly Revenue Age

count 2500.00000 2500.000000 2500.000000

mean 1250.50000 12.508400 38.795600

std 721.83216 1.686851 7.171778

min 1.00000 10.000000 26.000000

25% 625.75000 11.000000 32.000000

50% 1250.50000 12.000000 39.000000

75% 1875.25000 14.000000 45.000000

max 2500.00000 15.000000 51.000000

.value_counts
It Shows all the unique values with their count

In [12]: df["Country"].value_counts()

Out[12]: United States 451

Spain 451
Canada 317
United Kingdom 183
Australia 183
Germany 183
France 183
Brazil 183
Mexico 183
Italy 183
Name: Country, dtype: int64
.isnull()
It shows the how many null values

In [13]: df.isnull()

Out[13]:
Last
User Subscription Monthly Join Plan
Payment Country Age Gender Device
ID Type Revenue Date Duration
Date

0 False False False False False False False False False False

1 False False False False False False False False False False

2 False False False False False False False False False False

3 False False False False False False False False False False

4 False False False False False False False False False False

... ... ... ... ... ... ... ... ... ... ...

2495 False False False False False False False False False False

2496 False False False False False False False False False False

2497 False False False False False False False False False False

2498 False False False False False False False False False False

2499 False False False False False False False False False False

2500 rows × 10 columns

In [14]: sns.heatmap(df.isnull())

Out[14]: <AxesSubplot:>
In [15]: df["Join Date"] = pd.to_datetime(df["Join Date"])
df["Last Payment Date"] = pd.to_datetime(df["Last Payment Date"])

In [16]: import pandas as pd

# Assuming 'df' is your DataFrame with a 'Join Date' column
df['Join Date'] = pd.to_datetime(df['Join Date'])

# Extract month names
df['Join Month'] = df['Join Date'].dt.month_name()

# Display the DataFrame with the added 'Join Month' column
print(df)

User ID Subscription Type Monthly Revenue Join Date Last Payment Date \
0 1 Basic 10 2022-01-15 2023-10-06
1 2 Premium 15 2021-05-09 2023-06-22
2 3 Standard 12 2023-02-28 2023-06-27
3 4 Standard 12 2022-10-07 2023-06-26
4 5 Basic 10 2023-01-05 2023-06-28
... ... ... ... ... ...
2495 2496 Premium 14 2022-07-25 2023-12-07
2496 2497 Basic 15 2022-04-08 2023-07-14
2497 2498 Standard 12 2022-09-08 2023-07-15
2498 2499 Standard 13 2022-12-08 2023-12-07
2499 2500 Basic 15 2022-08-13 2023-12-07

Country Age Gender Device Plan Duration Join Month

0 United States 28 Male Smartphone 1 Month January
1 Canada 35 Female Tablet 1 Month May
2 United Kingdom 42 Male Smart TV 1 Month February
3 Australia 51 Female Laptop 1 Month October
4 Germany 33 Male Smartphone 1 Month January
... ... ... ... ... ... ...
2495 Spain 28 Female Smart TV 1 Month July
2496 Spain 33 Female Smart TV 1 Month April
2497 United States 38 Male Laptop 1 Month September
2498 Canada 48 Female Tablet 1 Month December
2499 United States 35 Female Smart TV 1 Month August

[2500 rows x 11 columns]

Why we Use (get_continent) in Python:

This library can help you find the continent of a given country
In [17]: # Deriving some useful features using lambda function

def get_continent(country):
"""returns the continent of the given country"""

if country in {"United States", "Canada", "Mexico"}:

return "North America"
if country in {"France", "Germany", "United Kingdom", "Italy", "Spain"}:
return "Europe"
if country == "Brazil":
return "South America"
if country == "Australia":
return "Australia"
return "Africa / Asia"

def get_age_class(age):
"""returns the age class of a given age"""

return "Kid" if age < 11 \

else "Teen" if age < 20 \
else "Young" if age < 40 \
else "Senior" if age < 70 \
else "Elderly"
df["Country"] = df["Country"].apply(lambda x: get_continent(x))
df["Age"] = df["Age"].apply(lambda x : get_age_class(x))
In [18]: df

Out[18]:
Last
User Subscription Monthly Join Plan
Payment Country Age Gender Device
ID Type Revenue Date Duration
Date

2022- 2023-10- North

0 1 Basic 10 Young Male Smartphone 1 Month Ja
01-15 06 America

2021- 2023-06- North

1 2 Premium 15 Young Female Tablet 1 Month
05-09 22 America

2023- 2023-06-
2 3 Standard 12 Europe Senior Male Smart TV 1 Month Fe
02-28 27

2022- 2023-06-
3 4 Standard 12 Australia Senior Female Laptop 1 Month O
10-07 26

2023- 2023-06-
4 5 Basic 10 Europe Young Male Smartphone 1 Month Ja
01-05 28

... ... ... ... ... ... ... ... ... ... ...

2022- 2023-12-
2495 2496 Premium 14 Europe Young Female Smart TV 1 Month
07-25 07

2022- 2023-07-
2496 2497 Basic 15 Europe Young Female Smart TV 1 Month
04-08 14

2022- 2023-07- North

2497 2498 Standard 12 Young Male Laptop 1 Month Sept
09-08 15 America

2022- 2023-12- North

2498 2499 Standard 13 Senior Female Tablet 1 Month Dec
12-08 07 America

2022- 2023-12- North

2499 2500 Basic 15 Young Female Smart TV 1 Month A
08-13 07 America

2500 rows × 11 columns

In [19]: plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Gender')
plt.xlabel('Gender')
plt.ylabel('Distribuation')
plt.title('Distribuation of Gender')
plt.show()
In [20]: plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Gender', hue ="Country")
plt.xlabel('Gender')
plt.ylabel('Country')
plt.title('Gender Vise Country Subscribers')
plt.show()
In [21]: plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Country', hue ="Device")
plt.xlabel('Country')
plt.ylabel('Device')
plt.title('Country Vise Device Users')
plt.show()
In [22]: # Group the data by Feedback and calculate the count of each category
Device = df.groupby('Device').size()

# Create a pie chart
plt.figure(figsize=(8, 8))
plt.pie(Device, labels=Device.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Device')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
In [23]: plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Subscription Type')
plt.xlabel('Subscription Type')
plt.ylabel('Counts')
plt.title('Distribuation of Subscription Type')
plt.show()
In [26]: df

Out[26]:
Last
User Subscription Monthly Join Plan
Payment Country Age Gender Device
ID Type Revenue Date Duration
Date

2022- 2023-10- North

0 1 Basic 10 Young Male Smartphone 1 Month Ja
01-15 06 America

2021- 2023-06- North

1 2 Premium 15 Young Female Tablet 1 Month
05-09 22 America

2023- 2023-06-
2 3 Standard 12 Europe Senior Male Smart TV 1 Month Fe
02-28 27

2022- 2023-06-
3 4 Standard 12 Australia Senior Female Laptop 1 Month O
10-07 26

2023- 2023-06-
4 5 Basic 10 Europe Young Male Smartphone 1 Month Ja
01-05 28

... ... ... ... ... ... ... ... ... ... ...

2022- 2023-12-
2495 2496 Premium 14 Europe Young Female Smart TV 1 Month
07-25 07

2022- 2023-07-
2496 2497 Basic 15 Europe Young Female Smart TV 1 Month
04-08 14

2022- 2023-07- North

2497 2498 Standard 12 Young Male Laptop 1 Month Sept
09-08 15 America

2022- 2023-12- North

2498 2499 Standard 13 Senior Female Tablet 1 Month Dec
12-08 07 America

2022- 2023-12- North

2499 2500 Basic 15 Young Female Smart TV 1 Month A
08-13 07 America

2500 rows × 11 columns

In [27]: Joining_Months_Counts = df['Join Month'].value_counts()
Joining_Months_Counts.plot(kind='bar')
plt.xlabel('Join Month')
plt.ylabel('Count')
plt.title('Joining Counts By Months')

Out[27]: Text(0.5, 1.0, 'Joining Counts By Months')

Controlling Input and Output - Exercises
0% (1)
Controlling Input and Output - Exercises
12 pages
Agfa DRYSTAR 5302 Diagrama
100% (3)
Agfa DRYSTAR 5302 Diagrama
222 pages
ProVision Plus 2 - 4 - 0 NBI System Integration Guide - May2019
No ratings yet
ProVision Plus 2 - 4 - 0 NBI System Integration Guide - May2019
66 pages
Standard For Brazing Procedure and Performance Qualification
No ratings yet
Standard For Brazing Procedure and Performance Qualification
5 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Netflix Userbase
No ratings yet
Netflix Userbase
119 pages
Q1 Video Games Sales: #Import The Libraries
No ratings yet
Q1 Video Games Sales: #Import The Libraries
16 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
Document From Gr7
No ratings yet
Document From Gr7
29 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
Using Python For Data Analysis - July 2018 - Slides
No ratings yet
Using Python For Data Analysis - July 2018 - Slides
43 pages
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
No ratings yet
Anurag Chaturvedi Netflix - Jupyter - Notebook Case Study
27 pages
Practical Ip (1) - 1
No ratings yet
Practical Ip (1) - 1
5 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Exploratry Data Analysis of The Telecom Customer Churn
No ratings yet
Exploratry Data Analysis of The Telecom Customer Churn
16 pages
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
No ratings yet
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
5 pages
Pyhon Solution
No ratings yet
Pyhon Solution
45 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
PMT2 24
No ratings yet
PMT2 24
56 pages
42a e Thermodynamics
No ratings yet
42a e Thermodynamics
28 pages
Step 16 Chapter4
No ratings yet
Step 16 Chapter4
64 pages
Acrobat 7
No ratings yet
Acrobat 7
161 pages
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
No ratings yet
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
40 pages
Hardware Abstraction: Device
No ratings yet
Hardware Abstraction: Device
34 pages
Interview Ques
No ratings yet
Interview Ques
2 pages
Series 1
No ratings yet
Series 1
408 pages
IoT U-II
No ratings yet
IoT U-II
43 pages
Track Career Transition
No ratings yet
Track Career Transition
5 pages
D845GERG2 D845GEBV2 ProductGuide English
No ratings yet
D845GERG2 D845GEBV2 ProductGuide English
51 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
Modern Systems Analysis and Design: The Systems Development Environment
No ratings yet
Modern Systems Analysis and Design: The Systems Development Environment
33 pages
Pandas Cheatsheet DF
No ratings yet
Pandas Cheatsheet DF
1 page
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
Lines of Code (LOC) Metrics
No ratings yet
Lines of Code (LOC) Metrics
26 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Project
No ratings yet
Project
12 pages
MAD GTU Study Material Presentations Unit-4 13082021072950PM
No ratings yet
MAD GTU Study Material Presentations Unit-4 13082021072950PM
22 pages
Della Marda
No ratings yet
Della Marda
12 pages
Pandas
No ratings yet
Pandas
21 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Big Data
No ratings yet
Big Data
5 pages
Configuring The Basic and Advanced Qos Settings
No ratings yet
Configuring The Basic and Advanced Qos Settings
10 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Hierarchical Clusteringipynb
No ratings yet
Hierarchical Clusteringipynb
58 pages
1 Demand
No ratings yet
1 Demand
13 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Syntax-Directed Translation
No ratings yet
Syntax-Directed Translation
126 pages
Elementary Alg
No ratings yet
Elementary Alg
3 pages
HR ABAP Functions & Tcodes
No ratings yet
HR ABAP Functions & Tcodes
7 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Amazon Sales Analysis-1
No ratings yet
Amazon Sales Analysis-1
14 pages
Servomold Brochure EN 2023 Web-1
No ratings yet
Servomold Brochure EN 2023 Web-1
9 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
The Social Impact of The Computer: J.A.N. Lee Virginia Tech
No ratings yet
The Social Impact of The Computer: J.A.N. Lee Virginia Tech
30 pages
Sachin Shastri Resume
No ratings yet
Sachin Shastri Resume
1 page
00 - Individual Work Plan
No ratings yet
00 - Individual Work Plan
3 pages
Lab Task 9.ipynb - Colab
No ratings yet
Lab Task 9.ipynb - Colab
4 pages
Numpy For Data Science
No ratings yet
Numpy For Data Science
94 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Hotels Analysis Project
No ratings yet
Hotels Analysis Project
23 pages
03-02 Adain 50 Accounts
No ratings yet
03-02 Adain 50 Accounts
12 pages
Pub G Analysis
No ratings yet
Pub G Analysis
14 pages
Youtube Analysis
No ratings yet
Youtube Analysis
13 pages
Instagram Analysis
No ratings yet
Instagram Analysis
13 pages
Amazon Prime Analysis
No ratings yet
Amazon Prime Analysis
10 pages
Record Book Programs 2024-2025
No ratings yet
Record Book Programs 2024-2025
11 pages
Anaconda Brochure
No ratings yet
Anaconda Brochure
11 pages
profileWen-Cheng-Laipublication363980266 A Stacking Ensemble Framework For Android Malware Predi 2
No ratings yet
profileWen-Cheng-Laipublication363980266 A Stacking Ensemble Framework For Android Malware Predi 2
752 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Practice Exercise #13-FORMATTING-Summary-Part2
No ratings yet
Practice Exercise #13-FORMATTING-Summary-Part2
2 pages
Data Wrangling - Jupyter Notebook
No ratings yet
Data Wrangling - Jupyter Notebook
5 pages
Extracted Notebook Content
No ratings yet
Extracted Notebook Content
17 pages
How To Deploy Your React App Using Container Registry - DigitalOcean
No ratings yet
How To Deploy Your React App Using Container Registry - DigitalOcean
27 pages
15 Funciones Esenciales de Pandas
No ratings yet
15 Funciones Esenciales de Pandas
12 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
6 TH Sa2
No ratings yet
6 TH Sa2
2 pages
Netflix Data Analysis 1691522070
No ratings yet
Netflix Data Analysis 1691522070
18 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Data Representation
No ratings yet
Data Representation
17 pages
Kunal DA-12 Assignment-4
No ratings yet
Kunal DA-12 Assignment-4
26 pages
Python Intro Tut 16 Jun
No ratings yet
Python Intro Tut 16 Jun
4 pages
Nexgen 11.1 Software Manual PIPTS150SW.11.1-00
No ratings yet
Nexgen 11.1 Software Manual PIPTS150SW.11.1-00
81 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Pandas Test Answer
No ratings yet
Pandas Test Answer
7 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Data Wrangling Notebook Summary
No ratings yet
Data Wrangling Notebook Summary
9 pages
Temp Mail Org en ...
No ratings yet
Temp Mail Org en ...
7 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
AutoCAD Electrical 2023 Black Book
From Everand
AutoCAD Electrical 2023 Black Book
Gaurav Verma
No ratings yet
AutoCAD Electrical 2025 Black Book
From Everand
AutoCAD Electrical 2025 Black Book
Gaurav Verma
No ratings yet

Netflix Users Analysis Using Python-1

Uploaded by

Netflix Users Analysis Using Python-1

Uploaded by

Netflix Users Analysis Using Python

In [2]: import pandas as pd

C:\Users\Syed Arif\anaconda3\lib\site-packages\scipy\__init__.py:146: UserWarning: A

Uploading Csv fle

13- United Smart

Out[6]: (2500, 10)

Out[7]: Index(['User ID', 'Subscription Type', 'Monthly Revenue', 'Join Date',

Out[8]: User ID int64

Out[9]: array(['United States', 'Canada', 'United Kingdom', 'Australia',

Out[10]: User ID 2500

count 2500.00000 2500.000000 2500.000000

mean 1250.50000 12.508400 38.795600

std 721.83216 1.686851 7.171778

min 1.00000 10.000000 26.000000

25% 625.75000 11.000000 32.000000

50% 1250.50000 12.000000 39.000000

75% 1875.25000 14.000000 45.000000

max 2500.00000 15.000000 51.000000

Out[12]: United States 451

2500 rows × 10 columns

In [16]: import pandas as pd

Country Age Gender Device Plan Duration Join Month

[2500 rows x 11 columns]

Why we Use (get_continent) in Python:

if country in {"United States", "Canada", "Mexico"}:

return "Kid" if age < 11 \

2022- 2023-10- North

2021- 2023-06- North

2022- 2023-07- North

2022- 2023-12- North

2022- 2023-12- North

2500 rows × 11 columns

2022- 2023-10- North

2021- 2023-06- North

2022- 2023-07- North

2022- 2023-12- North

2022- 2023-12- North

2500 rows × 11 columns

Out[27]: Text(0.5, 1.0, 'Joining Counts By Months')

You might also like

C:\Users\Syed Arif\anaconda3\lib\site-packages\scipy\init.py:146: UserWarning: A