0% found this document useful (0 votes)

15 views17 pages

Python Analysis

The document details the analysis of a customer churn dataset containing 7043 entries and 21 columns, including customer demographics and service usage. It includes data cleaning steps, such as converting the 'TotalCharges' column to float and checking for duplicates, as well as visualizations to analyze churn rates by gender and senior citizen status. The analysis uses libraries like pandas, seaborn, and matplotlib for data manipulation and visualization.

Uploaded by

Arsalan Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

Python Analysis

Uploaded by

Arsalan Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

# For importing libraries and data

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('customer churn.csv')
df

PhoneService MultipleLines InternetService

OnlineSecurity ... \
0 No No phone service DSL
No ...
1 Yes No DSL
Yes ...
2 Yes No DSL
Yes ...
3 No No phone service DSL
Yes ...
4 Yes No Fiber optic
No ...
... ... ... ... ... ..
.
7038 Yes Yes DSL
Yes ...
7039 Yes Yes Fiber optic
No ...
7040 No No phone service DSL
Yes ...
7041 Yes Yes Fiber optic
No ...
7042 Yes No Fiber optic
Yes ...

DeviceProtection TechSupport StreamingTV StreamingMovies

Contract \
0 No No No No Month-
to-month
1 Yes No No No
One year
2 No No No No Month-
to-month
3 Yes Yes No No
One year
4 No No No No Month-
to-month
... ... ... ... ...
...
7038 Yes Yes Yes Yes
One year
7039 Yes No Yes Yes
One year
7040 No No No No Month-
to-month
7041 No No No No Month-
to-month
7042 Yes Yes Yes Yes
Two year

PaperlessBilling PaymentMethod MonthlyCharges

TotalCharges \
0 Yes Electronic check 29.85
29.85
1 No Mailed check 56.95
1889.5
2 Yes Mailed check 53.85
108.15
3 No Bank transfer (automatic) 42.30
1840.75
4 Yes Electronic check 70.70
151.65
... ... ... ...
...
7038 Yes Mailed check 84.80
1990.5
7039 Yes Credit card (automatic) 103.20
7362.9
7040 Yes Electronic check 29.60
346.45
7041 Yes Mailed check 74.40
306.6
7042 Yes Bank transfer (automatic) 105.65
6844.5

Churn
0 No
1 No
2 Yes
3 No
4 Yes
... ...
7038 No
7039 No
7040 No
7041 Yes
7042 No

[7043 rows x 21 columns]

Data Cleaning and Extracting

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

#Replacing Float File into Float and also Blanks into 0

df["TotalCharges"] = df["TotalCharges"].replace(" ", "0")
df["TotalCharges"] = df["TotalCharges"].astype("float")

df.isnull().sum()

customerID 0
gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 0
Churn 0
dtype: int64

df.describe()

SeniorCitizen tenure MonthlyCharges

count 7043.000000 7043.000000 7043.000000
mean 0.162147 32.371149 64.761692
std 0.368612 24.559481 30.090047
min 0.000000 0.000000 18.250000
25% 0.000000 9.000000 35.500000
50% 0.000000 29.000000 70.350000
75% 0.000000 55.000000 89.850000
max 1.000000 72.000000 118.750000

df.duplicated().sum()

# duplicate check throough uniqueness like "customer ID"

df["customerID"].duplicated().sum()

0
def conv(value):
if value == 1:
return "Yes"
else:
return "No"

df['SeniorCitizen'] = df["SeniorCitizen"].apply(conv)

df.head(25)

MultipleLines InternetService OnlineSecurity ... \

0 No phone service DSL No ...
1 No DSL Yes ...
2 No DSL Yes ...
3 No phone service DSL Yes ...
4 No Fiber optic No ...
5 Yes Fiber optic No ...
6 Yes Fiber optic No ...
7 No phone service DSL Yes ...
8 Yes Fiber optic No ...
9 No DSL Yes ...
10 No DSL Yes ...
11 No No No internet service ...
12 Yes Fiber optic No ...
13 Yes Fiber optic No ...
14 No Fiber optic Yes ...
15 Yes Fiber optic Yes ...
16 No No No internet service ...
17 Yes Fiber optic Yes ...
18 No DSL No ...
19 No Fiber optic No ...
20 No phone service DSL No ...
21 No No No internet service ...
22 No No No internet service ...
23 Yes DSL No ...
24 No DSL Yes ...

DeviceProtection TechSupport StreamingTV \

0 No No No
1 Yes No No
2 No No No
3 Yes Yes No
4 No No No
5 Yes No Yes
6 No No Yes
7 No No No
8 Yes Yes Yes
9 No No No
10 No No No
11 No internet service No internet service No internet service
12 Yes No Yes
13 Yes No Yes
14 Yes Yes Yes
15 Yes Yes Yes
16 No internet service No internet service No internet service
17 Yes No Yes
18 Yes Yes No
19 Yes No No
20 Yes No No
21 No internet service No internet service No internet service
22 No internet service No internet service No internet service
23 No Yes No
24 No Yes No

StreamingMovies Contract PaperlessBilling \

0 No Month-to-month Yes
1 No One year No
2 No Month-to-month Yes
3 No One year No
4 No Month-to-month Yes
5 Yes Month-to-month Yes
6 No Month-to-month Yes
7 No Month-to-month No
8 Yes Month-to-month Yes
9 No One year No
10 No Month-to-month Yes
11 No internet service Two year No
12 Yes One year No
13 Yes Month-to-month Yes
14 Yes Month-to-month Yes
15 Yes Two year No
16 No internet service One year No
17 Yes Two year No
18 No Month-to-month No
19 Yes Month-to-month Yes
20 Yes Month-to-month Yes
21 No internet service One year No
22 No internet service Month-to-month No
23 No Two year Yes
24 No Month-to-month No

PaymentMethod MonthlyCharges TotalCharges Churn

0 Electronic check 29.85 29.85 No
1 Mailed check 56.95 1889.5 No
2 Mailed check 53.85 108.15 Yes
3 Bank transfer (automatic) 42.30 1840.75 No
4 Electronic check 70.70 151.65 Yes
5 Electronic check 99.65 820.5 Yes
6 Credit card (automatic) 89.10 1949.4 No
7 Mailed check 29.75 301.9 No
8 Electronic check 104.80 3046.05 Yes
9 Bank transfer (automatic) 56.15 3487.95 No
10 Mailed check 49.95 587.45 No
11 Credit card (automatic) 18.95 326.8 No
12 Credit card (automatic) 100.35 5681.1 No
13 Bank transfer (automatic) 103.70 5036.3 Yes
14 Electronic check 105.50 2686.05 No
15 Credit card (automatic) 113.25 7895.15 No
16 Mailed check 20.65 1022.95 No
17 Bank transfer (automatic) 106.70 7382.25 No
18 Credit card (automatic) 55.20 528.35 Yes
19 Electronic check 90.05 1862.9 No
20 Electronic check 39.65 39.65 Yes
21 Bank transfer (automatic) 19.80 202.25 No
22 Mailed check 20.15 20.15 Yes
23 Credit card (automatic) 59.90 3505.1 No
24 Credit card (automatic) 59.60 2970.3 No

[25 rows x 21 columns]

ax = sns.countplot(x = 'Churn', data = df)

ax.bar_label(ax.containers[0])
plt.title("Counts of Customer by Churn")
plt.show()
# Plot the pie chart
plt.figure(figsize = (4,4))
gb = df.groupby("Churn").agg({'Churn':"count"})
plt.pie(gb['Churn'], labels=gb.index, autopct= "%1.2f%%" )
plt.title(" Percentage of Churn Customer", fontsize=10)
plt.show()
plt.figure(figsize = (4,4))
ax = sns.countplot(x = "gender", data=df, hue = "Churn")
ax.bar_label(ax.containers[0])
plt.title("Churn by Gender")
plt.show()
plt.figure(figsize = (4,4))
ax = sns.countplot(x = "SeniorCitizen", data=df, hue = "Churn")
ax.bar_label(ax.containers[0])
plt.title("Churn By SeniorCitizen")
plt.show()

# From ChatGPT
cross_tab = pd.crosstab(df['SeniorCitizen'], df['Churn'])

# Calculate percentages
cross_tab_percentage = cross_tab.div(cross_tab.sum(axis=1), axis=0) *
100

# Plot the stacked bar chart

ax = cross_tab_percentage.plot(kind='bar', stacked=True, figsize=(6,
4), color=['lightgreen', 'lightcoral'])

# Add percentage labels

for i, (index, row) in enumerate(cross_tab_percentage.iterrows()):
total = row.sum()
cumulative = 0
for j, value in enumerate(row):
cumulative += value
ax.text(i, cumulative - (value / 2), f'{value:.1f}%',
ha='center', va='center', color='black')
# Add labels and title
plt.xlabel('Senior Citizen')
plt.ylabel('Percentage')
plt.title('Churn Percentage by Senior Citizen')
plt.xticks(ticks=[0, 1], labels=['No', 'Yes'], rotation=0) # Replace
0 and 1 with 'No' and 'Yes'
plt.legend(title='Churn', bbox_to_anchor = (0.9, 0.9))

# Display the chart

plt.tight_layout()
plt.show()

plt.figure(figsize= (9,4))
sns.histplot(x = "tenure", data=df, bins = 72, hue = "Churn")
plt.show()
plt.figure(figsize = (5,6))
ax = sns.countplot(x = "Contract", data=df, hue= "Churn")
ax.bar_label(ax.containers[0])
plt.title("Count of Customer by Churn")
plt.show()
df.columns.values

array(['customerID', 'gender', 'SeniorCitizen', 'Partner',

'Dependents',
'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges',
'TotalCharges', 'Churn'], dtype=object)

columns = ['PhoneService', 'MultipleLines', 'InternetService',

'OnlineSecurity',
'OnlineBackup', 'DeviceProtection', 'TechSupport',
'StreamingTV', 'StreamingMovies']

# Number of columns for the subplot grid (you can change this)
n_cols = 3
n_rows = (len(columns) + n_cols - 1) // n_cols # Calculate number of
rows needed

# Create subplots
fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, n_rows * 4)) #
Adjust figsize as needed

# Flatten the axes array for easy iteration (handles both 1D and 2D
arrays)
axes = axes.flatten()

# Iterate over columns and plot count plots

for i, col in enumerate(columns):
sns.countplot(x=col, data=df, ax=axes[i], hue = df["Churn"])
axes[i].set_title(f'Count Plot of {col}')
axes[i].set_xlabel(col)
axes[i].set_ylabel('Count')

# Remove empty subplots (if any)

for j in range(i + 1, len(axes)):
fig.delaxes(axes[j])

plt.tight_layout()
plt.show()
plt.figure(figsize = (9,6))
ax = sns.countplot(x = "PaymentMethod", data=df, hue= "Churn")
ax.bar_label(ax.containers[0])
ax.bar_label(ax.containers[1])
plt.title("Count of Customer by PaymentMethod")
plt.show()

Customer Churn Prediction - Ipynb
No ratings yet
Customer Churn Prediction - Ipynb
170 pages
WA FN UseC Telco Customer Churn Copy1
No ratings yet
WA FN UseC Telco Customer Churn Copy1
265 pages
Telco-Customer-Churn 1
No ratings yet
Telco-Customer-Churn 1
471 pages
A01663974 VisualizaciónDatos
No ratings yet
A01663974 VisualizaciónDatos
673 pages
Krishna
No ratings yet
Krishna
278 pages
Telco
No ratings yet
Telco
478 pages
Customer 11
No ratings yet
Customer 11
192 pages
D Tection Et PR Diction Du Churn Machine Learning 1741028890
No ratings yet
D Tection Et PR Diction Du Churn Machine Learning 1741028890
44 pages
Life Insurance Data
No ratings yet
Life Insurance Data
174 pages
Rajat DM
No ratings yet
Rajat DM
54 pages
Customer Churn Syntax
No ratings yet
Customer Churn Syntax
66 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Nairobi - Kenya-Healthcare-Facility-Analysis - Health - Analysis - Ipynb at Main Ronaldonyagaka - Nairobi - Kenya-Healthcare-Facility-Analysis
No ratings yet
Nairobi - Kenya-Healthcare-Facility-Analysis - Health - Analysis - Ipynb at Main Ronaldonyagaka - Nairobi - Kenya-Healthcare-Facility-Analysis
34 pages
Python Ayush
No ratings yet
Python Ayush
10 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
Credit Card Data: Application ID No of Times 90 DPD or Worse in Last 6 Months
No ratings yet
Credit Card Data: Application ID No of Times 90 DPD or Worse in Last 6 Months
50 pages
DMV 3 Output
No ratings yet
DMV 3 Output
7 pages
Capstone Removed
No ratings yet
Capstone Removed
17 pages
Predictive Modeling
No ratings yet
Predictive Modeling
42 pages
DS Capestone PDF
No ratings yet
DS Capestone PDF
41 pages
Ensemmmmm
No ratings yet
Ensemmmmm
10 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
16 pages
45B AIML Prac1.3
No ratings yet
45B AIML Prac1.3
11 pages
02 Amazon Fine Food Reviews Analysis - TSNE - Slides
No ratings yet
02 Amazon Fine Food Reviews Analysis - TSNE - Slides
1 page
457 Labs
No ratings yet
457 Labs
19 pages
BCSC Lecture 3
No ratings yet
BCSC Lecture 3
19 pages
Telecom Dataset Output
No ratings yet
Telecom Dataset Output
34 pages
DW 14
No ratings yet
DW 14
14 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Practical 3 DS
No ratings yet
Practical 3 DS
8 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Exploratry Data Analysis of The Telecom Customer Churn
No ratings yet
Exploratry Data Analysis of The Telecom Customer Churn
16 pages
Capstone Solution PDF
No ratings yet
Capstone Solution PDF
35 pages
Dataset 1
No ratings yet
Dataset 1
3 pages
Data Cleaning
No ratings yet
Data Cleaning
10 pages
Q1 Q2 Merged
No ratings yet
Q1 Q2 Merged
4 pages
Ladderen PDF
100% (2)
Ladderen PDF
398 pages
Assignment - 6 - Dataset
No ratings yet
Assignment - 6 - Dataset
3 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
CS338 S19 Midterm1 Answers PDF
No ratings yet
CS338 S19 Midterm1 Answers PDF
8 pages
Kotakmahindra
No ratings yet
Kotakmahindra
6 pages
17
No ratings yet
17
21 pages
Sap HCM No. 1 Sap H C M: Uman Apital Anagement
No ratings yet
Sap HCM No. 1 Sap H C M: Uman Apital Anagement
4 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
No ratings yet
TITLE: Bank Marketing Classification: Submitted To: Dr. Supriya Kumar de Professor XLRI, Jamshedpur
18 pages
Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Practice Exercise-U1
No ratings yet
Practice Exercise-U1
2 pages
Data Cleaning
No ratings yet
Data Cleaning
1 page
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
Information System Analysis and Design Final Project Report
No ratings yet
Information System Analysis and Design Final Project Report
8 pages
Complete Roadmap To Excel For Data Analyst Role
No ratings yet
Complete Roadmap To Excel For Data Analyst Role
3 pages
TCMS
No ratings yet
TCMS
21 pages
Gradle User Guide
No ratings yet
Gradle User Guide
1,179 pages
PROJEK ISDA Final Hendra and Samudra
No ratings yet
PROJEK ISDA Final Hendra and Samudra
12 pages
Users of A Music Streaming Service Will Churn or Stay: @staticmethod
No ratings yet
Users of A Music Streaming Service Will Churn or Stay: @staticmethod
1 page
Customer Churn Analysis - Jupyter Notebook
No ratings yet
Customer Churn Analysis - Jupyter Notebook
10 pages
Insurance Dataset Description2
No ratings yet
Insurance Dataset Description2
4 pages
Etudes For Programmers PDF
0% (6)
Etudes For Programmers PDF
2 pages
Xii Ip JPR QP PB2 Set-A
No ratings yet
Xii Ip JPR QP PB2 Set-A
7 pages
Customer Hierarchy
No ratings yet
Customer Hierarchy
3 pages
Dsu Micro Project
100% (2)
Dsu Micro Project
13 pages
Ex 8
No ratings yet
Ex 8
3 pages
Exam Questions 1Z0-819: Java SE 11 Developer
No ratings yet
Exam Questions 1Z0-819: Java SE 11 Developer
28 pages
Python Programming - Basics To Advanced
No ratings yet
Python Programming - Basics To Advanced
107 pages
ABAP RAP Unmanaged Transactional Apps - Part2
No ratings yet
ABAP RAP Unmanaged Transactional Apps - Part2
15 pages
Unicode V14
No ratings yet
Unicode V14
406 pages
Task Toolkit Manual For Indusoft Web Studio V6.1+Sp3
No ratings yet
Task Toolkit Manual For Indusoft Web Studio V6.1+Sp3
28 pages
Instant Download Java Programming Joyce Farrell PDF All Chapter
100% (6)
Instant Download Java Programming Joyce Farrell PDF All Chapter
53 pages
Creating A CRUD Application With PHP
100% (1)
Creating A CRUD Application With PHP
46 pages
675DF
No ratings yet
675DF
7 pages
ACS Core Body of Knowledge For ICT Professionals V1.2
No ratings yet
ACS Core Body of Knowledge For ICT Professionals V1.2
20 pages
COD219
No ratings yet
COD219
171 pages
L7 Cross Compiler
No ratings yet
L7 Cross Compiler
9 pages
Computer-Course Allocation 2022-2023
No ratings yet
Computer-Course Allocation 2022-2023
8 pages
Mapping The Data Warehouse
No ratings yet
Mapping The Data Warehouse
16 pages
Language Based Security-1
No ratings yet
Language Based Security-1
44 pages
Introduction To Regular Expressions: Katharine Jarmul
No ratings yet
Introduction To Regular Expressions: Katharine Jarmul
31 pages
Mic Question Bank
No ratings yet
Mic Question Bank
2 pages
Windows Programming CMD
No ratings yet
Windows Programming CMD
12 pages
Nathaniels Resume
No ratings yet
Nathaniels Resume
1 page
Ni Quiz Computer System
No ratings yet
Ni Quiz Computer System
4 pages
Inheritance and Polymorphism PDF
No ratings yet
Inheritance and Polymorphism PDF
12 pages
CSS - Practical No.2
No ratings yet
CSS - Practical No.2
3 pages
5 Best Antivirus Software: 3. Malwarebytes Anti-Malware
No ratings yet
5 Best Antivirus Software: 3. Malwarebytes Anti-Malware
3 pages
Animation Example Code - Animate - Decay - Py - Matplotlib 1.4
No ratings yet
Animation Example Code - Animate - Decay - Py - Matplotlib 1.4
3 pages
Insy2840 CH01
No ratings yet
Insy2840 CH01
2 pages
Cybercrime Investigators Handbook
From Everand
Cybercrime Investigators Handbook
Graeme Edwards
No ratings yet

Python Analysis

Uploaded by

Python Analysis

Uploaded by

# For importing libraries and data

customerID gender SeniorCitizen Partner Dependents tenure \

PhoneService MultipleLines InternetService

DeviceProtection TechSupport StreamingTV StreamingMovies

PaperlessBilling PaymentMethod MonthlyCharges

[7043 rows x 21 columns]

Data Cleaning and Extracting

#Replacing Float File into Float and also Blanks into 0

SeniorCitizen tenure MonthlyCharges

# duplicate check throough uniqueness like "customer ID"

customerID gender SeniorCitizen Partner Dependents tenure

MultipleLines InternetService OnlineSecurity ... \

DeviceProtection TechSupport StreamingTV \

StreamingMovies Contract PaperlessBilling \

PaymentMethod MonthlyCharges TotalCharges Churn

[25 rows x 21 columns]

ax = sns.countplot(x = 'Churn', data = df)

# Plot the stacked bar chart

# Add percentage labels

# Display the chart

array(['customerID', 'gender', 'SeniorCitizen', 'Partner',

columns = ['PhoneService', 'MultipleLines', 'InternetService',

# Iterate over columns and plot count plots

# Remove empty subplots (if any)

You might also like