0% found this document useful (0 votes)

2 views8 pages

Data Description

A bank manager is concerned about customer churn in credit card services and seeks to predict which customers are likely to leave. The analysis utilizes a dataset of 10,000 customers with 11 numerical features to predict 'Attrition_Flag' using PCA and logistic regression. The model's accuracy improves with the inclusion of more principal components, achieving a score of approximately 88% with four components.

Uploaded by

Thiresh Sidda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

Data Description

Uploaded by

Thiresh Sidda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

4/26/22, 9:59 PM PCA

A manager at the bank is disturbed with more and more customers leaving their credit card
services. They would really appreciate if one could predict for them who is gonna get churned so
they can proactively go to the customer to provide them better services and turn customers'
decisions in the opposite direction

Data Description
This dataset consists of 10,000 customers, with 18 features, but in this analysis we will avoid
using categorical features and use a subset of 11 numerical features to make the pre-processing
simpler

Target Variable description

The target variable that we’re trying to predict is ‘Attrition_Flag’, with 84% of customers being
Existing Customer and 16% being ‘Attrited Customer’

We will use Numpy for numerical operations, Pandas for dataframes, Matplotlib and Plotly for
plots and Sklearn for building machine learning models

In [2]:
!pip install plotly

Requirement already satisfied: plotly in c:\users\bisha\anaconda3\lib\site-packages

(5.7.0)
Requirement already satisfied: six in c:\users\bisha\anaconda3\lib\site-packages (fr
om plotly) (1.15.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\bisha\anaconda3\lib\site-
packages (from plotly) (8.0.1)

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as ex
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

Because the last two columns of the original dataset contains prediction output from a Naive
Bayes algorithm, we will drop these columns for the purpose of our analysis

In [4]:
df = pd.read_csv('C:/Users/bisha/OneDrive/Bishal/OneDrive/Python learning/Principal

In [5]:
df=df[df.columns[:-2]] # Drop the last two columns

df.head() # Inspect the first 5 rows

Out[5]: CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_St

Existing
0 768805383 45 M 3 High School Ma
Customer

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 1/8

4/26/22, 9:59 PM PCA

CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_St

Existing
1 818770008 49 F 5 Graduate S
Customer

Existing
2 713982108 51 M 3 Graduate Ma
Customer

Existing
3 769911858 40 F 4 High School Unkn
Customer

Existing
4 709106358 40 M 3 Uneducated Ma
Customer

5 rows × 21 columns

We will look at the customer Age distribution using plotly. ( which allows for interactive plots)

In [6]:
fig = make_subplots(rows=2,cols=1)
tr1 = go.Box(x=df['Customer_Age'],name='Age Box Plot',boxmean='sd')
tr2 = go.Histogram(x=df['Customer_Age'], name='Age Histogram')

fig.add_trace(tr1,row=1,col=1)
fig.add_trace(tr2,row=2,col=1)
fig.update_layout(height=500, width=600, title_text="Distribution of Customer Ages")
fig.show()

Distribution of Customer Ages

Age Box Plot

Age Histogram

Age Box Plot

30 40 50 60 70

400

200

0
30 40 50 60 70

Let's look at level of education and income:

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 2/8

4/26/22, 9:59 PM PCA

In [7]: education = pd.DataFrame(df['Education_Level'].value_counts())

labelsedu = df['Education_Level'].unique()

income = pd.DataFrame(df['Income_Category'].value_counts())
labelincome=df['Income_Category'].unique()

In [8]:
#explore education level and income level
fig = make_subplots(rows=1,cols=2,specs=[[{'type':'domain'}, {'type':'domain'}]])

tr3 = go.Pie(labels=labelsedu,values=education.iloc[:,0],name='proportion of Educati

tr4 = go.Pie(labels=labelincome,values=income.iloc[:,0], name='Propotion Of Differen

fig.add_trace(tr3,row=1,col=1)
fig.add_trace(tr4,row=1,col=2)
fig.update_layout(height=500, width=600, title_text="Distribution of Income and Educ
fig.show()

Distribution of Income and Education level

High School
Graduate
Uneducated
Unknown
College
17.7% Post-Graduate
19.9%
30.9% Doctorate
35.2%
80K
15.2%
15% Less than $40K
120K
7.

4.45% 13.8% 60K

14.7%
8%

10% 5.1% 11% $120K +

For prediction purpose, we will assign existing customers to be predicted as “1” and attrited
customers to be predicted as “0”

In [9]:
x = df.iloc[:,9:21] # assign column 9 to 21 as x variable - the features
x = StandardScaler().fit_transform(x) # standarize the variables

In [10]:
df['Attrition_Flag'].replace('Existing Customer','1',inplace=True)

df['Attrition_Flag'].replace('Attrited Customer','0',inplace=True)

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 3/8

4/26/22, 9:59 PM PCA

# assign y variable - the target

y = df['Attrition_Flag']

We will start by using only the first 2 leading principal components, and then explore 3 principal
components and 4 principal components.

In [11]:
pca = PCA(n_components=2)

PC=pca.fit_transform(x)

principalDF=pd.DataFrame(data=PC,columns=['pc1','pc2'])

finalDf = pd.concat([principalDF, df[['Attrition_Flag']]], axis = 1)

finalDf.head()

Out[11]: pc1 pc2 Attrition_Flag

0 0.276048 -0.617639 1

1 -0.612402 1.430502 1

2 -0.613733 1.098632 1

3 -2.499317 1.781346 1

4 -0.560120 0.924119 1

To assess how much weightings each feature will have in later predictions, we could construct a
loadings table. The loadings shows how much each of our original features have contributed to
each of the “new features” — the principal components.

In [12]:
PCloadings = pca.components_.T * np.sqrt(pca.explained_variance_)
components = df.columns.tolist()
components = components[9:21]

loadingdf = pd.DataFrame(PCloadings,columns=('PC1','PC2'))
loadingdf["variable"]=components
loadingdf

Out[12]: PC1 PC2 variable

0 -0.012248 -0.084536 Months_on_book

1 -0.276207 -0.384630 Total_Relationship_Count

2 -0.030992 -0.105797 Months_Inactive_12_mon

3 -0.017396 -0.314187 Contacts_Count_12_mon

4 0.867614 -0.180299 Credit_Limit

5 -0.261374 0.402668 Total_Revolving_Bal

6 0.890865 -0.216361 Avg_Open_To_Buy

7 -0.012135 0.181603 Total_Amt_Chng_Q4_Q1

8 0.467479 0.763757 Total_Trans_Amt

9 0.359458 0.788716 Total_Trans_Ct

10 -0.012862 0.309368 Total_Ct_Chng_Q4_Q1

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 4/8

4/26/22, 9:59 PM PCA

PC1 PC2 variable

11 -0.718652 0.411826 Avg_Utilization_Ratio

Now we can plot the loadings and see which of them have high weightings in both principal
component 1 and 2:

In [13]:
fig = ex.scatter(x=loadingdf['PC1'],y=loadingdf['PC2'],text=loadingdf['variable'],)

fig.update_layout(height=600,width=500, title_text='loadings plot')

fig.update_traces(textposition='bottom center')
fig.add_shape(type="line", x0=-0, y0=-0.5,x1=-0,y1=2.5,line=dict(color="RoyalBlue",w
)

fig.add_shape(type="line", x0=-1, y0=0,x1=1,y1=0, line=dict(color="RoyalBlue",width=

)

fig.show()

loadings plot

2.5

1.5

1
y

Total_Trans_Ct
Total_Trans_Amt

0.5
g_Utilization_Ratio
Total_Revolving_Bal
Total_Ct_Chng_Q4_Q1
Total_Amt_Chng_Q4_Q1
0
Months_on_book
Months_Inactive_12_mon
Credit_Lim
Avg_Open_To
Contacts_Count_12_mon
Total_Relationship_Count
−0.5
−1 −0.5 0 0.5 1

It is clear that "total transaction count" and "total transaction amount" are two heavily weighted
features.

This means that they will play a big role in our next step of prediction (using logistic regression).
localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 5/8
4/26/22, 9:59 PM PCA

If the predictions came out to have a reasonably high accuracy, we can infer that these two
features are indeed two important factors that determines customer churn.

But before going into the prediction, as dimension reduction have allowed us to visualize the
data in 2 dimensions, we can make a scatter plot with respect to the principal components to
see how the data are distributed.

In [14]:
def myplot(score,coeff,labels=None):
xs = score[:,0]
ys = score[:,1]
n = coeff.shape[0]
scalex = 1.0/(xs.max() - xs.min())
scaley = 1.0/(ys.max() - ys.min())
colors = {'1':'pink', '0':'blue'}
plt.scatter(xs * scalex,ys * scaley, c= y.apply(lambda x: colors[x]))
for i in range(n):
plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
if labels is None:
plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g
else:
plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))

myplot(PC[:,0:2],np.transpose(pca.components_[0:2, :]))
plt.show()

We can proceed with prediction using logistic regression:

1. Split the data set into training set and test set
2. Apply logistic regression to the training set
3. Make prediction on test set
4. Output the prediction score on the test set

In [15]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

Xfinal=finalDf[['pc1','pc2']]
yfinal=finalDf['Attrition_Flag']

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 6/8

4/26/22, 9:59 PM PCA

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

logistic=LogisticRegression()
logistic.fit(X=X_train,y=y_train)
logistic.predict(X_test)
score_2=logistic.score(X_test,y_test)

We also want to try using 3 principal components and 4 principal components to compare their
accuracy:

In [21]:
pca=PCA(n_components=3)
PC=pca.fit_transform(x)

principalDF=pd.DataFrame(data=PC,columns=['pc1','pc2','pc3'])
finalDf = pd.concat([principalDF, df[['Attrition_Flag']]], axis = 1)

Xfinal=finalDf[['pc1','pc2','pc3']]
yfinal=finalDf['Attrition_Flag']

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

logistic=LogisticRegression()
logistic.fit(X=X_train,y=y_train)
logistic.predict(X_test)

score_3=logistic.score(X_test,y_test)

pca=PCA(n_components=4)
PC=pca.fit_transform(x)
principalDF=pd.DataFrame(data=PC,columns=['pc1','pc2','pc3','pc4'])

finalDf = pd.concat([principalDF, df[['Attrition_Flag']]], axis = 1)

Xfinal=finalDf[['pc1','pc2','pc3','pc4']]
yfinal=finalDf['Attrition_Flag']

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

logistic=LogisticRegression()
logistic.fit(X=X_train,y=y_train)
logistic.predict(X_test)

score_4=logistic.score(X_test,y_test)

Finally, we can assess how accurate are the predictions made by our model:

In [23]:
scores=[score_2,score_3,score_4]
scores

Out[23]: [0.8538993089832182, 0.872326423165515, 0.8848305363606449]

In [24]:
ex.bar(y=scores,x=('pc2','pc3','pc4'),range_y=(0.7,0.9),title='PC prediction accurac

PC prediction accuracy

09
localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 7/8
4/26/22, 9:59 PM PCA
0.9

0.85

It turns out that 3 principal components gave the highest score, nevertheless, 84% accuracy is
already achieved with 2 principal components, which is a quite descent result.

Therefore we can infer that total transaction count and total transaction amount are two of the
good predictors of customer churning, and this is also very reasonable if we think about what
factors might be able to predict banks’ customer churning intuitively.

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 8/8

CSC 111 - Introduction To Computer Science - Corrected Version
No ratings yet
CSC 111 - Introduction To Computer Science - Corrected Version
93 pages
Governance, Risk and Compliance - Energy Industry
100% (2)
Governance, Risk and Compliance - Energy Industry
4 pages
ISTQB FL Chap 1
No ratings yet
ISTQB FL Chap 1
10 pages
Catia MCQ
No ratings yet
Catia MCQ
4 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Intelligent Disk Subsystems
No ratings yet
Intelligent Disk Subsystems
69 pages
Jan. 19, 2001 VLSI Test: Bushnell-Agrawal/Lecture 1 1
No ratings yet
Jan. 19, 2001 VLSI Test: Bushnell-Agrawal/Lecture 1 1
16 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
SCM Fiori App List
No ratings yet
SCM Fiori App List
11 pages
Whitepaper Automotive-Spice en Swe1 Software-requirements-Analysis
No ratings yet
Whitepaper Automotive-Spice en Swe1 Software-requirements-Analysis
15 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
No ratings yet
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
76 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
PCA - Colab
No ratings yet
PCA - Colab
2 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
Certificate
No ratings yet
Certificate
33 pages
3 - Modeling - Ipynb - Colaboratory
No ratings yet
3 - Modeling - Ipynb - Colaboratory
31 pages
RobotStudio 2023-4-1 Release Notes
No ratings yet
RobotStudio 2023-4-1 Release Notes
21 pages
DS Manual
No ratings yet
DS Manual
30 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
ML Manual
No ratings yet
ML Manual
30 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
DA Programs
No ratings yet
DA Programs
44 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Bank Marketing Targets 1724510938
No ratings yet
Bank Marketing Targets 1724510938
13 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
SOP For Masters in Computer Science: Phone: +91 9946991401
No ratings yet
SOP For Masters in Computer Science: Phone: +91 9946991401
1 page
Basics of Mobile Learning
No ratings yet
Basics of Mobile Learning
5 pages
Project Report
No ratings yet
Project Report
19 pages
ML Lab
No ratings yet
ML Lab
14 pages
ML Programs
No ratings yet
ML Programs
14 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Featureselection
No ratings yet
Featureselection
11 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
1
No ratings yet
1
13 pages
M PDF
No ratings yet
M PDF
13 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
9 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
DM LabManual Teena
No ratings yet
DM LabManual Teena
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Practical 5
No ratings yet
Practical 5
6 pages
So You Thought You Were Safe Using Angularjs. - . - Think Again!
No ratings yet
So You Thought You Were Safe Using Angularjs. - . - Think Again!
46 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Machine Learning Assignment-2
No ratings yet
Machine Learning Assignment-2
7 pages
Home Work
No ratings yet
Home Work
12 pages
Saliola Assunta 201406 MSC Thesis
No ratings yet
Saliola Assunta 201406 MSC Thesis
80 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Loan Prediction
No ratings yet
Loan Prediction
3 pages
Tiv PDF
No ratings yet
Tiv PDF
1 page
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
Optical Time Domain Reflectometer Simulator
No ratings yet
Optical Time Domain Reflectometer Simulator
3 pages
SRG 4600 Manual
No ratings yet
SRG 4600 Manual
29 pages
N3cs19 Practice Set 17: Iple Pages. If Your Current Grade Pct. Is 82%, You May Complete Between 2 of The 3 Sections
No ratings yet
N3cs19 Practice Set 17: Iple Pages. If Your Current Grade Pct. Is 82%, You May Complete Between 2 of The 3 Sections
1 page
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
ERP and Virtualization
No ratings yet
ERP and Virtualization
11 pages
Project
No ratings yet
Project
42 pages
Intel SSD Firmware Update Tool Release Notes Rev037US
No ratings yet
Intel SSD Firmware Update Tool Release Notes Rev037US
8 pages
04 - Harris, Ives, Junglas - 2012 - IT Consumerization When Gadgets Turn Into Enterprise IT Tools, MIS Quarterly Executive
No ratings yet
04 - Harris, Ives, Junglas - 2012 - IT Consumerization When Gadgets Turn Into Enterprise IT Tools, MIS Quarterly Executive
14 pages
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
100% (4)
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
43 pages
AY23-24 CSE TT Time Table - Template - Final
No ratings yet
AY23-24 CSE TT Time Table - Template - Final
2 pages
Python - How To Draw A Heart With Pylab - Stack Overflow
No ratings yet
Python - How To Draw A Heart With Pylab - Stack Overflow
5 pages
Experiment 8 - DDCA
No ratings yet
Experiment 8 - DDCA
4 pages
CERTIN - AI Advisort
No ratings yet
CERTIN - AI Advisort
2 pages
1 Using Files in Python
No ratings yet
1 Using Files in Python
4 pages
BD GasPak EZ CampyPouch System - 260685 - BD
No ratings yet
BD GasPak EZ CampyPouch System - 260685 - BD
1 page
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
14 pages
Download ebooks file Handbook of Machine Learning for Computational Optimization: Applications and Case Studies (Demystifying Technologies for Computational Excellence) 1st Edition Vishal Jain (Editor) all chapters
100% (2)
Download ebooks file Handbook of Machine Learning for Computational Optimization: Applications and Case Studies (Demystifying Technologies for Computational Excellence) 1st Edition Vishal Jain (Editor) all chapters
49 pages

Data Description

Uploaded by

Data Description

Uploaded by

4/26/22, 9:59 PM PCA

Target Variable description

Requirement already satisfied: plotly in c:\users\bisha\anaconda3\lib\site-packages

df.head() # Inspect the first 5 rows

Out[5]: CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_St

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 1/8

CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_St

Distribution of Customer Ages

Age Box Plot

Age Box Plot

Let's look at level of education and income:

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 2/8

In [7]: education = pd.DataFrame(df['Education_Level'].value_counts())

tr3 = go.Pie(labels=labelsedu,values=education.iloc[:,0],name='proportion of Educati

Distribution of Income and Education level

4.45% 13.8% 60K

10% 5.1% 11% $120K +

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 3/8

# assign y variable - the target

finalDf = pd.concat([principalDF, df[['Attrition_Flag']]], axis = 1)

Out[11]: pc1 pc2 Attrition_Flag

Out[12]: PC1 PC2 variable

0 -0.012248 -0.084536 Months_on_book

1 -0.276207 -0.384630 Total_Relationship_Count

2 -0.030992 -0.105797 Months_Inactive_12_mon

3 -0.017396 -0.314187 Contacts_Count_12_mon

4 0.867614 -0.180299 Credit_Limit

5 -0.261374 0.402668 Total_Revolving_Bal

6 0.890865 -0.216361 Avg_Open_To_Buy

7 -0.012135 0.181603 Total_Amt_Chng_Q4_Q1

8 0.467479 0.763757 Total_Trans_Amt

9 0.359458 0.788716 Total_Trans_Ct

10 -0.012862 0.309368 Total_Ct_Chng_Q4_Q1

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 4/8

PC1 PC2 variable

11 -0.718652 0.411826 Avg_Utilization_Ratio

fig.update_layout(height=600,width=500, title_text='loadings plot')

fig.add_shape(type="line", x0=-1, y0=0,x1=1,y1=0, line=dict(color="RoyalBlue",width=

We can proceed with prediction using logistic regression:

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 6/8

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

finalDf = pd.concat([principalDF, df[['Attrition_Flag']]], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(Xfinal,yfinal,test_size=0.3)

Out[23]: [0.8538993089832182, 0.872326423165515, 0.8848305363606449]

localhost:8888/nbconvert/html/OneDrive/Bishal/OneDrive/Python learning/Principal Component Analysis/PCA.ipynb?download=false 8/8

You might also like