Predicting Churn

tugasan analitik

Uploaded by

SAHARUDIN BIN JAPARUDIN Moe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Predicting Churn

tugasan analitik

Uploaded by

SAHARUDIN BIN JAPARUDIN Moe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Churn Modelling

Data: Customer Data

Cross-industry
standard process
for data mining
1 Business Understanding
Customer churn is when an existing
customer, user, player, subscriber or any
kind of return client stops doing business
or ends the relationship with a company.
The churn rate is the percentage of
subscribers to a service who discontinue
their subscriptions to the service within a
given time period. For a company to
expand its clientele, its growth rate, as
measured by the number of new https://pakman.com/churn-is-the-single-
metric-that-determines-the-success-of-
customers, must exceed its churn rate. your-subscription-service-6e82d9d9ea01
This rate is generally expressed as a
percentage.
https://www.netigate.net/articles/customer-
satisfaction/customer-churn-meaning/
Churn Analysis
In order to build a sustainable business, companies need to focus their
efforts on reducing customer churn.
According to the authors of “Leading on the Edge of Chaos”, reducing
customer churn by 5% can increase profits 25-125%.
Therefore to reduce churn, most companies perform customer churn
analysis.
But what is customer churn analysis and what are its benefits?

https://www.gainsight.com/your-success/what-is-customer-
churn-analysis/
What are its benefits
• Converts structured and unstructured data/information into
meaningful insights
• Utilizes these insights to predict customers who are likely to churn
• Identifies the causes for churn and works to resolve those issues
• Engages with customers to foster relationships
• Implements effective programs for customer retention

https://www.gainsight.com/your-success/what-is-customer-
churn-analysis/
Cross-industry
standard process
for data mining
2. Data Understanding
1. Selecting the Data: Customer Data.xlsx
2. Check the Headline (Name of Attribute). If there are no headers, remove
the annotation, telling RM that the data starts directly in the first row. All
attribute will get generic names such as att1, att2, etc.
3. Checking Data Types. Eg., PostalCode type as integer is ok but do we want
RM to perform mathematical operation on PostalCode? So polynominal is
better.
4. Data Exploration using the Statistics View.
Cross-industry
standard process
for data mining
Data Preparation
• Issues found:
• Missing values: ChurnDate contains a lot of missing values. Age and Gender
too.
• Range: Customer should between 16 to 100.
• Gender: With have four!
• Irrelevant attributes: Which one?
• ID Attributes: Can confuse the algorithm.. Remove them.
• No Label: Make RM ignore rowNumber attribute by assigning either the
predefined id role or a custom role. Use Set Role operator.
3. Modeling using Machine Learning
Machine learning is an application of artificial intelligence (AI) that
provides systems the ability to automatically learn and improve from
experience without being explicitly programmed.

Machine learning focuses on the development of computer programs

that can access data and use it learn for themselves.
If you don’t machine learning, what are other method to develop
model?
K-NN
• When do we use KNN algorithm?
• KNN can be used for both classification and regression predictive
problems. However, it is more widely used in classification problems
in the industry. To evaluate any technique we generally look at 3
important aspects:
K-NN Algorithm
Load the data
Initialise the value of k
For getting the predicted class, iterate from 1 to total number of
training data points
• Calculate the distance between test data and each row of training data. Here
we will use Euclidean, Cosine, etc. distance as our distance metric. Euclidean
is the most popular method.
• Sort the calculated distances in ascending order based on distance values
• Get top k rows from the sorted array
• Get the most frequent class of these rows
• Return the predicted class
https://www.analyticsvidhya.com/blog/2018/03/introduction-
k-neighbours-algorithm-clustering/
Distance
Measurement
How KNN Works?

Let’s take a simple case to

understand this algorithm.
Following is a spread of red
circles (RC) and green squares
(GS)
You intend to find out the
class of the blue star (BS) .
BS can either be RC or GS
and nothing else.
The “K” is KNN algorithm is
the nearest neighbors we
wish to take vote from.
Let’s say K = 3. Hence, we
will now make a circle with
BS as center just as big as
to enclose only three
datapoints on the plane.
Refer to following diagram
for more details:
The three closest points to BS is all RC.
Hence, with good confidence level we can say that the BS should
belong to the class RC.
Here, the choice became very obvious as all three votes from the
closest neighbor went to RC. The choice of the parameter K is very
crucial in this algorithm.
The error rate at K=1 is always zero for the training sample. This is because the
closest point to any training data point is itself.
Hence the prediction is always accurate with K=1. Still want to use K=1?
At K=1, we were overfitting the boundaries. Hence, error rate initially decreases
and reaches a minima.
After the minima point, it then increase with increasing K. To get the optimal value
of K, you can segregate the training and validation from the initial dataset.
ceil(date_diff(LastTransaction,date_parse("31
/12/2014"))/(1000*3600*24))
End notes on K-NN
• KNN algorithm is one of the simplest classification algorithm but it
can give highly competitive results.
• KNN algorithm can also be used for regression problems.
• Higher value of K create smoother prediction boundaries and ignore
stray data points.
• Lower K create very detailed model, can handle complex data but are
prone to errors induced by noisy or unclean data.
• The only difference from the discussed methodology will be using
averages of nearest neighbors rather than voting from nearest
neighbors. KNN can be coded in a single line on R
https://www.analyticsvidhya.com/blog/2018/03/introduction-
k-neighbours-algorithm-clustering/
Operators & Concept Involves
• Retrieve Data • Data Management
• Set Role • Set class/label
• Cross Validation • Technique for evaluating
• Sample prediction method
• Decision Tree • Machine learning
• Apply Model • Training & testing: split data
• Confusion Matrix
Model Validation
Model validation is key!
This cross-validation splits the dataset for training and, then, for
independent testing.

This splitting is done several times to get a better performance

estimate.

Double-click on the operator to take a look at the training itself.

Example
• Double click Cross Validation
Many more customers stay than churn (hopefully!).
In order for our model to learn how churners behave, we re-balance
the data to focus on the case we're interested in.
This is like a magnifying glass on churn!
Take a look at the 'Sample' operator.
Decision Tree
Click the Decision Tree operator.
Try different values for the
parameters, in particular, the
'minimal gain'.

The 'Wisdom of the Crowds'

recommendation helps you find
reasonable values.
Apply Model & Performance (Binomial
Classification)
The model trained on the training data is applied to the independent
test data set and the model performance is calculated.

The performance values obtained on the different folds of the cross-

validation are finally averaged to produce an average performance
measure as well as a measure of its dispersion - which gives an
estimate of the model stability when applied to different data samples.
Outputs
• A tree model (trained on the complete input data) that analyzes
churn behavior and can be applied to any individual customer to
estimate churn probability.
• The original input data
• The estimated (i.e. cross-validated) performance of the model.
10-fold Cross Validation
• Cross-validation is a technique to evaluate predictive models by partitioning the
original sample into a training set to train the model, and a test set to evaluate it.
• In k-fold cross-validation, the original sample is randomly partitioned into k equal
size subsamples.
• Of the k subsamples, a single subsample is retained as the validation data for
testing the model, and the remaining k-1 subsamples are used as training data.
• The cross-validation process is then repeated k times (the folds), with each of the
k subsamples used exactly once as the validation data.
• The k results from the folds can then be averaged (or otherwise combined) to
produce a single estimation. The advantage of this method is that all observations
are used for both training and validation, and each observation is used for
validation exactly once.
Accuracy
Understanding Confusion Matrix
A confusion matrix is a table that is often used to describe the
performance of a classification model (or "classifier") on a set of test
data for which the true values are known.

The confusion matrix itself is relatively simple to understand, but the

related terminology can be confusing.
What can we learn from this results?

• There are two possible predicted classes: ”TRUE" and ”FALSE”.

• If we were predicting churn, for example, ”TRUE" would mean they have
Churned last year, and ”FALSE" would mean they don't churned lay year.
• The classifier made a total of 9990 predictions (e.g., 9990 customers were
being tested for the presence of that disease).
• Out of those 9990 cases, the classifier predicted ”FALSE" 9823 times, and
”TRUE" 167 times.
• In reality, 9969 customers in the data have not churned, and 21 have
churned.
Basic Terms
• true positives (TP): These are cases in which we predicted TRUE (they
have churned), and they do.
• true negatives (TN): We predicted FALSE, and they don't churn.
• false positives (FP): We predicted TRUE, but they don't actually have
churned. (Also known as a "Type I error.")
• false negatives (FN): We predicted FALSE, but they actually do have
churned. (Also known as a "Type II error.")
List of rates that are
often computed from
a confusion matrix
for a binary classifier
Overfitting
In statistics, overfitting is "the production of an analysis that
corresponds too closely or exactly to a particular set of data, and may
therefore fail to fit additional data or predict future observations
reliably".

Wikipedia

Internship Project
100% (1)
Internship Project
29 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Telco Customer Churn
100% (2)
Telco Customer Churn
11 pages
08 Classification
No ratings yet
08 Classification
26 pages
Course Project Report: Indian Institute of Technology, Kanpur
No ratings yet
Course Project Report: Indian Institute of Technology, Kanpur
15 pages
Case Study - Churn Mdel Prediction
No ratings yet
Case Study - Churn Mdel Prediction
77 pages
CIA 4
No ratings yet
CIA 4
18 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
حل المشروع
No ratings yet
حل المشروع
13 pages
It 311-Ads Module 5
No ratings yet
It 311-Ads Module 5
9 pages
Telcom Customer Churn JMP Summit Presentation V7h
No ratings yet
Telcom Customer Churn JMP Summit Presentation V7h
6 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
Churn Prediction
100% (3)
Churn Prediction
41 pages
(IJCST-V11I3P15) :auti Divya.S, Deshmukh Rajeshwari.B, Dumbre Komal.G, Dr.A.A.Khatri
No ratings yet
(IJCST-V11I3P15) :auti Divya.S, Deshmukh Rajeshwari.B, Dumbre Komal.G, Dr.A.A.Khatri
5 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Churn Prediction and ML
No ratings yet
Churn Prediction and ML
9 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Lab Assignment 1 Ucs551
No ratings yet
Lab Assignment 1 Ucs551
23 pages
Report
No ratings yet
Report
17 pages
knime-press-practicing-data-science-4.7-plain
No ratings yet
knime-press-practicing-data-science-4.7-plain
158 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
L4-ML Introduction To Machine Learning Algorithms 2024-01
No ratings yet
L4-ML Introduction To Machine Learning Algorithms 2024-01
44 pages
Lectures 7 and 8 - Data Anaysis in Management - MBM
No ratings yet
Lectures 7 and 8 - Data Anaysis in Management - MBM
78 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Data Mining Intro
No ratings yet
Data Mining Intro
46 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Classification
No ratings yet
Classification
58 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
2021 01 Slides l4 ML
No ratings yet
2021 01 Slides l4 ML
253 pages
Writeup On Bank Customer Churn Prediction
No ratings yet
Writeup On Bank Customer Churn Prediction
14 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Data Mining
No ratings yet
Data Mining
33 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Introduction To Business Analytics: Alka Vaidya Nibm
100% (1)
Introduction To Business Analytics: Alka Vaidya Nibm
41 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Tajudin Mohammed
No ratings yet
Tajudin Mohammed
78 pages
2-Overview of Data Mining
No ratings yet
2-Overview of Data Mining
19 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
12_23ECE216_Nearest Neighbors
No ratings yet
12_23ECE216_Nearest Neighbors
29 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Churn Analysis Report
No ratings yet
Churn Analysis Report
28 pages
erum (1) (1)
No ratings yet
erum (1) (1)
18 pages
An Effective Classifier for Predicting Churn in Telecommunication
No ratings yet
An Effective Classifier for Predicting Churn in Telecommunication
9 pages
Data Mining Primer
No ratings yet
Data Mining Primer
5 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Classification
No ratings yet
Classification
50 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Log Book Record of Work Experience Jan 2020
No ratings yet
Log Book Record of Work Experience Jan 2020
2 pages
Search Results - TAŞTAN VIP Transfer MÜŞTERİ
No ratings yet
Search Results - TAŞTAN VIP Transfer MÜŞTERİ
3 pages
FMM125 Quick Manual v1.5
No ratings yet
FMM125 Quick Manual v1.5
16 pages
CP 4
No ratings yet
CP 4
258 pages
GT1030 SL 2G BRK
No ratings yet
GT1030 SL 2G BRK
3 pages
ABCs of Electronics: An Easy Guide to Electronics Engineering (Maker Innovations Series) 1st Edition Farzin Asadi - Download the ebook in PDF with all chapters to read anytime
No ratings yet
ABCs of Electronics: An Easy Guide to Electronics Engineering (Maker Innovations Series) 1st Edition Farzin Asadi - Download the ebook in PDF with all chapters to read anytime
70 pages
Kapil Slip
No ratings yet
Kapil Slip
1 page
User Guide - Unlock 2e P+ LS
No ratings yet
User Guide - Unlock 2e P+ LS
20 pages
Module 3 Kani' Method
No ratings yet
Module 3 Kani' Method
7 pages
EcoStruxure Panel Server - PAS800P
No ratings yet
EcoStruxure Panel Server - PAS800P
3 pages
PLC Elevator Controller Report
No ratings yet
PLC Elevator Controller Report
16 pages
Đáp Án Chi Tiết - Bài Tập Đội Tuyển 15-8
No ratings yet
Đáp Án Chi Tiết - Bài Tập Đội Tuyển 15-8
16 pages
Chapter 5
No ratings yet
Chapter 5
2 pages
Santak Castle Series Rack Type UPS
No ratings yet
Santak Castle Series Rack Type UPS
8 pages
R122 Quick Notes
No ratings yet
R122 Quick Notes
8 pages
Defensive
No ratings yet
Defensive
5 pages
Core-Abap GENSOFT Technologies: Mobile: E-Mail: Website
No ratings yet
Core-Abap GENSOFT Technologies: Mobile: E-Mail: Website
4 pages
2nd Year Syllabus
No ratings yet
2nd Year Syllabus
22 pages
Pdfmergerfreecom Free Download Terjemahan Kitab Durratun Nashihincompress
No ratings yet
Pdfmergerfreecom Free Download Terjemahan Kitab Durratun Nashihincompress
1 page
1984-00_Who's_who_at_Sinclair-OCRed-SQPP
No ratings yet
1984-00_Who's_who_at_Sinclair-OCRed-SQPP
83 pages
Tips Programación Phyton
No ratings yet
Tips Programación Phyton
27 pages
Icrs Project Report
100% (1)
Icrs Project Report
33 pages
Practice Set (ICT3)
No ratings yet
Practice Set (ICT3)
2 pages
Sample Letter of Transmittal
No ratings yet
Sample Letter of Transmittal
1 page
Modbus TCP and Its Client-Server Model and MQTT and Its Publish-Subscribe Model PDF
No ratings yet
Modbus TCP and Its Client-Server Model and MQTT and Its Publish-Subscribe Model PDF
8 pages
SOP 002 - Perform An On-Page SEO Audit On A Page
No ratings yet
SOP 002 - Perform An On-Page SEO Audit On A Page
20 pages
acadinfo
No ratings yet
acadinfo
10 pages
Any Number Not Included in The Reserved Private IP Address Range 10.0.0.0 - 10.255.255.255 172.16.0.0 - 172.31.255.255 192.168.0.0 - 192.168.255.255
No ratings yet
Any Number Not Included in The Reserved Private IP Address Range 10.0.0.0 - 10.255.255.255 172.16.0.0 - 172.31.255.255 192.168.0.0 - 192.168.255.255
28 pages