0% found this document useful (0 votes)
22 views

Predicting Churn

tugasan analitik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Predicting Churn

tugasan analitik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Churn Modelling

Data: Customer Data


Cross-industry
standard process
for data mining
1 Business Understanding
Customer churn is when an existing
customer, user, player, subscriber or any
kind of return client stops doing business
or ends the relationship with a company.
The churn rate is the percentage of
subscribers to a service who discontinue
their subscriptions to the service within a
given time period. For a company to
expand its clientele, its growth rate, as
measured by the number of new https://pakman.com/churn-is-the-single-
metric-that-determines-the-success-of-
customers, must exceed its churn rate. your-subscription-service-6e82d9d9ea01
This rate is generally expressed as a
percentage.
https://www.netigate.net/articles/customer-
satisfaction/customer-churn-meaning/
Churn Analysis
In order to build a sustainable business, companies need to focus their
efforts on reducing customer churn.
According to the authors of “Leading on the Edge of Chaos”, reducing
customer churn by 5% can increase profits 25-125%.
Therefore to reduce churn, most companies perform customer churn
analysis.
But what is customer churn analysis and what are its benefits?

https://www.gainsight.com/your-success/what-is-customer-
churn-analysis/
What are its benefits
• Converts structured and unstructured data/information into
meaningful insights
• Utilizes these insights to predict customers who are likely to churn
• Identifies the causes for churn and works to resolve those issues
• Engages with customers to foster relationships
• Implements effective programs for customer retention

https://www.gainsight.com/your-success/what-is-customer-
churn-analysis/
Cross-industry
standard process
for data mining
2. Data Understanding
1. Selecting the Data: Customer Data.xlsx
2. Check the Headline (Name of Attribute). If there are no headers, remove
the annotation, telling RM that the data starts directly in the first row. All
attribute will get generic names such as att1, att2, etc.
3. Checking Data Types. Eg., PostalCode type as integer is ok but do we want
RM to perform mathematical operation on PostalCode? So polynominal is
better.
4. Data Exploration using the Statistics View.
Cross-industry
standard process
for data mining
Data Preparation
• Issues found:
• Missing values: ChurnDate contains a lot of missing values. Age and Gender
too.
• Range: Customer should between 16 to 100.
• Gender: With have four!
• Irrelevant attributes: Which one?
• ID Attributes: Can confuse the algorithm.. Remove them.
• No Label: Make RM ignore rowNumber attribute by assigning either the
predefined id role or a custom role. Use Set Role operator.
3. Modeling using Machine Learning
Machine learning is an application of artificial intelligence (AI) that
provides systems the ability to automatically learn and improve from
experience without being explicitly programmed.

Machine learning focuses on the development of computer programs


that can access data and use it learn for themselves.
If you don’t machine learning, what are other method to develop
model?
K-NN
• When do we use KNN algorithm?
• KNN can be used for both classification and regression predictive
problems. However, it is more widely used in classification problems
in the industry. To evaluate any technique we generally look at 3
important aspects:
K-NN Algorithm
Load the data
Initialise the value of k
For getting the predicted class, iterate from 1 to total number of
training data points
• Calculate the distance between test data and each row of training data. Here
we will use Euclidean, Cosine, etc. distance as our distance metric. Euclidean
is the most popular method.
• Sort the calculated distances in ascending order based on distance values
• Get top k rows from the sorted array
• Get the most frequent class of these rows
• Return the predicted class
https://www.analyticsvidhya.com/blog/2018/03/introduction-
k-neighbours-algorithm-clustering/
Distance
Measurement
How KNN Works?

Let’s take a simple case to


understand this algorithm.
Following is a spread of red
circles (RC) and green squares
(GS)
You intend to find out the
class of the blue star (BS) .
BS can either be RC or GS
and nothing else.
The “K” is KNN algorithm is
the nearest neighbors we
wish to take vote from.
Let’s say K = 3. Hence, we
will now make a circle with
BS as center just as big as
to enclose only three
datapoints on the plane.
Refer to following diagram
for more details:
The three closest points to BS is all RC.
Hence, with good confidence level we can say that the BS should
belong to the class RC.
Here, the choice became very obvious as all three votes from the
closest neighbor went to RC. The choice of the parameter K is very
crucial in this algorithm.
The error rate at K=1 is always zero for the training sample. This is because the
closest point to any training data point is itself.
Hence the prediction is always accurate with K=1. Still want to use K=1?
At K=1, we were overfitting the boundaries. Hence, error rate initially decreases
and reaches a minima.
After the minima point, it then increase with increasing K. To get the optimal value
of K, you can segregate the training and validation from the initial dataset.
ceil(date_diff(LastTransaction,date_parse("31
/12/2014"))/(1000*3600*24))
End notes on K-NN
• KNN algorithm is one of the simplest classification algorithm but it
can give highly competitive results.
• KNN algorithm can also be used for regression problems.
• Higher value of K create smoother prediction boundaries and ignore
stray data points.
• Lower K create very detailed model, can handle complex data but are
prone to errors induced by noisy or unclean data.
• The only difference from the discussed methodology will be using
averages of nearest neighbors rather than voting from nearest
neighbors. KNN can be coded in a single line on R
https://www.analyticsvidhya.com/blog/2018/03/introduction-
k-neighbours-algorithm-clustering/
Operators & Concept Involves
• Retrieve Data • Data Management
• Set Role • Set class/label
• Cross Validation • Technique for evaluating
• Sample prediction method
• Decision Tree • Machine learning
• Apply Model • Training & testing: split data
• Confusion Matrix
Model Validation
Model validation is key!
This cross-validation splits the dataset for training and, then, for
independent testing.

This splitting is done several times to get a better performance


estimate.

Double-click on the operator to take a look at the training itself.


Example
• Double click Cross Validation
Many more customers stay than churn (hopefully!).
In order for our model to learn how churners behave, we re-balance
the data to focus on the case we're interested in.
This is like a magnifying glass on churn!
Take a look at the 'Sample' operator.
Decision Tree
Click the Decision Tree operator.
Try different values for the
parameters, in particular, the
'minimal gain'.

The 'Wisdom of the Crowds'


recommendation helps you find
reasonable values.
Apply Model & Performance (Binomial
Classification)
The model trained on the training data is applied to the independent
test data set and the model performance is calculated.

The performance values obtained on the different folds of the cross-


validation are finally averaged to produce an average performance
measure as well as a measure of its dispersion - which gives an
estimate of the model stability when applied to different data samples.
Outputs
• A tree model (trained on the complete input data) that analyzes
churn behavior and can be applied to any individual customer to
estimate churn probability.
• The original input data
• The estimated (i.e. cross-validated) performance of the model.
10-fold Cross Validation
• Cross-validation is a technique to evaluate predictive models by partitioning the
original sample into a training set to train the model, and a test set to evaluate it.
• In k-fold cross-validation, the original sample is randomly partitioned into k equal
size subsamples.
• Of the k subsamples, a single subsample is retained as the validation data for
testing the model, and the remaining k-1 subsamples are used as training data.
• The cross-validation process is then repeated k times (the folds), with each of the
k subsamples used exactly once as the validation data.
• The k results from the folds can then be averaged (or otherwise combined) to
produce a single estimation. The advantage of this method is that all observations
are used for both training and validation, and each observation is used for
validation exactly once.
Accuracy
Understanding Confusion Matrix
A confusion matrix is a table that is often used to describe the
performance of a classification model (or "classifier") on a set of test
data for which the true values are known.

The confusion matrix itself is relatively simple to understand, but the


related terminology can be confusing.
What can we learn from this results?

• There are two possible predicted classes: ”TRUE" and ”FALSE”.


• If we were predicting churn, for example, ”TRUE" would mean they have
Churned last year, and ”FALSE" would mean they don't churned lay year.
• The classifier made a total of 9990 predictions (e.g., 9990 customers were
being tested for the presence of that disease).
• Out of those 9990 cases, the classifier predicted ”FALSE" 9823 times, and
”TRUE" 167 times.
• In reality, 9969 customers in the data have not churned, and 21 have
churned.
Basic Terms
• true positives (TP): These are cases in which we predicted TRUE (they
have churned), and they do.
• true negatives (TN): We predicted FALSE, and they don't churn.
• false positives (FP): We predicted TRUE, but they don't actually have
churned. (Also known as a "Type I error.")
• false negatives (FN): We predicted FALSE, but they actually do have
churned. (Also known as a "Type II error.")
List of rates that are
often computed from
a confusion matrix
for a binary classifier
Overfitting
In statistics, overfitting is "the production of an analysis that
corresponds too closely or exactly to a particular set of data, and may
therefore fail to fit additional data or predict future observations
reliably".

Wikipedia

You might also like