Final Churn Prediction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

CAPSTONE SYNOPSIS

Course Code: CSE339

Topic: Churn Prediction

TEAM MEMBERS:

S.No Reg. No. Name

1 11608080 Aadil Ahmad Malla


2 11602753 Sanghati Ghosh
3 11612899 Gurpreet Singh
4 11617553 Mustahseen Shafi
5 11608072 Danish Farooq

SUBMITTED TO OUR VIRTUOUS MENTOR:

Ms. Sandeep Kaur


Assistant Professor, CSE department
School of Computer Science & Engineering
Lovely Professional University, Phagwara
DECLARATION

We hereby declare that the project work entitled Churn Prediction is an authentic
record of our own work carried out as requirements of Capstone Project for the
award of B.Tech degree in Computer Science and Engineering from Lovely
Professional University, Phagwara, under the guidance of Ms. Sandeep Kaur.
All the information furnished in this capstone project report is based on our own
intensive work and is genuine.

Project Group:-

Aadil Ahmad Malla


11608080

Sanghati Ghosh
11602753

Gurpreet Singh
11612899

Mustahseen Shafi
11617553

Danish Farooq
11608072

Date: 27-04-2019

I
CERTIFICATE

This is to certify that the declaration statement made by this group of students is
correct to the best of my knowledge and belief. They have completed this Capstone
Project under my guidance and supervision. The present work is the result of their
original investigation, effort and study. No part of the work has ever been
submitted for any other degree at any University. The Capstone Project is fit for
the submission and partial fulfilment of the conditions for the award of B.Tech
degree in Computer Science and Engineering from Lovely Professional University,
Phagwara.

Date: 27-04-2019

Mentor

Ms. Sandeep Kaur


Assistant Professor, CSE department
School of Computer Science & Engineering
Lovely Professional University,
Phagwara, Punjab

II
TABLE OF CONTENTS

I. Declaration I
II.Certificate II
III.Table of content III

1. Introduction 1
1.1 Description of the project 1
1.2 Objective of the project 2
1.3 Scope of the project 3
1.3.1 Use Case Model (If applicable)
1.4 Outcomes 4

2. System Description 4
2.1 Assumptions and Dependencies (If applicable)
2.2 Functional Requirements 4
2.3 Non-Functional Requirements 5

3. Design
3.1 Analyzing the problem 5
3.2 Designing 6
3.3 ER diagram 8
3.4 Data Flow Diagram 9

4. Scheduling and Estimates 11

III
1. Introduction:

Customer churn is referred to as the inclination of a customer to leave a service


provider. Customer churn prediction is the process of identifying those customers
who could leave or switch from the current service provider company due to certain
reasons. The major aim of churn prediction model is to identify such customers so
that the retention strategies could be targeted upon them and the company may
flourish by maximizing its overall revenue. Telecommunication companies over
the years are experiencing the highest annual churn rate from 20% to 40%.The
churn models usually assess all your customers and aim to predict churn and
loyalty behavior based on the analysis of demographic data, customer
purchases history, service usage and billing data. All characteristics and
transactions are analyzed, ranked and modelled to create customer or
segment loyalty profiles. Based upon the resulting loyalty or churn prediction
score a predetermined marketing and segment strategy can be
recommended for each customer, group of customers or a whole customer
segment. The telecommunication sector is facing severe threat of customer churn.
According to Wie and Chiu, 2002, wireless telecom industry is facing with the threat
of losing 27% of its customers every year, which would result in huge revenue loss.
It is also an adopted fact that adding or acquiring a new customer costs 5 to 10 times
more to add a new customer than retaining an old customer with the company.
it has become a belief that the best marketing strategy is to retain the existing
subscribers or more simply to avoid customer churn.

1.1 Description of the project :

Churn prediction is a function that involves systematic analysis of customer data


for identifying and analyzing patterns and trends of customer loyalty and blend.
The detected patterns and trends can be used by telecommunication industries to
improve customer relationship and at the same time improve net profit.
Identification of churners and no churners is a time consuming and critical task,
that has to be performed carefully, as the future growth of the company relies on
the result of such an analysis. This task is considered challenging because of two
reasons:

1
(i) customer information volume has increased and
(ii) the data available is inconsistent and are incomplete thus making the task of
formal analysis a difficult task.
Further, due to its vast size, investigation and analysis of customer database takes
longer duration due to the complexity of these issues. As information science and
technology progress, sophisticated data mining and artificial intelligence tools are
increasingly accessible to the telecommunication sector. These techniques combined
with state-of-the-art computers can process thousands of instructions in seconds,
saving precious time. In addition, installing and running software often costs less
than hiring and training personnel. Computers are also less prone to errors than
human investigators, especially those who work long hours. The current needs of
telecom companies are a tool that can be used to help them to understand customer
patterns and locate churners and possible actions that can be taken to convert the
churners to non-churners. This tool is called as ‘Customer Loyalty Assessment
Model and Actionable Knowledge Discovery System’ and the main goal is to
provide timely and pertinent customer information to decision-makers in a company.
The present research work focus on developing such a system that can be used by
telecom industry easily discover customer patterns and trends, make forecasts, find
relationships and possible explanations and identify possible churners. The proposed
system proposes the use of data mining techniques during the design and
development.

1.1 Objective of the project:

The objective of this analysis is to predict customer churn. This can be done by
understanding the customer usage pattern and if the customer has churned or not and
try to reduce the number of churners. For this, few algorithms are used to predict the
churn score based on usage pattern. The predictors provided are as follows:

• customerID
• gender
• SeniorCitizen

2
• Partner
• Dependents
• PhoneService
• MultipleLines
• InternetService
• OnlineSecurity
• DeviceProtection
• OnlineBackup
• DeviceProtection
• TechSupport
• StreamingTV
• StreamingMovies
• Contract
• PaperlessBilling
• PaperlessBilling
• MonthlyCharges
• TotalCharges
• Churn

1.3 Scope of the project:


The future scope of this project would be to get an accuracy of 100% and for this we
can make use of decision tree and in particular using a hybrid classification technique
to point out existing suggestion between churn prediction and customer lifetime
value. The result and the accuracy can be bettered if we use more variables in the
data. The dynamic nature of the industry has ensured that data mining has been
increasingly significant, and the industry is totally relying on the results that the data
would help predict.

The proposed models can be further enhanced, if the processes can be parallelizing.
This is feasible, by identifying operations that are independent to each other and
propose a parallel architecture to improve the performance.

3
1.2 Outcomes :

Calculation of churn rate, average churn rate and reasons of churning.


Calculate the churn pattern of customers thereby reducing the chances of future
churn.

2. System Description:

2.2 Functional Requirements :

The following Functional Requirements need to be defined:


1. Interoperability / Open Architecture: There is no standard infrastructure platform.
The key consideration is whether the analytics solution works with multiple
platforms or is a closed add-on to one platform.
2. Machine Learning Methodology: Each Predictive Asset Maintenance solution is
based on a Big Data methodology. Is this a manual process or is Artificial
Intelligence used to automatically select the optimal algorithm for the specific
scenario?
3. User Interface: Mobile application and web portal.

The following Non-Functional Requirement needs to be defined :

1. Scalability: Analytics platform must be applicable to a machine or facility of any


size. The solution must be able to add assets without a need for any incremental
investment in hardware, software or dedicated labour hours.
2. Performance: The objective for an industrial analytics platform is to provide the
production facility with accurate and timely data. Targeted performance
measurements of the following will need to be defined.
3. Portability: We have been planning to deploy the system as a website and keeping
in mind the role of web services in enterprise website integration, we can say that
our system is indeed portable enough.

4
2.3 Non-Functional Requirements:

Other Non-Functional Requirements


The following is a list of non-functional requirements:

1. Response Time
2.Availability
3.Stability
4.Maintainability
5.Usability

3.DESIGN

3.1 Analyzing the problem:

Symptoms to keep in mind when analyzing customer churn:

1.Declining Sentiment

In our ill-lustration, declining sentiment is a lot like a tickle in the back of your
throat. It may seem like a small issue at the moment, but it can actually end up much
worse than some of the other symptoms in the long run (think strep throat). NPS
estimates the amount of positive sentiment going out about your business by asking
your customers “on a scale from 0-10 how likely are you to recommend this brand
to your friends and family?” Customers are then categorized into detractors, passives
or promoters based on their responses. The more promoters (and less detractors) a
brand has, the higher their NPS score will be. High NPS scores are correlated with
satisfied customers and the highest NPS scores create brand advocates.

5
2. Declining Average Order Value:

Average order value is one of the most calculated metrics in today’s modern
business. AOV is one of the easiest ways to keep a finger on the pulse of your
business. It makes sense to wonder how much money your customers are spending
with you and it is perfectly rational to guide your business strategy based on the
answer. Like coughing, average order value is easy to notice because it is a “loud”
metric. In the same way it is hard to ignore a bad cough, it’s pretty hard to miss a
drop-in order value. Changes in AOV will very frequently be felt in both a brand’s
revenue and profit lines and as a result they often prompt immediate action. High
average order values mean customers are spending more per purchase on your brand
and are representative of a larger financial commitment to your products. This
financial commitment is often indicative of an emotional investment or satisfaction
that drives larger purchases.

3.3 Designing:

1. Gather the data about your customers (the more the better)it should contain
info about purchasing history, length and number of interactions of various
types, problems that might have impacted your customers etc.

2. Mark customers that have churned so that you can configure algorithm to
know what value (label) it should predict.

3. Put that data into a file or a database so it is available to machine learning


engine of your choosing and have it use decision tree algorithm.

4. Have your engine run through the data to build decision trees.

5. Configure hyper parameter tuning so that ML engine would pick most


important attributes on its own.

6. In case of churn you might have only a small fraction of customers that are
marked as churners resulting in algorithm producing low quality
6
predictions; in that case try a process called “partitioning” - it will generate
more records similar to the ones representing churners and give you more
balanced dataset.

7. Review the model parameters - in case of churn models you should be


optimizing for recall rather than precision; higher recall allows you to pick
more potential churners and even if you get a “false positive” a cost of not
loosing a customer should still be worth it .

Important notes about your data:


• make sure to remove personal data from training data set, you don’t want
them to lead by accident (after all that cloud ML engine is another persons
computer).

• if you can’t remove data simply anonymize it - it doesn’t matter for ML


engine if it “sees” “john” or “ashgdkahsdk”.

• make sure to let your customers know that their data will be used for
profiling (lot of fuzz about it in regard to GDPR).

7
3.3 E-R Diagrams :
The ER or (Entity Relational Model) is a high-level conceptual data model
diagram. Entity-Relation model is based on the notion of real-world entities and
the relationship between them.
ER modeling helps you to analyze data requirements systematically to produce a
well-designed database. So, it is considered a best practice to complete ER
modeling before implementing your database.

Entity Relationship Diagram

Churn
Protection Churn Reasons

Customers

Conscious Unconscio
Churn us

Loss % by
Churn
Average
Churn Rate
Market

Profit % by
Churn Average
Protection Retention
Rate
Buy

Level of Use Id
of Service
Customers
Name
Satisfaction 8
Level
Flag for Status
Offers
3.4 Data flow diagrams:
The DFD (also known as a bubble chart) is a hierarchical graphical model of a
system that shows the different processing activities or functions that the system
performs and the data interchange among these functions. Each function is
considered as a processing station (or process) that consumes some input data and
produces some output data. The system is represented in terms of the input data to
the system, various processing carried out on these data, and the output data
generated by the system.
A DFD model uses a very limited number of primitive symbols, to represent the
functions performed by a system and the data flow among these functions.
Data flow diagram symbol:

9
Data Flow Diagram

Email &
User/Admin
Password
1.0

Email & Market


Password Analysis
2.0 2.0

Customer
Analysis
2.0.0

Churn
Analysis
2.0.1

Level of Level of
Satisfaction Use
2.0.0.1 2.0.0.2

Average Average
Retention Churn
3.0 3.1

3
10
4. SCHEDULING AND ESTIMATES :

4.1. Scheduling – There are many different types of estimation techniques used in
project management with various streams as Engineering, IT etc. A Project often
contains 6 major constraints - Scope, Time, Cost, Quality, Resources and Risk in
order to accurately estimate the project.
• How much work is to be Estimated (scope)?
The work will be divided equally among all the team members.
• How to estimate the project (techniques)?
We will use following R libraries and models for our project:
library(plyr)
library(corrplot)
library(ggplot2)
library(gridExtra)
library(ggthemes)
library(caret)
library(MASS)
library(randomForest)
library(party)
• How much time it will require to complete the project (Schedule)?
We will complete the entire first phase of the project by end of our
Summer Vacations and then we will work on the extra added second
phase of the project.
• Who will be doing the project (resources)?
The project will be done as a team by
1)Danish Farooq
2)Mustahseen Shafi
3) Gurpreet Singh
4) Aadil Ahmad Malla
5) Sanghati Ghosh

11
• What is the budget required to deliver the project (cost)?
We don’t have to pay any cost for the project.
• Any intermediary dependencies that may delay or impact the
project (Risks)?
The main problem could be that if somebody gets his/her placement than
the members in the team would decrease and it could lead to latency in
the completion of the project.

4.2 Estimates :
There are 3 major parts to project estimation mainly:-
• Effort estimation
The efficiency and the work of the team will be divided equally among
all the individuals.
• Cost estimation
There will not be involvement of any cash related transactions.
• Resource estimate
The packages and the libraries involved are already mentioned above.

12

You might also like