Final Churn Prediction
Final Churn Prediction
Final Churn Prediction
TEAM MEMBERS:
We hereby declare that the project work entitled Churn Prediction is an authentic
record of our own work carried out as requirements of Capstone Project for the
award of B.Tech degree in Computer Science and Engineering from Lovely
Professional University, Phagwara, under the guidance of Ms. Sandeep Kaur.
All the information furnished in this capstone project report is based on our own
intensive work and is genuine.
Project Group:-
Sanghati Ghosh
11602753
Gurpreet Singh
11612899
Mustahseen Shafi
11617553
Danish Farooq
11608072
Date: 27-04-2019
I
CERTIFICATE
This is to certify that the declaration statement made by this group of students is
correct to the best of my knowledge and belief. They have completed this Capstone
Project under my guidance and supervision. The present work is the result of their
original investigation, effort and study. No part of the work has ever been
submitted for any other degree at any University. The Capstone Project is fit for
the submission and partial fulfilment of the conditions for the award of B.Tech
degree in Computer Science and Engineering from Lovely Professional University,
Phagwara.
Date: 27-04-2019
Mentor
II
TABLE OF CONTENTS
I. Declaration I
II.Certificate II
III.Table of content III
1. Introduction 1
1.1 Description of the project 1
1.2 Objective of the project 2
1.3 Scope of the project 3
1.3.1 Use Case Model (If applicable)
1.4 Outcomes 4
2. System Description 4
2.1 Assumptions and Dependencies (If applicable)
2.2 Functional Requirements 4
2.3 Non-Functional Requirements 5
3. Design
3.1 Analyzing the problem 5
3.2 Designing 6
3.3 ER diagram 8
3.4 Data Flow Diagram 9
III
1. Introduction:
1
(i) customer information volume has increased and
(ii) the data available is inconsistent and are incomplete thus making the task of
formal analysis a difficult task.
Further, due to its vast size, investigation and analysis of customer database takes
longer duration due to the complexity of these issues. As information science and
technology progress, sophisticated data mining and artificial intelligence tools are
increasingly accessible to the telecommunication sector. These techniques combined
with state-of-the-art computers can process thousands of instructions in seconds,
saving precious time. In addition, installing and running software often costs less
than hiring and training personnel. Computers are also less prone to errors than
human investigators, especially those who work long hours. The current needs of
telecom companies are a tool that can be used to help them to understand customer
patterns and locate churners and possible actions that can be taken to convert the
churners to non-churners. This tool is called as ‘Customer Loyalty Assessment
Model and Actionable Knowledge Discovery System’ and the main goal is to
provide timely and pertinent customer information to decision-makers in a company.
The present research work focus on developing such a system that can be used by
telecom industry easily discover customer patterns and trends, make forecasts, find
relationships and possible explanations and identify possible churners. The proposed
system proposes the use of data mining techniques during the design and
development.
The objective of this analysis is to predict customer churn. This can be done by
understanding the customer usage pattern and if the customer has churned or not and
try to reduce the number of churners. For this, few algorithms are used to predict the
churn score based on usage pattern. The predictors provided are as follows:
• customerID
• gender
• SeniorCitizen
2
• Partner
• Dependents
• PhoneService
• MultipleLines
• InternetService
• OnlineSecurity
• DeviceProtection
• OnlineBackup
• DeviceProtection
• TechSupport
• StreamingTV
• StreamingMovies
• Contract
• PaperlessBilling
• PaperlessBilling
• MonthlyCharges
• TotalCharges
• Churn
The proposed models can be further enhanced, if the processes can be parallelizing.
This is feasible, by identifying operations that are independent to each other and
propose a parallel architecture to improve the performance.
3
1.2 Outcomes :
2. System Description:
4
2.3 Non-Functional Requirements:
1. Response Time
2.Availability
3.Stability
4.Maintainability
5.Usability
3.DESIGN
1.Declining Sentiment
In our ill-lustration, declining sentiment is a lot like a tickle in the back of your
throat. It may seem like a small issue at the moment, but it can actually end up much
worse than some of the other symptoms in the long run (think strep throat). NPS
estimates the amount of positive sentiment going out about your business by asking
your customers “on a scale from 0-10 how likely are you to recommend this brand
to your friends and family?” Customers are then categorized into detractors, passives
or promoters based on their responses. The more promoters (and less detractors) a
brand has, the higher their NPS score will be. High NPS scores are correlated with
satisfied customers and the highest NPS scores create brand advocates.
5
2. Declining Average Order Value:
Average order value is one of the most calculated metrics in today’s modern
business. AOV is one of the easiest ways to keep a finger on the pulse of your
business. It makes sense to wonder how much money your customers are spending
with you and it is perfectly rational to guide your business strategy based on the
answer. Like coughing, average order value is easy to notice because it is a “loud”
metric. In the same way it is hard to ignore a bad cough, it’s pretty hard to miss a
drop-in order value. Changes in AOV will very frequently be felt in both a brand’s
revenue and profit lines and as a result they often prompt immediate action. High
average order values mean customers are spending more per purchase on your brand
and are representative of a larger financial commitment to your products. This
financial commitment is often indicative of an emotional investment or satisfaction
that drives larger purchases.
3.3 Designing:
1. Gather the data about your customers (the more the better)it should contain
info about purchasing history, length and number of interactions of various
types, problems that might have impacted your customers etc.
2. Mark customers that have churned so that you can configure algorithm to
know what value (label) it should predict.
4. Have your engine run through the data to build decision trees.
6. In case of churn you might have only a small fraction of customers that are
marked as churners resulting in algorithm producing low quality
6
predictions; in that case try a process called “partitioning” - it will generate
more records similar to the ones representing churners and give you more
balanced dataset.
• make sure to let your customers know that their data will be used for
profiling (lot of fuzz about it in regard to GDPR).
7
3.3 E-R Diagrams :
The ER or (Entity Relational Model) is a high-level conceptual data model
diagram. Entity-Relation model is based on the notion of real-world entities and
the relationship between them.
ER modeling helps you to analyze data requirements systematically to produce a
well-designed database. So, it is considered a best practice to complete ER
modeling before implementing your database.
Churn
Protection Churn Reasons
Customers
Conscious Unconscio
Churn us
Loss % by
Churn
Average
Churn Rate
Market
Profit % by
Churn Average
Protection Retention
Rate
Buy
Level of Use Id
of Service
Customers
Name
Satisfaction 8
Level
Flag for Status
Offers
3.4 Data flow diagrams:
The DFD (also known as a bubble chart) is a hierarchical graphical model of a
system that shows the different processing activities or functions that the system
performs and the data interchange among these functions. Each function is
considered as a processing station (or process) that consumes some input data and
produces some output data. The system is represented in terms of the input data to
the system, various processing carried out on these data, and the output data
generated by the system.
A DFD model uses a very limited number of primitive symbols, to represent the
functions performed by a system and the data flow among these functions.
Data flow diagram symbol:
9
Data Flow Diagram
Email &
User/Admin
Password
1.0
Customer
Analysis
2.0.0
Churn
Analysis
2.0.1
Level of Level of
Satisfaction Use
2.0.0.1 2.0.0.2
Average Average
Retention Churn
3.0 3.1
3
10
4. SCHEDULING AND ESTIMATES :
4.1. Scheduling – There are many different types of estimation techniques used in
project management with various streams as Engineering, IT etc. A Project often
contains 6 major constraints - Scope, Time, Cost, Quality, Resources and Risk in
order to accurately estimate the project.
• How much work is to be Estimated (scope)?
The work will be divided equally among all the team members.
• How to estimate the project (techniques)?
We will use following R libraries and models for our project:
library(plyr)
library(corrplot)
library(ggplot2)
library(gridExtra)
library(ggthemes)
library(caret)
library(MASS)
library(randomForest)
library(party)
• How much time it will require to complete the project (Schedule)?
We will complete the entire first phase of the project by end of our
Summer Vacations and then we will work on the extra added second
phase of the project.
• Who will be doing the project (resources)?
The project will be done as a team by
1)Danish Farooq
2)Mustahseen Shafi
3) Gurpreet Singh
4) Aadil Ahmad Malla
5) Sanghati Ghosh
11
• What is the budget required to deliver the project (cost)?
We don’t have to pay any cost for the project.
• Any intermediary dependencies that may delay or impact the
project (Risks)?
The main problem could be that if somebody gets his/her placement than
the members in the team would decrease and it could lead to latency in
the completion of the project.
4.2 Estimates :
There are 3 major parts to project estimation mainly:-
• Effort estimation
The efficiency and the work of the team will be divided equally among
all the individuals.
• Cost estimation
There will not be involvement of any cash related transactions.
• Resource estimate
The packages and the libraries involved are already mentioned above.
12