0% found this document useful (0 votes)
51 views34 pages

Sample Report

Uploaded by

Sandeep Chitte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views34 pages

Sample Report

Uploaded by

Sandeep Chitte
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

A dynamic Churn prediction Model using soft

computing and Random Forest based Supervised


Learning

Seminar II report submitted in partial fulfillment of the


Requirements for the Degree of

Bachelor of Technology
In
Computer Engineering
Submitted by
Gajanan Patil

DEPARTMENT OF COMPUTER ENGINEERING


S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE
2023-2024

I
A dynamic Churn prediction Model using soft
computing and Random Forest based Supervised
Learning

Seminar II report submitted in partial fulfillment of the


Requirements for the Degree of

Bachelor of Technology
In
Computer Engineering
Submitted by
Gajanan Patil
Guided by
Prof. R. V. Patil

DEPARTMENT OF COMPUTER ENGINEERING


S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE
2023-2024

II
S.S.V.P.S.’s B.S. DEORE COLLEGE OF ENGINEERING, DHULE

DEPARTMENT OF COMPUTER ENGINEERING

CERTIFICATE

This is to certify that the Seminar II entitled “A Dynamic Churn


prediction using soft computing and Random Forest based Supervised
Learning” has been carried out by
Gajanan Patil

under my guidance in partial fulfillment of the degree of Bachelor of


Technology in Compute Engineering of Dr.Babasaheb Ambedkar
Technological University, Lonere (M.S ) during the academic year 2023-
24. To the best of my knowledge and belief this work has not been
submitted elsewhere for the award of any other degree.

Date: Guide
Place: Dhule Prof. R. V. Patil

Head Principal
Prof. Dr. B. R. Mandre Prof.Dr. Hitendra D. Patil

III
ACKNOWLEDGEMENT
This Seminar II report has taken its current shape after a lot of hard work and
perseverance-not only just by me. I would like to express our sincere gratitude for
the assistance and support of a number of people who are helping to make this
success.
Immeasurable appreciation and deepest gratitude are extended for the help and
support to Prof. R. V. Patil, my guide for his guidance and enlightening comments
throughout the seminar work. It has been an altogether different experience to work
with him and I would like to thank for his helpful suggestion and numerous
discussions. I gladly take this opportunity to thank Prof. B. R. Mandre (Head Of
Department, Computer Engineering) and Dr. Hitendra D. Patil (Principal, SSVPS,
BSD, College of Engineering, Dhule) for providing facilities during progress of the
thesis.
I wish to express my sincere thanks to Prof. R. V. Paitl for his expert, sincere
and valuable guidance and encouragement extended to me.
I am also thankful to all those who helped us directly or indirectly to develop
this thesis and complete it successfully. Then I would like to thank all the Staff for
their encouragement. They had always been very prompt at extending in their
helping hand and sharing valuable technical knows. Special thanks to my Family
and Friends.

Gajanan Patil

IV
ABBREVIATIONS

Abbreviations Details

CRM Customer Relationship Management


ML Machine Learning
DL Deep Learning
NN Neural Network
ANN Artificial Neural Network
NLP Natural Language Processing
RNN Recurrent Neural Network
DFD Data Flow Diagram

V
Table of Contents
Page
Sr.No No
List of Tables X
List of Figures IX
1 INTRODUCTION………………………………………………………..... 1
1.1 Introduction………………………………………………………...... 1

1.2 Background………………………………………………………...... 2
1.3 Churn in Telecom Industry………………………………………...... 3
1.4 Machine Learning Approach………………………………………… 5
9
10
VI
2 LITERATUR REVIEW……………………………………………………. 11
2.1 Literature Survey…………………………………………………...... 11
3 SYSTEM ARCHITECTURE ……………………………………………… 24
4 ADVANTAGES…………………………………………………………… 28
5 DISADVANTAGES …………………………………………………….29
6 APPLICATION……………………………………………………………. 30
7 CONCLUSION……………………………………………………………. 31
BIBILOGRAPHY ……………………………………………………………….. 32

VII
FIGURE INDEX
Figure 1.1 : Machine learning approach……………………………………………..6
Figure 1.2: Supervised Learning approach………………………………………… 7
Figure 1.3 : Semi-Supervised Learning approach…………………………………. 7
Figure 1.4: Un-Supervised Learning approach……………………………………. 8
Figure 1.5 : Reinforcement Learning approach……………………………………. 9
Figure 3.1: Block diagram of Architecture…………………………………………. 25

VIII
TABLE INDEX
Table 1: Summary of Literature survey…………………………………… 15
Table 2: Testing Parameter for Algorithm…………………………………42
Table 3: Confusion Matrix Analysis……………………………………… 43
Table 4: Comparative analysis of various classification algorithms……… 44

IX
ABSTRACT
Customer churn is a major problem and one of the most important concerns for large
companies. Due to the direct effect on the revenues of the companies, especially in the
telecom field, companies are seeking to develop means to predict potential customer to
churn. Therefore, finding factors that increase customer churn is important to take
necessary actions to reduce this churn. The main contribution of our work is to
develop a churn prediction model which assists telecom operators to predict
customers who are most likely subject to churn. The model developed in this work uses
machine learning techniques on big data platform and builds a new way of features’
engineering and selection. In order to measure the performance of the model, this
work also identified churn factors that are essential in determining the root causes of
churn. By knowing the significant churn factors from customers' data, CRM can
improve productivity, recommend relevant promotions to the group of likely churn
customers based on similar behavior patterns, and excessively improve marketing
campaigns of the company. The proposed churn prediction model is evaluated using
metrics, such as accuracy, precision, recall, f-measure, and receiving operating
characteristics (ROC) area. Furthermore, it also provides factors behind the churning
of churn customers through the rules generated by using the attribute-selected
classifier algorithm.
Key terms: Receiving Operating Characteristics, Deep learning, Convolution Neural
Network, churn prediction, Feature selection.

X
Chapter - 1
INTRODUCTION
This chapter introduces churn prediction model and different approaches for churn prediction
model. It explains churn in telecom industry and their causes and also introduces different
machine learning techniques like supervised learning, unsupervised learning, Reinforcement
learning and also analyze which approaches is best for churn prediction model.

1.1 Introduction
Consumers today go through a complex decision making process before subscribing to any
one of the numerous Telecom service options. The services provided by the Telecom
vendors are not highly differentiated and number portability is commonplace. The mobile
telephone industry churn is the similar problem [2] [9] [12]. Customer loyalty becomes an
issue. Hence, it is becoming increasingly important for telecommunications companies to
proactively identify factors that have a tendency to unsubscribe and take preventive
measures to retain customers. To calculate your probable monthly churn, start with the
number of users who churn that month. Then divide by the total number of user days that
month to get the number of churns per user day. Then multiply by the number of days in the
month to get your resulting monthly churn rate. It is found that data mining techniques are
more effective in predicting consumer churn from the research conducted over the past few
years [17]. Creating an efficient churn prediction model is an essential activity requiring a lot
of work right from determining appropriate predictor variables (features) from the large
volume of available customer data to choosing an effective predictive data mining technique
suitable for the feature set.
The A multi-layer perceptron approach for customer churn prediction has used in
[14] for customer-related data such as customer profiling, calling pattern, and democratic
data in addition to the network data they generate. Based on the customer‘s history of calling
behavior and behavior, there is a possibility to classify their attitude of either going away or
not. Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade. The predictive modeling techniques in churn prediction
are also considered to be more accurate. Churn prediction systems and sentiment analysis
using classification as well as clustering techniques to classify churn customers and the
reasons behind the churning of telecom customers [18].
1
In telecom industry should we generate large amount of data on daily basis, it is very
tedious task to mine such a kind of last data using specific data mining techniques, while
hard to interpret the prediction on classical techniques. Sometime such telecommunication
data may be containing some churn and, it is much necessary to identify search problems.
Big companies implement churn prediction models to be able to detect possible churners
before they effectively leave the company [16].

1.2 Background
It is found that data mining techniques are more effective in predicting consumer churn from
the research conducted over the past few years [3]. Creating an efficient churn prediction
model is an essential activity requiring a lot of work right from determining appropriate
predictor variables (features) from the large volume of available customer data to choosing
an effective predictive data mining technique suitable for the feature set. Telecom Industries
collect a large amount of customer-related data such as customer profiling, calling pattern,
and democratic data in addition to the network data they generate [4]. Based on the
customer‘s history of calling behavior, there is a possibility to classify their attitude of either
going away or not.

Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade. The predictive modeling techniques in churn prediction
are also considered to be more accurate. Churn prediction systems and sentiment analysis
using classification as well as clustering techniques to classify churn customers and the
reasons behind the churning of telecom customers. In telecom industry [7] should we
generate large amount of data on daily basis, it is very tedious task to mine such a kind of
last data using specific data mining techniques, while hard to interpret the prediction on
classical techniques. Various researchers already described search a work to eliminate churn
from large data sets fusion static as well as dynamic approaches, but still such systems are
facing many problems actual identification of churn. Sometime such telecommunication data
may be containing some churn and, it is much necessary to identify search problems. To
successful identification of churn from large data is providing effectiveness to customer
relationship management (CRM) using various soft computing techniques in e.g. genetic
algorithms, adaboosting etc. [9].

In this research we proposed churn identification as well as prediction from large


scale telecommunication data set using Natural Language Processing (NLP) and machine
2
learning techniques to dealing with big data [10]. First system deals with strategic NLP
process which contains data pre-processing, data normalization, feature extraction and
feature selection respectively. Feature extraction techniques have been proposed like TF-
IDF, Stanford NLP and occurrence correlation techniques. Where machine learning
classification algorithms are has used to train and test the entire module. Neuro fuzzy
algorithms are used to divide the subscribers into discrete classes based on their input
attributes [11].

Using these classes, the Adaptive Neuro Fuzzy Inference System (ANFIS) is used to
develop a sensitive prediction model for churn management [1].

1.3 Churn in Telecom Industry


Telecommunications companies [7] are usually not the most popular companies with
consumers and Customer loyalty is the key to profitability in the telecom industry. People
often express frustration with the performance of service providers; whether it's complicated
billing, spam marketing emails, difficult customer service, internet speed, connectivity, or
high plans. As a result, it is not surprising to learn that telecommunications companies have a
high customer churn rate. Because telecom providers manage large fixed infrastructures that
must be offset by revenue, customer churn (attrition) is particularly problematic in this
industry [12].
Companies usually have a greater focus on customer acquisition and keep retention
as a secondary priority. However, it can cost five times more to attract a new customer than
it does to retain an existing one. Increasing customer retention rates by 5% can increase
profits by 25% to 95%, according to research done by Bain & Company [2].
Churn is a metric that shows customers who stop doing business with a company or a
particular service, also known as customer attrition. By following this metric, what most
businesses could do was try to understand the reason behind churn numbers and tackle those
factors, with reactive action plans [10].
The reasons that lead customers to the cancellation decision can be numerous, coming
from poor service quality, delay on customer support, prices, new competitors entering the
market, and so on. Usually, there is no single reason, but a combination of events that
somehow culminated in customer displeasure. If your company was not capable to identify
these signals and take actions prior to the cancel button click, there is no turning back, your
customer is already gone. But you still have something valuable: the data. Your customer left
very good clues about where you left to be desired. It can be a valuable source for
3
meaningful insights and to train customer churn models. Learn from the past, and have
strategic information at hand to improve future experiences, it‘s all about machine learning.
When it comes to the telecommunications segment, there is great room for opportunities.
The wealth and the amount of customer data that carriers collect can contribute a lot to shift
from a reactive to a proactive position. The emergence of sophisticated artificial intelligence
and data analytics techniques further help leverage this rich data to address churn in a much
more effective manner.

What is Churn Rate? What is the cause for customer churn in the Telecom
Industry?

The churn rate, also known as the rate of attrition or customer churn, is the rate at which
customers stop doing business with an entity [4]. It is most commonly expressed as the
percentage of service subscribers who discontinue their subscriptions within a given time
period. The churn rate in developing markets ranges from 20% to 70%. In some of these
markets, more than 90% of all mobile subscribers are on prepaid service. Some operators in
developing markets lose in aggregate their entire subscriber base to churn in a year [5].

How Business Intelligence can reduce churn in the telecom industry?

In the present world, Business Intelligence is helping businesses and organizations ask and
answer questions about their data. It is helping companies make better decisions by showing
present and historical data within their business context. With the availability of BI self-
service tools capable of helping companies understand performance from various angles so
they can then take action to drive better business outcomes on big data [10]. These tools can
mine massive datasets for performance insights relevant to customer churn, and then push
them to the attention of marketers, customer service managers, and executives so they can
factor these findings into subsequent decisions.

4
Rich customer data availability in Telecom Sector

Telecom providers both Communication Service Providers (CSPs) and content providers
have a unique opportunity to access rich customer data that isn‘t available to many other
industries. This is due to the nature of their products/services and the visibility they have to
the end-to-end supply chain of communication services. They can see content and service
usage through web services and centralized systems [9]. By accessing data from cell towers
and deployed infrastructure, companies can add a location dimension to the data. Reaching
into individual consumer devices, these companies gain visibility to the last mile of the
supply chain and can access data about the types of users/viewers of their services and
telemetry on end-user service performance [12].

Usage of Predictive Analytics to derive actionable insights


The real-time data that is being gathered from multiple sources like the call logs, call
records, network performance, live network data can be collected. We can correlate this data
with the customer preferences, usage history, complaints or calls received to the call center,
and customer segments from the Back end systems. Using Predictive Analytics, we can
predict the possible impact of the network events in these events.
This actionable insight will help the Telco’s to avoid the potential risk associated with the
customer experience. Application of predictive statistical models such as Logistic
Regression, Market Basket Analysis, and Exploratory Data Analysis can help to identify the
best possible option like [10].
Analyzing the data from the customer can provide a 360-degree comprehensive view
of the available pieces of information. This can help in personalized service offering to the
customer to retain them for a longer time. The data about a single customer is interesting but
may not be very actionable [4]. By analyzing the data of all customers’ telecom companies
can identify trends, patterns, and conduct correlation analysis to understand what factors
drive service usage behavior and influence customer satisfaction.

1.4 Machine Learning Approach


The more general framework of text classification or prediction assigns some actual output
to a given input text for categorization and aggregation. Sequence marking that assigns the
category to of component of a sequence of numbers (PoS tagging, which attributes a form of

5
a word from each term in an input text); parsing, that assigns a distance matrix to a feature
vector, defining the linguistic meaning of the sentence; etc. [17]. Other examples include
estimation, which assigns a productive capacity to each input; stochastic classification is a
general subset of classification. To find the best classification for a given case, algorithms of
this type use statistical analysis. Probabilistic formulas output a likelihood of the example is
a participant of one of the training images, unlike other equations, which output a 'best' class.
Usually, the best class would then be chosen as the one with the highest likelihood. Such an
implementation, however, has substantial advantages over – anti classification models.

Figure 1.1: Machine learning approach

As listed below, machine learning persists in the following 4 models:

Supervised learning
Supervised learning is the methodology of artificial intelligence that operates on labeled data
and maps team obtained with train and test instances. As trained knowledge is labeled and
properly categorized here, it is, therefore, a regulated process and conducted under
observation. The supervised technique (also known as the probabilistic activation method)
uses co-occurrence association rule mining to find categories, similar to the first method
[17].

6
Figure 1.2: Supervised Learning approach

Semi-Supervised Learning

Semi-supervised has become a machine-learning activity that small quantities of labeled data
can be used, including some unlabeled data. The mixture of different classifiers is also a
variation. Semi-supervised learning objective to train unlabeled data using a labeled data set.
[17]

Figure 1.3: Supervised Learning approach

7
Unsupervised Learning

The most frequently used computer vision strategy, where correlations are discovered, and
grouping techniques ha used for unsupervised classification. It operates on unlabeled
information specifically concerned about giving to the machine with no independent
variables the input vector or cluster. The suggested unsupervised technique (dubbed the
spreading activation method) learns relevant rules between notional words (defined as the
words in the sentence after deleting stop words and low frequency words) and the considered
categories using co-occurrence association rule mining in a similar fashion [17].
In unsupervised classification data, the system explicitly operates on the given data or
repository with some way to succeed, neither marked nor labeled. It is not controlled. Since
the output variable is uncertain, uncontrolled learning can manage more complex tasks than
reinforcement methods.

Figure 1.4: Un-Supervised Learning approach

Reinforcement Learning

Reinforcement learning operates based on steps of reward and penalties. This can be seen as
how we can benefit from their actions. Either qualifying action may give us the incentive for
desired performance in a given context, or it may merit a violation based on performed
errors.
The agent learns how to focused intervention on his behavior in a given context. In the given
case, the agent must properly analyses the things and get away from the penalties by doing
the right things.
8
Figure 1.5: Reinforcement learning approach

The well-being diagnostic system predicts the disease using the neural classification
approach based on the suggested fuzzy theory. This section has a sub-component called the
severity section, which is responsible for breaking the degree of severity [8] [11] [18]. The
user information is eventually categorized as ordinary and affected by the infection. Smart
fuzzy criteria are used in the expert system to decide on choices about rehabilitative
documents. The exploratory findings indicate that the application of the work carried out
overcomes the existing traditional classification mechanisms.

9
Chapter - 2
LITERATURE SURVEY
This chapter gives the details of various abstractive summarization techniques. It also gives
the literature survey for the abstractive summarization. Literature review helps to summarize
and synthesize the arguments and ideas of existing knowledge in a particular field without
adding any new contributions.

2.1 Literature Review


Many methods such as machine learning and data mining are used for churn prediction. The
decision-tree algorithm is a reliable method for churn prediction [6]. In addition, a neural
network method [7], data certainty [8], and particle swarm optimization [9] are used for
churn prediction.
According to system [2] a current collection of software to increase the standard of
detecting possible churners. The roles are extracted from request information and client
accounts and are classified as deal, request pattern and call pattern adjustments overview
functionality. The characteristics are evaluated using two probabilistic data mining
algorithms from Naïve Bayes and Bayesian Network, and their findings compared to those
obtained by the use of C4.5 decision tree, an algorithm widely used in many classification
and prediction tasks.
According to [3] formalization of time-window of the collection process, coupled
with literature review. Second, by expanding the duration of consumer events from one to
seventeen years using logistic regression, classification trees and bagging together with
classification trees, this analysis analyzes the rise in churn model accuracy. The practical
result is that researchers may substantially reduce the data-related pressures, such as data
collection, preparation, and analysis.
According to [4] the most efficient consumer engagement strategies can be used to
high the client satisfaction level efficiently. The study indicates a Multilayer Perceptron
(MLP) neural network method to estimate client turnover in one of Malaysia's leading
telecommunications firms. The results were contrasted with the most traditional churn
prediction strategies such as Multiple Regression Analysis and Analyzing Logistic
Regression.

10
The maximal neural network architecture includes 14 input nodes, 1 concealed node
and 1 output node with the learning algorithm Levenberg Marquardt (LM). Multilayer
Perceptron (MLP) neural network approach to predict client churn in one of the leading
telecommunications companies in Malaysia compared to the most common churn prediction
techniques, such as Multiple Regression Analysis and Logistic Regression Analysis.
In system [5] on creating an efficient and descriptive statistical churn model utilizing a
Partial Least Square (PLS) approach focused on strongly associated intervals in data sets. A
preliminary analysis reveals that the proposed model provides more reliable results than
conventional forecast models and recognizes core variables in order to better explain churning
behaviors. Additionally, network administration, overage administration and issue handling
approaches are introduced in certain simple marketing campaigns and discussed.
Burez and Van den Poel [6] Unbalance data sets studies in churn prediction models, and
contrasts random sampling performance, Advanced Under-Sampling, Gradient Boosting
Method, and Weighted Random Forest. The concept was evaluated using Metrics (AUC,
Lift). The study shows that the methodology under sampling is preferable to the other
techniques evaluated.
Brandusoiu [7] describes an innovative data mining method to explain the broad dataset type
of consumer churn detection. About 3500 consumer details is analyzed based on incoming
number as well as outgoing input call and texts. Specific machine learning algorithms were
used for training classification and research, respectively. The system's estimated average
accuracy is about 90 percent for the entire dataset.
He et al. [8] with approximately 5.23 million subscribers, a major Chinese
telecommunications corporation developed a predictive model focused on the Neural Network
method to address the issue of consumer churn. The average degree of precision was the
extent of predictability of 91.1%.
Idris [9] suggested a genetic engineering solution to modeling AdaBoost-churning
telecommunications problems. Two Standard Data Sets verified the series. With a precision of
89%, one from Orange Telecom and the other from cell2cell and 63% for the other one.
Huang et al. [10] the customer churn studied on the big data platform. The researchers
' aim was to show that big data significantly improves the cycle of churn prediction, based on
the quantity, variety and pace of the data. A broad data repository for fracture engineering was
expected to accommodate data from the Project Support and Business Support Department at

11
China's biggest telecommunications firm. AUC used the forest algorithm at random and
assessed.
According to [11] with k-means and fuzzy c-means clustering algorithms are clustered
input features to place subscribers in separate discrete groups. The Adaptive Neuro Fuzzy
Inference System (ANFIS) is introduced using these classes to construct a predictive model
for active churn management. The first prediction step begins with Neuro fuzzy parallel
classification. FIS then takes Neuro fuzzy classifiers outputs as input to decide on churners
activities. Measurements of success can be used to recognize inefficiency problems. Churn
management metrics are associated with customer service network services, operations, and
efficiency. GSM number versatility is a vital criterion for churner‘s determination.
In System [12] a New set of apps to improve the identification level of potential
churners. The features are derived from call details and customer profiles and are categorized
as description features related to contract, call pattern, and call pattern changes. The features
are tested using two Naïve Bayes and Bayesian Network probabilistic data mining algorithms
and their results compared to those obtained from the use of C4.5 decision tree, an algorithm
commonly used in many classification and prediction tasks. These have contributed, among
other factors, to the risk that customers can easily switch to competitors. One of the
techniques that can be used to do this is to improve churn prediction from large amount of
data with extraction in the near future.
According to [13] Formalization of the selection method in time window, along with
analysis of literature. Second, this study analyzes the increase in churn model consistency by
extending the history of customer events from one to seventeen years using logistic
regression, classification trees and bagging along with classification trees. The functional
consequence is that researchers, such as data storage, planning and research, can significantly
reduce data-related burdens. The amount that consumers have to pay depends on the
subscription's duration and pro-motional sense. A letter is sent by the newspaper company to
remind them that the subscription is coming to an end. Then ask them if they want to renew
their subscription, along with guidance on how to do that. Customers are unable to cancel
their subscription and have a grace period of four weeks once they have subscribed lapsed.
According to [14] the most effective customer retention techniques should be used to
effectively reduce customer turnover rates. The research suggests a neural network approach
for Multilayer Perceptron (MLP) to predict customer churn in one of Malaysia's leading
telecommunications firms. The findings were compared with the most common techniques of
churn prediction such as Multiple Regression Analysis and Logistic Regression Analysis. The
12
optimal configuration of the neural network contains 14 input nodes, 1 hidden node and 1
output node with Levenberg Marquardt (LM) learning algorithm. Multilayer Perceptron
(MLP) neural network approach to predict client churn in one of the leading
telecommunications companies in Malaysia most common Analysis and Logistic Regression
Analysis.

13
Chapter - 3
SYSTEM ARCHITECTURE

This chapter gives the description of the system in detail. It gives details information about
proposed system with the benefits and architecture of the model. It also explains the existing
system of churn prediction model and limitations and the problems related to existing
system.

3.1 System Analysis

It is found that data mining techniques [8] [12] are more effective in predicting consumer
churn from the research conducted over the past few years. Creating an efficient churn
prediction model is an essential activity requiring a lot of work right from determining
appropriate predictor variables (features) from the large volume of available customer data to
choosing an effective predictive data mining technique suitable for the feature set. Telecom
Industries collect a large amount of customer-related data such as customer profiling, calling
pattern, and democratic data in addition to the network data they generate. Based on the
customer‘s history of calling behavior and behavior, there is a possibility to classify their
attitude of either going away or not.
Data mining techniques are found to be more effective in predicting churn from the
research done over the past decade [15]. The predictive modeling techniques in churn
prediction are also considered to be more accurate. Churn prediction systems and sentiment
analysis using classification as well as clustering techniques to classify churn customers and
the reasons behind the churning of telecom customers [11]. In telecom industry should we
generate large amount of data on daily basis, it is very tedious task to mine such a kind of
last data using specific data mining techniques, while hard to interpret the prediction on
classical techniques. Various researchers already described search a work to eliminate churn
from large data sets fusion static as well as dynamic approaches, but still such systems are
facing many problems actual identification of churn. Sometime such telecommunication data
may be containing some churn and, it is much necessary to identify search problems. To

14
successful identification of churn from large data is providing effectiveness to customer
relationship management (CRM) [10].
In today‘s computer environment writing comments to churn more frequently while
voice mail plan customers can disposed to churn less frequently. Customers with four or
more customer service calls churn as often as other customers churn more than four times.
We calculate the average churn rate during model training using different machine learning
approaches and evaluate the for testing [5].
To maximize the organization's sales, as we suggested in our study, predicting
accuracy churn is very critical. The cost of making an excessive retention effort (false
positives) and the cost of losing a customer because the model does not accurately anticipate
churn can be reduced by combining the customer lifetime value with the churn prediction
(false negatives) [19].
3.1.1 Existing Algorithms

According to [1] Clustering algorithms are clustered input functions with k-means and fuzzy
c-means to position subscribers in independent, distinct classes. Using these groups the
Adaptive Neuro Fuzzy Inference Framework (ANFIS) is implemented to construct a
predictive model for successful churn management. The first step towards prediction starts
with the parallel classification of Neuro soft [18]. FIS then uses the outputs of Neuro fuzzy
classifiers as feedback to settle on the behaviors of the churners. Progress metrics can be
used to identify issues of inefficiency. Churn reduction indicators are concerned with the
facilities, processes and performance of customer support network. Versatility of GSM
numbers is a critical criterion for churner’s determination
3.1.2 Limitations of previous algorithm

 System reflects good accuracy on structured dataset only.


 One disadvantage of the some soft computing methods are that the complexity of the
algorithm is high when there are more than a number of inputs fed into the system..

3.1.3 Analysis of the problem

The algorithm's main goal is to create a system that produces highly fixable results with
exceptional precision. The machine learning algorithm in use seeks to accomplish the same
thing. The input of the system can be of size or resolution. It does not depend on the
operating system. The dataset here are trained and tested. In the proposed research work to

15
design and develop an approach for churn prediction using NLP and machine learning
approaches to enhance the system accuracy [8] [17]. It is very important for making the data
useful because noisy data can lead to poor results. In telecom dataset, there are a lot of
missing values, incorrect values like ``Null'' and imbalance attributes in the dataset. In our
dataset, the number of features is 29. We analyzed the dataset for filtering and reduced the
number of features so that it contains only useful features.

3.3. Applications

 BPO centers churn prediction systems.


 Service application churns prediction systems.
 Customer behaviors mapping system using churn prediction.

3.5 System Architecture

In the proposed research work to design and develop an approach for churn prediction using
NLP and machine learning approaches to enhance the system accuracy. Then we identify the
customer changing behavior pattern during prediction [4]. We also evaluate the factor which
mostly influences to reduce accuracy of churn prediction and finally evaluate and calculate
churn rate for month wise as well as day wise, which useful for enhance the service quality
of system. In this research we proposed churn prediction from large scale data, system
initially deals with telecommunication synthetic data set which contains some imbalance
Meta data. To apply data preprocessing, data normalization, feature extraction as well as
feature selection respectively [17]. During this execution some Optimization strategies have
been used to eliminate redundant features which sometimes generate high error rate during
the execution. The proposed system execution for training and testing. After completion both
phases system describe classification accuracy for entire data set

3.5.1 Proposed architecture

16
Figure 3.1 Block diagram of Architecture

System overview

The aim of this kind of research in the telecommunications industry is to help businesses
make more profit. Telecom companies have become known to forecast turnover as one of the
most important sources of income. Therefore, this research was aimed at building a system
in the Telecom Company that predicts customer churn. Such prediction models will achieve
high AUC values. The sample data was divided into 70% for training and 30% for testing to
evaluate and develop the model [9]. We chose 10-fold cross-validation for evaluating and
optimizing hyper parameters. We used engineering tools, effective function transformation
and selection approach. Making the interface fit for machine learning algorithms. Another
concern was also found: the data was not balanced. Only about 5% of the entries are
customers ' churn. A problem has been solved by under-sampling or using trees algorithms

17
that are not affected by this issue. In detecting the churn in large data and providing accurate
prediction, our different classifiers can be more accurate.
This work contributes to suggesting a supervised approach to the extraction of dimensional
categories, selecting suitable characteristics and avoiding duplication by measuring
correlation between characteristics. The results obtained show that there is a comparatively
higher f-score in the weighted frequency of the term with the correlation process. In this
regard, selecting features using weighted word frequency is more important [16]. The
overlap between features in a category of aspect is avoided by measuring the association
3.5.2 Modules

 Data Acquisition: First of all the information for different Telecom Sector Customer
based on certain parameters is extracted data.
 Preprocessing: Then we will apply various preprocessing steps such as lexical analysis,
stop word removal, stemming (Porters algorithm), index term selection and data cleaning
in order to make our dataset proper.
 Lexical analysis: Lexical analysis separates the input alphabet into,
1. Word characters (e.g. the letters a-z) and 2)
2. Word separators (e.g. space, newline, and tab).
 Stop word removal: Stop word removal refers to the removal of words that occur most
frequently in documents.
 Stemming: Stemming replaces all the variants of a word with a single stem word.
Variants include plurals, gerund forms , third person suffixes, past tense suffixes, etc.).
 Data Training: We compile artificial as well as real time using online news data and
provide training with any machine learning classifier.
 Testing with machine learning: We predict online news using any machine learning
classifier, weight calculator for real time or synthetic input data accordingly.
 Analysis: We demonstrate the accuracy of proposed system and evaluate with other
existing systems

18
Chapter - 4
ADVANTAGES

Accuracy: Proposed system gives highest accuracy based on real time data with multiple
classification algorithms.

Identify at-risk customers: For any business that wants to enjoy the benefits of
customer churn prediction, machine learning opens dozens of opportunities. Machine
learning is able to analyze client behavior and measure their probability of churning. In
particular, to precisely identify churn rate, machine learning algorithms can be trained to
learn the behavior patterns of clients/partners who have already canceled their contracts or
any other relationships with a particular company and compare them with the existing ones.
Then correlations between the actions of active and inactive clients are done. As a result, the
algorithm recognizes the customers that are more likely to leave.

Identify pain points: Different companies lose their clients for different reasons. In most
cases, there are numerous "pain points," which remain unknown for product owners. From
the bad quality and absent features to unpleasant design and poor customer service — there
are a lot of details which you do not take into account that your clients do. Even if your
product is almost perfect, you can still reward your new customers with some attractive
discounts and offers and ignore your loyal ones. When a business applies churn prediction,
machine learning can do analysis and forecasts based not only on customer behavior but also
on the brands.

Identify methods to implement: After the root cause of client churn has been
identified, companies can reconsider and rebuild their products and change their business
strategy accordingly. Transformed data and automated flow can be used in CRM and
marketing automation systems. However, this doesn't mean that using machine learning for
churn prediction is about building a certain model for a certain task. It is more about domain
knowledge and an ability to deliver the best possible solution based on learning data,
processes, and behavior.

19
Chapter - 5
DISADVANTAGES

20
Chapter - 6
APPLICATION

21
Chapter - 7
CONCLUSION

This work mainly focuses on identifying and detecting churn consumers from massive data
set of telecommunications and discusses churn prediction systems produced by different
algorithms. Some systems still face problems of conversion of linguistic data, which can
occur at high error rate during execution. Many researchers have been putting forward
Natural Language Processing (NLP) techniques as well as various machine learning
algorithms such a combination is likely to generate good performance when structuring data.
Customer churn is a major problem and one of the most important concerns for large
companies. Due to the direct effect on the revenues of the companies, especially in the
telecom field, companies are seeking to develop machine learning algorithms to predict
potential customer churn. In this work Random Forest, Decision tree, Bagging Classifier and
K-nearest neighbor classifiers are employed to find out churn prediction rate. Among all
these algorithm random forest classifier gives highest accuracy of 95% as compared to
decision tree, Bagging classifier and K-nearest neighbor classifier whereas KNN classifier
has lowest accuracy of 81%.

22
BIBLIOGRAPHY
[1] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.
[2] Kirui, Clement, et al. "Predicting customer churn in mobile telephony industry using
probabilistic classifiers in data mining." International Journal of Computer Science
Issues (IJCSI) 10.2 Part 1 (2013): 165.
[3] Ballings, Michel, and Dirk Van den Poel. "Customer event history for churn
prediction: How long is long enough?" Expert Systems with Applications 39.18
(2012): 13517-13522.
[4] Ismail, Mohammad Ridwan, et al. "A multi-layer perceptron approach for customer
churn prediction." International Journal of Multimedia and Ubiquitous Engineering
10.7 (2015): 213-222.
[5] Lee, Hyeseon, et al. "Mining churning behaviors and developing retention strategies
based on a partial least squares (PLS) model." Decision Support Systems 52.1
(2011): 207-216.
[6] Burez D, den Poel V. Handling class imbalance in customer churn prediction.
Expert Syst Appl. 2009; 36(3):4626–36.
[7] Brandusoiu I, Toderean Gavril, Ha B. Methods for churn prediction in the prepaid
mobile telecommunications industry. In: International conference on
communications. 2016. p. 97–100.
[8] He Y, He Z, Zhang D. A study on prediction of customer churns in fixed
communication network based on data mining. In: Sixth international conference on
fuzzy systems and knowledge discovery, vol. 1. 2009. p. 92–4.
[9] Idris A, Khan A, Lee YS. Genetic programming and adaboosting based churn prediction
for telecom. In: IEEE international conference on systems, man, and cybernetics (SMC).
2012. p. 1328–32.
[10] Huang F, Zhu M, Yuan K, Deng EO. Telco churn prediction with big data. In: ACM
SIGMOD international conference on management of data. 2015. p .607–18
[11] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.
[12] Kirui, Clement, et al. "Predicting customer churn in mobile telephony industry using
probabilistic classifiers in data mining." International Journal of Computer Science
Issues (IJCSI) 10.2 Part 1 (2013): 165.
[13] Ballings, Michel, and Dirk Van den Poel. "Customer event history for churn
prediction: How long is long enough?" Expert Systems with Applications 39.18
(2012): 13517-13522.
[14] Ismail, Mohammad Ridwan, et al. "A multi-layer perceptron approach for customer
prediction." International Journal of Multimedia and Ubiquitous Engineering 10.7
(2015): 213-222.
[15] Lee, Hyeseon, et al. "Mining churning behaviors and developing retention strategies
based on a partial least squares (PLS) model." Decision Support Systems 52.1
(2011): 207-216.
[16] Burez, Jonathan, and Dirk Van den Poel. "Handling class imbalance in customer
churn prediction." Expert Systems with Applications 36.3 (2009): 4626-4636.
[17] Schouten, Kim, et al. "Supervised and unsupervised aspect category detection for
sentiment analysis with co-occurrence data." IEEE transactions on cybernetics 48.4
(2017): 1263-1275.
[18] Karahoca, Adem, and Dilek Karahoca. "GSM churn management by using fuzzy c-
means clustering and adaptive neuro fuzzy inference system." Expert Systems with
Applications 38.3 (2011): 1814-1822.
[19] Kamalraj, N., and A. Malathi. "A survey on churn prediction techniques in
communication sector." International Journal of Computer Applications 64.5
(2013): 39-42.

You might also like