0% found this document useful (0 votes)
20 views

2 Customer churning analysis using machine learning algorithms

This study analyzes various machine learning algorithms for predicting customer churn, focusing on methods such as stochastic gradient booster, random forest, logistics regression, and k-nearest neighbors. The research aims to identify the most effective algorithm for early churn detection, with accuracy rates reported as 83.9% for stochastic gradient booster, 82.6% for random forest, 82.9% for logistics regression, and 78.1% for k-nearest neighbors. The findings suggest that integrating machine learning techniques can significantly enhance customer retention strategies in competitive markets.

Uploaded by

Adamu Denekew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

2 Customer churning analysis using machine learning algorithms

This study analyzes various machine learning algorithms for predicting customer churn, focusing on methods such as stochastic gradient booster, random forest, logistics regression, and k-nearest neighbors. The research aims to identify the most effective algorithm for early churn detection, with accuracy rates reported as 83.9% for stochastic gradient booster, 82.6% for random forest, 82.9% for logistics regression, and 78.1% for k-nearest neighbors. The findings suggest that integrating machine learning techniques can significantly enhance customer retention strategies in competitive markets.

Uploaded by

Adamu Denekew
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Intelligent Networks 4 (2023) 145–154

Contents lists available at ScienceDirect

International Journal of Intelligent Networks


journal homepage: www.keaipublishing.com/en/journals/
international-journal-of-intelligent-networks

Customer churning analysis using machine learning algorithms


B. Prabadevi *, R. Shalini, B.R. Kavitha
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India

A R T I C L E I N F O A B S T R A C T

Keywords: Businesses must compete fiercely to win over new consumers from suppliers. Since it directly affects a company’s
Stochastic gradient booster revenue, client retention is a hot topic for analysis, and early detection of client churn enables businesses to take
Random forest proactive measures to keep customers. As a result, all firms could practice a variety of approaches to identify
KNearest neighbors
their clients early on through client retention initiatives. Consequently, this study aims to advise on the optimum
Logistics regression
Customer churn
machine-learning strategy for early client churn prediction. The data included in this investigation includes all
Machine learning customer data going back about nine months before the churn. The goal is to predict existing customers’ re­
sponses to keep them. The study has tested algorithms like stochastic gradient booster, random forest, logistics
regression, and k-nearest neighbors methods. The accuracy of the aforementioned algorithms are 83.9%, 82.6%,
82.9% and 78.1% respectively. We have acquired the most effective results by examining these algorithms and
discussing the best among the four from different perspectives.

1. Introduction organization’s primary motive should be satisfying customers and


retaining existing customers. Retaining existing customers is equally
The customer’s concentration on the providers has prompted many important as gathering new customers. Customer churn prediction is the
new telecom associations to emerge. These new firms usually specialize most important issue in adopting an industry’s product.
in providing a specific service or product that the customer cannot find Managing customer churn is one major challenge companies face,
from the incumbent providers. These new firms can provide this service especially those offering subscription-based services. Customer churn
or product at a lower price than the incumbent providers, allowing them also called customer attrition,n is the loss of customers, and it is caused
to capture a larger market share. The incumbent providers, however, by a change in taste, lack of proper customer relationship strategy,
can retain most of the market by pricing their products higher than the change of residence, and several other reasons. If businesses can effec­
new firms. This competition between the incumbent providers and the tively predict customer attrition, they can segment those customers that
new firms has caused the rates that the associations charge to change. are highly likely to churn and provide better services to them. Hence, a
The associations’ rates are often determined by the amount of compe­ churn prediction model is a mandate needed in today’s digitized econ­
tition in the market. The more competition, the higher the rates the omy. An organization can achieve a high customer retention rate and
associations can charge. In markets with a low level of competition, the maximize its revenue.
associations can charge rates lower than in markets with a high level of Among a few methodologies created in writing for anticipating client
competition. The rates that the associations’ charge is also affected by agitate, regulated Machine Learning (ML) procedures region the most
the number of services that the associations can offer. The more services and largely explored. ML consolidates many computations, for instance,
the associations can offer, the higher the rates that the associations can Decision trees, KNearest Neighbors(KNN), Linear regression, Naive
charge. Bayes, Neural Networks, Support Vector Machines (SVM), Genetic
Churning, in marketing terms, refers to the number of customers who Programming, and various others. Churn is one of those indispensable
stopped using a particular product. Always the churn rate must be low. issues, and firms have begun to acquire new Business Intelligence (BI)
Customer churning is common with any product when there are mul­ applications that anticipate agitating clients. Whenever the association
tiple options for a single problem. Usually, customers will churn when is familiar with the degree of clients who leave for one more organiza­
they face any difficulties or disappointments in the services rendered by tion in an extremely given specific period, it will be very simpler to
the product. The churn rate is usually measured for a specific time. Any return up with a nearby examination of the foundations for the agitate

* Corresponding author.
E-mail addresses: [email protected] (B. Prabadevi), [email protected] (R. Shalini), [email protected] (B.R. Kavitha).

https://doi.org/10.1016/j.ijin.2023.05.005
Received 8 September 2022; Received in revised form 17 April 2023; Accepted 21 May 2023
Available online 17 June 2023
2666-6030/© 2023 Published by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

rate and see the way of behaving of customers that withdraw and move brand-new approach to measure the churn rate at Iranian banks:
to a various business competitor. This aids in planning compelling client initially, it is normalized by data preparation [8]; then, the information
maintenance strategies for that organization. pool is formed employing a k-medoids method. The results recommend
Such calculable existing estimations like Decision trees, KNN, Linear that the MLPNN and SVM models have higher preciseness and lower
regression, Naive Bayes, Neural Networks, SVM, XG boost, and so forth value practicality.
are units accessible for client churning inside the market. Evidently, Amuda and Adeyem have described a prognosticate model that uses
with these existing algorithms, it’ll be troublesome to pick the most the multilayer perceptron design of the Artificial Neural Network (ANN)
effective algorithms that suit our necessities. So in this study, we have to predict client churn in an exceeding institution. The results showed
analyzed some of these existing algorithms together on the same dataset. that the event of the ANN code had a performance admire of the Neuro
The best one suited for churning is determined based on the accuracy of answer eternity code [9].
these algorithms. Furthermore, these algorithms can improve the pre­ Amit had surveyed many shoppers’ information and states that
dictions’ accuracy by incorporating the target variable’s gradient. The customer churninig becomes tougher as they can’t target every cus­
study’s main objective is to analyze the existing machine learning al­ tomer’s needs regarding services. Once customers needs are not fulfilled,
gorithms for earlier prediction of customer churn based on previously they switch from the service supplier [10]. Anam has proposed that
recorded information on customer feedback. Some potential churn customer Churn has one of the foremost necessary telecommunications
prediction algorithms which identify the most important variables that issues [11]. This text provides an outline of varied data processing
affect the target variable are utilized in this study: Stochastic gradient techniques for churn prediction.
booster, Random forest, K-Nearest Neighbors and Logistics Regression. Deepthi Das and Raju Ramakrishna have proposed that numerous
The rest of the sections in the paper is organized as follows: section 2 business and e-commerce specialists know customer Churn to spot
provides the survey on existing research work, section 3 details the customers in United Nations agency area units ready to amend their
proposed system, dataset description, and modules description, section existing business service or finish their subscription term [12]. Recently,
4 discusses the results and section 5 concludes the paper. corporations like e-commerce, telecommunications, and insurance have
returned below tremendous pressure.
2. Literature survey ULLAH et al. have developed a model whose outcomes show that the
proposed stir expectation model gave better agitate order utilizing the
This section provides a briefly surveys search works related to RF calculation and the client profile using k-means grouping [13]. In
customer churn prediction in various industries, their pros, and cons. expansion, it also gives beat variables to the standards produced using
Omar Adwan et al., have used Multi-layer perceptron neural network the chosen characteristic classifier calculation.
(MLPNN): Modelling and Analysis tor to another commercial compet­ The best prescient model accomplished 79% of the mathematical
itor, which leads to a loss of serious profits [1]. Actual customer data mean, and misclassification mistakes were limited to 0.192. 0.229 for
from a large Jordanian telecommunications company was provided for Type I or Type II blunders. In outline, a fascinating Meta Cost strategy
this investigation. Using MLP Neural Networks: Modelling and Analysis, has worked on presenting the visionary model without requiring critical
they predict Customer Churn in the Telecommunications Industry. handling by changing the first information tests [14].
Farhad Shaikh described the churn prediction system that uses Edvaldo and Olawande have proposed that as of not long ago, con­
classification and grouping techniques to rank churn clients and the ventional AI strategies (for example, MLP and SVM were effectively
reasons behind telecommunication customer churn [2]. They tend to utilized for pivot forecast, however with impressive exertion in the
project churn identification and prediction from an oversized telecom­ design of the preparation boundaries [15]. They were foreseeing even
munications dataset, mistreatment of using ML and NLP techniques. information for client relationships with the executives in finance uti­
Babu and Ananth have proposed that it is the procedure of finding lizing Deep Neural Networks (DNN).
the information concealed in huge informational collections, in­ Yahaya and Abisoye have proposed that the client stir expectation is
corporates different strategies and calculations to play out an effective a significant issue in the financial industry and has acquired consider­
investigation of informational indexes, the order is the strategy that is ation over the years [16]. The execution shows that the preparation
utilized to recognize the information and make forecasts about the execution further developed when a commotion was sifted, while the
future remain constant [3]. testing execution was impacted by the unequal information brought
Ismail et al., have proposed that the media communications industry about by separating.
faces critical rivalry between merchants to draw in new client to your Ahmed and Shabib Aftab suggested an approach that require some
provider [4]. The study recommends a MLPNN way to deal with fore­ investment and assets to create top-notch programming because main
seen client agitate at one of Malaysia’s driving media communications modules that are anticipated to be imperfect are tried [17]. They have
organizations. The outcomes are analyzed against the most well-known presented an order system that utilizes the Multi Filter highlight choice
misfortune expectation procedures, like regression and classification. procedure, and MLP is used to foresee blunder-inclined programming
The experiments show that the neural network-based approach modules. As indicated by the outcomes, the structure proposed utilizing
medium-sized NNs were found to perform higher in predicting client the class harmony method functioned admirably in all the datasets.
churn once experiencing different neural network topologies [5]. Amatare and Adebola OJO, described that this study proposed a
Kosgey have used text summarization for churn prediction tech­ Convolution neural network (CNN) model for anticipating client beat in
niques to realize a deeper perception of client churn and shows that the a media communications industry; three different models fostered the
foremost correct churn prediction is provided hybrid victimization utilization of two MLP models and another CNN model [18]. Precision
models instead of individual algorithms to assist the telecommunica­ rates for MLP models; MLP1 and MLP2 are 80% and 81%, respectively,
tions trade in understanding client churn wants and improve their ser­ while the CNN, CNN1, and CNN2 models are 81% and 89%,
vices undo the choice to cancel [6]. respectively.
Fatih Kayaalp, has proposed that the churn analysis is one of all the M.Feindta et al., have proposed that the detailed analysis related to
analyses used worldwide in subscription-oriented industries to research knowledge plays a vital role in fashionable analytics. Mechanically
client behavior and predict customers UN agency can leave the mass preprocesses input variables and uses advanced regularization and
service of an organization [7]. To keep the review up thus far, studies clipping techniques to el primarily eliminate the danger of overtraining
revealed within the past five years, primarily within the past two years, [19]. Sun-Chong et al., considers artificial neural networks as approxi­
are enclosed. mations of universal functions. We tend to gift the neural networks that
Gholamiangonabadi et al. proposed that this article introduces a area unit appropriate for statistic forecasts [20].

146
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

Edwine et al. [21], have done a comparative analysis of customer 3.1. Study flow
churn prediction models in the telecom industry. They have used three
best-fit algorithms, namely KNN, RF, and SVM, along with an optimi­ The concept of improvement is finding the most effective answer for
zation algorithm for hyperparameter tuning. They have concluded that future problems by gaining expertise from the present examples within
the basic versions of these algorithms perform lesser than the amal­ the machine learning method. Customer churn prediction has been
gamation (RF with grid search optimization algorithm) with a low-ratio performed using different methods, techniques, data processing, ma­
undersampling strategy. Similarly, a few more works on improving chine learning, and hybrid technologies. Most of them used call trees
customer churn prediction are performed in telecom customer seg­ because it is a recognized way to seek out client churn. However, it does
mentation using logistic regression [22] and suggestions on optimized not apply to complicated issues. However, the study shows that reducing
solutions to the existing machine learning models with a focus on feature the information improves the accuracy of the decision tree. In some
reduction (an optimized subset of features to predict the model) [23]. cases, machine learning algorithms square measure used for client pre­
Furthermore, other prediction models that can be suggested for any diction and historical analysis.
generic predictive analytic applications are discussed in Refs. [24,25]. Here projected review methodology consists of a few phases, as
It has been evident from the survey that machine learning and arti­ depicted in Fig. 1. The dataset obtained from the Kaggle with 7044 al­
ficial intelligence play a wider role in customer churn analysis. ternatives and 21 attributes was taken as input. Within the 1st 2 phases,
Furthermore, it has been observed that machine learning algorithms knowledge preprocessing and analysis are performed. Then, the infor­
perform better in integration than individual performance. Deep mation has been parted into two sections, train,and test set in the pro­
learning algorithms are preferred only for image-based reviews and portion of 70% and 30% separately. The most well-liked prophetical
sentiment analysis. Also, optimization of the model through feature models are applied within the prediction method: Logis Regression,
engineering is suggested. Therefore, an analysis of the best-fit algo­ KNN, random gradient booster, random forest, etc., and ensemble
rithms for customer churn prediction using machine learning is per­ techniques square measure applied to visualize the impact on the ac­
formed in this paper to assist readers and researchers. The results of this curacy of models. Finally, a giant knowledge platform should create the
study provide useful insights into the industry and help them predict churn prophetical system curve.
customer churn at an early stage and retain customers. Logistic regres­
sion better interprets the data. K-Nearest Neighbors offers accurate • To create the churn prophetical system, a giant knowledge platform
predictions. Random Forest is preferred as it splits the trees based on a should be put in. Jupyter libraries were picked because it is a free and
subset of features to improve the bagging. Furthermore, the Stochastic open-source structure.
Gradient Booster is preferred as it checks the training samples at random • The importance of this kind of analysis within the market is to assist
instead of all and helps to reach the global minimum faster. organizations in building much profit.
• Overall, the findings recommend that a neural network learning
3. Proposed analysis algorithmic program may provide a viable various to applied math
prophetical approaches in client churn prediction.
This section describes the details of the study, methodologies used, • The final results of all the algorithmic program can show that algo­
and modules. rithmic program is best appropriate for churn prediction

Fig. 1. System layout.

147
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

3.2. Methodologies used • Loads of adaptability - can improve on various misfortune works and
gives a few hyper boundary tuning choices that make the capacity fit
The system involved in the analysis of customer churning uses four entirely adaptable.
different algorithms mentioned below. • No information preprocessing is needed - typically works nicely with
categorical and numerical values as is.
1. Stochastic gradient booster
2. Random forest model Challenges.
3. K-Nearest Neighbors
4. Logistic regression model • One detriment of supporting is that it is delicate to exceptions since
each classifier is obliged to fix the mistakes in the ancestors.
• Along these lines, the technique is too reliant upon exceptions.
3.2.1. Stochastic gradient Booster(SGB) Another weakness is that the technique is difficult to increase.
This variation of boosting is termed random gradient boosting. SGB • This is because each assessor puts together its rightness concerning
architecture is depicted in Fig. 2. At every iteration, a subsample of the the past indicators, accordingly making the methodology hard to
training data is chosen randomly (without substitution) from the full smooth out.
preparation dataset. The arbitrarily chosen subsample is then utilized to • The high adaptability brings about numerous boundaries that
fit the base student rather than the full example. A couple of stochastic cooperate and impact intensely the way of behaving of the meth­
variations can be utilized: Subsample columns before making each tree. odology (number of cycles, tree profundity, regularization bound­
Subsample segments before making each tree. aries, and so forth) This requires an enormous framework search
Steps concerned in SGB as specified in [17]. during tuning.
Training rule. • Less informative, though this can be simply self-addressed with
numerous tools.
1. Resulting to start the training instructing
2. Loads 3.2.2. Random forest model (RF)
3. Inclination Random Forest is appropriate for a large dataset. Random Forest
4. For simple estimation and straightforwardness, NN parameters like architecture is depicted in Fig. 3. A technique that mixes several clas­
biases and weights ought to be set to satisfactory zero, and the sifiers to supply solutions to complicated issues. Similarly, a random
learning rate ought to be set to sufficient. forest algorithmic program creates call trees on information sum, gets
5. Activate each info unit as follows xi = si(i = 1to n) the prediction from every one of them, and eventually selects the
6. Get web input with the resulting ensuing. simplest answer by suggesting that of the ballot.
7. Apply appropriate activation function to determine a definitive from Steps concerned in random forest algorithm.
the results in step 6.
1. n scope of arbitrary records square measure taken from the data set
Purpose of stochastic gradient booster. having k scope of records.
2. Individual decision trees are worked for every model.
• Gradient boosting algorithmic may be used to predict nonstop 3. Each decision tree will make an outcome.
objective variables (as a Regressor) but additionally categorical
target variables (as a Classifier). The outcome is separately established on Majority Voting or Aver­
• As gradient boosting is one of the boosting algorithms, it’s accus­ aging for Classification and backsliding.
tomed to minimizing the bias error of the model.
• When it’s used as a regressor, the price performed is Mean square
Error (MSE), and once it’s used as a classifier, then the price per­
formed is Log loss

Fig. 2. Stochastic Gradient Booster Architecture [17]. Fig. 3. Random Forest Architecture [13].

148
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

Challenges. • Adapts easily: As new coaching samples square measure extra, the
algorithmic rule adjusts to account for any new information since all
• Time-consuming process: Since random forest algorithms will coaching information is held on into memory.
handle massive information sets, they will be offered additional • Few Hyperparameters: KNN just requires a k-worth and a distance
correct predictions However, they will be slow to method informa­ metric, which is low when contrasted with other AI calculations.
tion as they’re computing information for every individual call tree. • High accuracy –you have to contrast and worse regulated learning
• Requires additional resources: Since the random forests method models
has larger information sets, they’ll need other help to store that • No assumptions concerning information need to be compelled to
information. create further assumptions, tune many parameters, or build a model.
• More complex: The prediction of one call tree is simpler to interpret This makes it crucial in nonlinear information cases.
than a forest of them.
Purpose of the KNN algorithm.
Purpose of random forest.
• KNN is utilized in each regression and classification of prophetical
• Among all the accessible arrangement techniques, random forests issue. It uses information with many categories to predict the clas­
provide the most remarkable precision. sification of the new sample purpose.
• The random forest technique can even handle huge information with • KNN is exceptionally simple to execute as the main thing to be
varied variables running into thousands. determined is the distance between various focuses based on infor­
• It will mechanically balance information sets once a category is a lot mation of various elements and this distance can without much of a
more than different categories within the information. stretch be determined utilizing distance recipes, for example,
• At the center of this calculation is a Decision Tree along these lines; Euclidian or Manhattan.
Random Forests share every one of its benefits. • As there is no preparation period, new information can be added
whenever since it won’t influence the model.
3.2.3. KNN • Accuracy depends on the standard of the information. With massive
It’s a supervised machine learning rule. KNN architecture is depicted data, the prediction stage may well be slow.
in Fig. 4. The calculation can be utilized to take care of both regression • Sensitive to the dimensions of the information and tangential
and classification issue articulations. KNN calculation at the preparation options.
stage merely stores the dataset. Once it gets new info, it arranges that • Require high memory – need to store all of the preparation
info into a classification. information.
KNN algorithmic rule [4]
3.2.4. Logistic regression model
1. Load the information The logistic Regression can classify our observations because the
2. Initialize K For every model in the information client “will churn” or “won’t churn” from the platform. This architecture
3. Sort the arranged assortment of distances and files from smallest to is depicted in Fig. 5. This model will attempt to figure out the likelihood
largest (in ascending order) by the distances of happiness in at least one cluster or another.
4. Get the labels of k Steps carried in LR calculation.
5. If Regression, come back the mean of the K labels
6. If classification, come back the mode of the K labels Step 1. Import the necessary libraries
Step 2. Peruse and get the information
Challenges. Step 3. Exploratory Data Analysis
Step 4. Information Preparation
• Easy to implement: Given the algorithm’s simplicity and accuracy, Step 5. Building Logistic Regression Model
it’s one of the primary classifiers that a brand-new information Step 6. Making Predictions on Test Set
human can learn. Step 7. Appointing Scores according to anticipated likelihood values

Challenges.

• The major challenge of logistical Regression is the assumption of


dimensionality between the variable quantity and the freelance
variables.
• Nonlinear issues cannot be solved with logistical Regression since it’s
a linear call surface.

Fig. 4. KNN architecture [4]. Fig. 5. Logistic Regression Architecture [22].

149
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

• Therefore, the transformation of nonlinear options is needed, which Table 1


may be done by increasing the number of elements such that the Dataset attributes and description.
information becomes linearly dissociable in higher dimensions. S. Attribute Description
No Name
Purpose of Logistic Regression. 1. Customer id The id of the customer
2. Gender The customer’s or client’s gender, whether male or female
• Logistic Regression predicts the output of a categorical variable 3. Senior-resident Whether the customer or client is a senior resident or not
quantity. So the result should be a categorical or distinct worth. It (1, 0)
4. Partner Whether the client or the customer has a collaborator or
will be either affirmative or no, 0 or 1, true or False, etc. however, not (Yes, No)
rather than giving the precise worth as zero and one, it provides the 5. Dependents Whether the client or the customer has wards/family
probabilistic values that lie between zero and one. members as dependants or not (Yes, No)
• It will simply reach multiple classes (multinomial Regression) and a 6. Tenure Number of months the client or the customer has retained
with the organization
natural probabilistic perspective on class expectations
7. Phone Service Whether the client or the customer has a telephone/
• Logistic Regression can be utilized to group the perceptions utilizing mobile connection or not (Yes, No)
various sorts of information and can,without much of a stretch, 8. Multiple Lines Whether the customer or client has several telephone lines
decide the best factors utilized for the arrangement. or not (Yes, No, No telephone administration)
• Logistic Regression performs higher once the information is linearly 9. Internet Customer’s web access supplier (DSL, Fiber optic, No)
Service
dissociable 10. Online Whether the customer or client has web security or not
• It doesn’t need too several procedure resources as its extremely Security (Yes, No, No web access)
explainable. There is no downside to scaling the input features—It 11. Online Backup Whether the customer or client has online reinforcement
doesn’t need standardization or not (Yes, No, No network access
12. Device Whether the customer or the client has gadget assurance
• It is simple to implement and train a model victimization logistical
Protection or not (Yes, No, No web access)
regression 13. Tech Support Whether the customer or the client has technical support
• It provides a live-off, however relevant a predictor (coefficient size) or not (Yes, No, No network access)
is, and its direction of association (positive or negative). 14. Streaming TV Whether the customer or the client has streaming TV or
not (Yes, No, No network access)
15. Streaming Whether the customer or the client has streaming motion
Dataset Description.
Movies pictures or not (Yes, No, No web access)
The customer churn dataset was downloaded from Kaggle. The 16. Contract The customer’s or the client’s agreement term (Month-to-
customer information contains data about a made-up telco organization month, One-year, long-term)
that provided home telephone and Internet administrations to 7044 17. Paperless Whether the customer or the client has paperless charging
Billing or not (Yes, No)
clients. It demonstrates which clients have left, remained, or pursued
18. Payment The customer’s or the client’s installment strategy
their administration. While gazing at knowledge from customers that Method
have already got churned (response) and their characteristics/behavior 19. Total Charges The aggregate sum
(predictors) before the churn happened, these information were 20. Monthly The aggregate charged to the client every month
recorded. Charges
21. Agitate If the customer or the client stirred or not (Yes or No)
The informational index incorporates data about.

1 Clients who left inside the last month - the portion is called Churn 4.2. Dataset and its description before and after datatype conversion
2 Administrations that each client has pursued - telephone, numerous
lines, web, online security, online reinforcement, gadget assurance, The raw dataset obtained has the attributes in different types of data
school backing, and streaming TV and flicks like objects. So the data is categorized and converted to a feasible type.
3 client account information - how long they’ve been a client, contract, The dataset description for training with different models is displayed in
installment technique, paperless charging, month-to-month charges, Fig. 7a. and Fig. 7b. One hot encoding and label encoders were used to
and outright charges transform the categorical labels to numerical labels and for normalizing
4 Segment data concerning clients - orientation, age range, and on the the labels as shown in Fig. 7c.
off chance that they need accomplices and wards.
4.3. Prediction of the KNN algorithm
The attributes of the dataset with 7044 alternatives are given in
Table 1. The KNN cross-validation is performed using Grid Search CV for the
hyperparameter tuning, and the best KNN results after tuning are
4. Result and discussion specified. The prediction results of KNN algorithm are depicted in Fig. 8a
and b. The best KNN Training score was 0.7849459340352903, the test
The results were obtained using Python 3.10.6 by utilizing the performance was 0.7773049645390071 and the AUROC was
Jupyter Libraries from Anaconda. The various libraries used include 0.7817042078747963.
numpy, pandas, matplotlib and seaborn.The results obtained in
comparing the performance of the various algorithms are narrated step 4.4. Prediction of logistic regression algorithm
by step.
The prediction results of the LR algorithm by applying GridSearch CV
4.1. Test and train dataset split for hyperparameter tuning are displayed in Fig. 9a. and 9b. The best LR
Training Score was 0.7975697081209744, test Performance was
The customer churn dataset is split into training and testing data in a 0.7829787234042553 and AUROC was 0.8269877975760328.
70 : 30 ratio respectively. The head of the dataset is shown in in Fig. 6.

150
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

Fig. 6. The sample rows of the dataset.

Fig. 7a. Dataset description.

Fig. 7b. Dataset samples after conversion.

4.5. Prediction of random forest algorithm 0.829097309685545.

The prediction results of the RF algorithm by applying Random­


4.6. Prediction of stochastic gradient booster algorithm
izedSearch CV for hyperparameter tuning are displayed in Fig. 10a. and
10b. The best RF Training Score was 0.8038805992445953, test Per­
The prediction results of the SGB algorithm by applying Random­
formance was 0.7872340425531915 and AUROC was
izedSearch CV for hyperparameter tuning are displayed in Fig. 11a. and

151
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

Fig. 7c. Sample Dataset after Label encoding and One hot encoding.

Fig. 8a. Final accuracy of KNN.

Fig. 9b. Predicted graph for LR.

Fig. 8b. Predicted graph for KNN.


Fig. 10a. Final accuracy of Random Forest.

11b. The best SGB Training Score was 0.8067218322921829, test Per­
formance was 0.7914893617021277 and AUROC was
0.8396754279107219.

4.7. Prediction results of overall compared four algorithms

The four algorithms taken for the analysis were described in Table 2.
The four algorithms under study were compared based on the ac­
Fig. 9a. Final accuracy of LR.
curacy as shown in Fig. 12. It is shown that SGB performs better than
other models. The performance metrics comparison of all four models is

152
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

Table 2
The difference among Logistic Regression, KNN, Stochastic gradient booster and
Random Forest.
Factor Logistic K-Nearest Stochastic Random forest
regression Neighbors gradient
booster

Definition Used to It consumes Uses delta Used in


predict the more cost rule for classifications
categorical for training. training and regressions
dependent It is non- Weights and problems
value and parametric bias are Suitable for
solve a model adjustable large data and
classification interpretability
problem is not a major
concern
Accuracy 0.826 0.781 0.839 0.829
Advantage Easier to Much faster Its efficiency provide the
implement than other is one of the highest
and interpret algorithms major accuracy, but in
Fig. 10b. Predicted graph for Random Forest. and efficient for training advantages this work SGB
to train outperforms
Remark Better than Gives Gives the best Second best
others decent performance performance
value compared
with the other
three

Fig. 11a. Final accuracy of stochastic gradient booster.

Fig. 12. ROC Curve for classification Models.

Fig. 11b. Predicted Graph for stochastic gradient booster.

shown in Fig. 13.


The models chosen are examined through ROC and AUC. ROC show
the yes instead of the Misleading Positive Rate of matched classifiers
across every decision edge (someplace in the scope of 0, 1). In the upper
right is any place the decision edge is zero. In this way, any perception
with a P (y = 1)≥zero is classed as a “1″, and the rest region unit is
delegated a “0". Since each perception can fulfill the essential condition,
every perception is classed as a “1". Subsequently, we will quite often
appropriately characterize all obvious “1″s, but inaccurately order all
evident “0″s.
Therefore, organizations can use SGB to predict customer churning
rates better, increasing retention rates and reducing costs incurred. This Fig. 13. Comparative analysis of KNN, LR, RF, and SGB.
model can be integrated into the organization’s customer management
portal to monitor the customer and predict the retention rate. This will satisfy their needs.
help categorize the customers as at-risk and can further improve to Customer churning for a different context and on different datasets
was carried out by considering other factors like Social network analysis

153
B. Prabadevi et al. International Journal of Intelligent Networks 4 (2023) 145–154

features. The results obtained on the unbalanced dataset category for RF [6] S. Babu, N.R. Ananthanarayanan, V. Ramesh, A study on efficiency of decision tree
and multi layer perceptron to predict the customer churn in telecommunication
are 6% less than the proposed model [26]. Furthermore, better accuracy
using WEKA, Int. J. Comput. Appl. 140 (4) (2016) 26–30.
was obtained using a backpropagation network and decision tree, but [7] Fatih Kayaalp, Review of customer churn analysis studies in telecommunications
this combination of machine learning algorithms was least tried [27]. industry, Karaelmas Science & Engineering Journal 7 (2) (2017).
Lalwani et al. have developed a prediction model by applying machine [8] Davoud Gholamiangonabadi, Jamal Shahrabi, Seyed Mohamad Hosseinioun,
Sanaz Nakhodchi, Soma Gholamveisy, Customer churn prediction using a new
learning algorithms like LR, Naïve Bayes, support vector machine, RF, criterion and data mining; A case study of Iranian banking industry, in:
DT, ensemble methods, and cross-validation for hyperparameter tuning. Proceedings of the International Conference on Industrial Engineering and
The highest accuracy of 81.71% and 80.8% were obtained by the Ada­ Operations Management, 2019, pp. 5–7.
[9] Kamorudeen A. Amuda, Adesesan B. Adeyemo, Customers Churn Prediction in
Boost and XGBoost classifiers, respectively [28], which is lesser than the Financial Institution Using Artificial Neural Network, 2019, 11346 arXiv preprint
highest accuracies of the proposed model. arXiv:1912.
[10] Anam Bansal, Churn prediction techniques in telecom industry for customer
retention: a survey, J. Eng. Sci. 11 (4) (2020) 871–881.
5. Conclusions and future works [11] Vrushabh Jinde, Savyanavar Amit, " customer churn prediction system using
machine learning.", International Journal of Advanced Science and Technology 29
The results were examined to observe the exhibition regarding the (5) (2020) 7957–7964.
[12] P. Sri Sai Surya, K. Anitha, Comparative analysis of accuracy and prediction of
different calculations of planning data for customer churn analysis. It customer loyalty in the telecom industry using novel diverse algorithm, in: 2022
had become understood that foreseeing a stir is one in everything about International Conference on Business Analytics for Technology and Security
preeminent essential wellsprings of monetary benefit to partnerships. (ICBATS), IEEE, 2022, pp. 1–7.
[13] Irfan Ullah, Basit Raza, Ahmad Kamran Malik, Muhammad Imran, Saif Ul Islam,
Four algorithms were chosen due to their variety during this prediction
Sung Won Kim, A churn prediction model using random forest: analysis of machine
and are examined by investigating Receiver Operating Characteristics learning techniques for churn prediction and factor identification in telecom
(ROC) and Area Under Curve(AUC). Looking at the regions beneath the sector, IEEE Access 7 (2019) 60134–60149.
bend, the Stochastic Gradient Booster model plays out magnificent, with [14] Nazeeh Ghatasheh, Hossam Faris, AlTaharwa Ismail, Yousra Harb, Ayman Harb,
Business analytics in telemarketing: cost-sensitive analysis of bank campaigns
an AUC of 0.84; the most over-the-top dreadful model is the K-Nearest using artificial neural networks, Appl. Sci. 10 (7) (2020) 2581.
Neighbor, which is a still-fair AUC of 0781. The Grid Search CV has [15] Edvaldo Domingos, Blessing Ojeme, Olawande Daramola, Experimental analysis of
increased the time needed to figure hyperparameter optimization. hyperparameters for deep learning-based churn prediction in the banking sector,
Computation 9 (3) (2021) 34.
Furthermore, looking over a bigger hyperparameter space doesn’t [16] Rahmat Yahaya, Opeyemi Aderiike Abisoye, Sulaimon Adebayo Bashir, An
ensure that Randomized Search CV will choose the ideal enhanced bank customers churn prediction model using A hybrid genetic
hyperparameters. algorithm and K-means filter and artificial neural network, in: 2020 IEEE 2nd
International Conference on Cyberspac (CYBER NIGERIA), IEEE, 2021, pp. 52–58.
Further research can concentrate on refined data-side preprocessing [17] Ahmed Iqbal, Shabib Aftab, A classification framework for software defect
and exhaustive hyperparameter standardization to improve the model prediction using multi-filter feature selection technique and MLP, Int. J. Mod.
performance. More advanced optimization methods can be used for Educ. Comput. Sci. 12 (1) (2020).
[18] Sunday A. Amatare, A.K. Ojo, Predicting customer churn in telecommunication
hyperparameter optimization. To achieve the most accuracy despite industry using convolutional neural network model, IOSR J. Comput. Eng. 22 (3)
machine intensity, a better hyperparameter optimization method would (2020) 54–59.
probably increase the classification accuracy of the models to some [19] M. Feindt, U. Kerzel, The NeuroBayes neural network package, Nucl. Instrum.
Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 559 (1) (2006)
extent.
190–194.
[20] Sun-Chong Wang, Artificial neural network, in: Interdisciplinary Computing in
Declaration of competing interest Java Programming, Springer, Boston, MA, 2003, pp. 81–100.
[21] Nabahirwa Edwine, Wenjuan Wang, Wei Song, Denis Ssebuggwawo, Detecting the
risk of customer churn in telecom sector: a comparative study, Math. Probl Eng.
The authors declare that they have no known competing financial 2022 (2022). Article ID 8534739, 16 pages.
interests or personal relationships that could have appeared to influence [22] Tianyuan Zhang, Sérgio Moro, Ricardo F. Ramos, A data-driven approach to
the work reported in this paper. improve customer churn prediction based on telecom customer segmentation,
Future Internet 14 (3) (2022) 94.
[23] Seyed Mohammad Sina Mirabdolbaghi, Babak Amiri, Model optimization analysis
References of customer churn prediction using machine learning algorithms with focus on
feature reductions, Discrete Dynam Nat. Soc. 2022 (2022). Article ID 5134356, 20
[1] Omar Adwan, Hossam Faris, Khalid Jaradat, Osama Harfoushi, Nazeeh Ghatasheh, pages.
Predicting customer churn in telecom industry using multilayer preceptron neural [24] Narges Heidari, Parham Moradi, Abbas Koochari, An attention-based deep learning
networks: modeling and analysis, Life Sci. J. 11 (3) (2014) 75–81. method for solving the cold-start and sparsity issues of recommender systems,
[2] Mohammad Ridwan Ismail, Mohd Khalid Awang, M. Nordin A. Rahman, Knowl. Base Syst. 256 (2022), 109835, https://doi.org/10.1016/j.
Mokhairi Makhtar, A multi-layer perceptron approach for customer churn knosys.2022.109835. ISSN 0950-7051.
prediction, International Journal of Multimedia and Ubiquitous Engineering 10 (7) [25] Navid Khaledian, Farhad Mardukhi, CFMT: a collaborative filtering approach
(2015) 213–222. based on the nonnegative matrix factorization technique and trust relationships,
[3] Farhad Shaikh, Brinda Pardeshi, Ajay Jachak, Akash Bendale, J. Ambient Intell. Hum. Comput. (2022) 1–17.
Nandakishor Sonune, Mangal Katkar, Customer churn prediction using nlp and [26] A.K. Ahmad, A. Jafar, K. Aljoumaa, Customer churn prediction in telecom using
machine learning: an overview, International Journal Of Advance Scientific machine learning in big data platform, Journal of Big Data 6 (1) (2019) 1–24.
Research And Engineering Trends 6 (2) (2021) 40–45. [27] Thanasis Vafeiadis, Konstantinos I. Diamantaras, Sarigiannidis George, K. Ch
[4] Anuj Sharma, DrPrabin Kumar Panigrahi, A neural network based approach for Chatzisavvas, A comparison of machine learning techniques for customer churn
predicting customer churn in cellular network services, Int. J. Comput. Appl. 27 prediction, Simulat. Model. Pract. Theor. 55 (2015) 1–9.
(11) (2011) 26–31. [28] Praveen Lalwani, Manas Kumar Mishra, Jasroop Singh Chadha, Pratyush Sethi,
[5] Ammara Ahmed, D. Maheswari Linen, A review and analysis of churn prediction Customer churn prediction system: a machine learning approach, Computing
methods for customer retention in telecom industries, in: 2017 4th International (2022) 1–24.
Conference on Advanced Computing and Communication Systems (ICACCS), IEEE,
2017, pp. 1–7.

154

You might also like