0% found this document useful (0 votes)
15 views

To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm

Uploaded by

snehadhake1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm

Uploaded by

snehadhake1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)

MIT ADT University, Pune, India. Apr 25-27, 2024

To Design and Implement Application for Bank


2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon) | 979-8-3503-6287-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/MITADTSoCiCon60330.2024.10575438

Customer Churning Rate Prediction and Analysis


using Machine Learning Algorithm
Vikram Maskale Vivekanand Vaidya Yash Patil
Computer Engineering Computer Engineering Computer Engineering
G H Raisoni College of Engineering and G H Raisoni College of Engineering and G H Raisoni College of Engineering and
Management Management Management
Pune, India Pune, India Pune, India
[email protected] [email protected] [email protected]

Yogesh Bagal Vidya Dhamdhre


Computer Engineering Computer Engineering
G H Raisoni College of Engineering and G H Raisoni College of Engineering and
Management Management
Pune, India Pune, India
[email protected] [email protected]

Abstract— The banking industry faces a constant challenge and understanding of customer churning patterns have
in retaining customers and mitigating customer churn. To become central to their operations. Identifying potential
address this issue, this study presents the design and churners involves categorizing current customers based on
implementation of an innovative application aimed at predicting similarities. Predicting customer churn is crucial, as retaining
and analyzing bank customer churning rates. Leveraging state- existing customers is often more cost-effective than acquiring
of-the-art machine learning algorithms, this application offers a new ones [3]. This research project embarks on the design and
comprehensive solution for identifying potential churners and implementation of an innovative application that leverages
devising effective retention strategies. The first phase of the cutting-edge machine learning algorithms for the prediction
project involves data collection and preprocessing, where
and analysis of bank customer churning rates.
historical customer data, including transaction history,
demographics, and behavior patterns, is gathered and cleaned Aim of this research project is to develop a robust and
for analysis. Subsequently, various machine learning practical application that empowers banks to proactively
algorithms, including but not limited to logistic regression, address customer churn. Creating a predictive model to
random forests, and gradient boosting, are employed to build anticipate the future status of customers enables banks to
predictive models. These models are trained on the historical receive early notifications, prompting timely adjustments in
data and are fine-tuned to achieve optimal predictive accuracy. services or the introduction of new offerings tailored to each
customer's needs [1]. This research paper seeks to contribute
Keywords—Churning Analysis, Prediction, Machine learning
valuable insights and practical guidance to financial
I. INTRODUCTION institutions aiming to harness the potential of machine
learning for addressing customer churn.
In the modern banking landscape, customer churn, or the
rate at which customers discontinue their financial II. LITERATURE REVIEW
relationships with banks, has become a pressing concern. With
Customer churn analysis in the banking sector is a
increasing competition and evolving customer preferences,
prominent area of study, and several research efforts have shed
banks are continually challenged to retain their existing
light on effective strategies for predicting and mitigating
customer base [1]. Recognizing the need for proactive
churn. For this work, in [4] A commercial bank's consumer
strategies to mitigate churn, this research paper focuses on the
dataset, comprising 28,382 customer records, underwent
application of predictive analysis using machine learning
meticulous analysis. After rigorous datapreprocessing, 5,260
algorithms as a powerful tool for addressing this critical issue.
valid records remained for in-depth analysis. Researchers
The banking industry relies heavily on customer explored the applicability of two Support Vector Machine
relationships, as long-term customers contribute significantly (SVM) models: the linear and the SVM with radial basis
to a bank's profitability. In the contemporary business kernel function. It was observed that due to the inherent
landscape [2], organizational managers increasingly imbalance in the commercial bank client churn dataset,
acknowledge the crucial significance of customer retention. traditional SVM models struggled to provide accurate churn
Consequently, understanding the drivers of churn and rate predictions, and common assessment metrics couldn't
developing effective strategies to retain customers have effectively evaluate model performance. In [3], the authors
become paramount for financial institutions. Data analysis introduced a novel hybrid model, the logit leaf model, which
involves thoroughly examining data to uncover valuable merges decision tree and logistic regression. This approach
insights, utilizing techniques such as data mining and machine aims to address the limitations of each algorithm by leveraging
learning [2]. their respective strengths in a unified framework.
As banks strive to provide exceptional customer This research yielded valuable insights into addressing
experiences and maintain their market share, the identification churn-related challenges. It became apparent that novel

979-8-3503-6287-9/24/$31.00 ©2024 IEEE 1


Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:52:39 UTC from IEEE Xplore. Restrictions apply.
approaches were needed to tackle the data imbalance issue.
Specifically, the utilization of under sampling techniques
significantly improved the predictive performance of the
classification models. Such techniques allowed the SVM
models to better capture the nuances of customer churn,
despite the dataset's skewed nature. Consequently, this
enhanced the overall predictive power of the models and
facilitated more meaningful insights. Marketing forcustomers
in advance and in time to reduce the loss of bank funds [5].
In another study focused on the banking sector [6], data
mining was leveraged to extract valuable information from Fig. 1. Block Diagram
repositories.
A. Dataset:
The research examined the behavior of 1,866 customers,
exploring how the usage of banking services and products The dataset was obtained from Kaggle and comprises
influenced customer loyalty. Remarkably, the findings 10,000 records, each featuring 13 distinct attributes. These
indicated that customers who engaged with more banking attributes can be categorized into three primary groups:
products and services exhibited higher levels ofloyalty. This Demographic Information: This category includes data
insight led to the recommendation that banks should related to customers' demographics, such as their city of
concentrate their efforts on customers who utilized fewer than residence, age, and more.
three products and tailor their offerings to meet their specific Customer Bank Relationship: This category encompasses
needs. The study employed deep learning techniques, information pertaining to the customers' relationship with the
specifically neural networks, for churn prediction using the bank, including attributes like the customer's net worth
Alyuda Neuro-Intelligence software package. This approach category (customer_nw category).
allowed for more precise churn predictions and, consequently,
more targeted strategies for customer retention. Personal Information: Additionally, the dataset contains
personal details of the customers, such as their names and
In a separate study [7], telecommunication operators surnames. This imbalance in class distribution is a significant
tackledthe challenge of predicting customer churn, leveraging aspect of the dataset's composition.
machine learning on a big data platform. The dataset was
generously provided by the Syriatel telecom company, B. Exploratory Data Analysis and Datapreprocessing:
making it particularly relevant to their operational context. The initial steps in data preprocessing involved addressing
The study deployed four distinct methodologies, including several crucial aspects of the dataset. These actions were
Decision Trees, Random Forest, Gradient Boosted Machine necessary due to the presence of missing values and class
Trees, and XGBOOST, to assess churn likelihood [8]. The imbalances that could potentially skew the analysis, as
Hortonworks Data Platform, coupled with Spark engines, was described in Table 3.1. Additionally, irrelevant features were
chosen for data analysis, function development, training, and removed to streamline the dataset, ensuring that the churners'
software testing. Addressing the imbalanced class issue, the class wouldn't be overwhelmed by non-churners' class.
researchers applied oversampling to balance the dataset,
effectively addressing the challenges posed by the skewed To gain a deeper understanding of the dataset, Exploratory
class distribution. Furthermore, hyper-parameteroptimization Data Analysis (EDA) was employed as a foundational
via K-fold cross-validation enhanced the model's approach, also highlighted in Table 3.1. EDA is anexploratory
performance. technique that offers valuable insights into the data's
characteristics. During EDA, various operations were
These studies collectively highlight the evolving performed, including handling missing values, normalizing
landscape of customer churn analysis, offering new data, and scaling the features.
perspectives and techniques to address the inherent challenges
in diverse sectors, ultimately improving customer retention In this process, Python libraries and utility functions, as
strategies. referenced in Table 3.1, were harnessed to facilitate the
necessary data transformations. Furthermore, EDA featureda
bivariate analysis, where relationships between pairs of
variables were explored, as shown in Table 1. This helped
III. METHODOLOGY uncover correlations and redundancies between features, with
In the bank customer data, we can clearly see that due to techniques like scatter plots, histograms, and heatmaps
useof classification algorithm prediction of customer becomes utilized for visualization.
inefficient hence in this methodology we have chosen some Additionally, feature selection played a pivotal role,
Machine learning algorithms which provides us the better detailed in Table 1. Only the most relevant attributes were
performance we have implemented the suitable ensemble chosen, effectively creating a subset of baseline features. Out
approaches for classification models that show us the better of the initial 13 features, a refined set of 9 features was
results the block diagram of approach consists of various steps selected based on feature selection techniques. This
shown in Fig.1. streamlined feature set not only improved the efficiency of
subsequent analyses but also enhanced the dataset's overall
quality.

2
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:52:39 UTC from IEEE Xplore. Restrictions apply.
TABLE I. PYTHON LIBRARIES USED FOR THE EDA AND
PREPROCESSING THE DATA

Sr Libraries Purpose
1 Pandas Data Manipulation
2 Describe () To see the details of data
3 Matplotlib Use for interactive visuals
4 Seaborn Plotting statistical visual
5 StandardScaler Use for Feature Scaling
6 get_dummies () Use for one-hot-encoding
7 Numpy Used For Mathematical
Function

C. Model Selection: Fig. 3. Model Precision


The objective is to identify the most suitable algorithms to
achieve the best predictive results for our model. To crucial to ensure the effectiveness of our machine learning
accomplish this, we're planning to employ a diverse range of models.
machine learning algorithms, including logistic regression, Subsequently, we proceeded to construct our machine
decision trees, random forests, support vector machines, and learning models using the training data. Each model was
the XG-Boost algorithm. then evaluated for its performance. The details of these model
The decision tree learning algorithm faces an issue where accuracies, model precision and their evaluations are
minor adjustments in the training data result in significant meticulously organized in Fig 2 and Fig 3.
fluctuations in classification performance, rendering it an
E. Model Deployment:
unstable algorithm. To address this challenge, Breiman
introduced random forests [9], offering a solution that Deploy the chosen churn prediction model in a production
mitigates instability through a more robust and diversified environment, which can include integration with the bank's
approach. customer management systems. Implement real-time or batch
prediction based on the bank's operational needs.
The strategy involves conducting comprehensive
experiments with these various algorithms to assess their We are going to create the GUI Using the PythonPackages
individual performance and accuracy, employing the most like tkinter, joblib.
relevant metrics and methodologies. We will be building and
IV. IMPLEMENTAION AND RESULT ANALYSIS
evaluating several models, including Random Forest,
Decision Tree, K-Nearest Neighbors, Support Vector This comprehensive model analysis showcases the
Machines, Logistic Regression, Gradient Boosting Classifier, varying strengths and effectiveness of each algorithm in
and XG-Boost. predicting and managing customer churn. Notably, XGBoost
stands out with the highest accuracy, followed closely by SVC
Through this systematic approach, we aim to thoroughly and Decision Tree. The precision metrics highlight the models'
evaluate how these algorithms perform when applied to our ability to minimize false positives, providing valuable insights
dataset [9]. This not only enables us to identify which for banks to make informed decisions in retaining their
algorithms yield the most reliable and precise results but also customer base.
allows us to gain insights into the strengths and weaknesses of
each approach, ultimately facilitating the selection of the most This comprehensive model analysis showcases the
effective algorithm for customer churn prediction. varying strengths and effectiveness of each algorithm in
predicting and managing customer churn. Notably, XGBoost
D. Model Training and Model Evaluation: stands out with the highest accuracy, followed closely by SVC
In the model training phase, we initiated the process by and Decision Tree. The precision metrics highlight the models'
implementing a train-test data split using the Sklearn library. ability to minimize false positives, providing valuable
This involved dividing the dataset into two portions: 70% of insights for banks to make informed decisions in retaining
the data was allocated for training, while the remaining 30% their customer base.
was earmarked for testing. This partitioning strategy was
TABLE II. COMPARISON OF MODEL WITH EXISTING MODELS

Model/Method Accuracy% Precision%


XGB 85.87 83.26
LR [4] 82.4 66.45
DT [2] 85.10 NC
LDT [3] 81.77 44.12
RF [4] 85.2 65.12

In the pursuit of enhancing customer churn prediction, our


project implemented a diverse array of machine learning
algorithms on a consistent platform. [4] Comparing the results
with a previous model, we observe variations in the
Fig. 2. Model accuracies performance of key models.

3
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:52:39 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSION
In the banking sector, prioritizing customer engagement
and tackling potential churn is crucial. Machine learning
emerges as a powerful tool for predicting and addressing
customer attrition in the competitive financial landscape.
The methodology outlined for predictive modeling
emphasizes a systematic approach, from data collection to
model deployment, highlighting the importance of ethical
considerations and data privacy.
The study, though based on a smaller dataset, effectively
addresses challenges like class imbalance through
oversampling techniques. The evaluation of machine learning
classifiers pinpoints the superiority of the XGBoostalgorithm,
emphasizing the need for advanced techniques in customer
retention.
Looking ahead, the banking sector should focus on
integrating advanced technologies, personalizing customer
experiences, and real-time monitoring. Maintaining ethical AI
practices, strengthening the feedback loop, ensuring cross-
channel consistency, and investing in staff training arevital for
a dynamic approach to customer retention.
REFERENCES
[1] AL-Najjar, D.; Al-Rousan, N.; AL-Najjar, H. Machine Learning to
Develop Credit Card Customer Churn Prediction. J. Theor. Appl.
Electron. Commer. Res. 2022, 17, 1529–1542. https://doi.org/
10.3390/jtaer17040077
[2] S. H. Dolatabadi and F. Keynia "Designing of Customer and Employee
Churn Prediction Model Based on Data Mining Method and Neural
Predictor " The 2nd International Conference on Computer and
Communication Systems (2022).
[3] A. D. Caigny, K. Coussement and Koen W. De Bock “A new hybrid
classification algorithm for customer churn prediction based on logistic
regression and decision trees “European Journal of Operational
Research Volume 269, Issue 2, 1 September 2018, Pages 760-772.
[4] I. Kaur and J. Kaur "Customer Churn Analysis and Prediction in
Banking Industry using Machine Learning" 2020 Sixth International
Conference on Parallel, Distributed and Grid Computing (PDGC).
[5] Deng, Y., Li, D., Yang, L., Tang, J., & Zhao, J." Analysis and prediction
of bank user churn based on ensemble learning algorithm." 2021 IEEE
International Conference on Power Electronics, Computer
Applications (ICPECA).
[6] Kristof Coussement, Stefan Lessmann, Geert Verstraeten, “A
comparative analysis of data preparation algorithms for customer churn
prediction: “A case study in the telecommunication industry, Decision
Support Systems (2016).
[7] K. Mishra and R. Rani, "Churn prediction in telecommunication using
machine learning, “International Conference on Energy,
Communication, Data Analytics and Soft Computing (ICECDS),
Chennai, pp.2252-2257, 2017.
[8] Rahman, M., & Kumar, V.” Machine Learning Based Customer Churn
Prediction in Banking.” 2020 4th International Conference on
Electronics, Communication and Aerospace Technology (ICECA).
(2020).
[9] Breiman, Leo. "Random forests." Machine learning 45.1 ,5-32, 2001.

4
Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on August 23,2024 at 13:52:39 UTC from IEEE Xplore. Restrictions apply.

You might also like