Report
Report
Report
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
1
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI - 600 119 ,MARCH – 2022
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Gujjalapati
Yamini (38110178) and Dasam Meenakshi (38110678) who carried out the
project entitled “PREDICTION OF BANK CUSTOMER CHURN USING
MACHINE LEARNING TECHNIQUE” under my supervision from
December 2021 to May 2022.
Internal Guide
Dr. B.U. Anu Barathi M.E., Ph.D.
2
DECLARATION
I, Gujjalapati Yamini and Dasam Meenakshi hereby declare that the Project
Report entitled “PREDICTION OF BANK CUSTOMER CHURN USING
MACHINE LEARNING TECHNIQUE” done by us under the guidance of Dr.
B.U.Anu Barathi M.E,Ph.D…, is submitted in partial fulfillment of the requirements for
the award of Bachelor of Engineering Degree in Computer Science and
Engineering 2018-2022.
DATE:
G.Yamini
D. Meenaakshi
ACKNOWLEDGEMENT
3
I would like to express my sincere and a deep sense of gratitude to my Project
Guide Dr.B.U. Anu Barathi M.E., Ph.D., for her valuable guidance,
suggestions and constant encouragement paved the way for the successful
completion of my project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in
many ways for the completion of the project
Abstract
Now a -days there are a lot of service providers available in every business. There is
no shortage of customers in any options. Mainly, in the banking sector when people
want to keep their money safely they have a lot of options. As a result, customer
churn and loyalty of customers have become a major problem for most banks. In this
paper, a method that predicts customer churn in banking using Machine learning
with ANN. This research promotes the exploration of the likelihood of churning by
customer loyalty.
The number of service providers is increasing very rapidly in every business. These
days, there is no shortage of options for customers in the banking sector when
choosing where to put their money. As a result, customer churn and engagement
have become one of the top issues for most banks. In this project, a method to
predict the customer churn in a Bank, using machine learning techniques, which is a
branch of artificial intelligence, is proposed. The research promotes the exploration
4
of the likelihood of churn by analyzing customer behavior. Customer Churn has
become a major problem in all industries including the banking industry and banks
have always tried to track customer interaction so that they can detect the customers
who are likely to leave the bank
TABLE OF CONTENT
ABSTRACT 5
TABLE OF CONTENT 6
7
LIST OF FIGURES
7
LIST OF SYMBOLS
01 INTRODUCTION 11
02 LITERATURE SURVEY 15
03
METHODOLOGY
18
3.1 OBJECTIVE 20
3.2 LIST OF MODULES 21
23
3.3 SYSTEM ARCHITECTURE
04
RESULT AND DISCUSSION
PERFORMANCE ANALYSIS 25
48
4.1 FEATURES 61
4.2 CODE
5
05 CONCLUSION AND FUTURE 85
WORK
LIST OF FIGURES
01 22
SYSTEM
ARCHITECTURE
02 WORKFLOW DIAGRAM 23
03 ER - DIAGRAM 25
04 MODULE DIAGRAM 29
6
1. INTRODUCTION
Churning means a customer who leaves one company and transfers to another
company. It is not only a loss in income but also other negative effects on the
operations and also mainly Customer Relation Management is very important for
banking when the company considers it as they try to establish long-term
relationships with customers and also it will lead to increase their customer base.
The service provider's challenges are found in the behavior of the customer and their
expectations. In the current generation, people are mostly educated compared to
previous generations. So, the current generation of people is expecting more policies
and their diverse demand for connectivity and innovation. This advanced knowledge
is leading to changes in purchase behavior. This is a big challenge for current
service providers to think innovatively to reach their expectations.
Private sectors need to recognize customers Liu and Shih strengthen this argument
in their paper by indicating that increasing pressures on companies to develop new
and innovative ideas in marketing, to meet customer expectations and increase
loyalty and retention. For Customers, it is very easy to transfer their relations from
one bank to another bank. Some customers might be keeping their relationship
status null that means they will keep their account status inactive. By keeping this
account inactive it might be the customer transferring their relationship with another
bank. There are different types of customers are in the bank. Farmers are one of the
major customers to the banks they will expect fewer monthly chargers as they were
financially low. Businessperson, are also one of the major and important customers
because a lot of transactions with huge amount is done by them only usually. These
customers will expect better service quality. One of the most important categories
was Middle-class customers, mostly in every bank these peoples are more than the
type of customers. These people will expect fewer monthly charges, better service
quality, and new policies.
So, maintaining different types of customers is not that easy. They need to consider
customers and their needs to resolve these challenges delivering reliable service on
5 time and within budget to customers. While maintaining a good working partnership
with them is another significant challenge for them. If they failed to resolve these
7
challenges this may cause churning. Recruiting a new customer is more expensive
and harder than keeping already customers. Customers holding on the other hand is
usually more expensive because they have already gained the confidence and
loyalty of present customers. So, the need for a system that can predict customer
churn effectively in the early stages is very important for any banking. This paper
aims at a framework that can predict the customer churning banking sectors using
some machine learning algorithms with ANN.
Existing System:
Disadvantages:
MACHINE LEARNING
8
Machine learning is to predict the future from past data. Machine learning
(ML) is a type of artificial intelligence (AI) that provides computers with the ability to
learn without being explicitly programmed. Machine learning focuses on the
development of Computer Programs that can change when exposed to new data and
the basics of Machine Learning, implementation of a simple machine learning
algorithm using python. Process of training and prediction involves the use of
specialized algorithms. It feeds the training data to an algorithm, and the algorithm
uses this training data to give predictions on a new test data. Machine learning can
be roughly separated into three categories. There is supervised learning,
unsupervised learning and reinforcement learning. Supervised learning programs are
both given the input data and the corresponding labeling to learn data has to be
labeled by a human being beforehand. Unsupervised learning has no labels. It
provided the learning algorithm. This algorithm has to figure out the clustering of the
input data. Finally, Reinforcement learning dynamically interacts with its environment
and it receives positive or negative feedback to improve its performance.
Data scientists use many different kinds of machine learning algorithms to
discover patterns in python that lead to actionable insights. At a high level, these
different algorithms can be classified into two groups based on the way they “learn”
about data to make predictions: supervised and unsupervised learning. Classification
is the process of predicting the class of given data points. Classes are sometimes
called as targets/ labels or categories. Classification predictive modeling is the task
of approximating a mapping function from input variables(X) to discrete output
variables(y). In machine learning and statistics, classification is a supervised learning
approach in which the computer program learns from the data input given to it and
then uses this learning to classify new observations. This data set may simply be bi-
class (like identifying whether the person is male or female or that the mail is spam
or non-spam) or it may be multi-class too. Some examples of classification problems
are: speech recognition, handwriting recognition, biometric identification, document
classification etc.
9
supervised machine learning is the majority of practical machine learning
using supervised learning. Supervised learning is where you have input
variables (X) and an output variable (y) and use an algorithm to learn the
mapping function from the input to the output is y = f(X). The goal is to
approximate the mapping function so well that when you have new input data
(X) that you can predict the output variables (y) for that data. Techniques of
Supervised Machine Learning algorithms include logistic regression, multi-
class classification, Decision Trees and support vector machines etc.
Supervised learning requires that the data used to train the algorithm is
already labeled with correct answers. Supervised learning problems can be
further grouped into Classification problems. This problem has as goal the
construction of a succinct model that can predict the value of the dependent
attribute from the attribute variables. The difference between the two tasks is
the fact that the dependent attribute is numerical for categorical for
classification. A classification model attempts to draw some conclusion from
observed values. Given one or more inputs a classification model will try to
predict the value of one or more outcomes. A classification problem is when
the output variable is a category, such as “red” or “blue”.
This dataset contains 10000 records of features extracted from Bank Customer
Data, which were then classified into 2 classes:
● Exit
● Not Exit
Proposed System:
10
The proposed method is to build a BANK CUSTOMER CHURN Prediction
using Machine learning Technique. We are going to develop a AI based model, we
need data to train our model. We can use BANK CUSTOMER Dataset in order to
train the model. To use this dataset, we need to understand what the intents that we
are going to train are. An intent is the intention of the user interacting with a
predictive model or the intention behind each Data that the Model receives from a
particular user. According to the domain that you are developing an AI solution,
these intents may vary from one solution to another. The strategy is to define
different intents and make training samples for those intents and train your AI model
with those training sample data as model training data and intents as model training
categories. The model is build using the process of vectorisation where the vectors
made to understand the data. To use different Algorithm we can get a better AI
model and best accuracy. After building a model we evaluate the model using
different metrics like confusion metrics, precision ,
reca
ll,
sen
sitivi
ty
and
F1
scor
e.
11
2.Literature survey
A literature review is a body of text that aims to review the critical points of
current knowledge on and/or methodological approaches to a particular topic. It is a
secondary source and discusses published information in a particular subject area
and sometimes information in a particular subject area within a certain time period.
Its ultimate goal is to bring the reader up to date with current literature on a topic and
forms the basis for another goal, such as future research that may be needed in the
area and precedes a research proposal and may be just a simple summary of
sources. Usually, it has an organizational pattern and combines both summary and
synthesis.
A summary is a recap of important information about the source, but a
synthesis is a reorganization, reshuffling of information. It might give a new
interpretation of old material or combine new with old interpretations or it might trace
the intellectual progression of the field, including major debates. Depending on the
situation, the literature review may evaluate the sources and advise the reader on
the most pertinent or relevant of them
12
analysis is needed to determine customers whether they are at risk of leaving or
worth retaining. From an organizational point of view, gaining new customers is
usually more difficult or more expensive than retaining existing customers. So,
customer churn prediction has been popular in the banking industry. By reducing
customer churn or attrition, the commercial banks gain not only more profits but also
enhancing core competitiveness among the competitors. Although many researchers
proposed many single prediction models and some hybrid models, accuracy is still
weak and computation time of some algorithms is still increased. In this research, the
churn prediction model of classifying bank customers is built by using the hybrid
model of k-means and Support Vector Machine data mining methods on bank
customer churn dataset to overcome the instability and limitations of single prediction
model and predict churn trend of high value users..
Developing a prediction model for customer churn from electronic banking services
using data mining Abbas Keramati , Hajar Ghaneei and Seyed Mohammad
Mirmohammadi. 2016 Given the importance of customers as the most valuable
assets of organizations, customer retention seems to be an essential, basic
requirement for any organization. Banks are no exception to this rule. The
competitive atmosphere within which electronic banking services are provided by
different banks increases the necessity of customer retention. Methods: Being based
on existing information technologies which allow one to collect data from
organizations’ databases, data mining introduces a powerful tool for the extraction of
knowledge from huge amounts of data. In this research, the decision tree technique
was applied to build a model incorporating this knowledge. Results: The results
represent the characteristics of churned customers. Conclusions: Bank managers
can identify churners in future using the results of the decision tree. They should be
provide some strategies for customers whose features are getting more likely to
churner’s features.
A Critical Examination of Different Models for Customer Churn Prediction using Data
Mining Seema, Gaurav Gupta 2019 Due to competition between online retailers, the
need for providing improved customer service has grown rapidly. In addition to
reduction in sales due to loss of customers, more investments are needed to be
done to attract new customers. Companies now are working continuously to improve
13
their perceived quality by way of giving timely and quality service to their customers.
Customer churn has become one of the primary challenges that many firms are
facing nowadays. Several churn prediction models and techniques are proposed
previously in literature to predict customer churn in areas such as finance, telecom,
banking etc. Researchers are also working on customer churn prediction in e-
commerce using data mining and machine learning techniques. In this paper, a
comprehensive review of various models to predict customer churn in e-commerce
data mining and machine learning techniques has been presented. A critical review
of recent research papers in the field of customer churn prediction in e-commerce
using data mining has been done. Thereafter, important inferences and research
gaps after studying the literature are presented. Finally, the research significance
and concluding remarks are described in the end.
bank customer retention prediction and customer ranking based on deep neural
networks dr a.p.jagadeesan ph.d 2020 Retention of customers is a major concern in
any industry. Customer churn is an important metric that gives the hard truth about
the retention percentage of customers. A detailed study about the existing models for
predicting the customer churn is made and a new model based on Artificial Neural
Network is proposed to find the customer churn in banking domain. The proposed
model is compared with the existing machine learning models. Logistic regression,
Decision Tree and random forest mechanisms are the baseline models that are used
for comparison, the performance metrics that were compared are accuracy,
precision, recall and F1 score. It has been observed that the artificial neural network
model performs better than the logistic regression model and decision tree model.
But when the results are compared with the random forest model considerable
difference is not noted. The proposed model differs from the existing models in a
way that it can rank the customers in the order in which they would leave the
organization.
3.Methodology
This section explains the various works that have been done in order to
predict the customer churn. It includes machine learning models. In addition to the
conventional data used for predicting the customer churn, the authors have added
data from the various sources. It includes the conversation of the customers through
14
phone, the websites and products the customer has viewed, interactive voice data
and other financial data. Binary Classification model is used for predicting the
customer churn. Though a good improvement is noticed with this model, the data
that has been used in this is not commonly available at all times. Churn prediction is
a binary classification problem; the authors specified that from the studies it has
been observed that there is no proper means of measuring the certainty of the
classifier that has been employed for churn prediction. It has also been observed that
the accuracy of the classifiers differs for different zones of the dataset.
Project Goals
15
● Based on the best accuracy
Objectives
The goal is to develop a machine learning model for Bank Churn Prediction, to
potentially replace the updatable supervised machine learning classification models
by predicting results in the form of best accuracy by comparing supervised
algorithms.
Feasibility study:
Data Wrangling
In this section of the report will load in the data, check for cleanliness, and
then trim and clean the given dataset for analysis. Make sure that the document
steps carefully and justify cleaning decisions.
Data collection The data set collected for predicting given data is split into Training
set and Test set. Generally, 7:3 ratios are applied to split the Training set and Test
set. The Data Model which was created using Different algorithms on the Training
set and based on the test result accuracy, Test set prediction is done.
Preprocessing
The data which was collected might contain missing values that may lead to
inconsistency. To gain better results data needs to be preprocessed so as to improve
16
the efficiency of the algorithm. The outliers have to be removed and also variable
conversion needs to be done.
List of Modules:
Data Pre-
processing
2. Non-Functional requirements
B. software requirements
17
Functional requirements:
requirements for the software product. It is the first step in the requirements analysis
process. It lists requirements of a particular software system. The following details
follow the special libraries like sk-learn, pandas, numpy, matplotlib and seaborn.
Non-Functional Requirements:
1. Problem define
2. Preparing data
3. Evaluating algorithms
4. Improving results
5. Prediction the result
Environmental Requirements:
1. Software Requirements :
18
Workflow diagram
Use case diagrams are considered for high level requirement analysis of a system.
So when the requirements of a system are analyzed the functionalities are captured
in use cases. So, it can say that uses cases are nothing but the system
functionalities written in an organized manner.
Class Diagram
Class diagram is basically a graphical representation of the static view of the system
and represents different aspects of the application. So a collection of class diagrams
19
represent the whole system. The name of the class diagram should be meaningful to
describe the aspect of the system. Each element and their relationships should be
identified in advance Responsibility (attributes and methods) of each class should be
clearly identified for each class minimum number of properties should be specified
and because unnecessary properties will make the diagram complicated. Use notes
whenever required to describe some aspect of the diagram and at the end of the
drawing it should be understandable to the developer/coder. Finally, before making
the final version, the diagram should be drawn on plain paper and reworked as many
times as possible to make it correct.
20
relational database is rolled out, an ERD can still serve as a referral point, should
any debugging or business process re-engineering be needed later.
Data Pre-processing
Validation techniques in machine learning are used to get the error rate of the
Machine Learning (ML) model, which can be considered as close to the true error rate of the
dataset. If the data volume is large enough to be representative of the population, you may not
need the validation techniques. However, in real-world scenarios, to work with samples of
data that may not be a true representative of the population of given dataset. To finding the
missing value, duplicate value and description of data type whether it is float variable or
integer. The sample of data used to provide an unbiased evaluation of a model fit on the
training dataset
The evaluation becomes more biased as skill on the validation dataset is incorporated
into the model configuration. The validation set is used to evaluate a given model, but this is
for frequent evaluation. It as machine learning engineers use this data to fine-tune the model
hyper parameters. Data collection, data analysis, and the process of addressing data content,
quality, and structure can add up to a time-consuming to-do list. During the process of data
identification, it helps to understand your data and its properties; this knowledge will help
you choose which algorithm to use to build your model.
A number of different data cleaning tasks using Python’s Pandas library and
specifically, it focus on probably the biggest data cleaning task, missing values and it able to
21
more quickly clean data. It wants to spend less time cleaning data, and more time
exploring and modeling.
Some of these sources are just simple random mistakes. Other times, there can be a
deeper reason why data is missing. It’s important to understand these different types of
missing data from a statistics point of view. The type of missing data will influence how to
deal with filling in the missing values and to detect missing values, and do some basic
imputation and detailed statistical approach for dealing with missing data. Before, joint into
code, it’s important to understand the sources of missing data. Here are some typical reasons
why some data is mising
22
● Data was lost while transferring manually from a legacy database.
● Users chose not to fill out a field tied to their beliefs about how the results would be used
or interpreted.
import libraries for access and functional purpose and read the given dataset
show columns
23
Checking count values of data frame
24
Exploration data analysis of visualization
This can be helpful when exploring and getting to know a dataset and can help with
identifying patterns, corrupt data, outliers, and much more. With a little domain knowledge,
data visualizations can be used to express and demonstrate key relationships in plots and
charts that are more visceral and stakeholders than measures of association or significance.
Data visualization and exploratory data analysis are whole fields themselves and it will
recommend a deeper dive into some the books mentioned at the end.
25
26
Sometimes data does not make sense until it can look at in a visual form, such as with
charts and plots. Being able to quickly visualize of data samples and others is an important
skill both in applied statistics and in applied machine learning. It will discover the many types
of plots that you will need to know when visualizing data in Python and how to use them to
better understand your own data.
How to chart time series data with line plots and categorical quantities with bar charts.
MODULE DIAGRAM
visualized data
Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm. Data Preprocessing is a technique that is used to convert the raw data into a clean
data set. In other words, whenever the data is gathered from different sources it is collected in
raw format which is not feasible for the analysis. To achieving better results from the applied
model in Machine Learning method of the data has to be in a proper manner. Some specified
Machine Learning model needs information in a specified format, for example, Random
Forest algorithm does not support null values. Therefore, to execute random forest algorithm
null values have to be managed from the original raw data set. And another aspect is that data
set should be formatted in such a way that more than one Machine Learning and Deep
Learning algorithms are executed in a given dataset.
27
False Positives (FP): A person who will pay is predicted as a defaulter. When the actual
class is no and the predicted class is yes. E.g. if the actual class says this passenger did not
survive but the predicted class tells you that this passenger will survive.
False Negatives (FN): A person who default predicted as payer. When the actual class is yes
but the predicted class is no. E.g. if the actual class value indicates that this passenger
survived and the predicted class tells you that passenger will die.
True Positives (TP): A person who will not pay is predicted as a defaulter. These are the
correctly predicted positive values which means that the value of the actual class is yes and
the value of predicted class is also yes. E.g. if the actual class value indicates that this
passenger survived and the predicted class tells you the same thing.
True Negatives (TN): A person who default predicted as payer. These are the correctly
predicted negative values which means that the value of actual class is no and value of
predicted class is also no. E.g. If the actual class says this passenger did not survive and the
predicted class tells you the same thing.
28
In the next section you will discover exactly how you can do that in Python with scikit-learn.
The key to a fair comparison of machine learning algorithms is ensuring that each algorithm
is evaluated in the same way on the same data and it can achieve this by forcing each
algorithm to be evaluated on a consistent test harness.
Logistic Regression
Random Forest
Naive Bayes
The K-fold cross validation procedure is used to evaluate each algorithm, importantly
configured with the same random seed to ensure that the same splits to the training data are
performed and that each algorithm is evaluated in precisely the same way. Before comparing
algorithms, build a Machine Learning Model using Scikit-Learn libraries. In this library
package, we have to do preprocessing, linear model with logistic regression method, cross
validating by KFold method, ensemble with random forest method and tree with decision tree
classifier. Additionally, splitting the train set and test set. To predict the result by comparing
accuracy.
29
Accuracy: The Proportion of the total number of predictions that is correct otherwise overall
how often the model predicts correctly defaulters and non-defaulters.
Accuracy calculation:
Accuracy is the most intuitive performance measure and it is simply a ratio of correctly
predicted observation to the total observations. One may think that, if we have high accuracy
then our model is best. Yes, accuracy is a great measure but only when you have symmetric
datasets where values of false positives and false negatives are almost the same.
Precision is the ratio of correctly predicted positive observations to the total predicted
positive observations. The question that this metric answer is of all passengers that ar
labeled as survived, how many actually survived? High precision relates to the low false
positive rate. We have got 0.788 precision which is pretty good.
Recall: The proportion of positive observed values correctly predicted. (The proportion of
actual defaulters that the model will correctly predict)
Recall(Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all
observations in actual class - yes.
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both
false positives and false negatives into account. Intuitively it is not as easy to understand as
accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class
distribution. Accuracy works best if false positives and false negatives have similar cost. If
30
the cost of false positives and false negatives are very different, it’s better to look at both
Precision and Recall.
General Formula:
Score Formula:
Algorithm Explanation
sklearn:
● In python, sklearn is a machine learning package which includes a lot of ML
algorithms.
● Here, we are using some of its modules like
train_test_split,
DecisionTreeClassifier or Logistic Regression and accuracy_score.
NumPy:
● It is a numeric python module which provides fast maths functions for
calculations.
31
● It is used to read data in numpy arrays and for manipulation purposes.
Pandas:
● Used to read and write different files.
● Data manipulation can be done easily with data frames.
Matplotlib:
● Data visualization is a useful way to help with identifying the patterns from a
given dataset.
● Data manipulation can be done easily with data frames.
Logistic Regression
It is a statistical method for analysing a data set in which there are one or more
independent variables that determine an outcome. The outcome is measured with a
dichotomous variable (in which there are only two possible outcomes). The goal of logistic
regression is to find the best fitting model to describe the relationship between the
dichotomous characteristic of interest (dependent variable = response or outcome variable)
and a set of independent (predictor or explanatory) variables. Logistic regression is a Machine
Learning classification algorithm that is used to predict the probability of a categorical
dependent variable. In logistic regression, the dependent variable is a binary variable that
contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
32
Logistic regression Assumptions:
For a binary regression, the factor level 1 of the dependent variable should represent
The independent variables should be independent of each other. That is, the model
33
MODULE DIAGRAM
getting accuracy
Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks that operate by constructing a multitude of decision
trees at training time and outputting the class that is the mode of the classes (classification) or
mean prediction (regression) of the individual trees. Random decision forests correct for
decision trees’ habit of over-fitting to their training set. Random forest is a type of supervised
machine learning algorithm based on ensemble learning. Ensemble learning is a type of
learning where you join different types of algorithms or the same algorithm multiple times to
form a more powerful prediction model. The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest algorithm can be used for both regression and
classification tasks.
34
The following are the basic steps involved in performing the random forest algorithm:
Choose the number of trees you want in your algorithm and repeat steps 1 and 2.
35
In case of a regression problem, for a new record, each tree in the forest predicts a value for Y
(output). The final value can be calculated by taking the average of all the values predicted by
all the trees in forest. Or, in case of a classification problem, each tree in the forest predicts
the category to which the new record belongs. Finally, the new record is assigned to the
category that wins the majority vote.
MODULE DIAGRAM
getting accuracy
36
Decision Tree Classifier
It is one of the most powerful and popular algorithms. Decision-tree algorithms fall under
the category of supervised learning algorithms. It works for both continuous as well as
categorical output variables. Assumptions of Decision tree:
Attributes are assumed to be categorical for information gain, attributes are assumed
to be continuous.
37
Decision tree builds classification or regression models in the form of a tree structure.
It breaks down a data set into smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. A decision node has two or more
branches and a leaf node represents a classification or decision. The topmost decision node in
a tree which corresponds to the best predictor called root node. Decision trees can handle both
categorical and numerical data. Decision trees build classification or regression models in the
form of a tree structure. It utilizes an if-then rule set which is mutually exclusive and
exhaustive for classification. The rules are learned sequentially using the training data one at
a time. Each time a rule is learned, the tuples covered by the rules are removed.
This process is continued on the training set until meeting a termination condition. It is
constructed in a top-down recursive divide-and-conquer manner. All the attributes should be
categorical. Otherwise, they should be discretized in advance. Attributes in the top of the tree
have more impact in the classification and they are identified using the information gain
concept. A decision tree can be easily over-fitted generating too many branches and may
reflect anomalies due to noise or outliers.
MODULE DIAGRAM
38
GIVEN INPUT EXPECTED OUTPUT
getting accuracy
The Naive Bayes algorithm is an intuitive method that uses the probabilities of each
approach you would come up with if you wanted to model a predictive modeling
problem probabilistically.
other attributes. This is a strong assumption but results in a fast and effective method.
The probability of a class value given a value of an attribute is called the conditional
probability. By multiplying the conditional probabilities together for each attribute for
a given class value, we have a probability of a data instance belonging to that class.
To make a prediction we can calculate probabilities of the instance belonging to each
class and select the class value with the highest probability.
39
Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one
accurate and reliable algorithm. Naive Bayes classifiers have high accuracy and speed
on large datasets.
Naive Bayes classifier assumes that the effect of a particular feature in a class is
depending on his/her income, previous loan and transaction history, age, and location.
Even if these features are interdependent, these features are still considered
40
MODULE DIAGRAM
getting accuracy
Deployment
41
libraries.
It has no database abstraction layer, form validation, or any other components where
pre-existing third-party libraries provide common functions.
However, Flask supports extensions that can add application features as if they were
implemented in Flask itself.
When Ronacher and Georg Brand created a bulletin board system written in Python,
the Pocoo projects Werkzeug and Jinja were developed.
In April 2016, the Pocoo team was disbanded and development of Flask and related
libraries passed to the newly formed Pallets project.
Flask has become popular among Python enthusiasts. As of October 2020, it has
second most stars on GitHub among Python web-development frameworks, only slightly
42
behind Django, and was voted the most popular web framework in the Python Developers
Survey 2018.
The micro-framework Flask is part of the Pallets Projects, and based on several others
of them.
Flask is based on Werkzeug, Jinja2 and inspired by Sinatra Ruby framework, available under
BSD licence. It was developed at pocoo by Armin Ronacher. Although Flask is rather young
compared to most Python frameworks, it holds a great promise and has already gained
popularity among Python web developers. Let’s take a closer look into Flask, so-called
―micro‖ framework for Python.
MODULE DIAGRAM
: predicting output
FEATURES:
Flask was designed to be easy to use and extend. The idea behind Flask is to build a
solid foundation for web applications of different complexity. From then on you are free to
plug in any extensions you think you need. Also you are free to build your own modules.
Flask is great for all kinds of projects. It's especially good for prototyping. Flask depends on
two external libraries: the Jinja2 template engine and the Werkzeug WSGI toolkit.
43
Still the question remains why use Flask as your web application framework if we
have immensely powerful Django, Pyramid, and don’t forget web mega-framework Turbo-
gears? Those are supreme Python web frameworks BUT out-of-the-box Flask is pretty
impressive too with its:
Plus Flask gives you so much more CONTROL on the development stage of your
project. It follows the principles of minimalism and let you decide how you will build your
application.
● Flask has a lightweight and modular design, so it easy to transform it to the web framework
you need with a few extensions without weighing it down
● ORM-agnostic: you can plug in your favourite ORM e.g. SQLAlchemy.
● Basic foundation API is nicely shaped and coherent.
● Flask documentation is comprehensive, full of examples and well structured. You can even
try out some sample application to really get a feel of Flask.
● It is super easy to deploy Flask in production (Flask is 100% WSGI 1.0 compliant‖)
● HTTP request handling functionality
● High Flexibility
The configuration is even more flexible than that of Django, giving you plenty of solutions
for every production need.
44
To sum up, Flask is one of the most polished and feature-rich micro frameworks
available. Still young, Flask has a thriving community, first-class extensions, and an elegant
API. Flask comes with all the benefits of fast templates, strong WSGI features, thorough
unit testability at the web application and library level, extensive documentation. So next
time you are starting a new project where you need some good features and a vast number of
extensions, definitely check out Flask.
Overview of Python Flask Framework Web apps are developed to generate content
based on retrieved data that changes based on a user’s interaction with the site. The server is
responsible for querying, retrieving, and updating data. This makes web applications slower
and more complicated to deploy than static websites for simple applications.
Flask is an excellent web development framework for REST API creation. It is built on top of
Python which makes it powerful to use all the python features.
Flask is used for the backend, but it makes use of a templating language called Jinja2 which is
used to create HTML, XML or other markup formats that are returned to the user via an
HTTP request.
Django is considered to be more popular because it provides many out of box features and
reduces time to build complex applications. Flask is a good start if you are getting into web
development. Flask is a simple, un-opinionated framework; it doesn't decide what your
application should look like developers do.
45
Flask is a web framework. This means flask provides you with tools, libraries and
technologies that allow you to build a web application. This web application can be some web
pages, a blog, a wiki or go as big as a web-based calendar application or a commercial
website.
Advantages of Flask:
Framework Flask is a web framework from the Python language. Flask provides a
library and a collection of codes that can be used to build websites, without the need to do
everything from scratch. But Framework flask still doesn't use the Model View Controller
(MVC) method.
Flask-RESTful is an extension for Flask that provides additional support for building
REST APIs. You will never be disappointed with the time it takes to develop an API. Flask-
Restful is a lightweight abstraction that works with the existing ORM/libraries. Flask-
RESTful encourages best practices with minimal setup.
Flask Restful is an extension for Flask that adds support for building REST APIs in
Python using Flask as the back-end. It encourages best practices and is very easy to set up.
Flask restful is very easy to pick up if you're already familiar with flask.
46
Flask is a web framework for Python, meaning that it provides functionality for building web
applications, including managing HTTP requests and rendering templates and also we can
add to this application to create our API.
2. The easiest way to start using an API is by finding an HTTP client online, like REST-Client,
Postman, or Paw.
3. The next best way to pull data from an API is by building a URL from existing API
documentation.
The flask object implements a WSGI application and acts as the central object. It is
passed the name of the module or package of the application. Once it is created it will act as a
central registry for the view functions, the URL rules, template configuration and much more.
The name of the package is used to resolve resources from inside the package or the
folder the module is contained in depending on if the package parameter resolves to an actual
python package (a folder with an __init__.py file inside) or a standard module (just a .py file).
Usually you create a Flask instance in your main module or in the __init__.py file of
your package.
Parameters
47
● provide_automatic_options (Optional[bool]) – Add the OPTIONS method and
respond to OPTIONS requests automatically.
● options (Any) – Extra options passed to the Rule object.
Return type -- None
After_Request(f)
The function is called with the response object, and must return a response
object. This allows the functions to modify or replace the response before it is sent.
If a function raises an exception, any remaining after request functions will not
be called. Therefore, this should not be used for actions that must execute, such as to
close resources. Use teardown_request() for that.
Parameters:
f (Callable[[Response], Response])
Return type
Callable[[Response], Response]
after_request_funcs: t.Dict[AppOrBlueprintKey,
t.List[AfterRequestCallable]]
A data structure of functions to call at the end of each request, in the format
{scope: [functions]}. The scope key is the name of a blueprint the functions are active
for, or None for all requests.
This data structure is internal. It should not be modified directly and its format
may change at any time.
app_context()
48
Create an AppContext. Use as a with block to push the context, which will
make current_app point at this application.
With app.app_context():
Init_db()
HTML Introduction
HTML stands for Hyper Text Markup Language. It is used to design web pages using
a markup language. HTML is the combination of Hypertext and Markup language. Hypertext
defines the link between the web pages. A markup language is used to define the text
document within a tag which defines the structure of web pages. This language is used to
annotate (make notes for the computer) text so that a machine can understand it and
manipulate text accordingly. Most markup languages (e.g. HTML) are human-readable. The
language uses tags to define what manipulation has to be done on the text.
49
<!DOCTYPE html> — This tag specifies the language you will write on the page. In this
case, the language is HTML 5.
<html> — This tag signals that from here on we are going to write in HTML code.
<head> — This is where all the metadata for the page goes — stuff mostly meant for search
engines and other computer programs.
Further Tags
Inside the <head> tag, there is one tag that is always included: <title>, but there are others
that are just as important:
<title>
50
This is where we insert the page name as it will appear at the top of the browser
window or tab.
<meta>
This is where information about the document is stored: character encoding, name
(page context), description.
Head Tag
<head>
<title>My First Webpage</title>
<meta charset="UTF-8">
<meta name="description" content="This field contains information about your page. It is
usually around two sentences long.">.
<meta name="author" content="Conor Sheils">
</header>
Adding Content
Next, we will make a<body> tag.
The HTML <body> is where we add the content which is designed for viewing by human
eyes.
This includes text, images, tables, forms and everything else that we see on the internet each
day.
▪ <h1>
▪ <h2>
▪ <h3>
51
▪ <h4>
▪ <h5>
▪ <h6>
As you might have guessed <h1> and <h2> should be used for the most important
titles, while the remaining tags should be used for sub-headings and less important text.
Search engine bots use this order when deciphering which information is most
important on a page.
And hit save. We will save this file as ―index.html‖ in a new folder called ―my
webpage.‖
Adding text to our HTML page is simple using an element opened with the tag <p>
which creates a new paragraph. We place all of our regular text inside the element <p>.
When we write text in HTML, we also have a number of other elements we can use
to control the text or make it appear in a certain way.
Almost everything you click on while surfing the web is a link takes you to another page
within the website you are visiting or to an external site.
Links are included in an attribute opened by the <a> tag. This element is the first that we’ve
met which uses an attribute and so it looks different to previously mentioned tags.
52
<a href=http://www.google.com>Google</a>
Image Tag
In today’s modern digital world, images are everything. The <img> tag has everything
you need to display images on your site. Much like the <a> anchor element, <img> also
contains an attribute.
CSS
CSS stands for Cascading Style Sheets. It is the language for describing the
presentation of Web pages, including colours, layout, and fonts, thus making our web pages
presentable to the users.CSS is designed to make style sheets for the web. It is independent of
HTML and can be used with any XML-based markup language. Now let’s try to break the
acronym:
CSS Syntax
Selector {
Property 1 : value;
Property 2 : value;
Property 3 : value;
For example:
53
h1
Color: red;
Text-align: center;
#unique
color: green;
CSS Comment
CSS How-To
54
Inline > Internal
> External
Inline CSS
Internal CSS
● With the help of style tag, we can apply styles within the HTML file
● Redundancy is removed
● But the idea of separation of concerns still lost
● Uniquely applied on a single document ● Example:
<style>
H1{
Color:red;
</style>
External CSS
● With the help of <link> tag in the head tag, we can apply styles
● Reference is added
55
● File saved with .css extension
● Redundancy is removed
● The idea of separation of concerns is maintained
● Uniquely applied to each document ● Example:
<head>
<link rel= ―stylesheet‖ type= ―text/css‖ href= ―name of the CSS file‖>
</head>
h1{
Coding
Module – 1
import warnings
warnings.filterwarnings("ignore")
#Load given dataset
data = pd.read_csv("bankchurn.csv")
Before drop the given dataset:
56
data.head(10)
#shape
data.shape
After drop the given dataset:
df = data.dropna()
df.head()
#shape
df.
()
#Checking data type and information about
dataset df.info() df.Age.unique()
df.IsActiveMember.unique() df.Gender.unique()
df.Geography.unique() df.Surname.unique()
df.HasCrCard.unique() df.NumOfProducts.unique()
df.Exited.unique() df.corr()
Before Pre-Processing
df.head()
Module-2
Module 2: Exploration data analysis of visualization and training a model by given attributes
57
#import library packages
import pandas as pd import
matplotlib.pyplot as plt import
seaborn as sns import numpy
as np import warnings
warnings.filterwarnings('ignore')
data =
pd.read_csv("bankchurn.csv") df =
data.dropna() df.columns
pd.crosstab(df.Gender,df.Exited)
pd.crosstab(df.Balance,df.Exited)
pd.crosstab(df.IsActiveMember,df.Exited)
[
58
sns.heatmap(df.corr(), ax=ax,
annot=True) Splitting Train/Test:
59
return dataframe_by_Group
qul_No_qul_bar_plot(df, 'IsActiveMember')
60
Logistic Regression :
logR= LogisticRegression()
logR.fit(X_train,y_train)
predictR = logR.predict(X_test)
print("")
print('Classification report of Logistic Regression Results:')
print("")
print(classification_report(y_test,predictR))
print("")
cm=confusion_matrix(y_test,predictR)
print('Confusion Matrix result of Logistic Regression is:\n',cm)
print("")
sensitivity = cm[0,0]/(cm[0,0]+cm[0,1])
print('Sensitivity : ', sensitivity )
print("")
specificity = cm[1,1]/(cm[1,0]+cm[1,1])
print('Specificity : ', specificity)
print("")
accuracy = cross_val_score(logR, X, y,
scoring='accuracy') print('Cross validation test results of
accuracy:') print(accuracy) #get the mean of each fold
print("")
61
print("Accuracy result of Logistic Regression is:",accuracy.mean() *
100) LR=accuracy.mean() * 100 def graph(): import matplotlib.pyplot
as plt data=[LR] alg="Logistic Regression" plt.figure(figsize=(5,5))
b=plt.bar(alg,data,color=("b"))
plt.title("Accuracy comparison of Bank customer churn",font size=15)
plt.legend(b,data,font size=9)
graph()
TP = cm[0][0]
FP = cm[1][0]
FN = cm[1][1] TN =
cm[0][1] print("True
Positive :",TP) print("True
Negative :",TN)
print("False Positive :",FP)
print("False Negative
:",FN) print("")
TPR = TP/(TP+FN)
TNR = TN/(TN+FP)
FPR = FP/(FP+TN) FNR =
FN/(TP+FN) print("True Positive
Rate :",TPR) print("True Negative
Rate :",TNR) print("False Positive
Rate :",FPR) print("False Negative
Rate :",FNR) print("")
PPV = TP/(TP+FP)
NPV = TN/(TN+FN)
print("Positive Predictive Value :",PPV)
print("Negative predictive value :",NPV)
def plot_confusion_matrix(cm2, title='Confusion matrix-LogisticRegression',
cmap=plt.cm.Blues):
target_names=['Predict','Actual']
62
plt.imshow(cm2, interpolation='nearest',
cmap=cmap) plt.title(title) plt.colorbar()
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names) plt.tight_layout()
plt.ylabel('True label') plt.xlabel('Predicted label')
cm2=confusion_matrix(y_test, predictR)
print('Confusion matrix-LogisticRegression:')
print(cm2)
sns.heatmap(cm2/np.sum(cm2), annot=True, fmt ='.2%')
63
df[i] = le.fit_transform(df[i]).astype(int)
X = df.drop(labels='Exited', axis=1)
#Response variable
y = df.loc[:,'Exited']
#We'll use a test size of 30%. We also stratify the split on the response variable, which is
very important to do because there are so few fraudulent transactions.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1,
stratify=y)
RandomForestClassifier:
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)
predictR = rfc.predict(X_test)
print("")
print('Classification report of Random Forest Classifier Results:')
print("")
print(classification_report(y_test,predictR))
print("")
cm=confusion_matrix(y_test,predictR)
print('Confusion Matrix result of Random Forest Classifier is:\n',cm)
print("")
sensitivity = cm[0,0]/(cm[0,0]+cm[0,1])
64
print('Sensitivity : ', sensitivity )
print("")
specificity = cm[1,1]/(cm[1,0]+cm[1,1])
print('Specificity : ', specificity)
print("")
accuracy = cross_val_score(rfc, X, y,
scoring='accuracy') print('Cross validation test results
of accuracy:') print(accuracy) #get the mean of each
fold print("")
print("Accuracy result of Random Forest Classifier is:",accuracy.mean() *
100) LR=accuracy.mean() * 100 def graph(): import matplotlib.pyplot as plt
data=[LR]
alg="Random Fores tClassifier"
plt.figure(figsize=(5,5))
b=plt.bar(alg,data,color=("b"))
plt.title("Accuracy comparison of Bank customer churn",fontsize=15)
plt.legend(b,data,fontsize=9)
graph()
TP = cm[0][0]
FP = cm[1][0]
FN = cm[1][1] TN =
cm[0][1] print("True
Positive :",TP) print("True
Negative :",TN)
print("False Positive :",FP)
print("False Negative
:",FN) print("")
TPR = TP/(TP+FN)
TNR = TN/(TN+FP)
FPR = FP/(FP+TN) FNR =
FN/(TP+FN) print("True Positive
65
Rate :",TPR) print("True Negative
Rate :",TNR) print("False Positive
Rate :",FPR) print("False Negative
Rate :",FNR) print("")
PPV = TP/(TP+FP)
NPV = TN/(TN+FN)
print("Positive Predictive Value :",PPV)
print("Negative predictive value :",NPV)
def plot_confusion_matrix(cm2, title='Confusion matrix-RandomForestClassifier',
cmap=plt.cm.Blues):
target_names=['Predict','Actual']
plt.imshow(cm2, interpolation='nearest',
cmap=cmap) plt.title(title) plt.colorbar()
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names,
rotation=45) plt.yticks(tick_marks,
target_names) plt.tight_layout() plt.ylabel('True
label') plt.xlabel('Predicted label')
cm2=confusion_matrix(y_test, predictR)
print('Confusion matrix-RandomForestClassifier:')
print(cm2)
sns.heatmap(cm2/np.sum(cm2), annot=True, fmt
='.2%') import joblib joblib.dump(rfc,"model.pkl")
Module 5 : Performance measurements of Decision Tree Classifier:
#import library packages
import pandas as pd import
matplotlib.pyplot as plt import
seaborn as sns import numpy
as np import warnings
warnings.filterwarnings('ignore')
66
#Load given dataset
data =
pd.read_csv("bankchurn.csv")
df=data.dropna() df.columns
#According to the cross-validated MCC scores, the random forest is the best-performing
model, so now let's evaluate its performance on the test set.
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score, roc_auc_score del df["RowNumber"] del df["CustomerId"] del
df["Surname"]
from sklearn.preprocessing import LabelEncoder
var_mod = ['Geography','Gender'] le =
LabelEncoder() for i in var_mod:
df[i] = le.fit_transform(df[i]).astype(int)
X = df.drop(labels='Exited', axis=1)
#Response variable
y = df.loc[:,'Exited']
#We'll use a test size of 30%. We also stratify the split on the response variable, which is
very important to do because there are so few fraudulent transactions.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1,
stratify=y)
Decision Tree Classifier:
dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)
67
predictR = dtc.predict(X_test)
print("")
print('Classification report of Decision Tree Classifier Results:')
print("")
print(classification_report(y_test,predictR))
print("")
cm=confusion_matrix(y_test,predictR)
print('Confusion Matrix result of Decision Tree Classifier is:\n',cm)
print("")
sensitivity = cm[0,0]/(cm[0,0]+cm[0,1])
print('Sensitivity : ', sensitivity )
print("")
specificity = cm[1,1]/(cm[1,0]+cm[1,1])
print('Specificity : ', specificity)
print("")
68
FP = cm[1][0]
FN = cm[1][1] TN =
cm[0][1] print("True
Positive :",TP) print("True
Negative :",TN)
print("False Positive :",FP)
print("False Negative
:",FN) print("")
TPR = TP/(TP+FN)
TNR = TN/(TN+FP)
FPR = FP/(FP+TN) FNR =
FN/(TP+FN) print("True Positive
Rate :",TPR) print("True Negative
Rate :",TNR) print("False Positive
Rate :",FPR) print("False Negative
Rate :",FNR) print("")
PPV = TP/(TP+FP)
NPV = TN/(TN+FN)
print("Positive Predictive Value :",PPV)
print("Negative predictive value :",NPV)
cm2=confusion_matrix(y_test, predictR)
print('Confusion matrix-DecisionTreeClassifier:')
print(cm2)
sns.heatmap(cm2/np.sum(cm2), annot=True, fmt ='.2%')
69
seaborn as sns import numpy
as np import warnings
warnings.filterwarnings('ignore')
#Load given dataset
data =
pd.read_csv("bankchurn.csv")
df=data.dropna() df.columns
#According to the cross-validated MCC scores, the random forest is the best-performing
model, so now let's evaluate its performance on the test set.
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score, roc_auc_score del df["RowNumber"] del df["CustomerId"] del
df["Surname"]
from sklearn.preprocessing import LabelEncoder
var_mod = ['Geography','Gender']
le = LabelEncoder()
for i in var_mod:
df[i] = le.fit_transform(df[i]).astype(int)
X = df.drop(labels='Exited', axis=1)
#Response variable
y = df.loc[:,'Exited']
#We'll use a test size of 30%. We also stratify the split on the response variable, which is
very important to do because there are so few fraudulent transactions.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1,
stratify=y)
Naive Bayes:
70
nb = GaussianNB()
nb.fit(X_train,y_train)
predictR = nb.predict(X_test)
print("")
print('Classification report of Naive Bayes Results:')
print("")
print(classification_report(y_test,predictR))
print("")
cm=confusion_matrix(y_test,predictR)
print('Confusion Matrix result of Naive Bayes is:\n',cm)
print("")
sensitivity = cm[0,0]/(cm[0,0]+cm[0,1])
print('Sensitivity : ', sensitivity )
print("")
specificity = cm[1,1]/(cm[1,0]+cm[1,1])
print('Specificity : ', specificity)
print("")
accuracy = cross_val_score(nb, X, y,
scoring='accuracy') print('Cross validation test results
of accuracy:') print(accuracy) #get the mean of each
fold print("")
print("Accuracy result of Naive Bayes is:",accuracy.mean() *
100) LR=accuracy.mean() * 100 def graph(): import
matplotlib.pyplot as plt data=[LR] alg="GaussianNB"
plt.figure(figsize=(5,5))
b=plt.bar(alg,data,color=("b"))
71
plt.title("Accuracy comparison of Bank customer churn",fontsize=15)
plt.legend(b,data,fontsize=9)
graph()
TP = cm[0][0]
FP = c3[1][0]
FN = cm[1][1] TN =
cm[5][1] print("True
Positive :",TP) print("True
Negative :",TN)
print("False Positive :",FP)
print("False Negative
:",FN) print("")
TPR = TP/(TP+FN)
TNR = TN/(TN+FP)
FPR = FP/(FP+TN) FNR =
FN/(TP+FN) print("True Positive
Rate :",TPR) print("True Negative
Rate :",TNR) print("False Positive
Rate :",FPR) print("False Negative
Rate :",FNR) print("")
PPV = TP/(P+FP)
NPV = TN/(TN+FN)
print("Positive Predictive Value :",PPV)
print("Negative predictive value :",NPV)
cm2=confusion_matrix(y_test, predictR)
print('Confusion matrix-DecisionTreeClassifier:')
print(cm2)
sns.heatmap(cm2/np.sum(cm2), annot=True, fmt ='.2%')
72
Flask deploy import numpy as np from flask import Flask,
joblib
joblib.load('model.pkl')
return
render_template('index.html')
@app.route('/predict',methods=['POST'])
def predict():
'''
'''
request.form.values()] final_features =
[np.array(int_features)] print(final_features)
prediction = model.predict(final_features)
output == 1:
73
if __name__ == "__main__":
app.run(host="localhost", port=6067)
HTML&CSS
<!DOCTYPE html>
<html >
<!--From https://codepen.io/frytyler/pen/EGdtg-->
<head>
<meta charset="UTF-8">
<title>TITLE</title>
<style>
.white{
color:white;
74
.space{ margin:10px
30px; padding:8px
15px; background:
palegreen;
width:500px
.gap{ padding:10px
20px;
.black{
padding:10px 15px;
</style>
</head>
<
</div>
<!-- Main Input For Receiving Query to our ML -->
75
<div class="row">
option>
<option value=2>GERMANY</option>
</select>
<option value=0>FEMALE</option>
<option value=1>MALE</option>
</select>
76
name="Tenure" placeholder="Tenure" required="required" /><br>
</div>
<option value=1>one</option>
<option value=2>two</option>
<option value=3>three</option>
<option value=4>four</option>
</select>
</select>
77
<label class="black" for=""> IsActiveMember</label>
<option value=1>active</option>
</select>
</div>
</div>
</div>
</form>
<br>
<br>
{{ prediction_text }}
</div>
</div>
</body>
</html>
78
5.Conclusion
The analytical process started from data cleaning and processing, missing value,
exploratory analysis and finally model building and evaluation. The best accuracy on a public
test set is a higher accuracy score will be found out. This application can help to find the
Prediction of Bank Churn, which helps to give more support to customers.
Future Work
Bank Churn prediction to connect with real time AI models.
79