0% found this document useful (0 votes)
6 views8 pages

Customer Churn Prediction On Credit Card Services Using Random Forest Method

This paper presents a study on predicting customer churn in credit card services using machine learning methods, specifically focusing on the Random Forest model, which achieved an accuracy of 96.25%. The analysis utilized a dataset of over 10,000 entries and identified key features influencing churn, including transaction amount and frequency. The findings suggest that increased credit card usage correlates with lower churn rates, providing actionable insights for banks to enhance customer retention strategies.

Uploaded by

Shad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Customer Churn Prediction On Credit Card Services Using Random Forest Method

This paper presents a study on predicting customer churn in credit card services using machine learning methods, specifically focusing on the Random Forest model, which achieved an accuracy of 96.25%. The analysis utilized a dataset of over 10,000 entries and identified key features influencing churn, including transaction amount and frequency. The findings suggest that increased credit card usage correlates with lower churn rates, providing actionable insights for banks to enhance customer retention strategies.

Uploaded by

Shad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Advances in Economics, Business and Management Research, volume 211

Proceedings of the 2022 7th International Conference on Financial Innovation and Economic Development (ICFIED 2022)

Customer Churn Prediction on Credit Card Services


using Random Forest Method
Xinyu Miao1, *, †, Haoran Wang2, †
1
Beijing University of Technology, Beijing, 100000, China.
2
Hefei University of Technology, Hefei, 230000, China.
*
Email: [email protected]

These authors contributed equally.

ABSTRACT
With the continuous development of the Internet, more and more people are spending money using credit cards online,
therefore, retaining customers in order to maintain profit margin becomes very important for many banks. This paper
aims to make predictions on credit card customer churn through machine learning methods and to provide feasible
solutions to deal with customer churn issue based on the results. Three models including Random Forest, Linear
Regression and K-Nearest Neighbor (KNN) are applied to a dataset which contains more than 10000 pieces and 21
features. By tuning hyperparameters and evaluating models based on ROC & AUC and confusion matrix, it is concluded
that Random Forest has the best performance with its accuracy reaching 96.25%. Total transaction amount in the last
12 months, total transaction count in the last 12 months and total revolving balance are the top three important features
which have the significant impacts on the customer churn prediction. It shows that the more frequent customers use
their credit cards, the less likely they are to leave, and by using this model, bank managers can proactively take actions
to fight against customer churn.

Keywords: credit card, customer churn, random forest, machine learning.

focuses of many banks. It is true if we look at AARRR or


1. INTRODUCTION
HEART framework used by many banks [4-5], both have
With the rise of Internet in the past decade, great some metrics to help them make decisions, and there is
changes have taken place in the market environment and one metric called retention, i.e., to look at the n-day
user habits, resulting in the sharply increased use of credit retention, monthly active users or session frequency.
cards [1]. The cumulative card issuing volume has Studies have shown that a bank can increase its profits by
reached 588 million in 2017 compared to 465 million in 85% when the retention rate increases 5% [3].
2016, achieving an increase of 26.45% [1]. Since credit The aim of this paper is to predict customer churn,
card plays an important part in banks’ profit, many banks once successfully predicted, banks can have enough time
work very hard to offer better services and products. to proactively take actions to retain customers, by
Competition between banks is really fierce because offering better services or giving more attractive
products are kind of homogeneous, so customers actually discounts. As a consequence, it is of great significance,
have many options and can compare various banks based especially in today’s world, when there are a lot of data
on their past experiences of being served [2]. Therefore, related to customers, and with the spread use of big data,
many banks begin to realize the importance of customers massive users’ data have become valuable treasures for
and pay attention to customer relationship management enterprises.
(CRM) [3].
There are some paper discussed customer churns in
It is the case that getting new users is much more the past, but most of them failed to use a specific
expensive and difficult than avoiding customer loss, collection of datasets or apply machine learning models,
given that the cost of selling to new customers is five which is a branch of computer science, and is widely used
times the cost of additional sales to existing customers in commercial applications with its goal to let computers
[3]. Therefore, customer churn, becomes one of the main “learn” without directly programmed [6]. With machine

Copyright © 2022 The Authors. Published by Atlantis Press International B.V.


This is an open access article distributed under the CC BY-NC 4.0 license -http://creativecommons.org/licenses/by-nc/4.0/. 649
Advances in Economics, Business and Management Research, volume 211

learning approaches, it is possible to process and analyze convincing. Specifically, each tree learns independently
large amounts of data. Some other paper, although used from random sub-dataset and sub-features, and the final
models to predict the outcome, mainly focus on outcome is drawn with the help of deterministic
unsupervised learning, which, in general, are not reliable averaging process, or in other words, the average of
and have relatively poor interpretability [7]. predictions of individual trees [10]. A simple example of
random forest tree is shown below as Figure 1.
In this paper, we obtain credit card holders’
information data from Kaggle, which has more than
10000 pieces of data and 21 various features. We do the
exploratory data analysis to see its distribution and
visualize relationships between features. Then we split
the dataset into training and testing, followed by
standardization. Three models are used including
Random Forest, Logistic Regression and KNN to see
their performance. After that, a method called grid search
is used to tune hyperparameters. In order to select the
optimum model, we use confusion metrics and ROC &
AUC to evaluate models. It is concluded that Random
Forest performs the best among the three models, and it Figure 1. Description of Random Forest. Notes: X1 to
is approximately 5% higher in accuracy and much higher X4 are all features, with Y is the outcome that needs to
in other metrics such as prediction and recall. We rank be predicted. The original dataset undergoes splits to
the importance of features based on Random Forest several sub-sets with each one contains less features and
model and select the top three important features which less data. Then, each sub-set will be used to train a tree
exert meaningful impacts on banks’ decision making. to make prediction, and deterministic averaging process
Specifically, they are the total transaction amount in the is applied to determine the final result.
last 12 months, the total transaction count in the last 12
months and total revolving balance on the credit card. 2.1.1. Random Forest Algorithm
Therefore, we learn that the more frequent one customer
uses his/her credit card, the less likely he/she will get Input: Dataset (D) with N features and numbers of
churned, which is intuitive and makes sense in daily life, trees n.
and the bank managers can adjust the card service
Output: A random forest.
accordingly.
For i=1 to n:
In the rest of the paper, the methods used are
discussed first, followed by introduction of dataset, first step: Draw a bootstrap sample from original
preprocessing steps and results of applying models. After dataset D.
that, three conclusions are introduced in the end with
second step: Grow a random forest tree to the
some innovations and a touch of shortcomings and how
bootstrap data, and repeat the following steps until the
we can further improve.
minimum node size is reached.
2. METHOD (1) Select a subset of √𝑁 features (variables).

2.1. Random Forest (2) For j=1 to √𝑁 , pick the best variable from √𝑁
and split the node into left and right child nodes.
Random forest is a supervised learning model. It was
To make a binary classification prediction of a new
proposed by Breiman and Cutler in 2001, and is based on
point x, we can use this formula with denotes the state
decision tree and ensemble learning [8]. Decision tree can
prediction of the m-th random forest tree.
describe complicated relationships between x and y
n
rather than simple linear relationship, thus has a stronger Ĉrf (x) = majority vote{Ĉm (x)}1n (1)
modeling strength. However, a single tree model is very
sensitive to the training set data, therefore is very likely 2.2. Logistic Regression
to cause overfitting problem [9]. Ensemble learning,
however, can solve this problem by a method called Logistic Regression is a linear model, which connects
bagging, which is to train multiple learners, with each 𝑋1 ,…, 𝑋𝑝 to the conditional probability
one’s training data comes from a collection of through this formula:
bootstrapped samples selected randomly from original
dataset with replacement. It decreases variance by (2)
introducing randomness into model framework, making
the model more robust and the result more accurate and

650
Advances in Economics, Business and Management Research, volume 211

where Y stands for the binary outcome that we are 3.1. Basic Information about the dataset
interested, and 𝑋1 , _? 𝑋𝑝 are the features. , ,…,
We find relevant dataset about bank customer
are regression coefficients, which are derived from a
information from Kaggle, which consists of more than
method called maximum-likelihood using the dataset [11].
10000 pieces and includes 21 features such as age,
For a new instance, all the s are replaced by their
income, marital status, credit card limit and so on. 16 of
calculated counterparts and Xs by their realizations, and
them are numerical while 5 of them are categorical.
if the probability P is greater than a threshold value, the
new instance will then be assigned to Y=1 class
accordingly. Otherwise, it will be assigned to Y=0. 3.2. Exploratory Data Analysis
Usually, the threshold is set to be 0.5 and therefore is so- We conduct exploratory data analysis (EDA) to have
called Bayes Classifier. a better understanding of the data by checking for missing
and duplicated values, handling outliers, visualizing
2.3. K Nearest Neighbors distributions and plotting graphs to see the relationships
between features and our target, which is whether the
K Nearest Neighbors takes less time than the other
customer get churned. Here are some important features
and therefore is a relatively simple model. It conducts
that needed to illustrate.
predictions straightforward from training set data, by
calculating the closest k objects on distance of dataset to
the input data, where k is the hyperparameter and can be 3.2.1. Type of Card
adjusted to affect the classifier performance, and it then It can be seen from the below table that the type of
assigns classification based on maximum voted classes card held by majority of people is blue card with 93.2%.
out these adjacent classes [12]. In the Figure 3, we split the data into two parts, thus
There are many ways to calculate the distance, for clearly, we can visualize the relationships between card
example, Euclidean distance and distance Manhattan, type and both currently existing customer and left
with the former one the most popular. The distance d customers. These two follows the same pattern, with the
between two points a and b can be calculated through the amount of blue card holders extremely surpasses the
formula below: others.
Table 1. Proportion of different card categories
(3)
type percentage
The picture below shows the principle of K Nearest
Neighbors. Blue 93.2%

Silver 5.48%

Gold 1.15%

platinum 0.197%

Figure 2. Sketch map of K Nearest Neighbors. Notes:


The small circle in the middle stands for the newly input
data which needs to be classified, whereas other small
triangle and square represent original binary types’ data,
and we can assign the circle to either type according to
the model. If we set k to 3, we will look at the inner
solid line circle and assign my input circle to triangle, if
k is 5 on the other hand, it will be assigned to square.

3. DATA AND EXPLORATORY DATA Figure 3. Relationship between customers and card
ANALYSIS type.

This kind of data distribution is not informative


because it is unbalanced and thus, we cannot explain the

651
Advances in Economics, Business and Management Research, volume 211

relationship between card type and our target. Therefore,


this feature is deleted in the further analysis.

3.2.2. Credit Card Limit Table 2. Comment of EDA.

The credit card limit is analyzed to see whether there Features Notes
are some extremely large values, or outliers. For
example, if some people get a card limit of 1-2 million, Education level 70.65% of customers gained a
which are significantly larger than the others, then we high school or higher
need to delete these data. Fortunately, there are no
outliers, and all the limits are within a reasonable range. education.

Marriage statues Almost half of the customers


are married, and single
customers' number is up to
40%.

Income level People with income of 40k-60k


may be potential customers for
banks.

Number of products Banks could give priority to


held bundling sales.
Figure 4. Distribution of the card limit.
Card type The distribution of the card type
3.2.3. Number of Products held by customers
is too unbalanced to help in
The number of bank products held by customers also
predicting whether a customer
has an impact on our research.
will get churned.

4. PREPROCESSING, APPLYING
MODELS AND RESULTS

4.1. Data preprocessing

4.1.1. Label Encoding


Since the input data of the model need to be numerical
numbers, thus we use label encoding to transfer all the
categorical values into numerical, for example, for the
gender feature, we use 0 to label female and 1 for male.
Figure 5. Relationship between customers and No. of One hot encoding is also a popular method, but it will
products held. create too many features in this case, making it more
difficult and more time-consuming to find the results.
As can be seen from Figure 5, customers with 4, 5, 6
numbers of products accounts for most of the existing 4.1.2. Correlation Matrix
personnel, and their losing rate is also lower than that of
customers with only one or two products. It shows that in Pearson correlation matrix can give us the
the future marketing process, perhaps it is a good idea for information about relationships between features. It can
banks to implement bundle marketing. help us do feature selection by removing some highly
After conducting exploratory data analysis of correlated features in the model training step. Here we
features, we gain deeper insights of our data and some plot correlation graphs of categorical features in our
results are shown in the table below. dataset.

652
Advances in Economics, Business and Management Research, volume 211

Among them, the model accuracy of Random Forest


is the highest with 96.1%, and the rest two are very
similar with 90.36% and 90.32% respectively. Therefore,
we can choose to use Random Forest to make predictions,
however, for comparison purposes, we still train the other
models. We split the dataset into training and testing,
with the formal contains 80%, and then use the training
set data to find parameters of three models.

4.2.2. Optimal Hyperparameters


Hyperparameters are man-made parameters, and can
Figure 6. Pearson correlation graph of the first half exert significant impacts on the performances of models.
categorical features. If not fit, the model will either show weak prediction
strength or cause overfitting problem. In this paper, we
use grid search method to adjust parameters.
The results are shown in the table 4.
Table 4. Accuracy score.

Result of 5-
After tuning
Model fold cross
hyperparameters
validation

Random
0.9610 0.9568
Forest

Figure 7. Pearson correlation graph of the second half Logistic


categorical features. 0.9036 0.9052
Regression
Values on both the vertical and perpendicular axis are
our features, with the data in the middle shows the K-Nearest
relationships between the corresponding axes. The color, 0.9032 0.9042
Neighbor
obviously, reflects the strength of the correlation, and the
lighter the color, the higher the positive correlation
between two features, while the darker the color, the It can be found that the accuracy score on testing set
higher the negative correlation. data of Random Forest reached 95.68% after parameter
adjustment, which is still the highest compared to the
4.2. Modelling other two.

4.2.1. Testing Models 4.2.3. Model Comparison


We use 5-fold cross validation to test the performance To test the performance of models, we use ROC curve,
of Random Forest, Linear Regression and KNN, and get AUC value and confusion matrix.
the evaluation results of these three models as below.
Table 3. Accuracy score. 4.2.3.1. Confusion Matrix

model result of 5-fold cross The confusion matrix, also called error matrix, is a
standard format for the accuracy evaluation and is
validation
represented by a matrix of 2 rows and 2 columns as it
shown in figure 7 [13].
Random Forest 0.9610

Logistic Regression 0.9036

K-Nearest Neighbor 0.9032

Notes: Accuracy score is the accuracy of model prediction.

653
Advances in Economics, Business and Management Research, volume 211

the highest AUC which means this model have the best
prediction ability.
Table 5. AUC of rocand recall ratio.

Random Logistic K-
Forest Regression Nearest
Neighbor

AUC 0.9889 0.9175 0.9057

Recall ratio 0.9906 0.9700 0.9806

In conclusion, we have used several different ways to


Figure 8. Confusion matrix. test which model performs best on credit card customer
churns. We first use cross validation and then grid search
Positive and negative refer to the result of model on to make some adjustments to the parameters. After that,
whether the customer gets churned, while true and false both AUC value and recall ratio prove that Random
indicate whether our model predicts correct. For Forest is the best.
example, true positive in the upper left means our model
predicts this customer will leave our services and it is 4.3. Feature Selection
true. Similarly, we can define the other three matrices
using the same rule. We can use Random Forest to rank the importance of
features. It can be seen from the table below that the three
We choose recall ratio to analysis model most important features are total transaction amount in
performances. It demonstrates how many churned the past 12 months, total transaction count in the past 12
customers are successfully predicted. months, and total revolving balance on the credit card.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑅𝑒𝑐𝑎𝑙𝑙 = (4) Table 6. Important features selected by random forest.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Importanc Import
Since our goal is to predict consumer churn as many Features Features
as possible, the recall ratio is the best matrix among e ance
others. Large proportion reflects high prediction
accuracy, and the results are shown in table 5. Total transaction
amount in the 0.1854 unused amount 0.0310
4.2.3.2. Receiver Operating Characteristics Curve past 12 months
(ROC curve)
Total transaction Number of
Threshold is an important hyperparameter in ROC
count in the past 0.1797 contacts in the 0.0278
curve. It is a standard which helps to determine whether
a customer will get churned. The result of our model is a 12 months past 12 months
numerical number lies in the range of 0 to 1, therefore,
those under the threshold will be predicted staying while period of
those surpass will be predicted churned. used amount 0.1112 relationship 0.0241

Since it is a hyperparameter, we can set its value to with the bank


adjust the strictness of this method. Specifically, the
number of
higher the threshold, the more difficult a customer to be change in
assigned to lost customer, the higher the accuracy of this moths inactive
transaction count 0.1076 0.0241
result is. Points on the ROC curve show the true positive in the past 12
rate and false positive rate achieved by specific decision (Q4 over Q1)
months
thresholds, and the monotone curve is obtained by
scanning all possible decision thresholds, and the area number of dependents
under the curve (AUC) corresponds to the proportion of 0.0686 0.0122
products held number
correctly arranged positive and negative sample pairs
[14]. Peter et al. had found that if we want to consider
average card
only the optimal threshold, then the higher AUC is, the 0.0649 education 0.0105
utilization ratio
stronger the predicted ability the model will have [15].
The below table clearly shows that Random Forest has

654
Advances in Economics, Business and Management Research, volume 211

Amount change 0.0633 Income level 0.0099 Totally three conclusions are obtained from the
research. To begin with, Random Forest model is the best
Card limit 0.0319 marital status 0.0086 among the other two, although it has relatively low
computational speed due to its complexity, its
age 0.0315 gender 0.0079 performance is approximately 5% higher in accuracy and
2% higher in recall. Secondly, using a better combination
of parameters can improve model’s performance. Finally,
we check the feature importance of the dataset and find
As for total transaction amount and count, they are
that the total transaction amount in the last 12 months,
very similar. Both can reflect usage situation of a
total transaction count in last 12 months and total
customer, because the bill could either be several big
revolving balance on the credit card have significant
expenses or frequent small-amount pay out. It is quite
impacts on model forecasting. It shows that the more
intuitive that the more one customer uses his credit card,
frequent customers use their credit cards, the less likely
the less likely he will leave bank’s services. Through
they are to leave, therefore, the bank managers can adjust
using process, customers may get more dependent on
credit card service based on it to fight against customer
credit card or be more satisfied with the services and
churn and increase retention rate. And the increase of
products, therefore, obviously they will keep using the
retention rate brings about a greater profit growth. By
card.
using this model, they have plenty of time prior to taking
Identifying the factors affecting customer churn has actions to retain customers, i.e., by making promotions,
always been popular research, because it can help banks offering coupons to encourage people to use their credit
better grasp existing customers and improve profits. By card and cultivate their using habits.
using logistic regression and decision tree, Abbas et al.
There are some deficiencies as well. Firstly, machine
found that customer relationship length, customer age,
learning has numerous algorithms in classification such
customer gender and the number of mobile banking
as neural network, but we merely use a few of them.
transactions have an impact on customer churn [16].
Next, only one dataset which is collected from a specific
Moreover, in the study of Mahdi et al., they found that
bank is used, thus it might bring limitations to our model
the loss of bank customers had something to do with their
because it just represents a part of the industry. Lastly,
careers through Neural Network model [17]. To
we use a single model to do predictions. In fact, ensemble
determine the causes of customer churn in banking and
learning can combine multiple models’ advantages and
e-banking services, Chiang et al. used association rules
give a better performance.
and analysis of customer transactions to find the most
important customer churn patterns and their result shows
CONFLICT OF INTEREST
that blind promotion is a major cause of customer loss
[18]. The authors declare no conflict of interest.
Those findings seem differ from what we get, and the
reason behind is because the databases, methods and AUTHOR CONTRIBUTIONS
models used in each study are different. In Abbas’ study
Xinyu Miao conducted the research, Liguo Tang
he used decision tree while Mahdi used Neural Network,
analyzed the data, and Haoran Wang wrote the paper. All
whereas in our study, we mainly use Random Forest
authors had approved the final version.
which is a combination of several decision trees.
Therefore, each study draws different conclusions.
REFERENCES
5. CONCLUSION [1] N. X. Hong, and L. Yi, “Standing at the crossroads-
credit card,” Reporters' Notes, vol. 5, pp. 41-43,
This paper aims at predicting the loss of bank credit 2020. (in Chinese)
card customers. We get 10,000 dataset containing age,
salary, marital status, credit card limit, etc., and do [2] R. Rajamohamed, and J. Manokaran, “Improved
analysis and research based on it. We firstly preprocess credit card churn prediction based on rough
the dataset, then apply three rational classification clustering and supervised learning techniques,”
models, specifically, Random Forest, Logistic Cluster Computing, vol. 21, pp. 65-77, June 2017.
regression, KNN by using 5-fold cross validation. We
[3] G. L. Nie, W. Rowe, L. L. Zhang, Y. J. Tian, and Y.
adjust the hyperparameters in each model to improve the
accuracy and use ROC & AUC and confusion matrix to Shi, “Credit card churn forecasting by logistic
evaluate the model performance. Both two agree that regression and decision tree,” Expert Systems with
Random Forest has the strongest predictive ability, and Applications, vol. 38, pp. 15273-15285, 2011.
by using this, we find out three features which have the [4] J. Liao, and Y. F. Ruan, “Research on APP
greatest impact on our prediction. Intelligence Promotion Decision Aiding System

655
Advances in Economics, Business and Management Research, volume 211

Based on Python Data Analysis and AARRR [15] P. Flach, J. H. Orallo, and C. Ferri, “A Coherent
Model,” Journal of Physics: Conference Series, vol. Interpretation of AUC as a Measure of Aggregated
1856, pp. 1-7, 2021. Classification Performance,” ICML, pp. 657-664,
Jane 2011.
[5] M. Kehoe, H. B. Taylor, and D. Broderick,
“Developing student social skills using restorative [16] A. Keramati, H. Ghaneei, and S. M. Mirmohammadi,
practices: a new framework called H.E.A.R.T,” “Investigating factors affecting customer churn in
Social Psychology of Education, vol. 21, pp. 189- electronic banking and developing solutions for
207, 2017. retention,” International Journal of Electronic
Banking, vol. 2, no. 3, pp. 185-204, November 2020.
[6] B. Roscher, B. Bohn, M. F. Duarte, and J. Garcke,
“Explainable Machine Learning for Scientific [17] S. H. Iranmanesh, M. Hamid, M. Bastan, G. H.
Insights and Discoveries,” IEEE Access, vol. 8, pp. Shakouri, and M. M. Nasiri, “Customer churn
42200-42216, 2020. prediction using artificial neural network: An
analytical CRM application,” In Proceedings of the
[7] Q. F. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler,
International Conference on Industrial Engineering
“What is Machine Learning? A Primer for the
and Operations Management, Pilsen, Czech
Epidemiologist,” American Journal of
Republic, pp. 23-26, July 2019.
Epidemiology, vol. 188, pp. 2222-2239, October
2019. [18] D. Chiang, Y. Wang, S. Lee, and C. Lin, “Goal-
oriented sequential pattern for network banking
[8] Y. A. Amrani, M. Lazaar, and K. E. E. Kadiri,
churn analysis,” Expert Systems with Applications,
“Random Forest and Support Vector Machine based
vol. 25, no. 3, pp. 293-302, 2003.
Hybrid Approach to Sentiment Analysis,” Procedia
Computer Science, vol. 127, pp. 511-520, 2018.
[9] S. Y. Xuan, G. J. Liu, Z. C. Li, L. T. Zheng, S. Wang,
and C. J. Jiang, “Random Forest for Credit Card
Fraud Detection,” 2018 IEEE 15th International
Conference on Networking, Sensing and Control
(ICNSC), pp. 1-6, 2018.
[10] T. Hengl, M. Nussbaum, M. N. Wright, G. B. M.
Heuvelink, and B. Gräler, “Random forest as a
generic framework for predictive modeling of
spatial and spatio-temporal variables,” PeerJ, vol. 6,
pp. e5518, 2018.
[11] R. Couronné, P. Probst, and A. L. Boulesteix,
“Random forest versus logistic regression: a large-
scale benchmark experiment,” BMC Bioinformatics,
vol. 19, pp. 270-283, July 2018.
[12] A. Singh, M. N. Halgamuge, and R. Lakshmiganthan,
“Impact of Different Data Types on Classifier
Performance of Random Forest, Naïve Bayes, and
K-Nearest Neighbors Algorithms,” International
Journal of Advanced Computer Science and
Applications (IJACSA), vol. 8, pp. 1-10, 2017.
[13] N. Yang, Y. Qian, H. S. EL-Mesery, R. Zhang, A.
Wang, and J. Tang, “Rapid detection of rice disease
using microscopy image identification based on the
synergistic judgment of texture and shape features
and decision tree–confusion matrix method,”
Journal of the Science of Food and Agriculture, vol.
99, no. 14, pp. 6589-6600, 2019.
[14] J. H. Orallo, P. Flach, and C. Ferri, “ROC curves in
cost space,” Machine Learning, vol. 93, no. 1, pp.
71-91, 2013.

656

You might also like