Decision_Trees_Concepts_Algorithms
Decision_Trees_Concepts_Algorithms
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Machine learning (ML) has been instrumental in solving complex problems and significantly
advancing different areas of our lives. Decision tree-based methods have gained significant popularity
among the diverse range of ML algorithms due to their simplicity and interpretability. This paper presents a
comprehensive overview of decision trees, including the core concepts, algorithms, applications, their early
development to the recent high-performing ensemble algorithms and their mathematical and algorithmic
representations, which are lacking in the literature and will be beneficial to ML researchers and industry
experts. Some of the algorithms include classification and regression tree (CART), Iterative Dichotomiser 3
(ID3), C4.5, C5.0, Chi-squared Automatic Interaction Detection (CHAID), conditional inference trees, and
other tree-based ensemble algorithms, such as random forest, gradient-boosted decision trees, and rotation
forest. Their utilisation in recent literature is also discussed, focusing on applications in medical diagnosis
and fraud detection.
INDEX TERMS Algorithms, CART, C4.5, C5.0, Decision tree, Ensemble learning, ID3, Machine learning
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
generalization. Cañete-Sifuentes et al. [12] reviewed cesses [15], [16]. However, the modern implemen-
multivariate decision trees (MDT) and compared the tation of decision trees in the context of ML started
performance of several MDT induction classifiers. decades later. Breiman et al. [17] developed the
Anuradha and Gupta [13] presented a review of CART algorithm in 1984, introducing concepts such
decision tree classifiers, focusing on a high-level as the Gini index and binary splitting, which are
description of key concepts, such as node splitting now widespread in decision tree designs. Quinlan
and tree pruning. Meanwhile, Costa and Pedreira [18] developed ID3, one of the first notable decision
[14] reviewed recent decision tree-based classifier tree algorithms, in 1986. Furthermore, Quinlan [19]
advances. The paper covered three main issues: how enhanced the ID3, introducing the C4.5 decision
decision trees fit the training data, their generaliza- tree in 1993. These developments and integration of
tion, and interpretability. decision trees into ensemble methods like random
However, most of the existing surveys and re- forests and boosting algorithms have solidified their
views of decision trees focus on their applications place as fundamental algorithms in machine learn-
in specific domains or a high-level overview of the ing.
decision tree concept. Therefore, the current liter- The learning procedure of decision trees involves
ature lacks a comprehensive overview of decision a series of steps where the data is split into ho-
tree algorithms, their early developments, succinct mogenous subsets, as shown in Figure 1. The root
mathematical formulations, and algorithmic repre- node, which is the starting point of the tree, rep-
sentations in a single peer-reviewed paper. There- resents the entire dataset. The algorithm identifies
fore, it is essential to have a review that fills this the feature and the threshold that leads to the best
gap in view of the continuous use and prevalence of split based on a specific criterion [20]. The process
decision tree-based algorithms and their application continues recursively, with each subset of the data
in today’s technological advancements. Hence, in being further split at each child node. This continues
this study, we present a detailed review of decision until a stopping criterion is reached, typically when
tree-based algorithms. Specifically, the paper aims the nodes are pure (i.e., all data points in a node
to cover the different decision tree algorithms, in- belong to the same class) or when a predefined depth
cluding ID3, C4.5, C5.0, CART, conditional infer- of the tree is reached. The nodes where the tree
ence trees, and CHAID, together with other tree- ends, called leaf node or terminal node, represent
based ensemble algorithms, such as random forest, the outcomes or class labels. The decision to split at
rotation forest, and gradient boosting decision trees. each node is made using mathematical formulations
The paper aims to present their mathematical formu- such as information gain, Gini impurity, or variance
lations and algorithmic representations clearly and reduction.
concisely. Furthermore, the success of decision tree tech-
The rest of the paper is structured as follows: niques mainly depends on several factors contribut-
Section II presents a comprehensive overview of ing to their performance, interpretability, and appli-
the decision tree, covering key areas such as split- cability to a wide range of problems. These factors
ting criteria and tree pruning methods. Section III include data quality, tree depth, splitting criteria,
discusses different decision tree algorithms, their and tree pruning method. According to Piramuthu
learning process, splitting criteria, and mathematical [21], the effectiveness of decision trees is highly
formulations. Section IV reviews decision tree ap- dependent on the training data quality. Hence, it
plications in recent literature, including applications is necessary to use clean or preprocessed data not
in medical diagnosis and fraud detection. Section containing missing values and outliers, which can
V discusses key findings and future research direc- significantly enhance the performance of the re-
tions, and Section VI concludes the paper. sulting models. Additionally, feature selection and
feature engineering are necessary because inputting
II. OVERVIEW OF DECISION TREE relevant and well-transformed features can lead to
This section provides a comprehensive overview of more efficient and accurate splits.
decision trees, focusing on the main building blocks
and splitting criteria. Decision trees, as a concept in A. SPLITTING RULES
ML, have a history that dates back to the mid-20th The term splitting criteria, or splitting rules, de-
century. Initial decision tree studies were started by scribes the methods used to determine where a tree
Charles J. Clopper and Egon S. Pearson in 1934, should make a split in its nodes, effectively deciding
who introduced the concept of binary decision pro- how to divide the dataset into subsets based on dif-
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
2) Information Gain
Information Gain (IG), a criterion used in ID3 and
C4.5, is based on the notion of entropy in informa-
tion theory. Entropy measures the unpredictability
or randomness in a set of data [26]. The IG tech-
nique searches for a split that maximizes the dif-
ference in certainty or decreases uncertainty before
and after the split. It determines the effectiveness
of an attribute in splitting the training data into
homogenous sets. Meanwhile, the entropy (E) of a
set S is given by the formula:
n
X
E(S) = − pi log2 (pi ) (2)
i=1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
construct a decision tree. It begins with the entire Equation 4, to select the best attribute to split the
dataset and divides it into subsets based on the at- dataset at each node, aiming to overcome the bias
tribute that maximizes the Information Gain (Equa- towards attributes with more levels found in the
tion 3), intending to efficiently classify the instances original Information Gain criterion used by ID3.
at each node of the tree. The ID3 is described in C5.0 is an improvement over C4.5, also proposed
Algorithm 1. by Quinlan [42], designed to be faster and more
memory efficient. It introduces several enhance-
Algorithm 1 ID3 Decision Tree Algorithm ments, such as advanced pruning methods and the
Require: Training data set D = ability to handle more complex types of data. C5.0
{(x1 , y1 ), (x2 , y2 ), ..., (xm , ym )} maintains the use of the information gain ratio for
Ensure: Decision tree T . selecting attributes but optimises the algorithm’s
1: function ID3(D) execution and the resulting decision tree’s size.
2: if D is empty then return a terminal node
with default class cdef ault C. CLASSIFICATION AND REGRESSION TREES
3: end if The CART decision tree was proposed in 1984 by
4: if all instances in D have same class label y Breiman et al. [43]. Unlike C4.5, CART creates bi-
then return a terminal node with class y nary trees irrespective of the type of target variables.
5: end if It uses different splitting criteria for classification
6: if the attribute set J is empty then return a and regression tasks. For classification tasks, it uses
terminal node with the prevalent class in D the Gini index (Equation 1) as a measure to create
7: end if splits [44], [45]. Meanwhile, it employs variance as
8: Select the feature f that best splits the data the splitting criterion in regression tasks [46], [47].
using information gain. The variance reduction for a set S when split on
9: Create a decision node for f . attribute A is calculated as:
10: for each value bi of f do
11: Create a branch for bi .
12: Let Di be the subset of D where xi = |Slef t |
V R = V (S) − V (Slef t )
bi . |S|
13: Recursively build the subtree for Di . |Sright |
14: Attach the subtree to the branch for bi . + V (Sright ) (6)
|S|
15: end for
16: return the decision node. where V (S) is the variance of the target variable
17: EndFunction in set S, and Slef t and Sright are the subsets of S
after the split on attribute A. In both cases, the goal
The algorithm iterates through every unused at- is to choose the split that maximizes the respective
tribute and calculates the Information Gain for a measure (Gini impurity reduction for classification
dataset split by the attribute’s possible values. The and variance reduction for regression), leading to
attribute with the highest Information Gain is cho- the most homogenous subsets possible. The CART
sen to make the decision at the node, and the dataset algorithm is described in Algorithm 2.
is partitioned accordingly. This process is repeated
recursively for each partitioned subset until one of D. CHI-SQUARED AUTOMATIC INTERACTION
the stopping criteria is met, such as when no further DETECTION
information can be gained, all instances in a subset The CHAID algorithm, developed by Kass [48],
belong to the same class, or there are no more at- performs multi-level splits when computing classifi-
tributes left to consider. Lastly, the ID3’s limitations cation trees. It is particularly robust in the detection
include its inability to directly handle continuous of interaction between variables. CHAID can handle
variables and overfitting. more than two categories for each variable, and
it uses the Chi-Square (χ2 ) test of independence
B. 4.5 AND C5.0 as its splitting criterion [49], [50]. This statistical
Quinlan [19] proposed the C4.5 in 1993 as an exten- test is applied to assess the relationship between
sion of the ID3 algorithm and is designed to handle categorical variables. For a given attribute A with
both continuous and discrete attributes. It introduces different categories and a target class C, the χ2
the concept of information gain ratio, described in statistic is computed as:
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
overfitting. Each tree in the forest is built from and L(y, Ft−1 (x)) is the loss function, the GBDT
a sample drawn with replacement (i.e., bootstrap algorithm works as follows:
sample) from the input data [56]. The basic idea 1) Initialize the model with a constant value
behind this algorithm is to generate a set of trees (e.g., the mean of the target variable).
using different subsets of the input samples and 2) For t = 1 to T :
features and then combine their outputs to obtain a) Compute the negative gradient of the
a final prediction. The Random Forest algorithm loss function with respect to the current
uses two main techniques to reduce overfitting and model’s predictions for each instance in
improve accuracy: the training data.
• Bootstrap Sampling: By sampling the data with b) Fit a decision tree to the negative gradi-
replacement, the algorithm generates multiple ent values, using the input data as fea-
training sets that are slightly different from tures and the negative gradient values as
each other. This type of sampling ensures re- target variables.
duced variance and prevents overfitting. c) Update the model by adding the new
• Feature Randomization: Randomly selecting a tree, weighted by a learning rate η, to the
subset of features for each tree ensures the current model.
algorithm decorrelates the trees and reduces 3) Make a prediction for a new instance by sum-
the chance of selecting the same "best" feature ming the predictions from the various trees:
for every tree. This improves the diversity and a) For a regression task, the final prediction
accuracy of the trees. is the sum of the predictions of all the
trees, i.e., f (x) is given by:
Algorithm 4 Random Forest Algorithm
T
1: for t = 1 to T do ▷ Generate T trees X
f (x) = ηht (x) (8)
2: Randomly sample n instances from D with
t=1
replacement
where η is the learning rate.
3: Randomly select m attributes from the total
b) For a classification task, the final pre-
p attributes (where m << p)
diction is the probability of the positive
4: Build a decision tree ht based on the sam-
class, computed by applying a sigmoid
pled instances and attributes
function to the sum of the predictions of
5: end for
all the trees.
6: end for
7: To make predictions for a new instance x:
1
f (x) = PT (9)
1+e − t=1 ηht (x)
8: if classification task then
PT
9: f (x) = argmaxc T1 t=1 I{ht (x) = c} ▷ where η is the learning rate and e is the
Majority vote across trees Euler’s number.
10: else if regression task then
PT
11: f (x) = T1 t=1 ht (x) ▷ Average of tree H. ROTATION FOREST
predictions Rotation forest is a type of decision tree ensemble
12: end if where each tree is trained on the principal com-
13: end if ponents of a randomly selected subset of features
[60], [61]. The core idea behind this algorithm is
to train each classifier in the ensemble on a version
G. GRADIENT BOOSTED DECISION TREES of the training data that has been transformed to
Gradient Boosted Decision Trees (GBDT) is an maintain the correlation between the features and
ensemble learning method that combines multiple introduce diversity among the classifiers. This is
decision trees to create a powerful predictive model achieved through the following steps:
[57]. Unlike Random Forest, which builds inde- 1) For each classifier to be trained, partition the
pendent trees in parallel, GBDT uses a sequential set of features F into k subsets. The partition-
approach to build trees that correct the errors of the ing can be random but is done in such a way
previous trees [58], [59]. It uses gradient descent that each subset contains a different part of the
to minimize errors. Assuming T is the number features.
of trees, ht (x) is the prediction of the t-th tree, 2) For each subset of features, apply PCA to
Ft−1 (x) is the current model’s predictions for x, obtain the principal components. This step
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
transforms the original feature space into a to predict the likelihood of a patient developing a
new space that captures the variance in the specific disease based on their medical history and
data more effectively. lifestyle factors [11], [62], [63]. This information
3) Combine the principal components from all can then be used to implement preventive measures
subsets to form a new set of features for and interventions, ultimately improving patient out-
training the classifier. This effectively rotates comes and reducing healthcare costs.
the axis of the feature space, hence the name Pathak et al. [64] proposed a heart disease pre-
Rotation Forest. diction model using a decision tree. The model was
4) Train each base classifier on the transformed built using a fuzzy rule-based technique combined
dataset. Different classifiers can be used, but with a decision tree, achieving an accuracy of 88%
decision trees are commonly applied. when trained on the Cleveland heart disease dataset
Given a dataset D with n features, the algorithm obtained from the University of California Irvine
partitions the feature set F into k non-overlapping (UCI) machine learning repository. Similarly, Maji
subsets F1 , F2 , . . . , Fk . For each subset Fi , PCA and Arora [65] conducted a study on heart disease
is applied to derive a set of principal components prediction using a different dataset from the UCI
P Ci , capturing the main variance directions of the machine learning repository. The study employed
features in Fi . The transformation for a subset Fi the C4.5 decision tree and a hybrid decision tree
can be represented as: made of C4.5 and artificial neural network (ANN),
where the former achieved an accuracy of 76.66%
Ti = P CA(Fi ) (10) and the latter 78.14%. The study demonstrated the
robustness of hybridising decision trees with neural
where Ti is the transformation matrix obtained from
networks.
PCA on subset Fi . The new feature set for training
Ahmad et al. [66] studied the performance of
the j th classifier, Dj , is obtained by applying the
several algorithms using different heart disease
transformation Ti to each subset Fi and concatenat-
datasets, including Cleveland, Switzerland, and
ing the results:
Long Beach. The algorithms studied include ran-
k
M dom forest, decision tree, support vector machine
Dj = Ti (Fi ) (11) (SVM), k-nearest neighbor (KNN), linear discrim-
i=1 inant analysis, and gradient boosting classifier. The
study employed sequential feature selection (SFS)
L
where denotes the concatenation of the trans-
formed feature subsets. The ensemble’s final out- to obtain the most significant features, which were
put is typically the majority vote (for classification then used to train the models. The study con-
tasks) of the predictions from all base classifiers. cluded that the random forest-SFS and decision tree-
A summary of the different tree-based algorithms SFS achieved the best accuracy. For the Cleveland
is tabulated in Table 1, including their advantages dataset, the random forest and decision tree obtained
and disadvantages. accuracies of 100
In [67], the authors identified the C4.5 and ran-
IV. DECISION TREE APPLICATIONS IN RECENT dom forest as potentially robust algorithms for de-
LITERATURE tecting chronic kidney disease (CKD) stages. The
Decision trees have gained significant attention in study employed a CKD dataset from the UCI ma-
recent literature. This section discusses some pop- chine learning repository, comprising 25 features
ular applications of decision trees in fields such as and 400 samples. The results indicated that the
healthcare and finance. C4.5 achieved an accuracy of 85.5%, outperforming
the random forest, which achieved an accuracy of
A. MEDICAL DIAGNOSIS 78.25%.
Healthcare is one of the prominent areas where Decision tree-based methods have also been em-
decision trees have found extensive use. Researchers ployed to diagnose COVID-19. Yoo et al. [66] pro-
have utilized decision trees to predict disease di- posed a deep learning-based decision tree model to
agnosis, treatment outcomes, and patient progno- detect COVID-19 using chest X-ray images. The
sis. Decision trees are effective in identifying pat- approach consists of three decision trees trained
terns and relationships in medical data, leading to using deep learning architectures, including a con-
more accurate diagnoses and personalized treatment volutional neural network (CNN). One tree classi-
plans. For example, decision trees have been used fies the images as normal or abnormal, another tree
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
detects tuberculosis indicators in the abnormal im- ensemble classifier. The study evaluated the per-
ages, and the last detects COVID-19. The approach formance of ensemble pruning on the imbalanced
achieved an average accuracy of 95%. Ghiasi and glaucoma dataset. The ensemble pruning techniques
Zendehboudi [68] proposed a decision tree-based include pruning by prediction accuracy (using
ensemble classifier for detecting breast cancer. The the Brier Score strategy), pruning by uncertainty-
study used the well-known Wisconsin Breast Cancer weighted accuracy (UWA), and pruning by diversity
dataset and aimed to build a robust breast cancer (using the Double-Fault measure). The experimental
detection framework using the random forest and results indicated that the RF model reached an area
extra trees classifier (ET). The approach resulted in under the receiver operating characteristic curve
an accuracy of 100%. (AUC) of 0.98 for the Brier and double-fault prun-
Mienye and Sun [69] studied the performance ing techniques.
of ML algorithms for heart disease prediction. The Additionally, Mienye et al. [71] employed deci-
study utilized the following algorithms: decision sion tree, SVM, and logistic regression for CKD
tree, XGBoost, random forest, logistic regression, detection. The selected algorithms were also used
and naive Bayes. Firstly, the authors employed the as the base learners in the AdaBoost ensemble. The
Synthetic Minority Oversampling Technique-Edited study reported accuracies of 94% and 100% for the
Nearest Neighbor (SMOTE-ENN) to resample the decision tree and AdaBoost classifier that used a
data and solve the imbalance class problem. Also, decision tree as a based learner. The study demon-
the recursive feature elimination technique was em- strated the robustness of using a decision tree in
ployed to identify the most significant attributes to the AdaBoost over the SVM and logistic regression.
further enhance the classification performance of the Furthermore, Mienye and Sun [72] studied the im-
models. The results showed that the decision tree, pact of cost-sensitive ML in medical diagnosis using
random forest, and XGBoost achieved an accuracy the following algorithms: decision tree, random for-
of 87.7%, 93%, and 95.6%, respectively, with the est, and XGBoost. Cost-sensitive learning involves
XGBoost obtaining the highest accuracy. modifying the algorithm to focus on the minority
Meanwhile, Adler et al. [70] developed a Glau- class samples, thereby enhancing the model’s per-
coma detection method using the random forest formance on the minority class, which in most ap-
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
V. DISCUSSIONS AND FUTURE RESEARCH number of features, as the tree structure becomes
DIRECTIONS complex and prone to overfitting. Future research
Decision trees have proven to be effective in various could explore techniques to improve the scalability
domains, including healthcare and finance. How- and efficiency of decision trees in high-dimensional
ever, like any other algorithm, decision trees have settings, such as feature selection methods or dimen-
their limitations and areas for improvement. In this sionality reduction techniques.
section, we will explore some potential future re- Furthermore, while decision trees are known for
search directions in decision trees that can enhance their interpretability compared to other machine
their performance and address their limitations. learning algorithms, they can still be difficult to un-
Firstly, the handling of missing data is a crucial derstand and explain, especially when they become
area of potential improvement for decision trees. large and complex. Future research could investigate
Currently, decision trees either ignore instances with methods to simplify decision trees and make them
missing values or use surrogate splits to make pre- more understandable to non-experts, such as rule
dictions [86], [87]. However, these approaches may extraction algorithms or visualisation techniques.
not always be optimal and can lead to biased or Additionally, decision trees are sensitive to outliers
inaccurate results. Future research could focus on and can easily be influenced by noisy data, leading
developing more sophisticated methods to handle to inaccurate predictions [91]. It might be worth
missing data in decision trees, such as advanced examining the robustness of decision trees to out-
imputation techniques or incorporating uncertainty liers and noisy data and exploring methods to make
estimation. decision trees more robust to outliers and noise, such
Another future research direction will be en- as outlier detection techniques or robust splitting
hancing the ability of decision trees to handle criteria.
high-dimensional data [88]–[90]. Decision trees can Lastly, the application of decision trees in emerg-
struggle when faced with datasets that have a large ing fields and domains is a potential future re-
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
search direction. Decision trees have been exten- streak virus disease detection,” Machine Learning with Appli-
sively studied and applied in traditional domains cations, p. 100556, 2024.
[8] R. Rivera-Lopez, J. Canul-Reich, E. Mezura-Montes, and
such as healthcare, finance, and marketing. How- M. A. Cruz-Chávez, “Induction of decision trees as classifica-
ever, there are numerous emerging fields where tion models through metaheuristics,” Swarm and Evolutionary
decision trees can potentially make a significant im- Computation, vol. 69, p. 101006, 3 2022.
[9] O. Sagi and L. Rokach, “Explainable decision forest: Trans-
pact. For example, decision trees could be applied in forming a decision forest into an interpretable tree,” Informa-
the field of autonomous vehicles to aid in decision- tion Fusion, vol. 61, pp. 124–138, 9 2020.
making processes or in the field of natural language [10] L.-a. Dong, X. Ye, and G. Yang, “Two-stage rule extraction
method based on tree ensemble model for interpretable loan
processing to improve sentiment analysis and text evaluation,” Information Sciences, vol. 573, pp. 46–64, 9 2021.
classification tasks. Future research could explore [11] D. Che, Q. Liu, K. Rasheed, and X. Tao, “Decision tree
the potential applications of decision trees in these and ensemble learning algorithms with their applications in
bioinformatics,” in Advances in Experimental Medicine and
emerging fields and investigate their effectiveness in Biology, pp. 191–199, Springer New York, 2011.
solving complex problems. [12] L. Canete-Sifuentes, R. Monroy, and M. A. Medina-Perez, “A
review and experimental comparison of multivariate decision
trees,” IEEE Access, vol. 9, pp. 110451–110479, 2021.
VI. CONCLUSION [13] Anuradha and G. Gupta, “A self explanatory review of decision
Decision trees have shown great potential and ef- tree classifiers,” 2014 Recent Advances and Innovations in
Engineering (ICRAIE), IEEE, 5 2014.
fectiveness in various fields. Their ability to analyse [14] V. G. Costa and C. E. Pedreira, “Recent advances in deci-
complex data and identify patterns and relationships sion trees: an updated survey,” Artificial Intelligence Review,
makes them valuable in the field of machine learn- vol. 56, pp. 4765–4800, 10 2022.
[15] C. Gupta and A. Ramdas, “Distribution-free calibration guar-
ing. This paper presented an overview of the deci-
antees for histogram binning without sample splitting,” in
sion trees, including their early development to the International conference on machine learning, pp. 3942–3952,
recent high-performing tree-based ensemble meth- PMLR, 2021.
ods. The article covers the main decision tree algo- [16] F. Mazurek, A. Tschand, Y. Wang, M. Pajic, and D. Sorin,
“Rigorous evaluation of computer processors with statistical
rithms, such as CART, ID3, C4.5, C5.0, CHAID, model checking,” MICRO ’23: 56th Annual IEEE/ACM Inter-
and conditional inference trees. Their applications national Symposium on Microarchitecture, ACM, 10 2023.
in medical diagnosis, credit risk, and fraud detection [17] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24,
pp. 123–140, 8 1996.
were reviewed. This study will be beneficial to ML [18] J. R. Quinlan, “Induction of decision trees,” Machine Learning,
practitioners and researchers trying to understand vol. 1, pp. 81–106, 3 1986.
decision trees and the widely used tree-based algo- [19] J. R. Quinlan, C4. 5: programs for machine learning. Elsevier,
2014.
rithms. [20] I. D. Mienye, Y. Sun, and Z. Wang, “Prediction performance of
improved decision tree-based algorithms: a review,” Procedia
Manufacturing, vol. 35, pp. 698–703, 2019.
REFERENCES
[21] S. PIRAMUTHU, “Input data for decision trees,” Expert Sys-
[1] J. G. Richens, C. M. Lee, and S. Johri, “Improving the ac- tems with Applications, vol. 34, pp. 1220–1226, 2 2008.
curacy of medical diagnosis with causal machine learning,” [22] S. Hwang, H. G. Yeo, and J.-S. Hong, “A new splitting
Nature Communications, vol. 11, 8 2020. criterion for better interpretable trees,” IEEE Access, vol. 8,
[2] G. Obaido, F. J. Agbo, C. Alvarado, and S. S. Oyelere, “Anal- pp. 62762–62774, 2020.
ysis of attrition studies within the computer sciences,” Ieee [23] J.-S. Hong, J. Lee, and M. K. Sim, “Concise rule induction al-
Access, 2023. gorithm based on one-sided maximum decision tree approach,”
[3] S. Ahmed, M. M. Alshater, A. E. Ammari, and H. Hammami, Expert Systems with Applications, vol. 237, p. 121365, 2024.
“Artificial intelligence and machine learning in finance: A [24] D. Bertsimas and J. Dunn, “Optimal classification trees,” Ma-
bibliometric review,” Research in International Business and chine Learning, vol. 106, pp. 1039–1082, 4 2017.
Finance, vol. 61, p. 101646, 10 2022. [25] L. Rutkowski, M. Jaworski, L. Pietruczuk, and P. Duda, “The
[4] G. Obaido, B. Ogbuokiri, C. W. Chukwu, F. J. Osaye, O. F. cart decision tree for mining data streams,” Information Sci-
Egbelowo, M. I. Uzochukwu, I. D. Mienye, K. Aruleba, ences, vol. 266, pp. 1–15, 5 2014.
M. Primus, and O. Achilonu, “An improved ensemble method [26] C. J. Mantas, J. Abellán, and J. G. Castellano, “Analysis of
for predicting hyperchloremia in adults with diabetic ketoaci- credal-c4.5 for classification in noisy domains,” Expert Sys-
dosis,” IEEE Access, vol. 12, pp. 9536–9549, 2024. tems with Applications, vol. 61, pp. 314–326, 11 2016.
[5] C. Wang, J. Xu, S. Tan, and L. Yin, “Secure decision tree [27] G. S. Reddy and S. Chittineni, “Entropy based c4.5-sho al-
classification with decentralized authorization and access con- gorithm with information gain optimization in data mining,”
trol,” Computer Standards amp; Interfaces, vol. 89, p. 103818, PeerJ Computer Science, vol. 7, p. e424, 4 2021.
4 2024. [28] N. Peker and C. Kubat, “Application of chi-square discretiza-
[6] M. M. Rahman and S. A. Nisher, “Predicting average local- tion algorithms to ensemble classification methods,” Expert
ization error of underwater wireless sensors via decision tree Systems with Applications, vol. 185, p. 115540, 12 2021.
regression and gradient boosted regression,” in Proceedings of [29] L. A. Badulescu, “A chi-square based splitting criterion better
International Conference on Information and Communication for the decision tree algorithms,” 2021 25th International Con-
Technology for Development, pp. 29–41, Springer Nature ference on System Theory, Control and Computing (ICSTCC),
Singapore, 2023. IEEE, 10 2021.
[7] T. O’Halloran, G. Obaido, B. Otegbade, and I. D. Mienye, “A [30] F. Mahan, M. Mohammadzad, S. M. Rozekhani, and
deep learning approach for maize lethal necrosis and maize W. Pedrycz, “Chi-mflexdt:chi-square-based multi flexible
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
fuzzy decision tree for data stream classification,” Applied Soft [49] S. Kushiro, S. Fukui, A. Inui, D. Kobayashi, M. Saita, and
Computing, vol. 105, p. 107301, 7 2021. T. Naito, “Clinical prediction rule for bacterial arthritis: Chi-
[31] F. J. Mehedi Shamrat, S. Chakraborty, M. M. Billah, P. Das, squared automatic interaction detector decision tree analysis
J. N. Muna, and R. Ranjan, “A comprehensive study on pre- model,” SAGE Open Medicine, vol. 11, p. 205031212311609,
pruning and post-pruning methods of decision tree classifi- 1 2023.
cation algorithm,” in 2021 5th International Conference on [50] H. Prasetyono, A. Abdillah, T. Anita, A. Nurfarkhana, and
Trends in Electronics and Informatics (ICOEI), pp. 1339– A. Sefudin, “Identification of the decline in learning outcomes
1345, 2021. in statistics courses using the chi-squared automatic interaction
[32] Y. Manzali and P. M. E. Far, “A new decision tree pre- detection method,” Journal of Physics: Conference Series,
pruning method based on nodes probabilities,” in 2022 Interna- vol. 1490, p. 012072, 3 2020.
tional Conference on Intelligent Systems and Computer Vision [51] T. Hothorn, K. Hornik, and A. Zeileis, “Unbiased recursive
(ISCV), pp. 1–5, 2022. partitioning: A conditional inference framework,” Journal of
[33] S. Trabelsi, Z. Elouedi, and K. Mellouli, “Pruning belief deci- Computational and Graphical statistics, vol. 15, no. 3, pp. 651–
sion tree methods in averaging and conjunctive approaches,” 674, 2006.
International Journal of Approximate Reasoning, vol. 46, [52] N. Levshina, “Conditional inference trees and random forests,”
pp. 568–595, 12 2007. in A Practical Handbook of Corpus Linguistics, pp. 611–643,
[34] T. Lazebnik and S. Bunimovich-Mendrazitsky, “Decision tree Springer International Publishing, 2020.
post-pruning without loss of accuracy using the sat-pp al- [53] B. Schivinski, “Eliciting brand-related social media engage-
gorithm with an empirical evaluation on clinical data,” Data ment: A conditional inference tree framework,” Journal of
Knowledge Engineering, vol. 145, p. 102173, 2023. Business Research, vol. 130, pp. 594–602, 6 2021.
[35] E. Frantar and D. Alistarh, “SparseGPT: Massive language [54] N. Younas, A. Ali, H. Hina, M. Hamraz, Z. Khan, and S. Al-
models can be accurately pruned in one-shot,” in Proceedings dahmani, “Optimal causal decision trees ensemble for im-
of the 40th International Conference on Machine Learning proved prediction and causal inference,” IEEE Access, vol. 10,
(A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and pp. 13000–13011, 2022.
J. Scarlett, eds.), vol. 202 of Proceedings of Machine Learning [55] Z. Khan, A. Gul, O. Mahmoud, M. Miftahuddin, A. Perper-
Research, pp. 10323–10337, PMLR, 23–29 Jul 2023. oglou, W. Adler, and B. Lausen, “An ensemble of optimal trees
[36] B. Mahbooba, M. Timilsina, R. Sahal, and M. Serrano, “Ex- for class membership probability estimation,” in Analysis of
plainable artificial intelligence (xai) to enhance trust manage- large and complex data, pp. 395–409, Springer, 2016.
ment in intrusion detection systems using decision tree model,” [56] I. D. Mienye and Y. Sun, “A survey of ensemble learning: Con-
Complexity, vol. 2021, pp. 1–11, 2021. cepts, algorithms, applications, and prospects,” IEEE Access,
[37] S. J. Oh, B. Schiele, and M. Fritz, “Towards reverse- vol. 10, pp. 99129–99149, 2022.
engineering black-box neural networks,” Explainable AI: in- [57] Z. Zhang and C. Jung, “Gbdt-mo: Gradient-boosted decision
terpreting, explaining and visualizing deep learning, pp. 121– trees for multiple outputs,” IEEE Transactions on Neural Net-
144, 2019. works and Learning Systems, vol. 32, no. 7, pp. 3156–3167,
[38] E. Zihni, V. I. Madai, M. Livne, I. Galinovic, A. A. Khalil, 2021.
J. B. Fiebach, and D. Frey, “Opening the black box of artificial [58] M.-J. Jun, “A comparison of a gradient boosting decision tree,
intelligence for clinical decision support: A study predicting random forests, and artificial neural networks to model urban
stroke outcome,” Plos one, vol. 15, no. 4, p. e0231166, 2020. land use changes: the case of the seoul metropolitan area,”
[39] C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin, and International Journal of Geographical Information Science,
A. Zeileis, “Conditional variable importance for random vol. 35, pp. 2149–2167, 3 2021.
forests,” BMC bioinformatics, vol. 9, pp. 1–11, 2008. [59] V. A. Dev and M. R. Eden, “Formation lithology classification
[40] S. Syed Mustapha, “Predictive analysis of students’ learning using scalable gradient boosted decision trees,” Computers
performance using data mining techniques: A comparative Chemical Engineering, vol. 128, pp. 392–404, 2019.
study of feature selection methods,” Applied System Innova- [60] S. Demir and E. K. Sahin, “Comparison of tree-based machine
tion, vol. 6, no. 5, p. 86, 2023. learning algorithms for predicting liquefaction potential using
[41] S. Ben Jabeur, N. Stef, and P. Carmona, “Bankruptcy pre- canonical correlation forest, rotation forest, and random forest
diction using the xgboost algorithm and variable importance based on cpt data,” Soil Dynamics and Earthquake Engineer-
feature engineering,” Computational Economics, vol. 61, no. 2, ing, vol. 154, p. 107130, 3 2022.
pp. 715–741, 2023. [61] E. K. Sahin, I. Colkesen, and T. Kavzoglu, “A comparative
[42] J. R. Quinlan, “Data mining tools see5 and c5. 0,” http://www. assessment of canonical correlation forest, random forest, ro-
rulequest. com/see5-info. html, 2004. tation forest and logistic regression methods for landslide sus-
[43] L. Breiman, Classification and regression trees. Routledge, ceptibility mapping,” Geocarto International, vol. 35, pp. 341–
2017. 363, 10 2018.
[44] M.-M. Chen and M.-C. Chen, “Modeling road accident sever- [62] F. L. Seixas, B. Zadrozny, J. Laks, A. Conci, and D. C. M.
ity with comparisons of logistic regression, decision tree and Saade, “A Bayesian network decision model for supporting the
random forest,” Information, vol. 11, no. 5, 2020. diagnosis of dementia, Alzheimer s disease and mild cognitive
[45] D.-H. Lee, S.-H. Kim, and K.-J. Kim, “Multistage mr-cart: impairment,” Computers in biology and medicine, vol. 51,
Multiresponse optimization in a multistage process using a pp. 140–158, 2014.
classification and regression tree method,” Computers amp; [63] G. Obaido, B. Ogbuokiri, I. D. Mienye, and S. M. Kasongo,
Industrial Engineering, vol. 159, p. 107513, 9 2021. “A voting classifier for mortality prediction post-thoracic
[46] E. Belli and S. Vantini, “Measure inducing classification and surgery,” in International Conference on Intelligent Systems
regression trees for functional data,” Statistical Analysis and Design and Applications, pp. 263–272, Springer, 2022.
Data Mining: The ASA Data Science Journal, vol. 15, no. 5, [64] A. K. Pathak and J. Arul Valan, “A predictive model for
pp. 553–569, 2022. heart disease diagnosis using fuzzy logic and decision tree,” in
[47] H. Ishwaran, “The effect of splitting on random forests,” Advances in Intelligent Systems and Computing, pp. 131–140,
Machine learning, vol. 99, pp. 75–118, 2015. Springer Singapore, 12 2019.
[48] G. V. Kass, “An exploratory technique for investigating large [65] S. Maji and S. Arora, “Decision tree algorithms for predic-
quantities of categorical data,” Journal of the Royal Statistical tion of heart disease,” in Information and Communication
Society: Series C (Applied Statistics), vol. 29, no. 2, pp. 119– Technology for Competitive Strategies, pp. 447–454, Springer
127, 1980. Singapore, 8 2018.
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
[66] G. N. Ahmad, S. Ullah, A. Algethami, H. Fatima, and S. M. H. card fraud detection,” in 2024 2nd International Conference on
Akhter, “Comparative study of optimum medical diagnosis Disruptive Technologies (ICDT), pp. 567–570, 2024.
of human heart disease using machine learning technique [83] V. R. Adhegaonkar, A. R. Thakur, and N. Varghese, “Advanc-
with and without sequential feature selection,” IEEE Access, ing credit card fraud detection through explainable machine
vol. 10, pp. 23808–23828, 2022. learning methods,” in 2024 2nd International Conference on
[67] H. Ilyas, S. Ali, M. Ponum, O. Hasan, M. T. Mahmood, Intelligent Data Communication Technologies and Internet of
M. Iftikhar, and M. H. Malik, “Chronic kidney disease diagno- Things (IDCIoT), pp. 792–796, 2024.
sis using decision tree algorithms,” BMC Nephrology, vol. 22, [84] A. H. Nadim, I. M. Sayem, A. Mutsuddy, and M. S. Chowd-
8 2021. hury, “Analysis of machine learning techniques for credit card
[68] M. M. Ghiasi and S. Zendehboudi, “Application of decision fraud detection,” in 2019 International Conference on Machine
tree-based ensemble learning in the classification of breast can- Learning and Data Engineering (iCMLDE), pp. 42–47, 2019.
cer,” Computers in Biology and Medicine, vol. 128, p. 104089, [85] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid,
1 2021. and H. Zeineddine, “An experimental study with imbalanced
[69] I. D. Mienye and Y. Sun, “Effective feature selection for classification approaches for credit card fraud detection,” IEEE
improved prediction of heart disease,” in Pan-African Artifi- Access, vol. 7, pp. 93010–93022, 2019.
cial Intelligence and Smart Systems (T. M. N. Ngatched and [86] S. Nijman, A. Leeuwenberg, I. Beekers, I. Verkouter, J. Jacobs,
I. Woungang, eds.), (Cham), pp. 94–107, Springer Interna- M. Bots, F. Asselbergs, K. Moons, and T. Debray, “Missing
tional Publishing, 2022. data is poorly handled and reported in prediction model studies
using machine learning: a literature review,” Journal of Clinical
[70] W. Adler, O. Gefeller, A. Gul, F. K. Horn, Z. Khan, and
Epidemiology, vol. 142, pp. 218–229, 2022.
B. Lausen, “Ensemble pruning for glaucoma detection in an
[87] R. V. McCarthy, M. M. McCarthy, W. Ceccucci, and L. Halawi,
unbalanced data set,” Methods of Information in Medicine,
Predictive Models Using Decision Trees, pp. 123–144. Cham:
vol. 55, no. 06, pp. 557–563, 2016.
Springer International Publishing, 2019.
[71] I. D. Mienye, G. Obaido, K. Aruleba, and O. A. Dada, “En- [88] A. Mhasawade, G. Rawal, P. Roje, R. Raut, and A. Devkar,
hanced prediction of chronic kidney disease using feature “Comparative study of svm, knn and decision tree for diabetic
selection and boosted classifiers,” in International Conference retinopathy detection,” in 2023 International Conference on
on Intelligent Systems Design and Applications, pp. 527–537, Computational Intelligence and Sustainable Engineering So-
Springer, 2021. lutions (CISES), pp. 166–170, 2023.
[72] I. D. Mienye and Y. Sun, “Performance analysis of cost- [89] T. Wang, R. Gault, and D. Greer, “Cutting down high dimen-
sensitive learning methods with application to imbalanced sional data with fuzzy weighted forests (fwf),” in 2022 IEEE
medical data,” Informatics in Medicine Unlocked, vol. 25, International Conference on Fuzzy Systems (FUZZ-IEEE),
p. 100690, 2021. pp. 1–8, 2022.
[73] Z. Khan, A. Gul, A. Perperoglou, M. Miftahuddin, O. Mah- [90] Z. Azam, M. M. Islam, and M. N. Huda, “Comparative analy-
moud, W. Adler, and B. Lausen, “Ensemble of optimal trees, sis of intrusion detection systems and machine learning-based
random forest and random projection ensemble classification,” model analysis through decision tree,” IEEE Access, vol. 11,
Advances in Data Analysis and Classification, vol. 14, pp. 97– pp. 80348–80391, 2023.
116, 6 2019. [91] Y. Xia, “A novel reject inference model using outlier detection
[74] V. García, A. I. Marqués, and J. S. Sánchez, “Exploring the and gradient boosting technique in peer-to-peer lending,” IEEE
synergetic effects of sample types on the performance of Access, vol. 7, pp. 92893–92907, 2019.
ensembles for credit risk and corporate bankruptcy prediction,”
Information Fusion, vol. 47, pp. 88–101, 2019.
[75] N. Arora and P. D. Kaur, “A bolasso based consistent feature
selection enabled random forest classification algorithm: An
application to credit risk assessment,” Applied Soft Comput-
ing, vol. 86, p. 105936, 2020.
[76] “Enterprise credit risk prediction using supply chain informa-
tion: A decision tree ensemble model based on the differential
sampling rate, synthetic minority oversampling technique and
<scp>adaboost,” Expert Systems, no. 6.
[77] W. Liu, H. Fan, and M. Xia, “Credit scoring based on tree-
enhanced gradient boosting decision trees,” Expert Systems IBOMOIYE DOMOR MIENYE re-
with Applications, vol. 189, p. 116034, 2022. ceived the B.Eng. degree in electri-
[78] T. M. Alam, K. Shaukat, I. A. Hameed, S. Luo, M. U. Sarwar, cal and electronic engineering and the
S. Shabbir, J. Li, and M. Khushi, “An investigation of credit M.Sc. degree (cum laude) in computer
card default prediction in the imbalanced datasets,” IEEE systems engineering from the Univer-
Access, vol. 8, pp. 201173–201198, 2020. sity of East London, in 2012 and 2014,
[79] J. T. Hancock and T. M. Khoshgoftaar, “Gradient boosted respectively, and the PhD degree in
decision tree algorithms for medicare fraud detection,” SN electrical and electronic engineering
Computer Science, vol. 2, 5 2021. from the University of Johannesburg,
[80] Y. Wang, Y. Zhang, Y. Lu, and X. Yu, “A comparative assess- South Africa. His research interests in-
ment of credit risk model based on machine learning ——a clude machine learning and deep learning for finance and health-
case study of bank loan data,” Procedia Computer Science, care applications.
vol. 174, pp. 141–149, 2020. 2019 International Conference
on Identification, Information and Knowledge in the Internet
of Things.
[81] M. Seera, C. P. Lim, A. Kumar, L. Dhamotharan, and K. H.
Tan, “An intelligent payment card fraud detection system,”
Annals of Operations Research, vol. 334, pp. 445–467, 6 2021.
[82] A. Rawat, S. S. Aswal, S. Gupta, A. P. Singh, S. P. Singh, and
K. C. Purohit, “Performance analysis of algorithms for credit
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3416838
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/