0% found this document useful (0 votes)

47 views19 pages

A Modified ID3 Decision Tree Algorithm Based On Cumulative

This paper introduces a modified ID3 decision tree algorithm that utilizes cumulative residual entropy (CRE) as an alternative to Shannon entropy. The authors evaluate the performance of their proposed algorithm (CRDT) against the traditional ID3 algorithm and other decision tree models, demonstrating that CRDT offers improved accuracy and efficiency. The study highlights the advantages of using CRE, particularly in handling continuous target variables without the need for discretization, thereby enhancing the decision-making process in data mining.

Uploaded by

vinaybuddyy2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views19 pages

A Modified ID3 Decision Tree Algorithm Based On Cumulative

Uploaded by

vinaybuddyy2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Expert Systems With Applications 255 (2024) 124821

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

A modified ID3 decision tree algorithm based on cumulative

residual entropy
Somayeh Abolhosseini a , Mohammad Khorashadizadeh a,* , Majid Chahkandi a ,
Mousa Golalizadeh b
a
Department of Statistics, University of Birjand, Birjand, Iran
b
Department of Statistics, Tarbiat Modares University, Tehran, Iran

A R T I C L E I N F O A B S T R A C T

Keywords: In this paper, we propose a new modification of the traditional ID3 decision tree algorithm through cumulative
Cumulative Residual Entropy (CRE) residual entropy (CRE). We discuss the principles of decision trees, including entropy and information gain, and
ID3 Decision tree algorithm introduce the concept, properties, and advantages of CRE as an alternative measure of Shannon entropy in de-
Information gain
cision trees. Also, by running the proposed decision tree algorithm (named CRDT) and ID3 decision tree algo-
Machine learning
rithm on ten real datasets, we evaluate and compare the accuracy and efficiency of the two decision trees
algorithm using appropriate criteria which indicates that the performance of the CRDT is much more accurate
and closer to reality than the ID3. Furthermore, we compare the performance of three decision tree-based al-
gorithms, CRDT, CART, and Random Forest via MSE, RMSE, R-Square and training time criteria. The results
show the superiority of new proposed model compared to alternatives.

1. Introduction hyperplane of the tree division is parallel to the axis or diagonal, en-
compasses two subjects: univariate decision trees (UDTs) and multi-
Over recent years, the progress of technology and the widespread variate decision trees (MDTs). To construct univariate decision trees,
utilization of computers for data recording have significantly augmented numerous division criteria have been put forth. Noteworthy examples of
the volume of data generated across various disciplines. Accordingly, such measures can be found in the well-known algorithms ID3, C4.5,
researchers leverage these recorded data to advance and refine their CART, and CHAID. The nodes within a decision tree can be categorized
undertakings. Consequently, there exists a demand for an intelligent tool into two distinct types: decision nodes and leaf nodes. At each decision
that can effectively organize the data and extract pertinent information. node, the most optimal local feature is chosen to partition the data into
This concern has instigated the emergence of scientific disciplines such child nodes. This process is iteratively carried out until a leaf node is
as data mining. It is evident that as the volume of data expands, the reached, at which point further partitioning is rendered unfeasible. The
significance of these tools proportionally escalates. Generally, data selection of the best feature is contingent upon a criterion that evaluates
mining methods can be categorized into two distinct groups, each the effectiveness of a segmentation. More details can be found in Mai-
providing a tailored solution based on the nature of the problem. See, e. mon and Rokach (2014).
g. Han et al. (2012) and Coenen (2011). One of the most commonly used measures for dividing data into
An exemplary instance of a data mining method is the decision tree. segments is information gain (IG), based on impurity. Decision trees
This method is regarded as a valuable and potential tool in categorizing based on IG exhibit exceptional performance when handling datasets
datasets for forecasting (Quinlan, 1986). The decision tree ultimately with a balanced distribution of classes. However, in the case of an
showcases the identified patterns in the form of regulations. Generally, imbalanced dataset, IG tends to favor the majority class due to its reli-
the decision tree is deemed such that its output can be observed as a tree ance on the probability of the previous type. Following Drummond and
structure encompassing a collection of nodes and leaves. In a broad Holte (2000), this phenomenon is commonly referred to as Chula
sense, the establishment of trees, contingent upon whether the sensitivity. To tackle the issue of imbalanced classes and improve the

* Corresponding author.
E-mail addresses: [email protected] (S. Abolhosseini), [email protected] (M. Khorashadizadeh), [email protected]
(M. Chahkandi), [email protected] (M. Golalizadeh).

https://doi.org/10.1016/j.eswa.2024.124821
Received 3 April 2024; Received in revised form 4 July 2024; Accepted 12 July 2024
Available online 15 July 2024
0957-4174/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

performance of decision trees, Akash et al. (2019) introduced a parti- definitions of the decision tree are mentioned. The Section 3 introduces
tioning criterion known as Hellinger inter-node distance (iHD). This important and recent decision tree algorithms. The main results of the
criterion assesses the distance between the parent and child nodes using article is discussed in Section 4. The evaluation methods of the presented
Hellinger distance measures. models are mentioned in Section 5. In Section 6, a series of real dataset is
The commonly employed measure of uncertainty utilized in formu- used to show the capability of the new decision tree.
lating decision trees, such as information gain, is Shannon’s entropy, as
established by Quinlan (1986). Highlighting the limitations of decision 2. Preliminaries
trees constructed using Shannon’s entropy, Maszczyk and Duch (2008)
proposed the Renyi and Tsallis entropies as alternatives to enhance the The decision tree possesses several properties. A brief discussion on
overall performance of the C4.5 decision tree. This particular approach them are given below. The reader can consult for more details on Mai-
can be applied to any decision tree and can subsequently be considered mon and Rokach (2014).
in the information selection algorithm. Additionally, Sharma et al.
(2013) conducted a comparative analysis of the C4.5 decision tree al- 1- The decision tree partitions the data into distinct groups, ensuring no
gorithm using various entropies to assess their respective performances. data is eliminated during the classification process. Moreover, the
To enhance the classification precision of the ID3 algorithm, Ade- data set in the parent node remains identical to the aggregate data in
wole and Udeh (2018) implemented the algorithm as mentioned above, the child nodes.
using quadratic entropy placement. The results reported by them indi- 2- Due to its graphical depiction, this method facilitates comprehension
cated that the utilization of quadratic entropy in the ID3 algorithm led to of the obtained outcomes for individuals of any background,
a noteworthy improvement in its accuracy domains when compared to enhancing the popularity of this approach.
Shannon entropy. Wang et al. (2015) and Wang and Xia (2017) 3- A decision tree can be effectively employed for both continuous and
demonstrated that the division criteria employed in ID3 decision tree discrete data.
algorithms can be standardized within the Tsallis entropy framework, 4- In supervised learning, the presence of the target variable necessi-
alongside with C4.5 and CART. Furthermore, to augment the perfor- tates the identification of explanatory variables that exert a signifi-
mance of the decision tree, they introduced a novel decision tree rooted cant influence on the classification of the predictive model. As
in Tsallis entropy, dubbed UTCDT. pointed out in Han et al. (2012), to ascertain the variables that
The ID3 algorithm is prone to favoring attributes with numerous possess a substantial impact on prediction and classification, a de-
values. Jin et al. (2009) addressed this issue by incorporating an asso- cision tree can be employed.
ciation function (AF) to enhance feature selection. Their experimental
findings demonstrate the efficacy of this approach in rectifying the Suppose a dataset comprises a collection of characteristics as
limitations of ID3 and generating more rational and impactful rules. explanatory variables and a distinctive label as a target variable.
Cheng et al. (1988) introduces GID3, a generalized version of the ID3 Depending on whether the target variable is continuous or discrete,
algorithm, addressing the issue of overspecialization in ID3. By identi- decision trees can be categorized as follows (Maimon & Rokach, 2014):
fying two causes of overspecialization, the authors develop GID3 and
apply it to automate the Reactive Ion Etching (RIE) process in semi- 1- Classification trees: These trees yield a discrete set as the output.
conductor manufacturing. The empirical results demonstrate GID3′s 2- Regression trees: These trees yield an actual number as the output.
superiority over ID3 across various performance measures, with mini-
mal increase in computational complexity. Xu et al. (2006) introduces The structure of a decision tree consists of a root node positioned at
ID3+, an enhanced decision tree algorithm designed to overcome limi- the top and leaves situated at the bottom of the tree. The initial dataset is
tations of the traditional ID3 approach. By incorporating techniques like placed in the root node, followed by a test to divide the node. Several
autonomous backtracking and handling unknown attribute values, ID3 methods are available to select the initial test, all with the same objec-
+ demonstrates improved robustness and efficacy in decision tree tive. These methods strive to choose the most effective way for sepa-
learning systems, as evidenced by empirical experiments. rating the target classes. This process continues until the sample arrives
Wang et al. (2017) put forth a binomial entropy optimization (TEIM) at a leaf node. All samples grouped in a leaf are regarded as a distinct
algorithm that incorporates a novel splitting criterion and a fresh con- class. Thus, decision tree-building algorithms utilize the divide-and-
struction method for decision trees. The novel partitioning criterion is conquer approach to construct the decision tree. See, e.g., (Han et al.,
rooted in the conditional entropy of the Tsallis binomial, which out- 2012) for more details.
performs the conventional monomial partitioning criterion. Chaji In decision tree algorithms, it is crucial to consider the following
(2023) introduced a partitioning approach founded on the t-entropy inquiries:
measure. The efficacy of the proposed approach was examined on three
data sets, and the outcomes demonstrated that this approach exhibits a 1- How can the most suitable characteristic be chosen to partition the
more precise performance compared to the renowned Gini index, Tsallis dataset at each node?
entropies, Shannon, and Rényi methods. Singh and Chhabra (2021) 2- How does the algorithm for constructing the tree determine when to
devised a novel partitioning approach that combines the Gini index and stop?
entropy to generate a decision tree. This innovative approach has been
labeled as EGIA. The prevalent test design utilized in classification models involves
In this paper, we propose a novel modification to the ID3 decision the complete random partitioning of the dataset into two subsets:
tree algorithm using cumulative residual entropy (CRE). In these mod- training data and testing data. The model is trained using the training
ifications, the main goal is to circumvent the need to discretize contin- data, which consists of a set of input features and corresponding class
uous target variable, which reduces the information. In other words, the labels. Subsequently, the class label of a sample from the test set is
new model is designed in such a way that the information in all indi- predicted. The process of dividing the existing dataset into training and
vidual observations of the target variable is used to create a decision tree test groups is employed after the creation and evaluation of the tree
without discretization. It should be noted that in all generalizations and model (Maimon & Rokach, 2014).
expansions of the ID3 decision tree algorithm, the problem of dis- Decision tree algorithms constantly endeavor to select the optimal
cretization and information reduction still exists. Definitely, our feature from the available features. As discussed in (Han et al., 2012),
approach will increase the efficiency and accuracy of the tree. The rest of the most commonly used criteria for feature selection include informa-
the paper is organized as follow. In the Section 2, the basic concepts and tion gain, Gini index, gain ratio, and likelihood ratio.

2
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Information gain is a well-known criterion utilized in the construc- current node falls below a specified threshold. It is important to note that
tion of decision trees. It is expressed based on Shannon entropy. Math- the CHAID algorithm employs no heuristic method and cannot handle
ematically, we can write: incomplete values (Han et al., 2012).
InformationGain(A) = Entropy(D) − EntropyA (D). (1)
3. Main results
Eq. (1) computes the intended measure for a characteristic, denoted by
A, while D signifies the objective variable found within the training In various algorithms, such as ID3 and C4.5, discretization becomes
dataset. The initial and subsequent components within the assertion necessary when dealing with continuous variables, leading to the loss of
above, are evaluated according to the subsequent method: certain information. To circumvent the need for discretizing the target
variable and enhance the accuracy of the decision tree, we propose a
c
∑
Entropy(D) = − pi × log2 pi , novel algorithm. This algorithm eliminates the requirement of dis-
i=1 cretizing the target variable and results in a tree that exhibits improved
⃒ ⃒ accuracy and efficiency.
∑ Dj
ν ⃒ ⃒ ( )
EntropyA (D) = × Entropy Dj . Rao et al. (2004) introduced an alternative metric for measuring
|D|
j=1 uncertainty in dealing with the continuous and discrete variables. To
give more details on this, we provide some background materials as
Where Entropy(D) is Shannon’s entropy of the target variable, c de- follows.
termines the number of classes in the training data set, pi is the proba-
bility of a data sample belonging to the i-th class, and EntropyA (D) is the Definition 3.1. (Suppose X be a non-negative continuous random variable
conditional entropy of the target variable under the condition of the with survival function F(x) = P(X > x). The cumulative residual entropy
explanatory variable A where, ν is the number of branches, and Dj is a (CRE) is defined as follows:)
part of the primary data whose characteristic value is vi . Also |D| means ∫∞
the size of D. ε(X) = − F(x)logF(x)dx.
0

2.1. Decision tree algorithms Also, if X is a discrete non-negative random variable with survival
function F(x), x0 <x1 <⋯<xb and b≤∞, then the CRE is defined as fol-
There exist numerous algorithms for constructing decision trees. In lows (Baratpour & Bami, 2012):
this section, we will provide some of the most well-known algorithms.
b
∑
2.1.1. The ID3 algorithm ε(X) = − P(X⩾xi )logP(X⩾xi )(xi − xi− 1 ).
i=1
This particular algorithm, initially introduced by Quinlan (1986), is
one of the most prevalent and simplistic methods for constructing de-
Properties of the CRE can be found in Rao (2005), Di Crescenzo and
cision trees. The criterion for feature selection in this algorithm is known
Longobardi (2009), and Navarro et al. (2010).
as information gain. The ID3 algorithm terminates either when all
The main idea of this article is to use the cumulative residual entropy
remaining samples belong to the same class or when the value of the best
(CRE) instead of the Shannon entropy in the ID3 algorithm decision tree
information gain criterion has been determined. Notably, this algorithm
without the need to discretize (loss of information) the target variable.
does not employ a pruning technique and can handle quantitative fea-
Because the CRE calculation is based on all data, it can be seen that the
tures and incomplete data.
new method is closer to reality and make sense while employing in real
application. The main reasons to consider the CRE in our research are as
2.1.2. The C4.5 algorithm
follows:
As a generalization of the ID3 algorithm, the C4.5 algorithm, intro-
duced by Quinlan (1996), utilizes the Gain ratio index to select features
• Although Shannon’s entropy for the continuous random variables
for dividing and constructing a decision tree. When the number of ∫
has also been expanded with H(F) = − f(x)logf(x)dx, it should be
samples is less than a specified threshold, the C4.5 algorithm provides
mentioned that, this entropy can be calculated provided the density
some sensible solutions. In contrast to the ID3 algorithm, this algorithm
function of the data has a closed form. Moreover, estimating the
incorporates post-pruning methods. Furthermore, similar to the ID3 al-
density function is complicated and sometimes deriving it somehow
gorithm, it can accommodate small amounts of data as input, with the
impossible. Also, the Shannon entropy of a discrete distribution is
ability to adapt to incomplete data with slight modifications.
always positive. However, the differential entropy of a continuous
variable may take negative values. In addition, the differential en-
2.1.3. The CART algorithm
tropy estimator is inconsistent. To summarize some essential prop-
Breiman (2017) introduced the CART algorithm to establish a
erties of the CRE, we can highlight:
connection between regression and classification trees. This algorithm
• For continuous and discrete variables, the CRE estimator is
produces a binary decision tree, where each node has two branches
consistent.
based on the splitting criterion. A pruning technique is incorporated into
• CRE is always non-negative.
this algorithm. Additionally, the CART algorithm can generate regres-
• The CRE can be easily calculated from the sample data, and the
sion trees, where the leaf nodes predict a real number as a class label
calculations converge asymptotically to the actual values.
(Coenen, 2011).
• There is no need to estimate the density function to calculate CRE. As
a result, the work can be carried out with high accuracy without
2.1.4. The CHAID algorithm
losing data (Rao et al., 2004).
In the 1970s, applied statistics researchers developed several algo-
rithms for generating and constructing decision trees. Notable examples
A relationship can be established between the CRE and the Shan-
include AID, MAID, THAID, and CHAID. The CHAID algorithm (Kiss,
non’s entropy by the equilibrium distribution of X, with density function
2003) was initially designed for nominal variables. This algorithm em-
F(x)
ploys various statistical tests depending on the type of class label. The fe (x) = μ , where μ = E(X). The concept of equilibrium distribution,
termination condition for the CHAID algorithm is either reaching a which originated from renewal theory, has been pivotal in reliability
predetermined maximum depth or when the number of samples in the theory and various other fields. It serves as a fundamental tool for

3
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

establishing theoretical results, defining aging concepts, conducting ⃒ ⃒

stochastic ordering tests, determining replacement policies, and more
ν ⃒
∑ Y{xi =j} ⃒ ( )
CREXi (Y) = × CRE Y{xi =j}
(See, Chatterjee & Mukherjee, 2001). If Y be the equilibrium distribution j=1
|Y|
of X, the connection between H(Y) and ε(X) can be expressed as a crucial ∑ν ∑ nj ⃒ ⃒
⃒Y{x =j} ⃒ ( ) ( )
link in various theoretical and practical applications, underpinning the =− i
P Y{xi =j} ⩾yk logP Y{xi =j} ⩾yk (yk − yk− 1 ) (4)
|Y|
foundation of reliability theory and related disciplines. The relationship j=1 k=1

between them is (Rao et al., 2004):

Where ν indicates the number of Xi levels, Y{xi =j} and nj are respectively
ε(X) the corresponding data set and size of the target variable, which is a Xi
H(Y) = logμ − .
μ variable at that level j. Also |Y| means the size of Y. The steps of pre-
paring the decision tree are as follows:
Let x1 , x2 , ..., xn be the observations of the continuous variable with
Step 1. Estimate the CRE of the target variable ( ε̂ (Y)) via (2).
survival function F(x). A simple, unbiased, and consistent estimate of Step 2. Estimate the conditional CRE of each qualitative feature
F(x) is based on the empirical survival function of the form, ̂ F (t) = n random variables (CREXi (Y)) via estimator of (4).
Step 3. Compute the information gain of each features via (3).
n , where #(xi> t) means the number of observations greater than t.
#(xi >t)

So, we can have the following estimator of CRE based on the sample Step 4. The feature with maximum information gain will be selected
data: as the dividing criterion at the root of the tree and the training data set
will split due to number of levels of root of the tree and the same process
n
∑ ( ) ( ( ) )( ) of Step 1 to 4 continues until the final tree is obtained.
ε̂
(X) = − F n x(i) log ̂
̂ F n x(i) x(i+1) − x(i)
i=1
(2) 4. Assessment
n ( ) ( )
∑ i i ( )
=− 1− log 1 − x(i+1) − x(i) .
n n
i=1 One of the critical challenges in the KDD (Knowledge Discovery in
Databases) process lies in formulating effective measures to assess the
Following Kaplan and Meier (1958), another estimator of CRE can be
quality of analytical results. In addition, decision tree performance
written based on the Kaplan-Meier estimator of survival function given
∏ evaluation represents a fundamental aspect of analysis based on ma-
as (F(t) = ti ≤t ni n− i di ). chine learning principles. Although there are several criteria for evalu-
ating the predictive performance of decision trees, additional factors
3.1. New modified ID3 decision tree algorithm such as computational complexity and comprehensibility of the result-
ing tree may also be important. In the following, some of these criteria
Suppose in a scientific investigation, the target variable (Y), and the are used to evaluate the performance of the models. Criteria such as
k features represented as X1 , …, Xk building the training data set of size n accuracy, confusion matrix, F-score and ROC curve are used to check
are given as follows: and evaluate the performance between ID3 and CRDT, and criteria of
⎛ ⎞ MSE, RMSE, R-squared and training time are used to check and evaluate
y1 x11 x21 ⋯ xk1
⎜ y2 x12 x22 ⋯ xk2 ⎟ the performance of Random Forest, CRDT and CART. To read more
⎜
⎝⋮ ⋮ ⋮ ⋱ ⋮ ⎠
⎟. about these criteria, refer to (Han et al., 2012).
yn x1n x2n ⋯ xkn
5. Comparisons and application to real datasets
To make model construction precise, suppose that Y is a continuous
random variable and Xi ’s are qualitative (categorical) random variables. In this section, in two separate subsections, the proposed decision
In our proposed new decision tree algorithm, which we denoted by tree (CRDT) is compared with ID3, CART and Random Forest based on
CRDT, the criterion of information gain Xi is calculated based on the CRE numerous real datasets and different evaluation indices.
as follows:
InformationGain(Xi ) = CRE(Y) − CREXi (Y), (3) 5.1. Comparing CRDT with ID3
∑n ) ( ( )( )
where CRE(Y) = − i=1 P Y⩾yi logP Y⩾yi yi − yi− 1 is the cumulative In this section, we conducted an in-depth analysis using ten diverse
residual entropy of the target variable in the training data set, and datasets, each exhibiting variations in record count and field composi-
CREXi (Y) is the conditional cumulative residual entropy, which is tion. To ensure the reliability and integrity of our findings, we embarked
calculated as follows: on a comprehensive data preparation journey, meticulously cleaning the
datasets to eliminate any inconsistencies or outliers. Additionally, we
employed the technique of cross-validation for evaluation of robust
model. By systematically partitioning the data and training our models

Table 1
The first ten observations of the data.
Student Sex Address Medu Fedu Mjob Paid Romantic Goout Score

1 F C 4 4 housewife No No 4 6
2 F C 1 1 housewife No No 3 6
3 F C 1 1 housewife Yes No 2 10
4 F C 4 2 health Yes Yes 2 15
5 F C 3 3 other Yes No 2 10
6 F C 4 3 services Yes No 2 15
7 M C 2 2 other No No 4 11
8 M C 4 4 other No No 4 6
9 F C 3 2 services Yes No 2 19
10 F C 3 4 other Yes No 1 15

4
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 1. The CRDT decision tree (First dataset).

Fig. 2. The ID3 decision tree (First dataset).

5
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 3. ROC curve of ID3 (a) and CRDT (b) (First dataset).

Table 2
Comparing and assessment criteria of the ID3 and CRDT decision tree (First
dataset).
Accuracy Precision Recall F1-score

CRDT 0.84 0.85 1 0.92

ID3 0.54 0.51 0.5 0.5

on multiple subsets, we not only validated their performance but also

guarded against overfitting, a common pitfall in predictive modeling.
Furthermore, recognizing the impact of hyper-parameters on model ef-
ficacy, we engaged in meticulous tuning efforts, striving to optimize
model performance across all fronts. Through this holistic approach, we
aimed to provide a thorough and nuanced evaluation, encompassing all
relevant dimensions to ensure the reliability and generalizability of our
conclusions.
For each following data set, the newly proposed algorithm, referred
to as CRDT, is formulated by the subsequent procedures:
Fig. 5. The ID3 decision tree (Second dataset).

The dataset is divided into two separate groups, namely training and
testing. The distribution ratio between these groups is determined Using testing part of data, the predictive performance of decision
according to the amount of data in each data set. trees is evaluating based on criterion studied on Section 4.
Performs Steps 1 to 4 described in Section 3.1 for the training part of
data of the previous step. To analyze the proposed dataset, both ID3 and CRDT decision trees
Ultimately, the target variable is categorized after the algorithm, were coded and implemented using R software (Appendix A present the
resulting in the creation of the final leaf of the decision tree. R’s codes of CRDT). The results of these implementations are presented
below.

Fig. 4. The CRDT decision tree (Second dataset).

6
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 6. ROC curve of ID3 (a) and CRDT (b) (Second dataset).

3- Address: The address variable denotes the place of residence, dis-

Table 3
Comparing and assessment criteria of the ID3 and CRDT decision tree (Second
tinguishing between urban areas (City (C)) and rural areas (Rural
dataset). (R)), also a categorical variable.

Accuracy Precision Recall F1-score

CRDT 0.88 0.5 1 0.96

ID3 0.78 0.77 0.88 0.82

5.1.1. First dataset

The first dataset used is student.mat.csv retrieved from the Kaggle
repository (Cortez & Silva, 2008). The dataset includes demographic
and social data on 396 Portuguese students. These data were obtained
on the academic achievement of students in secondary mathematics
education in two Portuguese schools, taking into account different de-
mographic and social characteristics. The collection of this data was
achieved using school reports and questionnaires, which have a
continuous objective variable and eight descriptive qualitative charac-
teristics as follow:

1- Score: The score ranges from 0 to 20, serving as the target variable
for this study.
2- Sex: The sex variable is a categorized variable set as either female (F)
or male (M).
Fig. 8. The ID3 decision tree (Third dataset).

Fig. 7. The CRDT decision tree (Third dataset).

7
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 9. ROC curve of ID3 (a) and CRDT (b) (Third dataset).

Table 4
Comparing and assessment criteria of the ID3 and CRDT decision tree (Third
dataset).
Accuracy Precision Recall F1-score

CRDT 0.75 0.73 0.85 0.9

ID3 0.72 0.7 0.73 0.71

Fig. 11. The ID3 decision tree (Fourth dataset).

5- Fedu: The Fedu variable signifies the father’s level of education,

categorized as none, elementary education (1 to 4), or high school
education (5 to 9).
6- Mjob: The Mjob variable characterizes the mother’s occupation,
encompassing roles such as housewife, health and related care, ser-
vice (police and administrative), teacher, or other.
7- Paid: The Paid variable indicates whether the student has taken
additional related classes, with options being Yes or No.
8- Romantic: The Romantic variable determines the presence of a
romantic relationship, with choices being Yes or No.
Fig. 10. The CRDT decision tree (Fourth dataset). 9- Goout: The Goout variable assesses the frequency of going out with
friends, classifying it on a scale from very little (1) to very much (5).
4- Medu: The Medu variable represents the level of education attained
by the mother, with options ranging from none to high school (1 to 4 The first ten observations of the data are presented in Table 1.
for elementary education and 5 to 9 for high school education) The resulting decision tree of ID3 and CRDT are presented in Figs. 1
and 2.
Furthermore, the graphical representation of the receiver operating

8
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 12. ROC curve of ID3 (a) and CRDT (b) (Fourth dataset).

Table 5
Comparing and assessment criteria of the ID3 and CRDT decision tree (Fourth
dataset).
Accuracy Precision Recall F1-score

CRDT 1 0.75 1 0.9

ID3 0.81 0.8 0.86 0.82

Fig. 14. The ID3 decision tree (Fifth dataset).

5.1.2. Second data set

The second dataset is California housing taken from the 1990 US
Census (Pace & Barry, 1997). This collection contains information about
residential areas in California, making it valuable for various regression
and predictive modeling tasks related to real estate and demographics.
The set contains 20,640 records and 10 attributes, including a target
variable. The output of decision trees diagrams, ROC curves and eval-
uation criteria are presented below. All the results indicate that the
CRDT tree has better performance and efficiency than the ID3 tree
(Figs. 4–6 and Table 3).
Fig. 13. The CRDT decision tree (Fifth dataset).

5.1.3. Third data set

characteristic (ROC) curve for the two decision trees (shown in the The third dataset contains red wine samples from northern Portugal
Fig. 3) highlights the fact that the introduced model has a higher (Cortez et al., 2009). The purpose of wine quality modeling is based on
performance. physicochemical tests. This dataset contains 4898 records and 11 fea-
Table 2 depicts the comparison between two decision trees, parti- tures. The similar outputs of previous datasets are presented below,
tioned based on four different metrics. Upon careful examination, it which indicates the better performance and efficiency of CRDT decision
becomes evident that the CRDT decision tree demonstrates a higher tree (Figs. 7–9 and Table 4).
level of efficiency compared to the ID3 decision tree.
5.1.4. Fourth data set
The fourth data set is titled “compressive strength of concrete” (Yeh,

9
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 15. ROC curve of ID3 (a) and CRDT (b) (Fifth dataset).

Table 6
Comparing and assessment criteria of the ID3 and CRDT decision tree (Fifth
dataset).
Accuracy Precision Recall F1-score

CRDT 1 0.52 1 0.66

ID3 0.73 0.73 0.74 0.74

1998). This dataset contains 1030 samples with 8 quantitative input

variables and 1 quantitative output variable. The input variables
represent the different materials used in the concrete mix, such as
cement, blast furnace slag, fly ash, water, superplasticizer, coarse
aggregate, and fine aggregate. The output variable is the compressive
strength of concrete, which is measured in megapascals (MPa). The Fig. 17. The ID3 decision tree (Sixth dataset).
similar outputs of previous datasets are presented below (Figs. 10–12
and Table 5). target variable is the age of the abalone (Figs. 13–15 and Table 6).

5.1.5. Fifth data set 5.1.6. Sixth data set

The fifth dataset used is Abalone (Nash et al., 1995). This is a dataset The sixth data set used is related to the fish market (Rathod, 2022),
that contains information about the physical characteristics of abalone, which is used to estimate fish weight based on species and physical
a type of sea snail, and aims to predict the age of abalone based on these measurements. This dataset contains 159 samples with 7 features. The
measurements. This dataset contains 4177 samples with 8 features. The similar outputs of previous datasets are presented below, which

Fig. 16. The CRDT decision tree (Sixth dataset).

10
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 18. ROC curve of ID3 (a) and CRDT (b) (Sixth dataset).

Table 7
Comparing and assessment criteria of the ID3 and CRDT decision tree (Sixth
dataset).
Accuracy Precision Recall F1-score

CRDT 0.85 0.93 0.85 0.89

ID3 0.89 0.85 0.96 0.9

Fig. 20. The ID3 decision tree (Seventh dataset).

tree (Figs. 19–21 and Table 8).

5.1.8. Eighth data set

The eighth dataset is related to forest fire data (Cortez & Morais,
2007), which aims to predict the burned area of forest fires in the
northeastern region of Portugal using meteorological and other data.
The dataset contains 517 samples and 12 features including tempera-
ture, relative humidity, wind and rain. The similar outputs of previous
datasets are presented below, which indicates the better performance
Fig. 19. The CRDT decision tree (Seventh dataset). and efficiency of CRDT decision tree (Figs. 22–24 and Table 9).

indicates the better performance and efficiency of CRDT decision tree 5.1.9. Ninth data set
(Figs.16–18 and Table 7). The ninth dataset contains information on births and deaths (Stats,
2024) in New Zealand for the year ending December 2022. The dataset
5.1.7. Seventh data set contains 897 records with 4 attributes. The similar outputs of previous
The Seventh dataset used is related to the “Energy Efficiency” dataset datasets are presented below, which indicates the better performance
(Tsanas & Xifara, 2012), which is used to assess the heating load and and efficiency of CRDT decision tree (Figs. 25–27 and Table 10).
cooling load requirements of buildings (that is, energy efficiency) as a
function of building parameters. This dataset contains 768 samples with 5.1.10. Tenth data set
8 features. The similar outputs of previous datasets are presented below, The tenth dataset contains information about the performance of the
which indicates the better performance and efficiency of CRDT decision insurance agency (Moneystore, 2022). The dataset contains details of an

11
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 21. ROC curve of ID3 (a) and CRDT (b) (Seventh dataset).

Table 8 order to improve existing knowledge used for agent segmentation in a

Comparing and assessment criteria of the ID3 and CRDT decision tree (Seventh supervised forecasting framework (Figs. 28–30 and Table 11).
dataset). Remark 5.1. One of the key advantages of decision tree models is
Accuracy Precision Recall F1-score their interpretability—they are considered “white-box” models that
CRDT 0.95 0.83 0.84 0.88
provide insight into how predictions work. However, a limitation of ID3
ID3 0.98 0.5 1 0.68 is that the trees it produces may not always grow optimally, and thus it is
not an optimal set of decision rules, which the dataset used in this paper
acknowledges. In contrast, our proposed algorithm is able to build
stronger and more informative decision trees. By considering additional
factors beyond mere information, our method can better identify the
most relevant features and make more accurate predictions. This is re-
flected in the top performance metrics we observed. In summary, our
results show that the new decision tree algorithm we developed is a
robust alternative to ID3 that provides improved predictive power while
still maintaining the interpretability that makes decision trees valuable
tools for classification tasks. The ability to understand the reasoning
behind predictions is critical in many real-world applications, so having
a high-performance interpretable model is a significant advantage.

5.2. Comparing CRDT with CART and Random Forest

In this subsection, we conducted a comprehensive analysis using five

real datasets, each exhibiting unique characteristics in terms of record
count and field composition. We aim to compare the performance of
three decision tree-based algorithms, CRDT, CART, and Random Forest.
These models represent different approaches to decision tree-based
learning, and by evaluating their relative strengths and weaknesses,
we can provide a thorough and nuanced assessment to address the re-
viewer’s feedback. Through this holistic approach, we strive to ensure
the reliability and generalizability of our conclusions. In this evaluation,
Fig. 22. The CRDT decision tree (Eighth dataset). we use the criteria of MSE, RMSE, R-squared and Training time
(Table 12).
insurance group consisting of 10 property and casualty, life insurance, We have compared the performance of CRDT, CART, and Random
and insurance brokerage companies operating in a 17-state region. The Forest algorithms on five real datasets using various evaluation metrics
group is represented by more than 4,000 independent agents in a six- such as mean square error (MSE), root mean square error (RMSE), R-
state area. The goal is to analyze agent performance based on various square, and training time. The results show that CRDT performs better in
attributes such as demographics, products sold, new business, etc. in MSE, RMSE, and R-Square in 40 % of the datasets and performs similarly
to Random Forest and CART in the remaining datasets. Therefore,

12
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 23. The ID3 decision tree (Eighth dataset).

Fig. 24. ROC curve of ID3 (a) and CRDT (b) (Eighth dataset).

13
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Table 9 interpretability. Overall, our results show that CRDT is a promising

Comparing and assessment criteria of the ID3 and CRDT decision tree (Eighth alternative for regression problems that offers a good balance between
dataset). performance, interpretability, and computational complexity.
Accuracy Precision Recall F1-score

CRDT 0.99 0.85 0.99 0.62

5.2.1. Feature importance
ID3 0.43 0.5 0.5 0.51 Feature importance is a valuable tool for evaluating the performance
and features of different machine learning models. By analyzing the
relative importance of input features, we can gain insights into the inner
considering that CRDT is theoretically and cognitively closer to reality workings of models and make more informed decisions about their
(because it is obtained without discretization and loss of information), suitability for a given problem. In this study, we compared the feature
CRDT is preferred over the other two trees. Regarding the training time, importance ratings of CRDT and Random Forest models on three data-
our results show that the CRDT training time is not far from the Random sets: lastmat, WineQuality, and Concrete-Compressive-Strength
Forest algorithm and in some cases it is faster. Although training time is (Figs. 31–33).
an important factor, the predictive power and interpretability of the The feature importance analysis provided several key insights:
model should also be considered when choosing the most appropriate Consistency in Top Features: Both CRDT and Random Forest models
algorithm for a particular task. By adding the analysis of training time, consistently identify the same top features as the most important pre-
the comparison of algorithm performance will be more comprehensive dictors in the dataset. This shows that these characteristics have a great
in terms of computational efficiency in addition to accuracy and influence on the target variable and the models can capture this

Fig. 25. The CRDT decision tree (Ninth dataset).

Fig. 26. The ID3 decision tree (Ninth dataset).

14
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 27. ROC curve of ID3 (a) and CRDT (b) (Ninth dataset).

Table 10
Comparing and assessment criteria of the ID3 and CRDT decision tree (Ninth
dataset).
Accuracy Precision Recall F1-score

CRDT 0.88 0.5 0.68 0.7

ID3 0.98 0.61 0.75 0.69

Fig. 29. The ID3 decision tree (Tenth dataset).

relationship effectively.
Divergent importance ratings: While the top features were similar,
the relative importance ratings of the remaining features often differed
between the CRDT and Random Forest models. This suggests that
models may focus on different aspects of the data or capture distinct
relationships between features and the target variable.
Complementary Insight: By examining differences in feature impor-
tance ratings, we can identify areas where the models provide comple-
mentary insight. For example, in the lastmat dataset, the CRDT model
gives more importance to characteristics related to gender and educa-
tion, while the random forest model emphasizes characteristics related
to occupation. Combining these perspectives can lead to a more
comprehensive understanding of the underlying drivers of the target
variable.
Performance Implications: Feature importance analysis correlates
Fig. 28. The CRDT decision tree (Tenth dataset).
with observed model performance metrics, such as MSE, RMSE, and R-
squared. For example, in the concrete-compression-strength dataset, the

15
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 30. ROC curve of ID3 (a) and CRDT (b) (Tenth dataset).

Table 11 different contexts to provide a more comprehensive understanding of

Comparing and assessment criteria of the ID3 and CRDT decision tree (Tenth the relationships in the data.
dataset).
Accuracy Precision Recall F1-score 6. Conclusion and feature of the work
CRDT 0.88 0.97 0.87 0.92
ID3 1 1 1 1 The present paper puts forth a proposition for enhancing the con-
ventional ID3 decision tree algorithm using the cumulative residual
entropy (CRE). This modification aims to prevent for discretizing the
target variable, thereby leading to increased accuracy in the decision
tree. The resultant CRDT decision tree algorithm demonstrates superior
random forest model had a higher R-squared, which is consistent with its efficiency compared to the ID3 algorithm when applied to the ten real
stronger focus on the most influential features according to the feature datasets utilized in the study. Consequently, this modified algorithm
importance analysis. proves to be viable in classifying real-valued domains that encompass
In general, it can be said that the difference in feature importance both symbolic and numeric attributes with multiple discrete outcomes.
rating and model performance measures show that models may capture The study proves that the proposed algorithm can serve as a valuable
different aspects of the data and can potentially be combined or used in alternative to the traditional ID3 algorithm across various tree learning
scenarios. Also, to further investigate CRDT, we compared it with other

Table 12
Comparing CRDT, CART and Random Forest in five real datasets via MSE, RMSE, R-squared and Training time.
Dataset name Type of Tree MSE RMSE R-squared Training time (Second)

lastmat CRDT 0.19 0.49 0.22 0.32

Random Forest 0.18 0.43 0.02 0.22
CART 0.24 0.49 0.001 0.02
WineQuality CRDT 0.47 0.58 0.6 0.35
Random Forest 0.34 0.55 0.52 2.02
CART 0.01 0.12 0.34 0.03
Abalone CRDT 0.12 0.33 0.1 0.64
Random Forest 4.6 2.1 0.56 9.49
CART 0.13 0.35 0.11 0.08
Concrete-Compressive-Strength CRDT 0.39 0.56 0.82 0.57
Random Forest 0.36 0.56 0.88 0.94
CART 0.11 0.34 0.51 0.02
Fish Market CRDT 0.38 0.62 0.41 0.3
Random Forest 0.4 0.69 0.96 0.07
CART 0.43 0.65 0.41 0.02

16
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 31. Feature Importance of CRDT (a) and Random Forest (b) (lastmat dataset).

Fig. 32. Feature Importance of CRDT (a) and Random Forest (b) (Wine-Quality dataset).

17
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

Fig. 33. Feature Importance of CRDT (a) and Random Forest (b) (Concrete-Compressive-Strength dataset).

related trees such as CART and Random Forest based on criterions MSE, Supervision, Validation, Investigation, Writing – review & editing,
RMSE, R-Square and training time in five real datasets, and the results Project administration. Majid Chahkandi: Conceptualization, Formal
indicated that the performance of CRDT was better than and sometimes analysis, Supervision, Validation. Mousa Golalizadeh: Conceptualiza-
similar to the other trees. Therefore, overall, CRDT is a suitable alter- tion, Formal analysis, Supervision, Validation.
native to ID3 and a strong competitor for other related trees, both in
terms of being close to reality and in terms of various evaluation Declaration of competing interest
indicators.
Future research could focus on comparative analyses with other data The authors declare that they have no known competing financial
mining techniques and further exploration of the scalability and interests or personal relationships that could have appeared to influence
adaptability of the CRDT algorithm to larger datasets. Another future the work reported in this paper.
issue of the research is to develop the model of this research to provide
an algorithm to prevent the discretization of feature variables based on Data availability
conditional cumulative residual entropy.
Data will be made available on request.
CRediT authorship contribution statement
Acknowledgment
Somayeh Abolhosseini: Conceptualization, Software, Formal
analysis, Validation, Writing – original draft. Mohammad Khora- The authors greatly appreciate the reviewers’ suggestions and the
shadizadeh: Conceptualization, Methodology, Formal analysis, editor’s encouragement.

Appendix A

#R codes for CRDT

# Define function to calculate cumulative residual entropy

NCRE=function(x){
n = length(x)
x = sort(x)
FB=1-c(1:(n-1))/n
return(− sum(FB*log(FB)*diff(x)))
}
# Modify CRDT function to use CRE for dataset d
CRDT=function(d){
d = as.data.frame(d)
y = sort(d[[1]])
n = length(y)
FB=1-c(1:(n-1))/n
CREy = -sum(FB*log(FB)*diff(y))
CREz = 0
w = dim(d)[2]-1
for (j in 1:w){
(continued on next page)

18
S. Abolhosseini et al. Expert Systems With Applications 255 (2024) 124821

(continued )
#R codes for CRDT

CREx = 0
tn = table(d[[j + 1]])/n
k = length(table(d[[j + 1]]))
for (i in 1:k){
CREx = CREx + tn[i]*NCRE(y[d[[j + 1]]==i])
}
CREz[j] = CREx
}
return(list(c(“CREY=”, “CREZi’s=”), c(CREy,CREz)))
}

References International Conference Zakopane, Poland, June 22-26, 2008 Proceedings 9 (pp. 643-
651). Springer Berlin Heidelberg.
Moneystore.)2022). Agency Performance. Kaggle. https://www.kaggle.com/datasets/
Adewole, A. P., & Udeh, S. N. (2018). The quadratic entropy approach to implement the
moneystore/agencyperformance.
Id3 decision tree algorithm. Journal of Computer Science and Information Technology, 6
Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1995). Abalone UCI Machine
(2), 23–29. https://doi.org/10.15640/jcsit.v6n2a3
Learning Repository. https://doi.org/10.24432/C55C7W
Akash, P. S., Kadir, M. E., Ali, A. A., & Shoyaib, M. (2019, August). Inter-node Hellinger
Navarro, J., del Aguila, Y., & Asadi, M. (2010). Some new results on the cumulative
Distance based Decision Tree, IJCAI-19, 1967–1973.
residual entropy. Journal of Statistical Planning and Inference, 140(1), 310–322.
Baratpour, S., & Bami, Z. (2012). On the discrete cumulative residual entropy. Journal of
https://doi.org/10.1016/j.jspi.2009.07.015
the Iranian Statistical Society, 2(11), 203–215. https://jirss.irstat.ir/article_253690.
Pace, R. K., & Barry, R. (1997). Sparse spatial autoregressions. Statistics & Probability
html.
Letters, 33(3), 291–297. https://doi.org/10.1016/S0167-7152(96)00140-X
Breiman, L. (2017). Classification and regression trees. Routled.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Chaji, A. (2023). Introducing a new method for the split criteria of decision trees. Journal
Quinlan, J. R. (1996). Improved use of continuous attributes in C4. 5. Journal of Artificial
of Statistical Sciences, 16(2), 331–348. https://doi.org/10.52547/jss.16.2.331
Intelligence Research, 4, 77–90. https://doi.org/10.1613/jair.279
Chatterjee, A., & Mukherjee, S. P. (2001). Equilibrium distribution-its role in reliability
Rao, M., Chen, Y., Vemuri, B. C., & Wang, F. (2004). Cumulative residual entropy: A new
theory. Handbook of Statistics, 20.
measure of information. IEEE Transactions on Information Theory, 50(6), 1220–1228.
Cheng, J., Fayyad, U. M., Irani, K. B., & Qian, Z. (1988). Improved decision trees: A
https://doi.org/10.1109/TIT.2004.828057
generalized version of id3. In Machine Learning Proceedings (pp. 100–106). Morgan
Rao, M. (2005). More on a new concept of entropy and information. Journal of Theoretical
Kaufmann. https://doi.org/10.1016/B978-0-934613-64-4.50016-5.
Probability, 18, 967–981.
Coenen, F. (2011). Data mining: Past, present and future. The Knowledge Engineering
Rathod, V. (2022). Fish Market. Kaggle. https://www.kaggle.com/datasets/vipullrathod/
Review, 26(1), 25–29. https://doi.org/10.1017/S0269888910000378
fish-market.
Cortez, P., & Morais, A. D. J. R. (2007). A data mining approach to predict forest fires
Singh, M., & Chhabra, J. K. (2021). EGIA: A new node splitting method for decision tree
using meteorological data.
generation: Special application in software fault prediction. Materials Today:
[dataset] Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary
Proceedings. https://doi.org/10.1016/j.matpr.2021.05.325.
school student performance. https://doi.org/10.24432/C5TG7T.
[dataset] Stats NZ. (2024). Births and deaths: Year ended December 2022 – CSV. https://
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine
www.stats.govt.nz/large-datasets/csv-files-for-download.
preferences by data mining from physicochemical properties. Decision Support
Sharma, S., Agrawal, J., & Sharma, S. (2013). Classification through machine learning
Systems, 47(4), 547–553. https://doi.org/10.1016/j.dss.2009.05.016
technique: C4. 5 algorithm based on various entropies. International Journal of
Di Crescenzo, A., & Longobardi, M. (2009). On cumulative entropies. Journal of Statistical
Computer Applications, 82(16), 20–27. https://doi.org/10.5120/14249-2444
Planning and -Inference, 139(12), 4072–4087. https://doi.org/10.1016/j.
Tsanas, A., & Xifara, A. (2012). Accurate quantitative estimation of energy performance
jspi.2009.05.038
of residential buildings using statistical machine learning tools. Energy and buildings,
Drummond, C., & Holte, R. C. (2000, June). Exploiting the cost (in) sensitivity of decision
49, 560. https://doi.org/10.24432/C51307
tree splitting criteria. Proceedings of Seventeenth International Conference on
Wang, Y., Song, C., & Xia, S. T. (2015). Unifying decision trees split criteria using tsallis
Machine Learning, Stanford University, California, United States.
entropy. arXiv preprint arXiv:1511.08136. https://doi.org/10.48550/
Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third edition.
arXiv.1511.08136.
University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser
Wang, Y., & Xia, S. T. (2017, March). Unifying attribute splitting criteria of decision trees
University.
by Tsallis entropy. In 2017 IEEE International Conference on Acoustics, Speech and
Jin, C., De-Lin, L., & Fen-Xiang, M. (2009 July). In An improved ID3 decision tree algorithm
Signal Processing (ICASSP) (pp. 2507–2511). IEEE. https://doi.org/10.1109/
(pp. 127–130). IEEE. https://doi.org/10.1109/ICCSE.2009.5228509.
ICASSP.2017.7952608.
Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete
Wang, Y., Xia, S. T., & Wu, J. (2017). A less-greedy two-term Tsallis Entropy Information
observations. Journal of the American statistical association, 53, 457–481. http://
Metric approach for decision tree classification. Knowledge-Based Systems, 120,
www.jstor.org/stable/2281868.
34–42. https://doi.org/10.1016/j.knosys.2016.12.021
Kiss, F. (2003). Credit scoring processes from a knowledge Management perspective.
Xu, M., Wang, J. L., & Chen, T. (2006). Improved decision tree algorithm: ID3+. In
Periodica Polytechnica Social and Management Sciences, 11(1), 95-110. https://www.
Intelligent Computing in Signal Processing and Pattern Recognition: International
pp.bme.hu/so/article/view/1683.
Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16–19, 2006
Maimon, O. Z., & Rokach, L. (2014). Data mining with decision trees: theory and
(pp. 141-149). Springer Berlin Heidelberg.
applications (Vol. 81). World scientific. (Chapter 1, 2, 3, 4, 5 & 6).
Yeh, I. C. (1998). Modeling of strength of high-performance concrete using artificial
Maszczyk, T., & Duch, W. (2008). Comparison of Shannon, Renyi and Tsallis entropy
neural networks. Cement and Concrete research, 28(12), 1797–1808.
used in decision trees. In Artificial Intelligence and Soft Computing–ICAISC 2008: 9th

Decision Trees
100% (6)
Decision Trees
28 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree Report
100% (1)
Decision Tree Report
29 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Digital Image Processing Questions
No ratings yet
Digital Image Processing Questions
7 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
21 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
Ghattas 2017
No ratings yet
Ghattas 2017
30 pages
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
No ratings yet
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
8 pages
Algorithms: Improvement of ID3 Algorithm Based On Simplified Information Entropy and Coordination Degree
No ratings yet
Algorithms: Improvement of ID3 Algorithm Based On Simplified Information Entropy and Coordination Degree
18 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Decision Tree Induction Based On Minority Entropy For The Class Imbalance Problem
No ratings yet
Decision Tree Induction Based On Minority Entropy For The Class Imbalance Problem
14 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
Extracting Useful Rules Through Improved Decision Tree Induction Using Information Entropy
No ratings yet
Extracting Useful Rules Through Improved Decision Tree Induction Using Information Entropy
15 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
Analysis of Data Mining Classification With Decision Tree Technique
No ratings yet
Analysis of Data Mining Classification With Decision Tree Technique
7 pages
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
No ratings yet
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
5 pages
Ds 6
No ratings yet
Ds 6
24 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
2018 - PIDT A Novel Decision Tree Algorithm Based On Parameterised Impurities and Statistical Pruning Approaches - Daniel Stamate
No ratings yet
2018 - PIDT A Novel Decision Tree Algorithm Based On Parameterised Impurities and Statistical Pruning Approaches - Daniel Stamate
13 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
A Survey On Decision Tree Algorithms of Classification in Data Mining
No ratings yet
A Survey On Decision Tree Algorithms of Classification in Data Mining
5 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
No ratings yet
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
7 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Research Scholars Evaluation Based On Guides View Using Id3
4 pages
MTH302 Mcqs FinalTerm by Vu Topper RM
No ratings yet
MTH302 Mcqs FinalTerm by Vu Topper RM
37 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
005 - Ceragon - MSE - Presentation v1.3
100% (1)
005 - Ceragon - MSE - Presentation v1.3
16 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
4 pages
1 s2.0 S235197891930736X Main
No ratings yet
1 s2.0 S235197891930736X Main
6 pages
Adaptive Boosting Assisted Multiclass Classification
No ratings yet
Adaptive Boosting Assisted Multiclass Classification
5 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Fuzzy Decision Trees
No ratings yet
Fuzzy Decision Trees
12 pages
A Survey of Decision Trees Concepts Algorithms and Applications
No ratings yet
A Survey of Decision Trees Concepts Algorithms and Applications
12 pages
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
No ratings yet
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
12 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
So sánh thuật toán cây quyết định ID3 và C45
No ratings yet
So sánh thuật toán cây quyết định ID3 và C45
7 pages
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
No ratings yet
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
5 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Arsdigita University Month 5: Algorithms - Professor Shai Simonson Problem Set 1 - Sorting and Searching
100% (1)
Arsdigita University Month 5: Algorithms - Professor Shai Simonson Problem Set 1 - Sorting and Searching
3 pages
Frequency Response (Report)
No ratings yet
Frequency Response (Report)
19 pages
Nata Supermarket
No ratings yet
Nata Supermarket
238 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
0 pages
7 Jasc 2019
No ratings yet
7 Jasc 2019
23 pages
MATLAB Frame Analysis Programing Code
100% (8)
MATLAB Frame Analysis Programing Code
4 pages
Decision Tree Method in Financial Analysis of Listed Logistics Companies
No ratings yet
Decision Tree Method in Financial Analysis of Listed Logistics Companies
6 pages
Cse 3318 - W4 - 06242024
100% (1)
Cse 3318 - W4 - 06242024
121 pages
Tanks PDF
No ratings yet
Tanks PDF
13 pages
Math3302 Fa2021 Syllabus
No ratings yet
Math3302 Fa2021 Syllabus
4 pages
Cs8792 Unit 2 Notes
No ratings yet
Cs8792 Unit 2 Notes
65 pages
Pooja Kabadi - Predictive Modelling Project
No ratings yet
Pooja Kabadi - Predictive Modelling Project
70 pages
Untitled
No ratings yet
Untitled
31 pages
Combine PDF
No ratings yet
Combine PDF
451 pages
Big O Noation
No ratings yet
Big O Noation
20 pages
Bonnal Et Al 2002 The Life Cycle of Technical Projects
No ratings yet
Bonnal Et Al 2002 The Life Cycle of Technical Projects
8 pages
Ai Set 04
No ratings yet
Ai Set 04
59 pages
Questions - FinCom
No ratings yet
Questions - FinCom
8 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
CMPE113 Lecture3
No ratings yet
CMPE113 Lecture3
49 pages
GSoC 2017 Proposal - Rajat Arora
No ratings yet
GSoC 2017 Proposal - Rajat Arora
9 pages
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
No ratings yet
Data Structures, Sample Test Questions For The Material After Test 2, With Answers
12 pages
Exactly Solvable Floquet Dynamics For Conformal Field Theories in Dimensions Greater Than Two
No ratings yet
Exactly Solvable Floquet Dynamics For Conformal Field Theories in Dimensions Greater Than Two
32 pages
4.1 Program To Convert Celsius To Fahrenheit: Ex - No:4 C Programming Using Simple Statements and Expression
No ratings yet
4.1 Program To Convert Celsius To Fahrenheit: Ex - No:4 C Programming Using Simple Statements and Expression
5 pages
4.4 Correlation and Simple Linear Regression
No ratings yet
4.4 Correlation and Simple Linear Regression
19 pages
Presentation For Cubical Spline Interpolation
No ratings yet
Presentation For Cubical Spline Interpolation
14 pages
Application of Sentiment Analysis On Product Review E-Commerce
No ratings yet
Application of Sentiment Analysis On Product Review E-Commerce
9 pages
92136v00 Tips and Tricks Smart Signal Routing in Simulink
No ratings yet
92136v00 Tips and Tricks Smart Signal Routing in Simulink
2 pages
Assignment
No ratings yet
Assignment
4 pages
Analytic II - HW3 - 1106
No ratings yet
Analytic II - HW3 - 1106
6 pages
APPC 1.6A WKST Polynomial End Behavior
No ratings yet
APPC 1.6A WKST Polynomial End Behavior
2 pages
Practise Numerical On Companies Final Accounts
No ratings yet
Practise Numerical On Companies Final Accounts
7 pages
Event-Wise Requirement
No ratings yet
Event-Wise Requirement
5 pages
Đề thi GT1 MI1016 CK 20221 aThu E
No ratings yet
Đề thi GT1 MI1016 CK 20221 aThu E
1 page
Vinay Prakash Tupe - 202401312 - FIG Sponsor Pitch Task
No ratings yet
Vinay Prakash Tupe - 202401312 - FIG Sponsor Pitch Task
1 page
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet