0% found this document useful (0 votes)

15 views10 pages

Decision Tree Learning

Uploaded by

sharma.pranshu2388

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Decision Tree Learning

Uploaded by

sharma.pranshu2388

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Decision tree learning

Decision tree learning is one of the predictive modelling approaches used in statistics, data mining and machine
learning. It uses a decision tree (as a predictive model) to go from observations about an item (represented in the
branches) to conclusions about the item's target value (represented in the leaves). Tree models where the target
variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent
class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the
target variable can take continuous values (typically real numbers) are called regression trees. Decision trees are
among the most popular machine learning algorithms given their intelligibility and simplicity.[1][2]

In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making.
In data mining, a decision tree describes data (but the resulting classification tree can be an input for decision
making). This page deals with decision trees in data mining.

Contents
General
Decision tree types
Metrics
Gini impurity
Information gain
Variance reduction
Uses
Advantages
Limitations
Implementations
Extensions
Decision graphs
Alternative search methods
See also
References
Further reading
External links

General
Decision tree learning is a method commonly used in data mining.[3] The goal is to create a model that predicts the
value of a target variable based on several input variables.

A decision tree is a simple representation for classifying examples. For this section, assume that all of the input
features have finite discrete domains, and there is a single target feature called the "classification". Each element of
the domain of the classification is called a class. A decision tree or a classification tree is a tree in which each
internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature
are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a
different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes,
signifying that the data set has been classified by the tree into either a
specific class, or into a particular probability distribution (which, if the
decision tree is well-constructed, is skewed towards certain subsets of
classes).

A tree is built by splitting the source set, constituting the root node of the
tree, into subsets—which constitute the successor children. The splitting is
based on a set of splitting rules based on classification features.[4] This
process is repeated on each derived subset in a recursive manner called
recursive partitioning. The recursion is completed when the subset at a node
has all the same values of the target variable, or when splitting no longer
adds value to the predictions. This process of top-down induction of
decision trees (TDIDT)[5] is an example of a greedy algorithm, and it is by
far the most common strategy for learning decision trees from data. A tree showing survival of
passengers on the Titanic ("sibsp" is
In data mining, decision trees can be described also as the combination of the number of spouses or siblings
mathematical and computational techniques to aid the description, aboard). The figures under the
categorization and generalization of a given set of data. leaves show the probability of
survival and the percentage of
Data comes in records of the form: observations in the leaf.
Summarizing: Your chances of
survival were good if you were (i) a
female or (ii) a male younger than
The dependent variable, , is the target variable that we are trying to 9.5 years with strictly less than 3
understand, classify or generalize. The vector is composed of the siblings.
features, etc., that are used for that task.

e tree which estimates the probability of kyphosis after surgery, given the age of the patient and the vertebra at which surgery
d. The same tree is shown in three different ways. Left The colored leaves show the probability of kyphosis after surgery, and
e of patients in the leaf. Middle The tree as a perspective plot. Right Aerial view of the middle plot. The probability of kyphosis
ry is higher in the darker areas. (Note: The treatment of kyphosis has advanced considerably since this rather small set of data
ted.)

Decision trees used in data mining are of two main types:

Classification tree analysis is when the predicted outcome is the class (discrete) to which the data
belongs.
Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the
price of a house, or a patient's length of stay in a hospital).

The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer to both of the
above procedures, first introduced by Breiman et al. in 1984.[6] Trees used for regression and trees used for
classification have some similarities - but also some differences, such as the procedure used to determine where to
split.[6]
Some techniques, often called ensemble methods, construct more than one decision tree:

Boosted trees Incrementally building an ensemble by training each new instance to emphasize the
training instances previously mis-modeled. A typical example is AdaBoost. These can be used for
regression-type and classification-type problems.[7][8]
Bootstrap aggregated (or bagged) decision trees, an early ensemble method, builds multiple
decision trees by repeatedly resampling training data with replacement, and voting the trees for a
consensus prediction.[9]
A random forest classifier is a specific type of bootstrap aggregating
Rotation forest – in which every decision tree is trained by first applying principal component
analysis (PCA) on a random subset of the input features.[10]

A special case of a decision tree is a decision list,[11] which is a one-sided decision tree, so that every internal node
has exactly 1 leaf node and exactly 1 internal node as a child (except for the bottommost node, whose only child is
a single leaf node). While less expressive, decision lists are arguably easier to understand than general decision
trees due to their added sparsity, permit non-greedy learning methods[12] and monotonic constraints to be
imposed.[13]

Notable decision tree algorithms include:

ID3 (Iterative Dichotomiser 3)

C4.5 (successor of ID3)
CART (Classification And Regression Tree)[6]
Chi-square automatic interaction detection (CHAID). Performs multi-level splits when computing
classification trees.[14]
MARS: extends decision trees to handle numerical data better.
Conditional Inference Trees. Statistics-based approach that uses non-parametric tests as splitting
criteria, corrected for multiple testing to avoid overfitting. This approach results in unbiased predictor
selection and does not require pruning.[15][16]

ID3 and CART were invented independently at around the same time (between 1970 and 1980), yet follow a
similar approach for learning a decision tree from training tuples.

It has also been proposed to leverage concepts of fuzzy set theory for the definition of a special version of decision
tree, known as Fuzzy Decision Tree (FDT).[17] In this type of fuzzy classification, generally an input vector is
associated with multiple classes, each with a different confidence value. Boosted ensembles of FDTs have been
recently investigated as well, and they have shown performances comparable to those of other very efficient fuzzy
classifiers.[18]

Metrics
Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best
splits the set of items.[19] Different algorithms use different metrics for measuring "best". These generally measure
the homogeneity of the target variable within the subsets. Some examples are given below. These metrics are
applied to each candidate subset, and the resulting values are combined (e.g., averaged) to provide a measure of the
quality of the split.

Gini impurity

Used by the CART (classification and regression tree) algorithm for classification trees, Gini impurity is a measure
of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled
according to the distribution of labels in the subset. The Gini impurity can be computed by summing the probability
of an item with label being chosen times the probability of a mistake in categorizing that

item. It reaches its minimum (zero) when all cases in the node fall into a single target category.

The Gini impurity is also an information theoretic measure and corresponds to Tsallis Entropy with deformation
coefficient , which in Physics is associated with the lack of information in out-of-equlibrium, non-extensive,
dissipative and quantum systems. For the limit one recovers the usual Boltzmann-Gibbs or Shannon
entropy. In this sense, the Gini impurity is but a variation of the usual entropy measure for decision trees.

To compute Gini impurity for a set of items with classes, suppose , and let be the fraction of
items labeled with class in the set.

Information gain

Used by the ID3, C4.5 and C5.0 tree-generation algorithms. Information gain is based on the concept of entropy
and information content from information theory.

Entropy is defined as below

where are fractions that add up to 1 and represent the percentage of each class present in the child node
that results from a split in the tree.[20]

Averaging over the possible values of ,

That is, the expected information gain is the mutual information, meaning that on average, the reduction in the
entropy of T is the mutual information.

Information gain is used to decide which feature to split on at each step in building the tree. Simplicity is best, so
we want to keep our tree small. To do so, at each step we should choose the split that results in the purest daughter
nodes. A commonly used measure of purity is called information which is measured in bits. For each node of the
tree, the information value "represents the expected amount of information that would be needed to specify whether
a new instance should be classified yes or no, given that the example reached that node".[20]
Consider an example data set with four attributes: outlook (sunny, overcast, rainy), temperature (hot, mild, cool),
humidity (high, normal), and windy (true, false), with a binary (yes or no) target variable, play, and 14 data points.
To construct a decision tree on this data, we need to compare the information gain of each of four trees, each split
on one of the four features. The split with the highest information gain will be taken as the first split and the
process will continue until all children nodes are pure, or until the information gain is 0.

The split using the feature windy results in two children nodes, one for a windy value of true and one for a windy
value of false. In this data set, there are six data points with a true windy value, three of which have a play (where
play is the target variable) value of yes and three with a play value of no. The eight remaining data points with a
windy value of false contain two no's and six yes's. The information of the windy=true node is calculated using the
entropy equation above. Since there is an equal number of yes's and no's in this node, we have

For the node where windy=false there were eight data points, six yes's and two no's. Thus we have

To find the information of the split, we take the weighted average of these two numbers based on how many
observations fell into which node.

To find the information gain of the split using windy, we must first calculate the information in the data before the
split. The original data contained nine yes's and five no's.

Now we can calculate the information gain achieved by splitting on the windy feature.

To build the tree, the information gain of each possible first split would need to be calculated. The best first split is
the one that provides the most information gain. This process is repeated for each impure node until the tree is
complete. This example is adapted from the example appearing in Witten et al.[20]

Variance reduction

Introduced in CART,[6] variance reduction is often employed in cases where the target variable is continuous
(regression tree), meaning that use of many other metrics would first require discretization before being applied.
The variance reduction of a node N is defined as the total reduction of the variance of the target variable x due to
the split at this node:

where , , and are the set of presplit sample indices, set of sample indices for which the split test is true, and
set of sample indices for which the split test is false, respectively. Each of the above summands are indeed variance
estimates, though, written in a form without directly referring to the mean.
Uses

Advantages

Amongst other data mining methods, decision trees have various advantages:

Simple to understand and interpret. People are able to understand decision tree models after a
brief explanation. Trees can also be displayed graphically in a way that is easy for non-experts to
interpret.[21]
Able to handle both numerical and categorical data.[21] Other techniques are usually specialized
in analyzing datasets that have only one type of variable. (For example, relation rules can be used
only with nominal variables while neural networks can be used only with numerical variables or
categoricals converted to 0-1 values.) Early decision trees were only capable of handling categorical
variables, but more recent versions, such as C4.5, do not have this limitation.[2]
Requires little data preparation. Other techniques often require data normalization. Since trees can
handle qualitative predictors, there is no need to create dummy variables.[21]
Uses a white box or open-box[2] model. If a given situation is observable in a model the
explanation for the condition is easily explained by boolean logic. By contrast, in a black box model,
the explanation for the results is typically difficult to understand, for example with an artificial neural
network.
Possible to validate a model using statistical tests. That makes it possible to account for the
reliability of the model.
Non-statistical approach that makes no assumptions of the training data or prediction residuals; e.g.,
no distributional, independence, or constant variance assumptions
Performs well with large datasets. Large amounts of data can be analyzed using standard
computing resources in reasonable time.
Mirrors human decision making more closely than other approaches.[21] This could be useful
when modeling human decisions/behavior.
Robust against co-linearity, particularly boosting
In built feature selection. Additional irrelevant feature will be less used so that they can be removed
on subsequent runs. The hierarchy of attributes in a decision tree reflects the importance of
attributes.[22] It means that the features on top are the most informative.[23]
Decision trees can approximate any Boolean function eq. XOR.[24]

Limitations
Trees can be very non-robust. A small change in the training data can result in a large change in the
tree and consequently the final predictions.[21]
The problem of learning an optimal decision tree is known to be NP-complete under several aspects
of optimality and even for simple concepts.[25][26] Consequently, practical decision-tree learning
algorithms are based on heuristics such as the greedy algorithm where locally optimal decisions are
made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. To
reduce the greedy effect of local optimality, some methods such as the dual information distance
(DID) tree were proposed.[27]
Decision-tree learners can create over-complex trees that do not generalize well from the training
data. (This is known as overfitting.[28]) Mechanisms such as pruning are necessary to avoid this
problem (with the exception of some algorithms such as the Conditional Inference approach, that
does not require pruning).[15][16]
For data including categorical variables with different numbers of levels, information gain in decision
trees is biased in favor of attributes with more levels.[29] However, the issue of biased predictor
selection is avoided by the Conditional Inference approach,[15] a two-stage approach,[30] or adaptive
leave-one-out feature selection.[31]

Implementations

Many data mining software packages provide implementations of one or more decision tree algorithms.

Examples include Salford Systems CART (which licensed the proprietary code of the original CART authors),[6]
IBM SPSS Modeler, RapidMiner, SAS Enterprise Miner, Matlab, R (an open-source software environment for
statistical computing, which includes several CART implementations such as rpart, party and randomForest
packages), Weka (a free and open-source data-mining suite, contains many decision tree algorithms), Orange,
KNIME, Microsoft SQL Server [1] (https://technet.microsoft.com/en-us/library/cc645868.aspx), and scikit-learn (a
free and open-source machine learning library for the Python programming language).

Extensions

Decision graphs

In a decision tree, all paths from the root node to the leaf node proceed by way of conjunction, or AND. In a
decision graph, it is possible to use disjunctions (ORs) to join two more paths together using minimum message
length (MML).[32] Decision graphs have been further extended to allow for previously unstated new attributes to be
learnt dynamically and used at different places within the graph.[33] The more general coding scheme results in
better predictive accuracy and log-loss probabilistic scoring. In general, decision graphs infer models with fewer
leaves than decision trees.

Alternative search methods

Evolutionary algorithms have been used to avoid local optimal decisions and search the decision tree space with
little a priori bias.[34][35]

It is also possible for a tree to be sampled using MCMC.[36]

The tree can be searched for in a bottom-up fashion.[37]

See also
Decision tree pruning Decision list
Binary decision diagram Incremental decision tree
CHAID Alternating decision tree
CART Structured data analysis (statistics)
ID3 algorithm Logistic model tree
C4.5 algorithm Hierarchical clustering
Decision stumps, used in e.g. AdaBoosting

References
1. Wu, Xindong; Kumar, Vipin; Ross Quinlan, J.; Ghosh, Joydeep; Yang, Qiang; Motoda, Hiroshi;
McLachlan, Geoffrey J.; Ng, Angus; Liu, Bing; Yu, Philip S.; Zhou, Zhi-Hua (2008-01-01). "Top 10
algorithms in data mining" (https://doi.org/10.1007/s10115-007-0114-2). Knowledge and Information
Systems. 14 (1): 1–37. doi:10.1007/s10115-007-0114-2 (https://doi.org/10.1007%2Fs10115-007-011
4-2). ISSN 0219-3116 (https://www.worldcat.org/issn/0219-3116).
2. Piryonesi S. Madeh; El-Diraby Tamer E. (2020-03-01). "Data Analytics in Asset Management: Cost-
Effective Prediction of the Pavement Condition Index" (https://ascelibrary.org/doi/10.1061/%28ASC
E%29IS.1943-555X.0000512). Journal of Infrastructure Systems. 26 (1): 04019036.
doi:10.1061/(ASCE)IS.1943-555X.0000512 (https://doi.org/10.1061%2F%28ASCE%29IS.1943-555
X.0000512).
3. Rokach, Lior; Maimon, O. (2008). Data mining with decision trees: theory and applications. World
Scientific Pub Co Inc. ISBN 978-9812771711.
4. Shalev-Shwartz, Shai; Ben-David, Shai (2014). "18. Decision Trees". Understanding Machine
Learning (http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning). Cambridge University
Press.
5. Quinlan, J. R. (1986). "Induction of decision trees" (https://link.springer.com/content/pdf/10.1007/BF0
0116251.pdf) (PDF). Machine Learning. 1: 81–106. doi:10.1007/BF00116251 (https://doi.org/10.100
7%2FBF00116251).
6. Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression
trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-
04841-8.
7. Friedman, J. H. (1999). Stochastic gradient boosting (https://astro.temple.edu/~msobel/courses_files/
StochasticBoosting(gradient).pdf). Stanford University.
8. Hastie, T., Tibshirani, R., Friedman, J. H. (2001). The elements of statistical learning : Data mining,
inference, and prediction. New York: Springer Verlag.
9. Breiman, L. (1996). "Bagging Predictors". Machine Learning. 24 (2): 123–140.
doi:10.1007/BF00058655 (https://doi.org/10.1007%2FBF00058655).
10. Rodriguez, J. J.; Kuncheva, L. I.; Alonso, C. J. (2006). "Rotation forest: A new classifier ensemble
method". IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (10): 1619–1630.
CiteSeerX 10.1.1.156.8277 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8277).
doi:10.1109/TPAMI.2006.211 (https://doi.org/10.1109%2FTPAMI.2006.211). PMID 16986543 (https://
pubmed.ncbi.nlm.nih.gov/16986543).
11. Rivest, Ron (Nov 1987). "Learning Decision Lists" (http://people.csail.mit.edu/rivest/pubs/Riv87b.pdf)
(PDF). Machine Learning. 3 (2): 229–246. doi:10.1023/A:1022607331053 (https://doi.org/10.1023%2
FA%3A1022607331053).
12. Letham, Ben; Rudin, Cynthia; McCormick, Tyler; Madigan, David (2015). "Interpretable Classifiers
Using Rules And Bayesian Analysis: Building A Better Stroke Prediction Model". Annals of Applied
Statistics. 9 (3): 1350–1371. arXiv:1511.01644 (https://arxiv.org/abs/1511.01644). doi:10.1214/15-
AOAS848 (https://doi.org/10.1214%2F15-AOAS848).
13. Wang, Fulton; Rudin, Cynthia (2015). "Falling Rule Lists" (http://www.jmlr.org/proceedings/papers/v3
8/wang15a.pdf) (PDF). Journal of Machine Learning Research. 38.
14. Kass, G. V. (1980). "An exploratory technique for investigating large quantities of categorical data".
Applied Statistics. 29 (2): 119–127. doi:10.2307/2986296 (https://doi.org/10.2307%2F2986296).
JSTOR 2986296 (https://www.jstor.org/stable/2986296).
15. Hothorn, T.; Hornik, K.; Zeileis, A. (2006). "Unbiased Recursive Partitioning: A Conditional Inference
Framework". Journal of Computational and Graphical Statistics. 15 (3): 651–674.
CiteSeerX 10.1.1.527.2935 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.527.2935).
doi:10.1198/106186006X133933 (https://doi.org/10.1198%2F106186006X133933).
JSTOR 27594202 (https://www.jstor.org/stable/27594202).
16. Strobl, C.; Malley, J.; Tutz, G. (2009). "An Introduction to Recursive Partitioning: Rationale,
Application and Characteristics of Classification and Regression Trees, Bagging and Random
Forests" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927982). Psychological Methods. 14 (4):
323–348. doi:10.1037/a0016973 (https://doi.org/10.1037%2Fa0016973). PMC 2927982 (https://www.
ncbi.nlm.nih.gov/pmc/articles/PMC2927982). PMID 19968396 (https://pubmed.ncbi.nlm.nih.gov/1996
8396).
17. Janikow, C. Z. (1998). "Fuzzy decision trees: issues and methods" (http://ieeexplore.ieee.org/lpdocs/
epic03/wrapper.htm?arnumber=658573). IEEE Transactions on Systems, Man and Cybernetics, Part
B (Cybernetics). 28: 1–14. doi:10.1109/3477.658573 (https://doi.org/10.1109%2F3477.658573).
18. Barsacchi, M.; Bechini, A.; Marcelloni, F. (2020). "An analysis of boosted ensembles of binary fuzzy
decision trees" (http://www.sciencedirect.com/science/article/pii/S0957417420302608). Expert
Systems with Applications. 154. doi:10.1016/j.eswa.2020.113436 (https://doi.org/10.1016%2Fj.eswa.
2020.113436).
19. Rokach, L.; Maimon, O. (2005). "Top-down induction of decision trees classifiers-a survey". IEEE
Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews. 35 (4): 476–
487. CiteSeerX 10.1.1.458.7031 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.458.703
1). doi:10.1109/TSMCC.2004.843247 (https://doi.org/10.1109%2FTSMCC.2004.843247).
20. Witten, Ian; Frank, Eibe; Hall, Mark (2011). Data Mining (https://archive.org/details/dataminingpracti0
0witt_966). Burlington, MA: Morgan Kaufmann. pp. 102 (https://archive.org/details/dataminingpracti0
0witt_966/page/n136)–103. ISBN 978-0-12-374856-0.
21. Gareth, James; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2015). An Introduction to
Statistical Learning (https://archive.org/details/introductiontost00jame). New York: Springer. pp. 315
(https://archive.org/details/introductiontost00jame/page/n323). ISBN 978-1-4614-7137-0.
22. Provost, Foster, 1964- (2013). Data science for business : [what you need to know about data mining
and data-analytic thinking] (https://www.worldcat.org/oclc/844460899). Fawcett, Tom. (1st ed.).
Sebastopol, Calif.: O'Reilly. ISBN 1-4493-6132-3. OCLC 844460899 (https://www.worldcat.org/oclc/8
44460899).
23. Piryonesi S. Madeh; El-Diraby Tamer E. (2020-06-01). "Role of Data Analytics in Infrastructure Asset
Management: Overcoming Data Size and Quality Problems" (https://ascelibrary.org/doi/10.1061/JPE
ODX.0000175). Journal of Transportation Engineering, Part B: Pavements. 146 (2): 04020022.
doi:10.1061/JPEODX.0000175 (https://doi.org/10.1061%2FJPEODX.0000175).
24. Mehtaa, Dinesh; Raghavan, Vijay (2002). "Decision tree approximations of Boolean functions".
Theoretical Computer Science. 270 (1–2): 609–623. doi:10.1016/S0304-3975(01)00011-1 (https://do
i.org/10.1016%2FS0304-3975%2801%2900011-1).
25. Hyafil, Laurent; Rivest, RL (1976). "Constructing Optimal Binary Decision Trees is NP-complete".
Information Processing Letters. 5 (1): 15–17. doi:10.1016/0020-0190(76)90095-8 (https://doi.org/10.1
016%2F0020-0190%2876%2990095-8).
26. Murthy S. (1998). "Automatic construction of decision trees from data: A multidisciplinary survey" (htt
ps://cs.nyu.edu/~roweis/csc2515-2006/readings/murthy_dt.pdf). Data Mining and Knowledge
Discovery
27. Ben-Gal I. Dana A., Shkolnik N. and Singer (2014). "Efficient Construction of Decision Trees by the
Dual Information Distance Method" (http://www.eng.tau.ac.il/~bengal/DID.pdf) (PDF). Quality
Technology & Quantitative Management. 11 (1): 133–147.
28. Principles of Data Mining. 2007. doi:10.1007/978-1-84628-766-4 (https://doi.org/10.1007%2F978-1-8
4628-766-4). ISBN 978-1-84628-765-7.
29. Deng, H.; Runger, G.; Tuv, E. (2011). Bias of importance measures for multi-valued attributes and
solutions (https://www.researchgate.net/profile/Houtao_Deng/publication/221079908_Bias_of_Import
ance_Measures_for_Multi-valued_Attributes_and_Solutions/links/0046351909faa8f0eb000000/Bias-
of-Importance-Measures-for-Multi-valued-Attributes-and-Solutions.pdf) (PDF). Proceedings of the
21st International Conference on Artificial Neural Networks (ICANN). pp. 293–300.
30. Brandmaier, Andreas M.; Oertzen, Timo von; McArdle, John J.; Lindenberger, Ulman (2012).
"Structural equation model trees" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4386908).
Psychological Methods. 18 (1): 71–86. doi:10.1037/a0030001
(https://doi.org/10.1037%2Fa0030001). hdl:11858/00-001M-0000-0024-EA33-9 (https://hdl.handle.ne
t/11858%2F00-001M-0000-0024-EA33-9). PMC 4386908 (https://www.ncbi.nlm.nih.gov/pmc/articles/
PMC4386908). PMID 22984789 (https://pubmed.ncbi.nlm.nih.gov/22984789).
31. Painsky, Amichai; Rosset, Saharon (2017). "Cross-Validated Variable Selection in Tree-Based
Methods Improves Predictive Performance". IEEE Transactions on Pattern Analysis and Machine
Intelligence. 39 (11): 2142–2153. arXiv:1512.03444 (https://arxiv.org/abs/1512.03444).
doi:10.1109/TPAMI.2016.2636831 (https://doi.org/10.1109%2FTPAMI.2016.2636831).
PMID 28114007 (https://pubmed.ncbi.nlm.nih.gov/28114007).
32. "CiteSeerX" (http://citeseer.ist.psu.edu/oliver93decision.html).
33. Tan & Dowe (2003) (http://www.csse.monash.edu.au/~dld/Publications/2003/Tan+Dowe2003_MMLD
ecisionGraphs.pdf)
34. Papagelis, A.; Kalles, D. (2001). "Breeding Decision Trees Using Evolutionary Techniques" (http://ww
w.gatree.com/wordpress/wp-content/uploads/2010/04/BreedinDecisioTreeUsinEvo.pdf) (PDF).
Proceedings of the Eighteenth International Conference on Machine Learning, June 28–July 1, 2001.
pp. 393–400.
35. Barros, Rodrigo C.; Basgalupp, M. P.; Carvalho, A. C. P. L. F.; Freitas, Alex A. (2012). "A Survey of
Evolutionary Algorithms for Decision-Tree Induction". IEEE Transactions on Systems, Man and
Cybernetics. Part C: Applications and Reviews. 42 (3): 291–312. CiteSeerX 10.1.1.308.9068 (https://
citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.308.9068). doi:10.1109/TSMCC.2011.2157494
(https://doi.org/10.1109%2FTSMCC.2011.2157494).
36. Chipman, Hugh A.; George, Edward I.; McCulloch, Robert E. (1998). "Bayesian CART model
search". Journal of the American Statistical Association. 93 (443): 935–948.
CiteSeerX 10.1.1.211.5573 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.211.5573).
doi:10.1080/01621459.1998.10473750 (https://doi.org/10.1080%2F01621459.1998.10473750).
37. Barros, R. C.; Cerri, R.; Jaskowiak, P. A.; Carvalho, A. C. P. L. F. (2011). "A bottom-up oblique
decision tree induction algorithm". Proceedings of the 11th International Conference on Intelligent
Systems Design and Applications (ISDA 2011). pp. 450–456. doi:10.1109/ISDA.2011.6121697 (http
s://doi.org/10.1109%2FISDA.2011.6121697). ISBN 978-1-4577-1676-8.

Further reading
James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert (2017). "Tree-Based Methods" (htt
ps://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf#page=317) (PDF). An
Introduction to Statistical Learning: with Applications in R. New York: Springer. pp. 303–336.
ISBN 978-1-4614-7137-0.

External links
Decision Trees page at aitopics.org (http://aitopics.org/topic/decision-tree-learning), a page with
commented links.
Evolutionary Learning of Decision Trees in C++ (https://www.cs.kent.ac.uk/people/staff/mg483/code/
evoldectrees/)
A very detailed explanation of information gain as splitting criterion (http://christianherta.de/lehre/data
Science/machineLearning/decision-trees.html)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Decision_tree_learning&oldid=962810381"

This page was last edited on 16 June 2020, at 04:42 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you
agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-
profit organization.

berryman
No ratings yet
berryman
24 pages
Concentric Circles
No ratings yet
Concentric Circles
6 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Unit 4
No ratings yet
Unit 4
33 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
m3
No ratings yet
m3
141 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
UNIT 2 - Groups (Decision Tree) (1)
No ratings yet
UNIT 2 - Groups (Decision Tree) (1)
20 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
5 Review Paper
No ratings yet
5 Review Paper
7 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
decision tree
No ratings yet
decision tree
13 pages
Decision Tree
No ratings yet
Decision Tree
4 pages
Tree
No ratings yet
Tree
7 pages
UNIT-3 ML notes
No ratings yet
UNIT-3 ML notes
4 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Adobe Scan 16 May 2023 (5)
No ratings yet
Adobe Scan 16 May 2023 (5)
12 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
5 pages
decision tree
No ratings yet
decision tree
11 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
Test
No ratings yet
Test
1 page
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
decision tree
No ratings yet
decision tree
66 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
Chapter-1 Introduction of Microsoft Excel: Structure
No ratings yet
Chapter-1 Introduction of Microsoft Excel: Structure
182 pages
Dash Sylvereye: A Webgl-Powered Library For Dashboard-Driven Visualization of Large Street Networks
No ratings yet
Dash Sylvereye: A Webgl-Powered Library For Dashboard-Driven Visualization of Large Street Networks
20 pages
Machine Translation MT
No ratings yet
Machine Translation MT
29 pages
Book English Version
No ratings yet
Book English Version
167 pages
Compute Brochure
No ratings yet
Compute Brochure
4 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
433 pages
Alshammari 2024 Ijca 923446
No ratings yet
Alshammari 2024 Ijca 923446
6 pages
c2lc40760d 2
No ratings yet
c2lc40760d 2
4 pages
Block 3
No ratings yet
Block 3
83 pages
Support For Resource Constrained Microcontroller Programming by A Broad Developer Community
No ratings yet
Support For Resource Constrained Microcontroller Programming by A Broad Developer Community
240 pages
2020 Acl-Demos 13
No ratings yet
2020 Acl-Demos 13
6 pages
Resume-Snibe - Pranshu Sharma
No ratings yet
Resume-Snibe - Pranshu Sharma
3 pages
2CS2010303 - Advance Java Programming
No ratings yet
2CS2010303 - Advance Java Programming
3 pages
UC Berkeley Electronic Theses and Dissertations
No ratings yet
UC Berkeley Electronic Theses and Dissertations
95 pages
Hong Molei
No ratings yet
Hong Molei
46 pages
MyCobot en
No ratings yet
MyCobot en
302 pages
Artificial Intelligence For Skin Cancer Detection and Classification For Clinical Environment: A Systematic Review
No ratings yet
Artificial Intelligence For Skin Cancer Detection and Classification For Clinical Environment: A Systematic Review
13 pages
Dominos Case Study
No ratings yet
Dominos Case Study
2 pages
Values
100% (1)
Values
72 pages
TEMPLATE 2 - LAC Supervisory Plan
No ratings yet
TEMPLATE 2 - LAC Supervisory Plan
3 pages
Punjab University College of Information Technology (Pucit) : Database Systems Lab 2
No ratings yet
Punjab University College of Information Technology (Pucit) : Database Systems Lab 2
2 pages
E Planning, Resource and Assessment
No ratings yet
E Planning, Resource and Assessment
8 pages
Cape Chemistry Unit 1 Answers
60% (5)
Cape Chemistry Unit 1 Answers
2 pages
CH 4-MCM-WS
No ratings yet
CH 4-MCM-WS
8 pages
A Structured Approach To Effective Partnering
No ratings yet
A Structured Approach To Effective Partnering
12 pages
Bid Management-1
No ratings yet
Bid Management-1
1 page
ENGLISG 3 Story
No ratings yet
ENGLISG 3 Story
14 pages
Needs Analysis
No ratings yet
Needs Analysis
2 pages
UNDS 111 Psychology
No ratings yet
UNDS 111 Psychology
18 pages
Modals of Permission Obligation and Prohibition
No ratings yet
Modals of Permission Obligation and Prohibition
12 pages
STUDENTS BEHAVIOR 12 HUMSS.docx FINALS
No ratings yet
STUDENTS BEHAVIOR 12 HUMSS.docx FINALS
23 pages
Learner's Activity Sheet: Technology and Livelihood Education-Cookery (Quarter II - Week 8)
No ratings yet
Learner's Activity Sheet: Technology and Livelihood Education-Cookery (Quarter II - Week 8)
3 pages
RESEARCH Chapter 1 To 4 1
No ratings yet
RESEARCH Chapter 1 To 4 1
114 pages
JBoss Enterprise Application Platform-6-Administration and Configuration Guide-En-US
No ratings yet
JBoss Enterprise Application Platform-6-Administration and Configuration Guide-En-US
257 pages
Lesson Plan 9-25-15
No ratings yet
Lesson Plan 9-25-15
2 pages
Drafting for the Theatre 2nd Edition Dennis Dorn instant download
100% (4)
Drafting for the Theatre 2nd Edition Dennis Dorn instant download
57 pages
HUM 120 Syllabus
No ratings yet
HUM 120 Syllabus
4 pages
Notre Dame College Prep Winter/Spring 2014 Legacy Magazine
No ratings yet
Notre Dame College Prep Winter/Spring 2014 Legacy Magazine
20 pages
9 - Bui Nguyen Quoc Trinh-En
No ratings yet
9 - Bui Nguyen Quoc Trinh-En
15 pages
Noise
No ratings yet
Noise
3 pages
10 Image Based Lesson Plans
No ratings yet
10 Image Based Lesson Plans
90 pages
Frustration Ob
No ratings yet
Frustration Ob
7 pages
Users Guides to the Medical Literature A Manual for Evidence Based Clinical Practice 3rd Edition Gordon Guyatt/Drummond Rennie/Maureen O Meade && Deborah J. Cook - Discover the ebook with all chapters in just a few seconds
100% (1)
Users Guides to the Medical Literature A Manual for Evidence Based Clinical Practice 3rd Edition Gordon Guyatt/Drummond Rennie/Maureen O Meade && Deborah J. Cook - Discover the ebook with all chapters in just a few seconds
47 pages
19CS4101 - Machine Learning
No ratings yet
19CS4101 - Machine Learning
2 pages
Teach Yourself Xhosa Beverley Kirsch download
100% (1)
Teach Yourself Xhosa Beverley Kirsch download
61 pages

Decision Tree Learning

Uploaded by

Decision Tree Learning

Uploaded by

Decision tree learning

Decision trees used in data mining are of two main types:

Notable decision tree algorithms include:

ID3 (Iterative Dichotomiser 3)

Entropy is defined as below

Averaging over the possible values of ,

Alternative search methods

It is also possible for a tree to be sampled using MCMC.[36]

The tree can be searched for in a bottom-up fashion.[37]

Retrieved from "https://en.wikipedia.org/w/index.php?title=Decision_tree_learning&oldid=962810381"

This page was last edited on 16 June 2020, at 04:42 (UTC).

You might also like