0% found this document useful (0 votes)
7 views

4.3-DecisionTreesLearningAlgorithms Part 2

Uploaded by

Sujithra Jones
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

4.3-DecisionTreesLearningAlgorithms Part 2

Uploaded by

Sujithra Jones
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 4 Inductive Learning based on


Symbolic Representations
and Weak Theories

Video 4.3 Decision Tree Learning Algorithms Part 2


ID3 algorithm
• ID3 (Iterative Dichotomiser 3) is an TDIDT (Top Down Induction of
Decision Trees) algorithm invented by Ross Quinlan in 1986.
• A TDIDT algorithm returns just one single consistent hypothesis and
considers all examples as a batch
• Employs a greedy search algorithm (local optimizations) without
backtracking through the space of all possible decision trees
• Susceptible to the usual risks of hill-climbing without backtracking and
as a consequence finds a tree with short path lengths, but not
necessarily the best tree
• Selects and orders features recursively according to a statistical
measure: Information Gain and until each training example can be
classified unambiguously
• Inductive biases: Occam´s razor + priority for high information gain
Occam´s razor
Outline of the ID3 algorithm
The ID3 algorithm starts with the original data-set as the root node. On each iteration of the
algorithm, it considers every unused feature and calculates the information gain of that
feature. It then selects the feature which has the largest information gain value.

The data-set is then partitioned by the selected feature to produce subsets of the data
that is associated with the branched out nodes corresponding to the values of the chosen
feature. The algorithm continues to recur on each subset, considering only attributes never
selected before.

Recursion on a subset may stop in one of these cases:


 If every element in the subset belongs to the same class, the node is turned into a leaf
node and labelled with the class of the examples.
 If there are no examples in the subset, a leaf node is created and labelled with the most
common class of the examples in the parent node's set
 If there are no more attributes to be selected, but the examples still do not belong to the
same class, the node is made a leaf node and labelled with the most common class of the
examples in the subset.

Throughout the algorithm, the decision tree is constructed with each non-terminal node
representing selected feature on which the data is split, and terminal nodes representing the
class label best suited for the final subset of this branch.
ID3 algorithm pseudocode

ID3 (Instances , Classes, Features)


Create a Node for the tree
If all instances belongs to the same class, Return single node tree Node,with class label
If Features is empty Return single node tree Node with label
= the most common class label in Instances
Otherwise
Begin
F←feature in Featurelist with maximum information gain
Decision feature for Node←F
For each possible value vi of F
Begin
Add new branch below Root with F=vi
Let Instances-vi be the subset of Instances with vi for F
If Instances-vi is empty ·
Then add a leafnode withlabel= the most common class in Instances ·
Else add new branch ID3 (Instances-vi,Classes,Features−{F})
End
Return Node
End
Result of running ID3 on the example
14 dataitems, 9 YES and 6 NO

5 dataitems, 3 YES and 2 NO


5 dataitems, 3 YES and 2 NO 4 dataitems,4 YES and 0 NO

3 dataitems, 3 YES and 0 NO 2 dataitems, 2 YES and 0 NO 2 dataitems, 2 YES and 0 NO 3 dataitems, 3 YES and 0 NO
Noise
Non-systematic errors in either the values of features or class labels are
usually referred to as noise.

Two modifications of the basic algorithms are required if the tree building
should be able to operate with a noise-affected training set.

(1) The algorithm must be able to work with inadequate features, because
noise can cause even the most comprehensive set of features to appear
inadequate.

(2) The algorithm must be able to detect if testing further features will not
improve the predictive accuracy of the decision tree but rather result in
overfitting and as consequence take some measures like pruning.
General definition of overfitting
Overfitting is a significant practical difficulty for
decision tree models and many other predictive
models. Overfitting happens when the learning
algorithm continues to develop hypotheses that
reduce training set error at the cost of an increased
test set error.

Consider an average error for an hypothesis h over


• training data: ET
• training data + test data: ED

Definition

Hypothesis h ∈ H is overfitting training data if there


is an alternative hypothesis h´ ∈ H such that

ET (h) < ET (h´) and ED (h) > ED(h´)


Avoiding overfitting through pruning
Pruning is the major approach to avoid overfitting. Pruning should reduce the size of the
decision tree without reducing predictive accuracy as measured by a cross-validation set.

Pre-pruning
• Stops growing of the tree earlier, before it perfectly classifies the training data-set (when data
split is not statistically significant).
• Criteria for stopping are usually based on statistical significance test to decide whether
pruning or expanding a particular node is likely to produce an improvement beyond the
training set (e.g., Chi-square test).
• Has the problem of “too early stopping”, as it is not easy to precisely estimate when to stop
growing the tree.

Post-pruning that allows the tree to perfectly classify the training set, and then post-prunes the
tree by removal of sub-trees. Often a distinct subset of the data-set (called validation set) is set
aside, to evaluate the effect of post-pruning nodes from the tree.
A simple variant of Post Pruning

Reduced-Error Pruning

• Split data into a training and a validation set.

• All nodes are iteratively considered for pruning.

• A node is removed if the resulting tree performs no worse then the original on the
validation set.

• Pruning means removing the whole subtree for which the node is the root, making
it a leaf and assigned the most common class of the associated instances.

• Pruning continues until further pruning is considered as deteriorating accuracy.


Alternative TDIDT algorithms – similar to ID3
CLS (Concept Learning System), Hunt
Precursor among TDIDT systems
ID3 (Iterative Dichotomizer 3), Quinlan
The prototypical TDIDT algorithm/system
-----------------------------------------------------------------------------------
C4.5 and C5 follow ups from ID3, Quinlan
C4.5 (“default” machine learning algorithm for a period)
C5, commercial version of C4.5.
ACLS, Niblett,
Assistant, Bratko
CART, Breiman

The later systems extends the ID3 setup in various ways primarily with
extended datatypes for features, better pruning and noisehandling.
Comparison of three TDIDT systems
Ensemble approaches
Ensemble methods construct more than one decision tree and use the set of trees for joint classification.

Two kind of approaches relevant not only for decision trees but for different kinds of classifiers:

Boosting approaches
Boosting is a sequential approach where a sequence of average performing classifiers can give a
boosted performance by feeding experience from one classifier to the next.
E.g. Ada Boost is a boosting technique that can be applied to many ML agorithms

Bagging approaches
Bagging is a parallel approach where a set of classifiers together can produce partial results that
then can be the basis for a total negotiated result.
E.g. The Random forest algorithm combines random decision trees with bagging to achieve very
high classification accuracy.
Random forests
Random forests or random decision forests is an ensemble learning method for classification,
regression and other tasks.

Random forests operates by constructing a multitude of decision trees at training time and outputs the
class that is the most common of the classes (classification) or mean predictions (regression) produced
as results from the individual trees.

The Random forests approach is an alternative remedy for the decision trees problem of overfitting.
NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Thanks for your attention!

The next lecture 4.4 will be on the topic:

Instance Based Learning

You might also like