0% found this document useful (0 votes)

152 views

Machine Learning Notes - Lec 04 - Decision Tree Learning

The document describes machine learning lecture 04 on decision tree learning. It provides an overview of decision trees, including what they are, how they work by testing attributes at each node to classify examples, and sample applications. It also describes the ID3 algorithm for learning decision trees in a top-down manner by selecting the best attribute to test at each node based on how well it classifies examples.

Uploaded by

Zara Jamshaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views

Machine Learning Notes - Lec 04 - Decision Tree Learning

Uploaded by

Zara Jamshaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

Machine Learning

Lecture 04
Decision Tree Learning

Dr. Rao Muhammad Adeel Nawab

How to Work

Dr. Rao Muhammad Adeel Nawab 2

Power of Dua

Dr. Rao Muhammad Adeel Nawab 3

Dua – Take Help from Allah before starting any task

Dr. Rao Muhammad Adeel Nawab 4

Course Focus
Mainly get EXCELLENCE in two things
1. Become a great human being
2. Become a great Machine Learning Engineer

To become a great human being

Get sincere with yourself
When you get sincere with yourself your ‫ ﺧﻠﻭﺕ‬and ‫ ﺟﻠﻭﺕ‬is the
same
Dr. Rao Muhammad Adeel Nawab 5
Lecture Outline
What are Decision Trees?
What problems are appropriate for Decision Trees?
The Basic Decision Tree Learning Algorithm: ID3
Entropy and Information Gain
Inductive Bias in Decision Tree Learning
Reﬁnements to Basic Decision Tree Learning

Reading:
Chapter 3 of Mitchell
Sections 4.3 and 6.1 of Wittena and Frank
Dr. Rao Muhammad Adeel Nawab 6
What are Decision Trees?
Decision tree learning is a method for approximating
discrete-valued target functions, in which the learned
function is represented by a decision tree.
Learned trees can also be re-represented as sets of if-then
rules to improve human readability.
Most popular of inductive inference algorithms
Successfully applied to a broad range of tasks..

Dr. Rao Muhammad Adeel Nawab 7

What are Decision Trees?
Decision trees are trees which classify instances by testing
at each node some attribute of the instance.
Testing starts at the root node and proceeds downwards
to a leaf node, which indicates the classiﬁcation of the
instance.
Each branch leading out of a node corresponds to a value
of the attribute being tested at that node.

Dr. Rao Muhammad Adeel Nawab 8

Decision Tree for PlayTennis
Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 9
Decision Tree for PlayTennis
Outlook

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an

attribute value node
No Yes Each leaf node assigns a classification
Dr. Rao Muhammad Adeel Nawab 10
Decision Tree for PlayTennis
Outlook Temperature Humidity Wind PlayTennis
Sunny Hot High Weak ? Outlook

Sunny Overcast Rain

No Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 11
Decision Tree for Conjunction
Outlook=Sunny ∧ Wind=Weak Outlook

Sunny Overcast Rain

Wind No No

Strong Weak

No Yes

Dr. Rao Muhammad Adeel Nawab 12

Decision Tree for Disjunction
Outlook=Sunny ∨ Wind=Weak Outlook

Sunny Overcast Rain

Yes Wind Wind

Strong Weak Strong Weak

No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 13
Decision Tree for XOR
Outlook=Sunny XOR Outlook
Wind=Weak
Sunny Overcast Rain

Wind Wind Wind

Strong Weak Strong Weak Strong Weak

Yes No No Yes No Yes

Dr. Rao Muhammad Adeel Nawab 14

Decision Tree
decision trees represent disjunctions of conjunctions

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

(Outlook=Sunny ∧ Humidity=Normal) ∨ (Outlook=Overcast)

∨ (Outlook=Rain ∧ Wind=Weak)
Dr. Rao Muhammad Adeel Nawab 15
Decision Tree for PlayTennis
A decision tree to classify 〈Outlook = Sunny, Temp = Hot,
Outlook
days as appropriate for Humidity = High, Wind = Strong〉〉 No
playing tennis might look
like: Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 16
What are Decision Trees?
Note that
each path through a decision tree forms a conjunction of
attribute tests
the tree as a whole forms a disjunction of such paths; i.e. a
disjunction of conjunctions of attribute tests
Preceding example could be re-expressed as:
(Out look = Sunny ∧ Humidity = Normal)
∧ = AND
∨ (Out look = Overcast) V = OR
∨ (Out look = Rain ∧ Wind =Weak)
Dr. Rao Muhammad Adeel Nawab 17
What are Decision Trees? (cont)
As a complex rule, such a decision tree could be coded by
hand.
However, the challenge for machine learning is to propose
algorithms for learning decision trees from examples.

Dr. Rao Muhammad Adeel Nawab 18

What Problems are Appropriate for Decision Trees?
There are several varieties of decision tree learning, but in
general decision tree learning is best for problems where:
Instances describable by attribute–value pairs
usually nominal (categorical/enumerated/discrete) attributes
with small number of discrete values, but can be numeric
(ordinal/continuous).

Dr. Rao Muhammad Adeel Nawab 19

What Problems are Appropriate for Decision Trees?
Target function is discrete valued
in PlayTennis example target function is Boolean
easy to extend to target functions with > 2 output values
harder, but possible, to extend to numeric target functions
Disjunctive hypothesis may be required
easy for decision trees to learn disjunctive concepts (note such
concepts were outside the hypothesis space of the Candidate-
Elimination algorithm)

Dr. Rao Muhammad Adeel Nawab 20

What Problems are Appropriate for Decision Trees?
Possibly noisy/incomplete training data
robust to errors in classiﬁcation of training examples and errors
in attribute values describing these examples
Can be trained on examples where for some instances
some attribute values are unknown/missing.

Dr. Rao Muhammad Adeel Nawab 21

Sample Applications of Decision Trees?
Decision trees have been used for:
(see http://www.rulequest.com/see5-examples.html)
Predicting Magnetic Properties of Crystals
Proﬁling Higher-Priced Houses in Boston
Detecting Advertisements on the Web
Controlling a Production Process
Diagnosing Hypothyroidism
Assessing Credit Risk
Such problems, in which the task is to classify examples into
one of a discrete set of possible categories, are often referred to
as classification problems.

Dr. Rao Muhammad Adeel Nawab 22

Sample Applications of Decision Trees? (cont)
Assessing Credit Risk
Sample Applications of Decision Trees? (cont)
From 490 cases like this, split 44%/56% between
accept/reject, See5 derived twelve rules.
On a further 200 unseen cases, these rules give a
classiﬁcation accuracy of 83%

Dr. Rao Muhammad Adeel Nawab 24

ID3 Algorithm

Dr. Rao Muhammad Adeel Nawab 25

ID3 Algorithm
ID3, learns decision trees by constructing them top- down,
beginning with the question
which attribute should be tested at the root of the tree?
Each instance attribute is evaluated using a statistical test
to determine how well it alone classifies the training
examples.

Dr. Rao Muhammad Adeel Nawab 26

ID3 Algorithm
The best attribute is selected and used as the test at the
root node of the tree.
A descendant of the root node is then created for each
possible value of this attribute, and the training examples
are sorted to the appropriate descendant node (i.e., down
the branch corresponding to the example's value for this
attribute).

Dr. Rao Muhammad Adeel Nawab 27

ID3 Algorithm
The entire process is then repeated using the training
examples associated with each descendant node to select
the best attribute to test at that point in the tree.
This process continues for each new leaf node until either
of two conditions is met:
every attribute has already been included along this path
through the tree, or
the training examples associated with this leaf node all have
the same target attribute value (i.e., their entropy is zero).
Dr. Rao Muhammad Adeel Nawab 28
ID3 Algorithm
This forms a greedy search for an acceptable decision tree,
in which the algorithm never backtracks to reconsider
earlier choices.

Dr. Rao Muhammad Adeel Nawab 29

The Basic Decision Tree Learning Algorithm:ID3(Cont.)
ID3 algorithm:
ID3(Example, Target_Attribute, Attribute)
Create Root node for the tree
If all examples +ve, return 1-node tree Root with label=+
If all examples -ve, return 1-node tree Root with label=-
If Attributes=[], return 1-node tree Root with label=most
common value of Target_Attribute in Examples
Otherwise

Dr. Rao Muhammad Adeel Nawab 30

The Basic Decision Tree Learning Algorithm:ID3
Begin
A ← attribute in Attributes that best classifies Examples
The decision attribute for Root ← A
For each possible value vi of A
Add a new branch below Root for test A = vi
Let Examplesvi = subset of Examples with value vi for A
If Examplesvi = []
Then below this new branch add leaf node with label=most common value of
Target_Attribute in Examples
Else below this new branch add subtree
ID3(Examplesvi, Target_Attribute, Attributes –{A})
End
Return Root

Dr. Rao Muhammad Adeel Nawab 31

Which Attribute is the Best Classifier?
In the ID3 algorithm, choosing which attribute to test at the
next node is a crucial step.
Would like to choose that attribute which does best at
separating training examples according to their target
classification.
An attribute which separates training examples into two sets
each of which contains positive/negative examples of the target
attribute in the same ratio as the initial set of examples has not
helped us progress towards a classification.
Dr. Rao Muhammad Adeel Nawab 32
Which Attribute is the Best Classifier?
Suppose we have 14 training examples, 9 +ve and 5 -ve, of days on which
tennis is played.
For each day we have information about the attributes humidity and wind,
as below.
Which attribute is the best classifier?

Dr. Rao Muhammad Adeel Nawab 33

Entropy and Information Gain
A useful measure of for picking the best classifier attribute
is information gain.
Information gain measures how well a given attribute
separates training examples with respect to their target
classification.
Information gain is defined in terms of entropy as used in
information theory.

Dr. Rao Muhammad Adeel Nawab 34

Entropy and Information Gain(Cont.)
S is a sample of training examples
p+ is the proportion of positive
examples
p- is the proportion of negative
examples
Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
Or
Entropy(S) = −p ⊕ log2 p ⊕ − p ⊖
log2 p ⊖ p⊕
Dr. Rao Muhammad Adeel Nawab 35
Entropy
For our previous example (14 examples, 9 positive, 5
negative):
Entropy([9+,5−]) = −p⊕
⊕ log2 p⊕
⊕− p⊖⊖ log2
= −(9/14)log2(9/14)−(5/14)log2(5/14)
= .940

Dr. Rao Muhammad Adeel Nawab 36

Entropy Cont…
Think of Entropy(S) as expected number of bits needed to
encode class (⊕
⊕ or ⊖) of randomly drawn member of S (under
the optimal, shortest-length code)
For Example
If p⊕
⊕ = 1 (all instances are positive) then no message need be sent
(receiver knows example will be positive) and Entropy = 0 (“pure
sample”)
If p⊕
⊕ = .5 then 1 bit need be sent to indicate whether instance
negative or positive and Entropy = 1
If p⊕
⊕ = .8 then less than 1 bit need be sent on average – assign
shorter codes to collections of positive examples and longer ones
to negative ones

Dr. Rao Muhammad Adeel Nawab 37

Entropy Cont…
Why?
Information theory: optimal length code assigns −log2p bits to.
message having probability p.
So, expected number of bits needed to encode class (⊕⊕ or ⊖)
of random member of S:
p⊕
⊕(−log2 p⊕
⊕)+ p⊖
⊖(−log2 p⊖
⊖)

Entropy(S) ≡ −p⊕
⊕ log2p⊕
⊕− p⊖
⊖ log2p⊖
⊖

Dr. Rao Muhammad Adeel Nawab 38

Information Gain
Entropy gives a measure of purity/impurity of a set of
examples.
Define information gain as the expected reduction in
entropy resulting from partitioning a set of examples on
the basis of an attribute.
Formally, given a set of examples S and attribute A:

Dr. Rao Muhammad Adeel Nawab 39

Information Gain

where
Values(A) is the set of values attribute A can take on
Sv is the subset of S for which A has value v
First term in Gain(S,A) is entropy of original set; second
term is expected entropy after partitioning on A = sum of
entropies of each subset Sv weighted by ratio of Sv in S.
Dr. Rao Muhammad Adeel Nawab 40
Information Gain Cont….

Dr. Rao Muhammad Adeel Nawab 41

Information Gain Cont….

Dr. Rao Muhammad Adeel Nawab 42

Training Examples
Day Outlook Temp Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Dr. Rao Muhammad Adeel Nawab 43
First step: which attribute to test at the root?
Which attribute should be tested at the root?
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.084
Gain(S, Temperature) = 0.029
Outlook provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Outlook
partition the training samples according to the value of Outlook
Dr. Rao Muhammad Adeel Nawab 44
After first step [D1,D2,…,D14]
[9+,5-]

Outlook

Sunny Overcast Rain

Ssunny=[D1, D2, D8, D9, D11] [D3, D7, D12, D13] [D4, D5, D6, D10, D14]
[2+,3-] [4+,0-] [3+,2-]
? Yes ?
Which attribute should be tested here?
Ssunny ={D1,D2,D8,D9,D11}
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
45
Second step
Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3.5 × 0.918 = 0 .019
Gain(SSunny, Temp) = 0.970 − 2/5 × 0.0 − 2/5 × 1.0 − 1/5 × 0.0 =0.570
Humidity provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Humidity
partition the training samples according to the value of Humidity

Dr. Rao Muhammad Adeel Nawab 46

Selecting the Next Attribute
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

E=0.985 E=0.592 E=0.811 E=1.0
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 (7/14)*0.592 =0.940-(8/14)*0.811 – (6/14)*1.0
=0.151 =0.048
Dr. Rao Muhammad Adeel Nawab 47
Selecting the Next Attribute
S=[9+,5-]
E=0.940
Outlook

Over
Sunny Rain
cast

[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971
Gain(S,Outlook)
=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971
=0.247
Dr. Rao Muhammad Adeel Nawab 48
Second and third steps

Outlook
Final tree is for S is:

Sunny Overcast Rain

Humidity Yes Wind

[D3,D7,D12,D13]

High Normal Strong Weak

No Yes No Yes

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

49
Dr. Rao Muhammad Adeel Nawab
Hypothesis Space Search by ID3

+ - +
A2
A1

+ - + + - -
+ - + - - +
A2 A2

- + - + -
A3 A4
Dr. Rao Muhammad Adeel Nawab + - - + 50
Hypothesis Space Search by ID3
ID3 searches a space of hypotheses (set of possible
decision trees) for one ﬁtting the training data.
Search is simple-to-complex, hill-climbing search guided
by the information gain evaluation function.
Hypothesis space of ID3 is complete space of ﬁnite,
discrete-valued functions w.r.t available attributes
contrast with incomplete hypothesis spaces, such as
conjunctive hypothesis space

Dr. Rao Muhammad Adeel Nawab 51

Hypothesis Space Search by ID3
ID3 maintains only one hypothesis at any time,
instead of, e.g., all hypotheses consistent with
training examples seen so far
contrast with CANDIDATE-ELIMINATION
means can’t determine how many alternative decision trees
are consistent with data
means can’t ask questions to resolve competing alternatives

Dr. Rao Muhammad Adeel Nawab 52

Hypothesis Space Search by ID3
ID3 performs no backtracking – once an attribute is
selected for testing at a given node, this choice is never
reconsidered.
so, susceptible to converging to locally optimal rather than
globally optimal solutions

Dr. Rao Muhammad Adeel Nawab 53

Hypothesis Space Search by ID3
Uses all training examples at each step to make
statistically-based decision about how to reﬁne current
hypothesis
contrast with CANDIDATE-ELIMINATION or FIND-S – make
decisions incrementally based on single training examples
using statistically-based properties of all examples
(information gain) means technique is robust in the face of
errors in individual examples.

Dr. Rao Muhammad Adeel Nawab 54

Inductive Bias in Decision Tree
Learning

Dr. Rao Muhammad Adeel Nawab 55

Inductive Bias in Decision Tree Learning
Inductive bias: set of assumptions needed in addition to
training data to justify deductively learner’s classiﬁcation
Given a set of training examples, there may be many
decision trees consistent with them Inductive bias of ID3 is
shown by which of these trees it chooses
ID3’s search strategy (simple-to-complex, hill climbing)
selects shorter trees over longer ones
selects trees that place attributes with highest information
gain closest to root

Dr. Rao Muhammad Adeel Nawab 56

Inductive Bias in Decision Tree Learning
Inductive bias of ID3 therefore:
Shorter trees are preferred over longer trees. Trees that place high
information gain attributes close to the route are preferred to
those that do not.
Note that one could produce a decision tree learning algorithm
with the simpler bias of always preferring a shorter tree.
How does inductive bias of ID3 compare to that of version
space CANDIDATE-ELIMINATION algorithm?
ID3 incompletely searches a complete hypothesis space
CANDIDATE-ELIMINATION completely searches an incomplete
hypothesis space
Dr. Rao Muhammad Adeel Nawab 57
Inductive Bias in Decision Tree Learning
Can be put differently by saying
inductive bias of ID3 follows from its search strategy (preference
bias or search bias)
inductive bias of CANDIDATE-ELIMINATION follows from its
deﬁnition of its search space (restriction bias or language bias).
Note that preference bias only effects order in which
hypotheses are investigated; restriction bias effects which
hypotheses are investigated.
Generally better to choose algorithm with preference bias rather
than restriction bias – with restriction bias target function may not
be contained in hypothesis space.
Dr. Rao Muhammad Adeel Nawab 58
Inductive Bias in Decision Tree Learning
Note that some algorithms may combine preference and
restriction biases – e.g. checker’s learning program
linear weighted function of ﬁxed set of board features
introduces restriction bias (non-linear potential target
functions excluded)
least mean square parameter tuning introduces preference
bias into search through space of parameter values

Dr. Rao Muhammad Adeel Nawab 59

Inductive Bias in Decision Tree Learning
Is ID3’s inductive bias sound? Why prefer shorter
hypotheses/trees?
One response :
“Occam’s Razor” – prefer simplest hypothesis that ﬁts the data.
This is a general assumption that many natural scientists make.

Dr. Rao Muhammad Adeel Nawab 60

Occam’s Razor
Why prefer short hypotheses?
Argument in favor:
Fewer short hypotheses than long hypotheses
A short hypothesis that fits the data is unlikely to be a coincidence
A long hypothesis that fits the data might be a coincidence
Argument opposed:
There are many ways to define small sets of hypotheses
E.g. All trees with a prime number of nodes that use attributes
beginning with ”Z”
What is so special about small sets based on size of hypothesis?
Dr. Rao Muhammad Adeel Nawab 61
Issues in Decision Tree Learning
Practical issues in learning decision trees include
determining how deeply to grow the decision tree
handling continuous attributes
choosing an appropriate attribute selection measure
handling training data with missing attribute values
handling attributes with differing costs and
improving computational efficiency.

Dr. Rao Muhammad Adeel Nawab 62

Refinements to Basic Decision Tree
Learning

Dr. Rao Muhammad Adeel Nawab 63

Reﬁnements to Basic Decision Tree Learning:
Overﬁtting Training Data + Tree Pruning
In case of
noise in the data or
number of training examples is too small to produce a
representative sample of the true target function.
The simple ID3 algorithm can produce trees that overfit
the training examples.

Dr. Rao Muhammad Adeel Nawab 64

Reﬁnements to Basic Decision Tree Learning:
What impact this will have
Outlook
on our earlier tree ?

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 65
Reﬁnements to Basic Decision Tree Learning
The addition of this incorrect example will now cause ID3 to
construct a more complex tree.
The new example will be sorted into the second leaf node from
the left in the learned tree, along with the previous positive
examples D9 and D11.
Because the new example is labeled as a negative example, ID3 will
search for further refinements to the tree below this node.
Result will be tree which performs well on (errorfull training
examples) and less well on new unseen instances.
Dr. Rao Muhammad Adeel Nawab 66
Reﬁnements to Basic Decision Tree Learning
Since we previously had the correct example:
‹Sunny, Cool , Normal, Weak, PlayTennis = Yes›
‹Sunny, Mild , Normal, Strong, PlayTennis = Yes›
Tree will be elaborated below right branch of Humidity
Result will be tree that performs well on (errorful) training
examples, but less well on new unseen instances

Dr. Rao Muhammad Adeel Nawab 67

Refinements: Overfitting Training Data
Adapting to noisy training data is one type of overfitting.
Overfitting can also occur when the number of training
examples is too small to be representative of the true target
function
coincidental regularities may be picked up during training
More precisely:
Definition: Given a hypothesis space H, a hypothesis h ∈ H
overfits the training data if there is another hypothesis h′ ∈ H
such that h has smaller error than h′ over the training data, but
h′ has a smaller error over the entire distribution of instances.

Dr. Rao Muhammad Adeel Nawab 68

Refinements: Overfitting Training Data
Overfitting is a real problem for decision tree learning –
10-25% decrease in accuracy over a range of tasks in one
empirical study
Overfitting a problem for many other machine learning
methods too

Dr. Rao Muhammad Adeel Nawab 69

Reﬁnements: Overﬁtting Training Data(Example)
Example of ID3 learning which medical patients have a form of diabetes:

Accuracy of tree over training examples increases monotonically as tree grows(to be expected)
Accuracy of tree over independent test examples increases till about 25 nodes, then decreases

Dr. Rao Muhammad Adeel Nawab 70

Refinements: Avoiding Overfitting
How can overfitting be avoided?
Two general approaches:
stop growing tree before perfectly fitting training data
e.g. when data split is not statistically significant
grow full tree, then prune afterwards In practice, second
approach has been more successful

Dr. Rao Muhammad Adeel Nawab 71

Refinements: Avoiding Overfitting
For either approach, how can optimal final tree size be
decided?
use a set of examples distinct from training examples to
evaluate quality of tree; or
use all data for training but apply statistical test to decide
whether expanding/pruning a given node is likely to improve
performance over whole instance distribution; or
measure complexity of encoding training examples + decision
tree and stop growing tree when this size is minimised –
minimum description length principle

Dr. Rao Muhammad Adeel Nawab 72

Reﬁnements: Avoiding Overﬁtting
First approach most common – called training and
validation set approach. Divide available instances into
training set – commonly 2/3 of data
validation set – commonly 1/3 of data
Hope is that random errors and coincidental regularities
learned from training set will not be present in validation
set

Dr. Rao Muhammad Adeel Nawab 73

Reﬁnements: Reduced Error Pruning
Assumes data split into training and validation sets.
Proceed as follows:
Train decision tree on training set
Do until further pruning is harmful:
for each decision node evaluate impact on validation set
of removing that node and those below it
remove node that most improves accuracy on validation set

Dr. Rao Muhammad Adeel Nawab 74

Refinements: Reduced Error Pruning
How is impact of removing a node evaluated?
When a decision node is removed the subtree rooted at it
is replaced with a leaf node whose classification is the
most common classification of examples beneath the
decision node

Dr. Rao Muhammad Adeel Nawab 75

Reﬁnements: Reduced Error Pruning (cont…)
To assess value of reduced error pruning, split data into 3
distinct sets:
1. training examples for the original tree
2. validation examples of guiding tree pruning
3. test examples to provide an estimate over future unseen
examples

Dr. Rao Muhammad Adeel Nawab 76

Reﬁnements: Reduced Error Pruning (cont.)
On previous example, reduced error pruning produces this effect:

Drawback: holding data back for a validation set reduces data available for
training
Dr. Rao Muhammad Adeel Nawab 77
Reﬁnements: Rule Post-Pruning
Perhaps most frequently used method (e.g.,C4.5)
Proceed as follows:
1. Convert tree to equivalent set of rules
2. Prune each rule independently of others
3. Sort ﬁnal rules into desired sequence for use
Convert tree to rules by making the conjunction of
decision nodes along each branch the antecedent of a rule
and each leaf the consequent

Dr. Rao Muhammad Adeel Nawab 78

Reﬁnements: Rule Post-Pruning

Dr. Rao Muhammad Adeel Nawab 79

Reﬁnements: Rule Post-Pruning (cont)
To prune rules, remove any precondition (= conjunct in
antecedent) of a rule whose removal does not worsen rule
accuracy
Can estimate rule accuracy
by using a separate validation set
by using the training data, but assuming a statistically-based
pessimistic estimate of rule accuracy (C4.5)

Dr. Rao Muhammad Adeel Nawab 80

Reﬁnements: Rule Post-Pruning (cont)
Three advantages of converting trees to rules before
pruning:
1. converting to rules allows distinguishing different contexts in
which rules are used – treat each path through tree differently
contrast: removing a decision node removes all paths beneath
it
2. removes distinction between testing nodes near root and
those near leaves – avoids need to rearrange tree should
higher nodes be removed
3. rules often easier for people to understand

Dr. Rao Muhammad Adeel Nawab 81

Refinements: Continuous-valued Attibutes
Initial definition of ID3 restricted to discrete-valued
target attributes
decision node attributes
Can overcome second limitation by dynamically defining
new discrete-valued attributes that partition a continuous
attribute value into a set of discrete intervals

Dr. Rao Muhammad Adeel Nawab 82

Reﬁnements: Continuous-valued Attributes
So, for continuous attribute A dynamically create a new
Boolean attribute Ac that is true A > c and false otherwise
How do we pick c ? →Pick c that maximises information
gain

Dr. Rao Muhammad Adeel Nawab 83

Refinements: Continuous-valued Attributes
E.g. suppose for PlayTennis example we want Temperature to be a
continuous attribute
Temperature: 40 48 60 72 80 90
PlayTennis: No No Yes Yes Yes No
Sort by temperature and identify candidate thresholds midway
between points where target attribute changes ((60+48)/2) and
((90+80)/2))
Compute information gain for Temperature>54 and Temperature<85
and select highest (Temperature>54)
Can be extended to split continuous attribute into
> 2 intervals
Dr. Rao Muhammad Adeel Nawab 84
Refinements: Alternative Attribute Selection
Measures
Information gain measure favours attributes with many
values over those with few values.
E.g. if we add a Date attribute to the PlayTennis example it
will have a distinct value for each day and will have the
highest information gain.
this is because date perfectly predicts the target attribute for
all training examples
result is a tree of depth 1 that perfectly classifies training
examples but fails on all other data
Dr. Rao Muhammad Adeel Nawab 85
Refinements: Alternative Attribute Selection
Measures
Can avoid this by using other attribute selection measures.
One alternative is gain ratio

Dr. Rao Muhammad Adeel Nawab 86

Reﬁnements: Alternative Attribute Selection
Measures
where Si is subset of S for which c-valued attribute A has
value vi
(Note: Split Information is entropy of S w.r.t values of A)
Has effect of penalizing attributes with many, uniformly
distributed values
Experiments with variants of this and other attribute
selection measures have been carried out and are
reported in the machine learning literature

Dr. Rao Muhammad Adeel Nawab 87

Reﬁnements: Missing/Unknown Attribute
Values
What if a training example x is missing value for attribute A?
Several alternatives have been explored.
At decision node n where Gain(S,A) is computed
assign most common value of A among other examples sorted to
node n or
assign most common value of A among other examples at n with
same target attribute value as x or
assign probability pi to each possible value vi of A estimated from
observed frequencies of values of A for examples sorted to A

Dr. Rao Muhammad Adeel Nawab 88

Reﬁnements: Missing/Unknown Attribute Values

Assign fraction pi of x distributed down each branch in

tree below n (this technique is used in C4.5)
Last technique can be used to classify new examples with
missing attributes (i.e. after learning) in same fashion

Dr. Rao Muhammad Adeel Nawab 89

Reﬁnements: Attributes with Differing Costs
Different attributes may have different costs associated
with acquiring their values
E.g.
in medical diagnosis, different tests, such as blood tests, brain
scans, have different costs
in robotics positioning a sensing device on a robot so as to take
a differing measurements requires differing amounts of time (=
cost)

Dr. Rao Muhammad Adeel Nawab 90

Reﬁnements: Attributes with Differing Costs
How to learn a consistent tree with low expected cost?
Various approaches have been explored in which the
attribute selection measure is modiﬁed to include a cost
term. (E.g.)

Dr. Rao Muhammad Adeel Nawab 91

Summary
Decision trees classify instances. Testing starts at the root and
proceeds downwards:
Non-leaf nodes test one attribute of the instance and the attribute
value determines which branch is followed.
Leaf nodes are instance classifications.
Decision trees are appropriate for problems where:
instances are describable by attribute–value pairs (typically, but
not necessarily, nominal);
target function is discrete valued (typically, but not necessarily);
disjunctive hypotheses may be required;
training data may be noisy/incomplete.

Dr. Rao Muhammad Adeel Nawab 92

Summary (cont….)
Various algorithms have been proposed to learn decision trees
– ID3 is the classic. ID3:
recursively grows tree from the root picking at each point attribute
which maximises information gain with respect to the training
examples sorted to the current node
recursion stops when all examples down a branch fall into a single
class or all attributes have been tested
ID3 carries out incomplete search of complete hypothesis space
– contrast with CANDIDATE-ELIMINATION which carries out a
complete search of an incomplete hypothesis space.

Dr. Rao Muhammad Adeel Nawab 93

Summary (cont…)
Decision trees exhibit an inductive bias which prefers
shorter trees with high information gain attributes closer
to the root (at least where information gain is used as the
attribute selection criterion, as in ID3)
ID3 searches a complete hypothesis space for discrete-
valued functions, but searches the space incompletely,
using the information gain heuristic

Dr. Rao Muhammad Adeel Nawab 94

Summary (cont…)
Overﬁtting the training data is an important issue in
decision tree learning.
Noise or coincidental regularities due to small samples
may mean that while growing a tree beyond a certain size
improves its performance on the training data, it worsens
its performance on unseen instances
Overﬁtting can be addressed by post-pruning the decision
in a variety of ways

Dr. Rao Muhammad Adeel Nawab 95

Summary (cont…)
Various other reﬁnements of the basic ID3 algorithm
address issues such as:
handling real-valued attributes
handling training/test instances with missing attribute values
using attribute selection measures other than information gain
allowing costs to be associated with attributes

Dr. Rao Muhammad Adeel Nawab 96

How To Become a Great Human
Being

Dr. Rao Muhammad Adeel Nawab 97

Balanced Life is Ideal Life
Get Excellence in five things
1. Health
2. Spirituality
3. Work
4. Friend
5. Family
A Journey from BIGNNER to EXCELLENCE
You must have a combination of five things with different
variations. However, aggregate will be same.
Dr. Rao Muhammad Adeel Nawab 98
Excellence
1. Health
I can run (or brisk walk) 5 kilometers in one go
I take 7-9 hours sleep per night (TIP: Go to bed at 10pm)
I take 3 meals of balanced diet daily
2. Spirituality

Dr. Rao Muhammad Adeel Nawab 99

Excellence
3. Work
Become an authority in your field
For example - Dr. Abdul Qadeer Khan Sb is an authority in research
4. Friend
Have a DADDU YAR in life to drain out on daily basis
5. Family
1. Take Duas of Parents and elders by doing their ‫ ﺧﺩﻣﺕ‬and ‫ﺍﺩﺏ‬
2. Your wife/husband should be your best friend
3. Be humble and kind to kids, subordinates and poor people
Dr. Rao Muhammad Adeel Nawab 100
Definition by World Health Organization (WHO)
It is a state of complete
1. physical
2. mental
3. social
wellbeing, and not merely the absence of disease or infirmity.

Dr. Rao Muhammad Adeel Nawab 101

Motivation for Physical Health
CHANGE is never a matter of ABILITY it is always a matter of
MOTIVATION

Man + Tan ⟶ Both need good quality food to remain healthy

Focus on OUTCOMES not ACTIVITIES

Dr. Rao Muhammad Adeel Nawab 102

Daily running and exercise

Motivation for my students and

friends

Dr. Rao Muhammad Adeel Nawab 103

How to Spare Time for Health and Fitness

Technology is the biggest addiction after drugs

Trend vs Comfort
Control vs Quit
Make a Schedule with a particular focus on 3 things

1. Get ADEQUATE Sleep

For adults - 7 to 9 hours regular sleep per night.

Research showed that “Amount of Sleep” is an important indicator

of Health and Well Being.

Go to bed for sleep between 9:00 pm to 10:00 pm

Make a Schedule with a particular focus on 3 things

2. Eat a HEALTHY diet

Healthy diet contains mostly fruits and vegetables and includes little to
no processed food and sweetened beverages

The China Study

i. ‫ﻣﻳﻭے‬
ii. ‫ﺳﺑﺯﻳﺎں‬
iii. ‫ﭘﻬﻝ‬
Make a Schedule with a particular focus on 3 things

3. Exercise REGULARLY
Exercise is any bodily activity that enhances or maintains physical
fitness and overall health and wellness.

I am 55 years old and I can run (or brisk walk) five kilometers in one go
(Prof. Roger Moore, University of Sheffield, UK)

At least have brisk walk of 30 to 60 minutes daily

Key to Success

No Pain No Gain

Course Description, CSE Dept, National University, Bangladesh
30% (10)
Course Description, CSE Dept, National University, Bangladesh
30 pages
Cs 32 Final Notes
No ratings yet
Cs 32 Final Notes
21 pages
Homework Week 5 Trees
100% (1)
Homework Week 5 Trees
6 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Unit 3
No ratings yet
Unit 3
46 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Module 3 - Decision Tress and Artificial Neural Networks
No ratings yet
Module 3 - Decision Tress and Artificial Neural Networks
177 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
ML UNIT 2 Decision Tree
No ratings yet
ML UNIT 2 Decision Tree
109 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
module 2
No ratings yet
module 2
42 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
lec06decisiontreesandid3algorithm_727c2262eb504a6ee5d0bcf1f5c4d0c3_
No ratings yet
lec06decisiontreesandid3algorithm_727c2262eb504a6ee5d0bcf1f5c4d0c3_
26 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
ID3
No ratings yet
ID3
7 pages
3 - Decision trees
No ratings yet
3 - Decision trees
16 pages
03 02 Decision Trees (1)
No ratings yet
03 02 Decision Trees (1)
61 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
decision_tree_learning_lecture
No ratings yet
decision_tree_learning_lecture
13 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Module 3
No ratings yet
Module 3
103 pages
W7-8_ Decision Trees
No ratings yet
W7-8_ Decision Trees
81 pages
Unit IV Notes
No ratings yet
Unit IV Notes
20 pages
Mitchell Dectrees PDF
No ratings yet
Mitchell Dectrees PDF
29 pages
Slide 3
No ratings yet
Slide 3
23 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Module 3
No ratings yet
Module 3
102 pages
L6 Decision Tree Classifier
No ratings yet
L6 Decision Tree Classifier
46 pages
ml unit 3 part 1
No ratings yet
ml unit 3 part 1
42 pages
Create Your World By Changing Your Words
From Everand
Create Your World By Changing Your Words
Henry Aderigbige
No ratings yet
Project Management: 101 Tips and Tools for Success
From Everand
Project Management: 101 Tips and Tools for Success
Rob Kennaugh
No ratings yet
Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn
No ratings yet
Machine Learning Notes - Lec 05 - Text Claasification Using Scikit-Learn
75 pages
Machine Learning Notes - Lec 01 - Basics of Machine Learning
No ratings yet
Machine Learning Notes - Lec 01 - Basics of Machine Learning
109 pages
Machine Learning Notes - Lec 03 - Gender Identification Using Scikit-Learn
No ratings yet
Machine Learning Notes - Lec 03 - Gender Identification Using Scikit-Learn
71 pages
Machine Learning Notes - Lec 02 - Concept Learning
No ratings yet
Machine Learning Notes - Lec 02 - Concept Learning
92 pages
B Tree
No ratings yet
B Tree
58 pages
Sample Question Paper Data Structure Using 'C' PDF
No ratings yet
Sample Question Paper Data Structure Using 'C' PDF
5 pages
tree
No ratings yet
tree
37 pages
CS Question Paper and Answer Key - 27.06.2019 1
No ratings yet
CS Question Paper and Answer Key - 27.06.2019 1
68 pages
Tree (Java)
No ratings yet
Tree (Java)
27 pages
AVL Trees in Java
No ratings yet
AVL Trees in Java
7 pages
B Trees
No ratings yet
B Trees
62 pages
PDF
No ratings yet
PDF
35 pages
5.1. Data Structures and Algorithms Lab Exam
No ratings yet
5.1. Data Structures and Algorithms Lab Exam
18 pages
Data Structure Chapter 1
No ratings yet
Data Structure Chapter 1
34 pages
Quicksort On Singly Linked List 14. Iterative Quick Sort 15. Merge Sort For Linked List
No ratings yet
Quicksort On Singly Linked List 14. Iterative Quick Sort 15. Merge Sort For Linked List
21 pages
Sample Paper Tehsil It Officer: Building Standards in Educational and Professional Testing
No ratings yet
Sample Paper Tehsil It Officer: Building Standards in Educational and Professional Testing
11 pages
Database Architecture
No ratings yet
Database Architecture
42 pages
JNTUK-DAP-Revised Course Structure and Syllabus-B.tech (CSE and IT) - II YEAR-I Semester.R10 Students 05-08-2011
No ratings yet
JNTUK-DAP-Revised Course Structure and Syllabus-B.tech (CSE and IT) - II YEAR-I Semester.R10 Students 05-08-2011
10 pages
Assignment DSA
No ratings yet
Assignment DSA
38 pages
Discrete Math Introduction
No ratings yet
Discrete Math Introduction
3 pages
Chapter 13
No ratings yet
Chapter 13
100 pages
Ad 01
No ratings yet
Ad 01
162 pages
Amazon All Time Q
No ratings yet
Amazon All Time Q
46 pages
The Next Wave Vol. 20 No. 2
No ratings yet
The Next Wave Vol. 20 No. 2
46 pages
Iit C++ Notes
No ratings yet
Iit C++ Notes
5 pages
The Log-Structured Merge-Tree (LSM-Tree) : Abstract
No ratings yet
The Log-Structured Merge-Tree (LSM-Tree) : Abstract
32 pages
Dsa 450questions
No ratings yet
Dsa 450questions
7 pages
3 Sigma Batch
No ratings yet
3 Sigma Batch
14 pages
Non Linear Data Structures
No ratings yet
Non Linear Data Structures
50 pages
Discrete Mathematics - ABDUR RAHMANE175678
No ratings yet
Discrete Mathematics - ABDUR RAHMANE175678
78 pages