0% found this document useful (0 votes)

41 views61 pages

Chapter 3 Decision Trees

Uploaded by

Kenneth Kibet Ngeno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views61 pages

Chapter 3 Decision Trees

Uploaded by

Kenneth Kibet Ngeno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Chapter 3

Decision Trees
What is a Decision Tree?
• Decision trees are powerful and popular tools for
classification and prediction.

• An inductive learning task

– Use particular facts to make more generalized conclusions

• A predictive model based on a branching series of

Boolean tests
– These smaller Boolean tests are less complex than a one-
stage classifier

• Let’s look at a sample decision tree…

2
Predicting Commute Time

Leave At If we leave at
10 AM 9 AM 10 AM and
8 AM
there are no
Stall? Accident?
cars stalled on
No Yes Long No Yes the road, what
will our
Short Long Medium Long commute time
be?

3
Inductive Learning
• In this decision tree, we made a series of Boolean
decisions and followed the corresponding branch
– Did we leave at 10 AM?
– Did a car stall on the road?
– Is there an accident on the road?

• By answering each of these yes/no questions, we

then came to a conclusion on how long our commute
might take

4
Learning in Decision Trees
• Goal: Build a decision tree for
classifying examples as positive or
negative instances of a concept using
supervised learning from a training Color
set. green blue
red
• A decision tree is a tree where
– each non-leaf node is associated with an Size + Shape
attribute (feature) big small square round
– each leaf node is associated with a
classification (+ or -) - + Size +
– each arc is associated with one of the big small
possible values of the attribute at the - +
node where the arc is directed from.
• Generalization: allow for > 2 classes
– e.g., {sell, hold, buy}
5
Decision Trees as Rules
• We did not have represent this tree
graphically

• We could have represented as a set of rules.

However, this may be much harder to read…

6
Decision Tree as a Rule Set

if hour == 8am • Notice that all attributes need

commute time = long not to have to be used in each
else if hour == 9am path of the decision.
if accident == yes
commute time = long • As we will see, all attributes
may not even appear in the
else tree.
commute time = medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short

7
How to Create a Decision Tree
• We first make a list of attributes that we
can measure
– These attributes (for now) must be discrete
• We then choose a target attribute that
we want to predict
• Then create an experience table that
lists what we have seen in the past

8
Sample Experience Table
Example Attributes Target

Hour Weather Accident Stall Commute

D1 8 AM Sunny No No Long

D2 8 AM Cloudy No Yes Long

D3 10 AM Sunny No No Short

D4 9 AM Rainy Yes No Long

D5 9 AM Sunny Yes Yes Long

D6 10 AM Sunny No No Short

D7 10 AM Cloudy No No Short

D8 9 AM Rainy No No Medium

D9 9 AM Sunny Yes No Long

D10 10 AM Cloudy Yes Yes Long

D11 10 AM Rainy No No Short

D12 8 AM Cloudy Yes No Long

D13 9 AM Sunny No No Medium

Choosing Attributes
• The previous experience decision table
showed 4 attributes: hour, weather, accident
and stall
• But the decision tree only showed 3
attributes: hour, accident and stall
• Why is that?

10
Choosing Attributes
• Methods for selecting attributes (which will be
described later) show that weather is not a
discriminating attribute
• We use the principle of Occam’s Razor:
– Given a number of competing hypotheses, the
simplest one is preferable

11
Choosing Attributes
• The basic structure of creating a decision tree
is the same for most decision tree algorithms
• The difference lies in how we select the
attributes for the tree
• We will focus on the ID3 algorithm developed
by Ross Quinlan in 1975

12
Decision Tree Algorithms
• The basic idea behind any decision tree
algorithm is as follows:
– Choose the best attribute(s) to split the
remaining instances and make that attribute a
decision node
– Repeat this process for recursively for each child
– Stop when:
• All the instances have the same target attribute value
• There are no more attributes
• There are no more instances
13
Identifying the Best Attributes
• Refer back to our original decision tree

Leave At
10 AM 9 AM
8 AM

Stall? Accident?
Long
No Yes No Yes
Short Long Medium Long

◼ How did we know to split on leave at and

then on stall and accident and not weather?
14
Choosing the Best Attribute
• The key problem is choosing which attribute to split a
given set of examples.
• Intuitively: A good attribute splits the examples into
subsets that are (ideally) all positive or all negative.
• Some possibilities are:
– Random: Select any attribute at random
– Least-Values: Choose the attribute with the smallest number
of possible values (fewer branches)
– Most-Values: Choose the attribute with the largest number
of possible values (smaller subsets)
– Max-Gain: Choose the attribute that has the largest expected
information gain, i.e. select attribute that will result in the
smallest expected size of the subtrees rooted at its children.
15
ID3 Heuristic
• To determine the best attribute, we look at
the ID3 heuristic
• ID3 splits attributes based on their entropy.
• Entropy is the measure of disinformation…

16
Entropy
• A measure of homogeneity of the set of
examples.

• Entropy is minimized when all values of the

target attribute are the same.
– If we know that commute time will always be
short, then entropy = 0
– The entropy is 0 if the outcome is ``certain’’.

17
Entropy
• Entropy is maximized when there is an equal
chance of all values for the target attribute
(i.e. the result is random)
– If commute time = short in 3 instances, medium in
3 instances and long in 3 instances, entropy is
maximized
– The entropy is maximum if we have no
knowledge of the system (or any outcome is
equally possible).

18
Entropy
• Calculation of entropy
– Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)
• S = set of examples
• Si = subset of S with value vi under the target attribute
• l = size of the range of the target attribute

19
Entropy - Example
• Given a set S of positive and negative examples of
some target concept (a 2-class problem), the entropy of
set S relative to this binary classification is

E(S) = - p(P)log2 p(P) – p(N)log2 p(N)

• Suppose S has 25 examples, 15 positive and 10

negatives [15+, 10-]. Then the entropy of S relative to
this classification is

E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

20
ID3
• ID3 splits on attributes with the lowest entropy
• We calculate the entropy for all values of an attribute as the
weighted sum of subset entropies as follows:
∑(i = 1 to k) |Si|/|S| Entropy(Si),
– where k is the range of the attribute we are testing

• We can also measure information gain (which is inversely

proportional to entropy) as follows:
– Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)

21
Information Gain
• Information gain measures the expected reduction in
entropy, or uncertainty.

– Values(A) is the set of all possible values for attribute A,

and Sv the subset of S for which attribute A has value v
Sv = {s in S | A(s) = v}.

– the first term in the equation for Gain is just the entropy
of the original collection S
– the second term is the expected value of the entropy after
S is partitioned using attribute A
22
Information Gain - cont
• It measures how well a given attribute
separates the training examples according to
their target classification
• This measure is used to select among the
candidate attributes at each step while
growing the tree
• It is simply the expected reduction in entropy
caused by partitioning the examples according
to this attribute.
23
Training Examples
Day Outlook Temp Humidity Wind Tennis?
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Entropy Example
• Entropy (disorder) is bad, Homogeneity (Information Gain) is
good
• Let S be a set of examples
• Entropy(S) = -P log2(P) - N log2(N)
– P is proportion of pos example
– N is proportion of neg examples
– 0 log 0 == 0

• Example: S has 9 pos and 5 neg

Entropy([9+, 5-]) = -(9/14) log2(9/14) - (5/14)log2(5/14)
= 0.940
25
Information Gain
• Measure of expected reduction in entropy
• Resulting from splitting along an attribute

Gain(S,A) = Entropy(S) -  (|S | / |S|) Entropy(S )

v v

v  Values(A)

Where Entropy(S) = -P log2(P) - N log2(N)

26
Gain of Splitting on Wind
Day Wind Tennis?
Values(wind) = weak, strong d1 weak no
S = [9+, 5-] d2 strong no
Sweak = [6+, 2-] d3 weak yes
Sstrong = [3+, 3-] d4 weak yes
d5 weak yes
d6 strong no
Gain(S, wind) d7 strong yes
= Entropy(S) -
 (|S | / |S|) Entropy(S )
v v
d8
d9
weak
weak
no
yes
v  {weak, s} d10 weak yes
d11 strong yes
= Entropy(S) - 8/14 Entropy(Sweak)
d12 strong yes
- 6/14 Entropy(Sstong) d13 weak yes
= 0.940 - (8/14) 0.811 - (6/14) 1.00 d14 strong no
= .048

27
Gain of Splitting on Humidity and Wind
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

E=0.985 E=0.592 E=0.811 E=1.0
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
= 0.151 = 0.048
NB: Humidity provides greater information gain than Wind, with relation to
28
target classification.
Gain of splitting on Outlook
S=[9+,5-]
E=0.940
Outlook

Over
Sunny Rain
cast

[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.971
= 0.247

• The information gain on Temperature is calculated the

same way 29
Evaluating Attributes
Yes

Humid Wind

Gain(S,Humid) Gain(S,Wind)
= 0.151 Outlook Temp = 0.048

Gain(S,Outlook) Gain(S,Temp)
= 0.246 = 0.029

▪Outlook provides the best prediction for the target

30
Resulting Tree
▪ Lets grow the tree:
▪ add to the tree a successor for each possible value of
Outlook
▪ partition the training samples according to the value of
Outlook

Outlook
Sunny Rain
Overcast
Don’t Play Don’t Play
[2+, 3-] Play [3+, 2-]
[4+] 31
Second step
▪ Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5  0.0 − 2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5  1.0 − 3/5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5  0.0 − 2/5  1.0 − 1/5  0.0 =
0.570
▪ Humidity provides the best prediction for the target
▪ Lets grow the tree:
▪ add to the tree a successor for each possible value of
Humidity
▪ partition the training samples according to the value of
Humidity
32
One Step Later
Good day for tennis?

Outlook
Sunny Rain
Overcast
Humidity
Don’t Play
Play
[2+, 3-]
[4+]
High Normal

Don’t play Play

[3-] [2+]

33
Recurse Again
Good day for tennis?

Outlook
Sunny Medium
Overcast
Humidity Day Temp Humid Wind Tennis?
d4 m h weak yes
High Low d5 c n weak yes
d6 c n s n
d10 m n weak yes
d14 m h s n

34
One Step Later: Final Tree
Good day for tennis?

Outlook
Sunny Rain
Overcast
Humidity Wind
Play
[4+] Strong Weak
High Normal

Don’t play Play Don’t play Play

[3-] [2+] [2-] [3+]

35
Pruning Trees
• There is another technique for reducing the
number of attributes used in a tree - pruning
• Two types of pruning:
– Pre-pruning (forward pruning)
– Post-pruning (backward pruning)

36
Prepruning
• In prepruning, we decide during the building process
when to stop adding attributes (possibly based on
their information gain)

• However, this may be problematic – Why?

– Sometimes attributes individually do not contribute much
to a decision, but combined, they may have a significant
impact
37
Postpruning

• Postpruning waits until the full decision tree

has built and then prunes the attributes
• Two techniques:
– Subtree Replacement
– Subtree Raising

38
Subtree Replacement
• Entire subtree is replaced by a single leaf node

A A

B B
C 4 5 6 4 5

1 2 3

• Node 6 replaced the subtree

• Generalizes tree a little more, but may
increase accuracy 39
Subtree Raising
• Entire subtree is raised onto another node

A A

B
C
C 4 5

1 2 3 1 2 3

• Entire subtree is raised onto another node

• This process is very time consuming
40
Reduced Error Pruning Example
Outlook
Sunny Rain
Overcast
Humidity Wind
Play
Strong Weak
High Low

Don’t play Play

Validation set accuracy = 0.75

41
Reduced Error Pruning Example
Outlook
Sunny Rain
Overcast
Don’t play Wind
Play
Strong Weak

Don’t play Play

Validation set accuracy = 0.80

42
Reduced Error Pruning Example
Outlook
Sunny Rain
Overcast
Humidity
Play
Play
High Low

Don’t play Play

Validation set accuracy = 0.70

43
Reduced Error Pruning Example
Outlook
Sunny Rain
Overcast
Don’t play Wind
Play
Strong Weak

Don’t play Play

Use this as final tree

44
Problems with Decision Trees
• While decision trees classify quickly, the time
for building a tree may be higher than
another type of classifier

• Decision trees suffer from a problem of errors

propagating throughout a tree
– A very serious problem as the number of classes
increases

45
Error Propagation
• Since decision trees work by a series of local
decisions, what happens when one of these
local decisions is wrong?
– Every decision from that point on may be wrong
– We may never return to the correct path of the
tree

46
Problems with ID3
• ID3 is not optimal
– Uses expected entropy reduction, not actual
reduction
• Must use discrete (or discretized) attributes
– What if we left for work at 9:30 AM?
– We could break down the attributes into smaller
values…

47
Problems with ID3
• If we broke down leave time to the minute,
we might get something like this:

8:02 AM 8:03 AM 9:05 AM 9:07 AM 9:09 AM 10:02 AM

Long Medium Short Long Long Short

• Since entropy is very low for each branch, we

have n branches with n leaves.
• This would not be helpful for predictive modeling.

48
Problems with ID3
• We can use a technique known as discretization
• We choose cut points, such as 9AM for splitting
continuous attributes
• These cut points generally lie in a subset of
boundary points, such that a boundary point is
where two adjacent instances in a sorted list have
different target value attributes

49
Problems with ID3
• Consider the attribute commute time

8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)

• When we split on these attributes, we increase the

entropy so we don’t have a decision tree with the
same number of cut points as leaves

50
Issues with Decision Trees
• Missing data
• Real-valued attributes
• Many-valued features
• Evaluation
• Overfitting

51
Missing Data 1
Day Temp Humid Wind Tennis?
Assign most common
d1 h h weak n
d2 h h s n value at this node
d8 m h weak n ?=>h
d9 c ? weak yes
d11 m n s yes

Day Temp Humid Wind Tennis?

d1 h h weak n Assign most common
d2 h h s n value for class
d8 m h weak n ?=>n
d9 c ? weak yes
d11 m n s yes

52
Missing Data 2
Day Temp Humid Wind Tennis?
d1 h h weak n [0.75+, 3-]
d2 h h s n
d8 m h weak n
d9 c ? weak yes
d11 m n s yes
[1.25+, 0-]

• 75% h and 25% n

• Use in gain calculations
• Further subdivide if other missing attributes
• Same approach to classify test ex with missing attribute
– Classification is most probable classification
– Summing over leaves where it got divided
53
Real-valued Features
• Discretize?
• Threshold split using observed values?
Wind 8 25 7 6 6 10 12 5 7 7 12 10 7 11
Play n n y y y n y n y y y y y n

Wind 25 12 12 11 10 10 8 7 7 7 7 6 6 5
Play n y y n y n n y y y y y y n

>= 12 >= 10
Gain = 0.0004 Gain = 0.048

54
Many-valued Attributes
• Problem:
– If attribute has many values, Gain will select it
– Imagine using Date = June_6_1996
• So many values
– Divides examples into tiny sets
– Sets are likely uniform => high info gain
– Poor predictor
• Penalize these attributes

55
Evaluation
• Training accuracy
– How many training instances can be correctly
classify based on the available data?
– Is high when the tree is deep/large, or when there
is less confliction in the training instances.
– however, higher training accuracy does not mean
good generalization
• Testing accuracy
– Given a number of new instances, how many of
them can we correctly classify?
– Cross validation 56
Overfitting Definition
• DT is overfit when exists another DT’ and
– DT has smaller error on training examples, but
– DT has bigger error on test examples
• Causes of overfitting
– Noisy data, or
– Training set is too small
• Solutions
– Reduced error pruning
– Early stopping
– Rule post pruning
57
Summary
• Decision trees can be used to help predict the
future
• The trees are easy to understand
• Decision trees work more efficiently with
discrete attributes
• The trees may suffer from error propagation

58
Strengths
• can generate understandable rules
• perform classification without much
computation
• can handle continuous and categorical
variables
• provide a clear indication of which fields are
most important for prediction or classification

59
Weaknesses
• Not suitable for prediction of continuous attribute.
• Perform poorly with many class and small data.
• Computationally expensive to train.
– At each node, each candidate splitting field must be sorted
before its best split can be found.
– In some algorithms, combinations of fields are used and a
search must be made for optimal combining weights.
– Pruning algorithms can also be expensive since many
candidate sub-trees must be formed and compared.
• Do not treat well non-rectangular regions.

60
Assignment 1
• The following attributes of instances are used to determine the best day
for playing tennis
Day Outlook Temp Humid Wind PlayTennis?

d1 sunny hot high weak no

d2 sunny hot high strong no
d3 overcast hot high weak yes
d4 rainy medium high weak yes
d5 rainy cool normal weak yes
d6 rainy cool normal strong no
d7 overcast cool normal strong yes
d8 sunny medium high weak no
d9 sunny cool normal weak yes
d10 rainy medium normal weak yes
d11 sunny medium normal strong yes
d12 overcast medium high strong yes
d13 overcast hot normal weak yes
d14 rainy medium high strong no

• Use Information Gain to select the best attributes, and draw the resultant
decision tree for the above data
NB: Show all the calculations
61
Assignment should be hand written, scanned and submitted as PDF.

Staar 2023 Math Grade 7 Practice Test Final
100% (2)
Staar 2023 Math Grade 7 Practice Test Final
38 pages
Narrative Report - Leland Ojt
100% (1)
Narrative Report - Leland Ojt
17 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
AIML - Module 3 - Updated
No ratings yet
AIML - Module 3 - Updated
42 pages
TESDA Circular No. 026-2023
No ratings yet
TESDA Circular No. 026-2023
22 pages
Individual Performance Rating (IPR) : Key Result Area (KRA) Actual Results Rating Score Quality Efficiency Timeliness
100% (1)
Individual Performance Rating (IPR) : Key Result Area (KRA) Actual Results Rating Score Quality Efficiency Timeliness
3 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
Curriculum Guide in English 8
No ratings yet
Curriculum Guide in English 8
19 pages
03 02 Decision Trees
No ratings yet
03 02 Decision Trees
61 pages
Storey DecisionTrees
No ratings yet
Storey DecisionTrees
38 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Unit 3
No ratings yet
Unit 3
81 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Ev3 User Guide
No ratings yet
Ev3 User Guide
69 pages
Research Methodologies Research Exercise
No ratings yet
Research Methodologies Research Exercise
14 pages
Tree Models
No ratings yet
Tree Models
42 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Ai 01 Id3
No ratings yet
Ai 01 Id3
7 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
Module 6
No ratings yet
Module 6
5 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
2025 Lecture07 P1 ID3
No ratings yet
2025 Lecture07 P1 ID3
41 pages
A Dolls House - PDF Part 2
No ratings yet
A Dolls House - PDF Part 2
2 pages
Unit 3
No ratings yet
Unit 3
46 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module 3
No ratings yet
Module 3
102 pages
Module 3
No ratings yet
Module 3
101 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Unit 3
No ratings yet
Unit 3
90 pages
Shakey's 2021
No ratings yet
Shakey's 2021
69 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Adon Olam
No ratings yet
Adon Olam
4 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
School Calendar
No ratings yet
School Calendar
2 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
NCP
No ratings yet
NCP
9 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Final Placement Report Batch of 2022
No ratings yet
Final Placement Report Batch of 2022
4 pages
Jesus Is Lord Christian School: EMCEE/S: Sir Robinson and Ma'am Janine
No ratings yet
Jesus Is Lord Christian School: EMCEE/S: Sir Robinson and Ma'am Janine
3 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Proximal Goals - Bandura & Schunk, 1981
No ratings yet
Proximal Goals - Bandura & Schunk, 1981
13 pages
Gantt Chart For Research Project: Tasks Progress Start
No ratings yet
Gantt Chart For Research Project: Tasks Progress Start
5 pages
Reliability Standards: 1. IEEE P1624
No ratings yet
Reliability Standards: 1. IEEE P1624
5 pages
Q1 Beowulf and Grendel 2
No ratings yet
Q1 Beowulf and Grendel 2
5 pages
Single Subject Design Critique: Article: Running Training After Stroke: A Single-Subject Report
No ratings yet
Single Subject Design Critique: Article: Running Training After Stroke: A Single-Subject Report
19 pages
Felix Gauger - User - Preferences - For - Coworking - Spaces - A - Comparison Between The Netherlands, Germany and The Czech Republic
No ratings yet
Felix Gauger - User - Preferences - For - Coworking - Spaces - A - Comparison Between The Netherlands, Germany and The Czech Republic
25 pages
Adv. No. 1of 2022
No ratings yet
Adv. No. 1of 2022
9 pages
Marketing Plan Bachelor of Elementary Education (ETEEAP)
No ratings yet
Marketing Plan Bachelor of Elementary Education (ETEEAP)
6 pages
ID3
No ratings yet
ID3
7 pages
SMART Postgraduate Course Scientific Program
No ratings yet
SMART Postgraduate Course Scientific Program
3 pages
Lec. 1-3 - Chapt.1 - Intro
No ratings yet
Lec. 1-3 - Chapt.1 - Intro
35 pages
DSC2021 02 15
No ratings yet
DSC2021 02 15
28 pages
Clinical Research Proposal Guidelines
No ratings yet
Clinical Research Proposal Guidelines
2 pages
Agile Teamwork - Minimize Handoffs
No ratings yet
Agile Teamwork - Minimize Handoffs
3 pages
20240229T112406 Pols90023 International Law and World Politics Entwined
No ratings yet
20240229T112406 Pols90023 International Law and World Politics Entwined
10 pages
Hess - Why Are Schadenfreude and Gluckschmerz Not Happiness or Anger
No ratings yet
Hess - Why Are Schadenfreude and Gluckschmerz Not Happiness or Anger
2 pages
How to Find a Wolf in Siberia (or, How to Troubleshoot Almost Anything)
From Everand
How to Find a Wolf in Siberia (or, How to Troubleshoot Almost Anything)
Don Jones
No ratings yet

Chapter 3 Decision Trees

Uploaded by

Chapter 3 Decision Trees

Uploaded by

Chapter 3

• An inductive learning task

• A predictive model based on a branching series of

• Let’s look at a sample decision tree…

• By answering each of these yes/no questions, we

• We could have represented as a set of rules.

if hour == 8am • Notice that all attributes need

Hour Weather Accident Stall Commute

D2 8 AM Cloudy No Yes Long

D4 9 AM Rainy Yes No Long

D5 9 AM Sunny Yes Yes Long

D9 9 AM Sunny Yes No Long

D10 10 AM Cloudy Yes Yes Long

D11 10 AM Rainy No No Short

D12 8 AM Cloudy Yes No Long

D13 9 AM Sunny No No Medium

◼ How did we know to split on leave at and

• Entropy is minimized when all values of the

E(S) = - p(P)log2 p(P) – p(N)log2 p(N)

• Suppose S has 25 examples, 15 positive and 10

E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)

• We can also measure information gain (which is inversely

– Values(A) is the set of all possible values for attribute A,

• Example: S has 9 pos and 5 neg

Gain(S,A) = Entropy(S) -  (|S | / |S|) Entropy(S )

Where Entropy(S) = -P log2(P) - N log2(N)

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

[2+, 3-] [4+, 0] [3+, 2-]

• The information gain on Temperature is calculated the

▪Outlook provides the best prediction for the target

Don’t play Play

Don’t play Play Don’t play Play

• However, this may be problematic – Why?

• Postpruning waits until the full decision tree

• Node 6 replaced the subtree

• Entire subtree is raised onto another node

Don’t play Play

Validation set accuracy = 0.75

Don’t play Play

Validation set accuracy = 0.80

Don’t play Play

Validation set accuracy = 0.70

Don’t play Play

Use this as final tree

• Decision trees suffer from a problem of errors

8:02 AM 8:03 AM 9:05 AM 9:07 AM 9:09 AM 10:02 AM

Long Medium Short Long Long Short

• Since entropy is very low for each branch, we

• When we split on these attributes, we increase the

Day Temp Humid Wind Tennis?

• 75% h and 25% n

d1 sunny hot high weak no

You might also like