100% found this document useful (1 vote)

52 views61 pages

Decision Trees

This document discusses decision trees, which are a representation for concept learning and decision making. Decision trees represent concepts as disjunctions of conjunctions of attribute values, allowing for more powerful representations than just conjunctions. Decision trees can be constructed from training examples using a top-down approach, where the best attribute to split on is selected at each node. The best attribute minimizes the number of examples in resulting leaf nodes or maximizes the purity of leaf nodes. Splits continue until leaf nodes are perfectly classified or the tree reaches a chosen depth.

Uploaded by

Ashish Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

52 views61 pages

Decision Trees

Uploaded by

Ashish Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 61

Decision Trees

1
Representation of Concepts
 Concept learning: conjunction of attributes
 (Sunny AND Hot AND Humid AND Windy) +

• Decision trees: disjunction of conjunction of attributes

• (Sunny AND Normal) OR (Overcast) OR (Rain AND Weak) +
• More powerful representation
OutdoorSport
• Larger hypothesis space H
• Can be represented as a tree
sunny overcast rain
• Common form of decision
making in humans Humidity Yes Wind

high normal strong weak

No Yes No Yes
2
Rectangle learning….
-----------
-- +++++++ --
-- -- Conjunctions
-- ++++++
-- (single rectangle)
-- --
-----------

-----------
--
----------- +++++++ --
----------- ++++++ --
-- Disjunctions of Conjunctions
-- -- +++++++ --
-- (union of rectangles)
-- -- --
-- ++++++ --
-- --
-- --
-- --
--
-----------

3
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Trees
• Decision tree to represent learned target functions
– Each internal node tests an attribute
– Each branch corresponds to attribute value
– Each leaf node assigns a classification
Outlook

• Can be represented
sunny overcast rain
by logical formulas
Humidity Yes Wind

high normal strong weak

No Yes No Yes

5
Representation in decision trees

 Example of representing rule in DT’s:

if outlook = sunny AND humidity = normal
OR
if outlook = overcast
OR
if outlook = rain AND wind = weak
then playtennis

6
Applications of Decision Trees

 Instances describable by a fixed set of attributes and their values

 Target function is discrete valued
– 2-valued
– N-valued
– But can approximate continuous functions
 Disjunctive hypothesis space
 Possibly noisy training data
– Errors, missing values, …
 Examples:
– Equipment or medical diagnosis
– Credit risk analysis
– Calendar scheduling preferences

7
Decision Trees

Given distribution
+ + + + - - - - + + + + Of training instances
Attribute 2

+ + + + - - - - + + + + Draw axis parallel

+ + + + - - - - + + + + Lines to separate the
- - - - Instances of each class
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
8
Decision Tree Structure

Draw axis parallel

+ + + + - - - - + + + + Lines to separate the
Attribute 2

+ + + + - - - - + + + + Instances of each class

+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
9
Decision Tree Structure
Decision leaf
* Alternate splits possible
Attribute 2

+ + + + - - - - + + + +
Decision node
+ + + + - - - - + + + +
+ + + + - - - - + + + + = condition
30 - - - - = box
- - - - - - - - + + + + = collection of satisfying
- - - - - - - - + + + + examples
- - - - - - - - + + + +
- - - - - - - -

Decision nodes (splits)

20 40 Attribute 1
10
Decision Tree Construction
• Find the best structure
• Given a training data set

11
Top-Down Construction

 Start with empty tree

 Main loop:
1.Split the “best” decision attribute (A) for next node
2.Assign A as decision attribute for node
3.For each value of A, create new descendant of node
4.Sort training examples to leaf nodes
5.If training examples perfectly classified, STOP,
Else iterate over new leaf nodes
 Grow tree just deep enough for perfect classification
– If possible (or can approximate at chosen depth)
 Which attribute is best?

12
Best attribute to split?

+ + + + - - - - + + + +
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
13
Best attribute to split?

+ + + + - - - - + + + +
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 40? Attribute 1

14
Best attribute to split?

+ + + + - - - - + + + +
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 40? Attribute 1

15
Which split to make next?

Pure box/node
Mixed box/node

+ + + + - - - - + + + +
Attribute 2

+ + + + - - - - + + + + Already pure leaf

+ + + + - - - - + + + + No further need to split
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 20? Attribute 1

16
Which split to make next?

Already pure leaf

No further need to split
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
+ + + + - - - - + + + +
A2 > 30? - - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
17
Principle of Decision Tree Construction
• Finally we want to form pure leaves
– Correct classification
• Greedy approach to reach correct classification
1. Initially treat the entire data set as a single box
2. For each box choose the spilt that reduces its impurity (in terms
of class labels) by the maximum amount
3. Split the box having highest reduction in impurity
4. Continue to Step 2
5. Stop when all boxes are pure

18
Choosing Best Attribute?
• Consider  examples,  and 
• Which one is better?
29, 35 A1 29, 35 A2
t f t f

25, 5 4, 30 15, 19 14, 16

• Which is better?
29, 35 A1 29, 35 A2
t f t f

21, 5 8, 30 18, 33 11, 2

19
Entropy

• A measure for
– uncertainty
– purity
– information content
• Information theory: optimal length code assigns (logp) bits to
message having probability p
• S is a sample of training examples
– p+ is the proportion of positive examples in S
– p is the proportion of negative examples in S
• Entropy of S: average optimal number of bits to encode information
about certainty/uncertainty about S
EntropySplogpplogp plogpplogp
• Can be generalized to more than two values
20
Entropy

 Entropy can also be viewed as measuring

– purity of S,
– uncertainty in S,
– information in S, …
 E.g.: values of entropy for p+=1, p+=0, p+=.5
 Easy generalization to more than binary values
– Sum over pi *(-log2 pi) , i=1,n
 i is + or – for binary
 i varies from 1 to n in the general case

21
Choosing Best Attribute?
• Consider  examples (,and compute entropies:
• Which one is better?
E(S)=0.993
29, 35 A1 E(S)=0.993 29, 35 A2
t f t f
0.650 0.522 0.989 0.997
25, 5 4, 30 15, 19 14, 16

• Which is better?
E(S)=0.993 E(S)=0.993
29, 35 A1 29 , 35 A2
 

t f t f
0.708 0.742 0.937 0.619
21, 5 8, 30 18, 33 11, 2

22
Information Gain
• Gain(S,A): reduction in entropy after choosing attr. A
Sv
Gain( S , A)  Entropy( S )  
vValues ( A ) S
Entropy ( S v )

E(S)=0.993
29, 35 A1 E(S)=0.993 29 , 35
  A2
t f t f
0.650 0.522 0.989 0.997
25, 5 4, 30 15, 19 14, 16
Gain: 0.395 Gain: 0.000

E(S)=0.993 E(S)=0.993
29, 35 A1 29 , 35 A2
 

t f t f
0.708 0.742 0.937 0.619
21, 5 8, 30 18, 33 11, 2
Gain: 0.265 Gain: 0.121 23
Gain function
 Gain is measure of how much can
– Reduce uncertainty
 Value lies between 0,1
 What is significance of
 gain of 0?
 example where have 50/50 split of +/- both before and after
discriminating on attributes values
 gain of 1?
 Example of going from “perfect uncertainty” to perfect certainty
after splitting example with predictive attribute
– Find “patterns” in TE’s relating to attribute values
 Move to locally minimal representation of TE’s

24
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Determine the Root Attribute
9+, 5 9+, 5

Humidity Wind

High Low Weak Strong

3+, 4 6+, 1 6+, 2 3+, 3

   

Gain (S, Humidity) GainS, Wind

Gain (S, Outlook) Gain (S, Temp)

26
Sort the Training Examples
9+, 5D1,…,D14

Outlook

Sunny Overcast Rain

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D15

2+, 3 4+, 0 3+, 2

? Yes ?

Ssunny= {D1,D2,D8,D9,D11}
Gain (Ssunny, Humidity) = .970
Gain (Ssunny, Temp) = .570
Gain (Ssunny, Wind) = .019 27
Final Decision Tree for Example

Outlook

Sunny Rain
Overcast

Humidity
Yes Wind
High
Normal Strong Weak

No Yes No
Yes

28
Hypothesis Space Search (ID3)
• Hypothesis space (all possible trees) is complete!
– Target function is included in there

29
Hypothesis Space Search in Decision Trees
• Conduct a search of the space of decision trees which
can represent all possible discrete functions.

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

is NP-hard.

• Perform a greedy heuristic search: hill climbing without

backtracking

• Statistics-based decisions using all data

30
Hypothesis Space Search by ID3
• Hypothesis space is complete!
– H is space of all finite DT’s (all discrete functions)
– Target function is included in there
• Simple to complex hill-climbing search of H
– Use of gain as hill-climbing function
• Outputs a single hypothesis (which one?)
– Cannot assess all hypotheses consistent with D (usually many)
– Analogy to breadth first search
 Examines all trees of given depth and chooses best…
• No backtracking
– Locally optimal ...
• Statistics-based search choices
– Use all TE’s at each step
– Robust to noisy data

31
Restriction bias vs. Preference bias
• Restriction bias (or Language bias)
– Incomplete hypothesis space
• Preference (or search) bias
– Incomplete search strategy
• Candidate Elimination has restriction bias
• ID3 has preference bias
• In most cases, we have both a restriction and a
preference bias.

32
Inductive Bias in ID3

• Preference for short trees, and for those with high

information gain attributes near the root
• Principle of Occam's razor
– prefer the shortest hypothesis that fits the data
• Justification
– Smaller likelihood of a short hypothesis fitting the data
at random
• Problems
– Other ways to reduce random fits to data
– Size of hypothesis based on the data representation
 Minimum description length principle

33
Overfitting the Data
• Learning a tree that classifies the training data perfectly may
not lead to the tree with the best generalization performance.
- There may be noise in the training data the tree is fitting
- The algorithm might be making decisions based on
very little data
• A hypothesis h is said to overfit the training data if the is
another hypothesis, h’, such that h has smaller error than h’
on the training data but h has larger error on the test data than h’.

On training

accuracy On testing

Complexity of tree 34
Overfitting
Attribute 2

A very deep tree required

+ + + + - - - - + + + + To fit just one odd training
+ + - + - - - - + + + + example
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
35
When to stop splitting further?
Attribute 2

A very deep tree required

Attribute 1
36
Overfitting in Decision Trees
• Consider adding noisy training example (should be +):
Day Outlook Temp Humidity Wind Tennis?
D15 Sunny Hot Normal Strong No

• What effect on earlier tree?

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

37
Overfitting - Example

Noise or other Outlook

coincidental regularities

Sunny Overcast Rain

1,2,8,9,11 3,7,12,13 4,5,6,10,14
2+,3- 4+,0- 3+,2-
Humidity Yes Wind

High Normal Strong Weak

No Wind No Yes

Strong Weak
No Yes 38
Avoiding Overfitting

• Two basic approaches

- Prepruning: Stop growing the tree at some point during
construction when it is determined that there is not enough
data to make reliable choices.
- Postpruning: Grow the full tree and then remove nodes
that seem not to have sufficient evidence. (more popular)
• Methods for evaluating subtrees to prune:
- Cross-validation: Reserve hold-out set to evaluate utility (more popular)
- Statistical testing: Test if the observed regularity can be
dismissed as likely to be occur by chance
- Minimum Description Length: Is the additional complexity of
the hypothesis smaller than remembering the exceptions ?
This is related to the notion of regularization that we will see
in other contexts– keep the hypothesis simple.

39
Reduced-Error Pruning
• A post-pruning, cross validation approach
- Partition training data into “grow” set and “validation” set.
- Build a complete tree for the “grow” data
- Until accuracy on validation set decreases, do:
For each non-leaf node in the tree
Temporarily prune the tree below; replace it by majority vote.
Test the accuracy of the hypothesis on the validation set
Permanently prune the node with the greatest increase
in accuracy on the validation test.
• Problem: Uses less data to construct the tree
• Sometimes done at the rules level

General Strategy: Overfit and Simplify

40
Rule post-pruning

• Allow tree to grow until best fit (allow overfitting)

• Convert tree to equivalent set of rules
– One rule per leaf node
– Prune each rule independently of others
 Remove various preconditions to improve
performance
– Sort final rules into desired sequence for use

41
Example of rule post pruning
• IF (Outlook = Sunny) ^ (Humidity = High)
– THEN PlayTennis = No
• IF (Outlook = Sunny) ^ (Humidity = Normal)
– THEN PlayTennis = Yes

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

42
Extensions of basic algorithm

• Continuous valued attributes

• Attributes with many values
• TE’s with missing data
• Attributes with associated costs
• Other impurity measures
• Regression tree

43
Continuous Valued Attributes
• Create a discrete attribute from continuous variables
– E.g., define critical Temperature = 82.5
• Candidate thresholds
– chosen by gain function
– can have more than one threshold
– typically where values change quickly
(48+60)/2 (80+90)/2

Temp 40 48 60 72 80 90
Tennis? N N Y Y Y N

44
Attributes with Many Values
• Problem:
– If attribute has many values, Gain will select it (why?)
– E.g. of birthdates attribute
 365 possible values

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

infinity
 ni/N -> 0

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

45
Attributes with many values
• Problem: Gain will select attribute with many values
• One approach: use GainRatio instead

Gain( S , A)
GainRatio( S , A) 
SplitInformation( S , A) Entropy of the
partitioning
c Si Si
SplitInformation( S , A)   log 2 Penalizes
higher number
i 1 S S of partitions

where Si is the subset of S for which A has value vi

(example of Si/S = 1/N: SplitInformation = log N)
46
Unknown Attribute Values

• What if some examples are missing values of attribute A?

• Use training example anyway, sort through tree
– if node n tests A, assign most common value of A among other
examples sorted to node n
– assign most common value of A among other examples with same
target value
– assign probability pi to each possible value vi of A
 assign fraction pi of example to each descendant in tree
• Classify test instances with missing values in same fashion
• Used in C4.5

47
Attributes with Costs
• Consider
– medical diagnosis: BloodTest has cost $150, Pulse has a cost of $5.
– robotics, Width-From-1ft has cost 23 sec., from 2 ft 10s.
• How to learn a consistent tree with low expected cost?
• Replace gain by
– Tan and Schlimmer (1990)

Gain 2 ( S , A)
Cost ( A)
– Nunez (1988)

2Gain ( S , A)  1
)  1) of cost
(Cost ( Aimportance
where  determines

48
Gini Index
• Another sensible measure of impurity
(i and j are classes)

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Gini Index

. .

. .
. .

Attributes: color, border, dot

Classification: triangle, square

50
. . .
.
. .
. .
red

Color?
Gini Index for Color

green

.
yellow .

.
.

51
Gain of Gini Index

52
Three Impurity Measures

A Gain(A) GainRatio(A) GiniGain(A)

Color 0.247 0.156 0.058

Outline 0.152 0.152 0.046
Dot 0.048 0.049 0.015

53
Regression Tree
• Similar to classification
• Use a set of attributes to predict the value (instead
of a class label)
• Instead of computing information gain, compute
the sum of squared errors
• Partition the attribute space into a set of
rectangular subspaces, each with its own predictor
– The simplest predictor is a constant value

54
Rectilinear Division
• A regression tree is a piecewise constant function of the
input attributes
X2
X1 t1
r5
r2
X2  t2 X1  t3
r3
t2 r4
r1
r1 r2 r3 X2  t4

t1 t3 X1
r4 r5

55
Growing Regression Trees

• To minimize the square error on the learning sample,

the prediction at a leaf is the average output of the
learning cases reaching that leaf
• Impurity of a sample is defined by the variance of the
output in that sample:
I(LS)=vary|LS{y}=Ey|LS{(y-Ey|LS{y})2}

• The best split is the one that reduces the most variance:

| LS a |
I ( LS , A)  vary|LS { y}   vary|LS a { y}
a | LS |

56
Regression Tree Pruning
• Exactly the same algorithms apply: pre-pruning
and post-pruning.
• In post-pruning, the tree that minimizes the
squared error on VS is selected.
• In practice, pruning is more important in
regression because full trees are much more
complex (often all objects have a different output
values and hence the full tree has as many leaves
as there are objects in the learning sample)

57
When Are Decision Trees Useful ?
• Advantages
– Very fast: can handle very large datasets with many
attributes
– Flexible: several attribute types, classification and
regression problems, missing values…
– Interpretability: provide rules and attribute importance
• Disadvantages
– Instability of the trees (high variance)
– Not always competitive with other algorithms in terms
of accuracy

58
History of Decision Tree Research
• Hunt and colleagues in Psychology used full search decision
trees methods to model human concept learning in the 60’s

• Quinlan developed ID3, with the information gain heuristics

in the late 70’s to learn expert systems from examples

• Breiman, Friedmans and colleagues in statistics developed

CART (classification and regression trees simultaneously

• A variety of improvements in the 80’s: coping with noise,

continuous attributes, missing data, non-axis parallel etc.

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

59
Summary
• Decision trees are practical for concept learning
• Basic information measure and gain function for best first
search of space of DTs
• ID3 procedure
– search space is complete
– Preference for shorter trees
• Overfitting is an important issue with various solutions
• Many variations and extensions possible

60
Software
• In R:
– Packages tree and rpart
• C4.5:
– http://www.cse.unwe.edu.au/~quinlan
• Weka
– http://www.cs.waikato.ac.nz/ml/weka

Unit 3 Decision Trees-3
No ratings yet
Unit 3 Decision Trees-3
70 pages
Decision Tree
No ratings yet
Decision Tree
54 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
Sra4 Installation Guide
No ratings yet
Sra4 Installation Guide
26 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
M2 Decision Trees
No ratings yet
M2 Decision Trees
37 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Trees
No ratings yet
Trees
78 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Programming: Just Basic Tutorials
67% (3)
Programming: Just Basic Tutorials
360 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Week 2 Lecture Notes
No ratings yet
Week 2 Lecture Notes
61 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
School Management System
No ratings yet
School Management System
62 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Maximo Overview Basic
No ratings yet
Maximo Overview Basic
25 pages
Sap Time Management Holiday and Virriant Configuration Final PDF
No ratings yet
Sap Time Management Holiday and Virriant Configuration Final PDF
47 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Starcoder 2
No ratings yet
Starcoder 2
61 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Service Level Agreement Template
No ratings yet
Service Level Agreement Template
3 pages
Lesson 13: Input, Output, Storage Device: Ailene Sibayan - Bobier Faculty, Computer Studies Department
No ratings yet
Lesson 13: Input, Output, Storage Device: Ailene Sibayan - Bobier Faculty, Computer Studies Department
19 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Oracle - Overview of Oracle Spatial
No ratings yet
Oracle - Overview of Oracle Spatial
20 pages
Automotive ECU SW Function Development Chart Template
100% (1)
Automotive ECU SW Function Development Chart Template
21 pages
Ai Copywriting Module For Youtube (Description Version)
No ratings yet
Ai Copywriting Module For Youtube (Description Version)
1 page
Report 2.0
No ratings yet
Report 2.0
28 pages
Microblogging Platforms Explained
No ratings yet
Microblogging Platforms Explained
4 pages
How To Fix: Print Operation Failed Error 0x00000006
No ratings yet
How To Fix: Print Operation Failed Error 0x00000006
11 pages
Ventor Quick Start Guide v2
No ratings yet
Ventor Quick Start Guide v2
16 pages
Visual Communication Portfolio
No ratings yet
Visual Communication Portfolio
45 pages
COMP246-016 - Fridge Management System - Parts A, B, & C
No ratings yet
COMP246-016 - Fridge Management System - Parts A, B, & C
56 pages
Exhibit A
No ratings yet
Exhibit A
8 pages
Kendriya Vidyalaya Painavu, Idukki: Class Xii - Term 2 Lab Record
No ratings yet
Kendriya Vidyalaya Painavu, Idukki: Class Xii - Term 2 Lab Record
38 pages
Flowchart Basics
No ratings yet
Flowchart Basics
20 pages
University Research Graph Database
No ratings yet
University Research Graph Database
5 pages
RC Beam Scheduler
No ratings yet
RC Beam Scheduler
8 pages
Dictionaries: Erin Keith
No ratings yet
Dictionaries: Erin Keith
22 pages
MCA Projects
No ratings yet
MCA Projects
6 pages
Complete System Design Roadmap - Aman
No ratings yet
Complete System Design Roadmap - Aman
3 pages
Brochure Rosemount Level Measurement Solutions en 76356
No ratings yet
Brochure Rosemount Level Measurement Solutions en 76356
20 pages
Numetry Technologies Campus Recruitment - 2025 Passing Out Batch
No ratings yet
Numetry Technologies Campus Recruitment - 2025 Passing Out Batch
1 page
Bank of Success
No ratings yet
Bank of Success
3 pages
Zied Kanoun Resume v2
No ratings yet
Zied Kanoun Resume v2
1 page
TECHNICAL TERMS Page 2
No ratings yet
TECHNICAL TERMS Page 2
2 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Decision Trees

• Decision trees: disjunction of conjunction of attributes

high normal strong weak

Day Outlook Temp Humidity Wind Tennis?

high normal strong weak

 Example of representing rule in DT’s:

 Instances describable by a fixed set of attributes and their values

+ + + + - - - - + + + + Draw axis parallel

Draw axis parallel

+ + + + - - - - + + + + Instances of each class

Decision nodes (splits)

 Start with empty tree

Attribute 1 > 40? Attribute 1

Attribute 1 > 40? Attribute 1

+ + + + - - - - + + + + Already pure leaf

Attribute 1 > 20? Attribute 1

Already pure leaf

25, 5 4, 30 15, 19 14, 16

21, 5 8, 30 18, 33 11, 2

 Entropy can also be viewed as measuring

Day Outlook Temp Humidity Wind Tennis?

High Low Weak Strong

3+, 4 6+, 1 6+, 2 3+, 3

Gain (S, Humidity) GainS, Wind

Gain (S, Outlook) Gain (S, Temp)

Sunny Overcast Rain

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D15

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

• Perform a greedy heuristic search: hill climbing without

• Statistics-based decisions using all data

• Preference for short trees, and for those with high

A very deep tree required

A very deep tree required

• What effect on earlier tree?

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

Noise or other Outlook

Sunny Overcast Rain

High Normal Strong Weak

• Two basic approaches

General Strategy: Overfit and Simplify

• Allow tree to grow until best fit (allow overfitting)

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

• Continuous valued attributes

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

where Si is the subset of S for which A has value vi

• What if some examples are missing values of attribute A?

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Attributes: color, border, dot

A Gain(A) GainRatio(A) GiniGain(A)

Color 0.247 0.156 0.058

• To minimize the square error on the learning sample,

• Quinlan developed ID3, with the information gain heuristics

• Breiman, Friedmans and colleagues in statistics developed

• A variety of improvements in the 80’s: coping with noise,

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

You might also like