0% found this document useful (0 votes)

5 views70 pages

Unit 3 Decision Trees-3

The document discusses decision trees as a powerful method for concept learning, representing hypotheses as disjunctions of conjunctions of attributes. It outlines the structure of decision trees, including nodes, branches, and leaf nodes, and explains the process of constructing these trees through a top-down approach that aims to reduce impurity in the data. Additionally, it covers the concepts of entropy and information gain as metrics for selecting the best attributes to split on during the tree construction.

Uploaded by

Deepanshu Tyagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views70 pages

Unit 3 Decision Trees-3

Uploaded by

Deepanshu Tyagi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Decision Trees

1
Representation of Concepts
 Concept learning: conjunction of attributes
 (Sunny AND Hot AND Humid AND Windy) +

• Decision trees: disjunction of conjunction of attributes

• (Sunny AND Normal) OR (Overcast) OR (Rain AND Weak) +
• More powerful representation
OutdoorSport
• Larger hypothesis space H
• Can be represented as a tree
sunny overcast rain
• Common form of decision
making in humans Humidity Yes Wind

high normal strong weak

No Yes No Yes
2
Rectangle learning….
-----------
-- +++++++ --
-- -- Conjunctions
-- ++++++
-- (single rectangle)
-- --
-----------

-----------
--
----------- +++++++ --
----------- ++++++ --
-- Disjunctions of Conjunctions
-- -- +++++++ --
-- (union of rectangles)
-- -- --
-- ++++++ --
-- --
-- --
-- --
--
-----------

3
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Trees
• Decision tree to represent learned target functions
– Each internal node tests an attribute
– Each branch corresponds to attribute value
– Each leaf node assigns a classification
Outlook

• Can be represented
sunny overcast rain
by logical formulas
Humidity Yes Wind

high normal strong weak

No Yes No Yes

5
Representation in decision trees

 Example of representing rule in DT’s:

if outlook = sunny AND humidity = normal
OR
if outlook = overcast
OR
if outlook = rain AND wind = weak
then playtennis

6
Applications of Decision Trees

 Instances describable by a fixed set of attributes and their values

 Target function is discrete valued
– 2-valued
– N-valued
– But can approximate continuous functions
 Disjunctive hypothesis space
 Possibly noisy training data
– Errors, missing values, …
 Examples:
– Equipment or medical diagnosis
– Credit risk analysis
– Calendar scheduling preferences

7
Decision Trees

Given distribution
Of training instances
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + + Draw axis parallel
+ + + + - - - - + + + + Lines to separate the
- - - - Instances of each class
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
8
Decision Tree Structure

Draw axis parallel

Lines to separate the
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + + Instances of each class
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
9
Decision Tree Structure
Decision leaf
* Alternate splits possible
Attribute 2

+ + + + - - - - + + + +
Decision node
+ + + + - - - - + + + +
+ + + + - - - - + + + + = condition
30 - - - - = box
- - - - - - - - + + + + = collection of satisfying
- - - - - - - - + + + + examples
- - - - - - - - + + + +
- - - - - - - -

Decision nodes (splits)

20 40 Attribute 1
10
Decision Tree Construction
• Find the best structure
• Given a training data set

11
Top-Down Construction

 Start with empty tree

 Main loop:
1. Split the “best” decision attribute (A) for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
4. Sort training examples to leaf nodes
5. If training examples perfectly classified, STOP,
Else iterate over new leaf nodes
 Grow tree just deep enough for perfect classification
– If possible (or can approximate at chosen depth)
 Which attribute is best?

12
Best attribute to split?
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
13
Best attribute to split?
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 40? Attribute 1

14
Best attribute to split?
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 40? Attribute 1

15
Which split to make next?

Pure box/node
Mixed box/node
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + + Already pure leaf
+ + + + - - - - + + + + No further need to split
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1 > 20? Attribute 1

16
Which split to make next?

Already pure leaf

No further need to split
Attribute 2

+ + + + - - - - + + + +
+ + + + - - - - + + + +
+ + + + - - - - + + + +
A2 > 30? - - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
17
Principle of Decision Tree Construction
• Finally we want to form pure leaves
– Correct classification
• Greedy approach to reach correct classification
1. Initially treat the entire data set as a single box
2. For each box choose the spilt that reduces its impurity (in terms
of class labels) by the maximum amount
3. Split the box having highest reduction in impurity
4. Continue to Step 2
5. Stop when all boxes are pure

18
Choosing Best Attribute?
• Consider  examples, + and −
• Which one is better?
29+, 35− A1 29+, 35− A2
t f t f

25+, 5− 4+, 30− 15+, 19− 14+, 16−

• Which is better?
29+, 35− A1 29+, 35− A2
t f t f

21+, 5− 8+, 30− 18+, 33− 11+, 2−

19
Entropy

• A measure for
– uncertainty
– purity
– information content
• Information theory: optimal length code assigns (− logp) bits to
message having probability p
• S is a sample of training examples
– p+ is the proportion of positive examples in S
– p− is the proportion of negative examples in S
• Entropy of S: average optimal number of bits to encode information
about certainty/uncertainty about S
Entropy(S) = p+(−logp+) + p−(−logp−) = −p+logp+− p−logp−
• Can be generalized to more than two values
20
Entropy

 Entropy can also be viewed as measuring

– purity of S,
– uncertainty in S,
– information in S, …
 E.g.: values of entropy for p+=1, p+=0, p+=.5
 Easy generalization to more than binary values
– Sum over pi *(-log2 pi) , i=1,n
❖ i is + or – for binary
❖ i varies from 1 to n in the general case

21
Choosing Best Attribute?
• Consider  examples (+,−) and compute entropies:
• Which one is better?
E(S)=0.993
29+, 35− A1 E(S)=0.993 29+, 35− A2
t f t f
0.650 0.522 0.989 0.997
25+, 5− 4+, 30− 15+, 19− 14+, 16−

• Which is better?
E(S)=0.993 E(S)=0.993
29+, 35− A1 29+, 35− A2
t f t f
0.708 0.742 0.937 0.619
21+, 5− 8+, 30− 18+, 33− 11+, 2−

22
Information Gain
• Gain(S,A): reduction in entropy after choosing attr. A
𝑆𝑣
Gain 𝑆, 𝐴 = Entropy 𝑆 − ෍ Entropy 𝑆𝑣
𝑆
𝑣∈Values 𝐴

E(S)=0.993
29+, 35− A1 E(S)=0.993 29+, 35− A2
t f t f
0.650 0.522 0.989 0.997
25+, 5− 4+, 30− 15+, 19− 14+, 16−
Gain: 0.395 Gain: 0.000

E(S)=0.993 E(S)=0.993
29+, 35− A1 29+, 35− A2
t f t f
0.708 0.742 0.937 0.619
21+, 5− 8+, 30− 18+, 33− 11+, 2−
Gain: 0.265 Gain: 0.121 23
Gain function
 Gain is measure of how much can
– Reduce uncertainty
❖ Value lies between 0,1
❖ What is significance of
➢ gain of 0?
▪ example where have 50/50 split of +/- both before and after
discriminating on attributes values
➢ gain of 1?
▪ Example of going from “perfect uncertainty” to perfect certainty
after splitting example with predictive attribute
– Find “patterns” in TE’s relating to attribute values
❖ Move to locally minimal representation of TE’s

24
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
0.0289
Sort the Training Examples
9+, 5− D1,…,D14

Outlook

Sunny Overcast Rain

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D15

2+, 3− 4+, 0− 3+, 2−

? Yes ?

Ssunny= {D1,D2,D8,D9,D11}
Gain (Ssunny, Humidity) = .970
Gain (Ssunny, Temp) = .570
Gain (Ssunny, Wind) = .019 31
Final Decision Tree for Example

Outlook

Sunny Rain
Overcast

Humidity
Yes Wind
High
Normal Strong Weak

No Yes No
Yes

37
Hypothesis Space Search (ID3)
• Hypothesis space (all possible trees) is complete!
– Target function is included in there

38
Hypothesis Space Search in Decision Trees
• Conduct a search of the space of decision trees which
can represent all possible discrete functions.

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

is NP-hard.

• Perform a greedy heuristic search: hill climbing without

backtracking

• Statistics-based decisions using all data

39
Hypothesis Space Search by ID3
• Hypothesis space is complete!
– H is space of all finite DT’s (all discrete functions)
– Target function is included in there
• Simple to complex hill-climbing search of H
– Use of gain as hill-climbing function
• Outputs a single hypothesis (which one?)
– Cannot assess all hypotheses consistent with D (usually many)
– Analogy to breadth first search
 Examines all trees of given depth and chooses best…
• No backtracking
– Locally optimal ...
• Statistics-based search choices
– Use all TE’s at each step
– Robust to noisy data

40
Restriction bias vs. Preference bias
• Restriction bias (or Language bias)
– Incomplete hypothesis space
• Preference (or search) bias
– Incomplete search strategy
• Candidate Elimination has restriction bias
• ID3 has preference bias
• In most cases, we have both a restriction and a
preference bias.

41
Inductive Bias in ID3

• Preference for short trees, and for those with high

information gain attributes near the root
• Principle of Occam's razor
– prefer the shortest hypothesis that fits the data
• Justification
– Smaller likelihood of a short hypothesis fitting the data
at random
• Problems
– Other ways to reduce random fits to data
– Size of hypothesis based on the data representation
 Minimum description length principle

42
Overfitting the Data
• Learning a tree that classifies the training data perfectly may
not lead to the tree with the best generalization performance.
- There may be noise in the training data the tree is fitting
- The algorithm might be making decisions based on
very little data
• A hypothesis h is said to overfit the training data if the is
another hypothesis, h’, such that h has smaller error than h’
on the training data but h has larger error on the test data than h’.

On training

accuracy On testing

Complexity of tree 43
Overfitting
Attribute 2

A very deep tree required

+ + + + - - - - + + + + To fit just one odd training
+ + - + - - - - + + + + example
+ + + + - - - - + + + +
- - - -
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - - + + + +
- - - - - - - -

Attribute 1
44
When to stop splitting further?
Attribute 2

A very deep tree required

Attribute 1
45
Overfitting in Decision Trees
• Consider adding noisy training example (should be +):
Day Outlook Temp Humidity Wind Tennis?
D15 Sunny Hot Normal Strong No

• What effect on earlier tree?

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

46
Overfitting - Example

Noise or other Outlook

coincidental regularities

Sunny Overcast Rain

1,2,8,9,11 3,7,12,13 4,5,6,10,14
2+,3- 4+,0- 3+,2-
Humidity Yes Wind

High Normal Strong Weak

No Wind No Yes

Strong Weak
No Yes 47
Avoiding Overfitting

• Two basic approaches

- Prepruning: Stop growing the tree at some point during
construction when it is determined that there is not enough
data to make reliable choices.
- Postpruning: Grow the full tree and then remove nodes
that seem not to have sufficient evidence. (more popular)
• Methods for evaluating subtrees to prune:
- Cross-validation: Reserve hold-out set to evaluate utility (more popular)
- Statistical testing: Test if the observed regularity can be
dismissed as likely to be occur by chance
- Minimum Description Length: Is the additional complexity of
the hypothesis smaller than remembering the exceptions ?
This is related to the notion of regularization that we will see
in other contexts– keep the hypothesis simple.

48
Reduced-Error Pruning
• A post-pruning, cross validation approach
- Partition training data into “grow” set and “validation” set.
- Build a complete tree for the “grow” data
- Until accuracy on validation set decreases, do:
For each non-leaf node in the tree
Temporarily prune the tree below; replace it by majority vote.
Test the accuracy of the hypothesis on the validation set
Permanently prune the node with the greatest increase
in accuracy on the validation test.
• Problem: Uses less data to construct the tree
• Sometimes done at the rules level

General Strategy: Overfit and Simplify

49
Rule post-pruning

• Allow tree to grow until best fit (allow overfitting)

• Convert tree to equivalent set of rules
– One rule per leaf node
– Prune each rule independently of others
 Remove various preconditions to improve
performance
– Sort final rules into desired sequence for use

50
Example of rule post pruning
• IF (Outlook = Sunny) ^ (Humidity = High)
– THEN PlayTennis = No
• IF (Outlook = Sunny) ^ (Humidity = Normal)
– THEN PlayTennis = Yes

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

51
Extensions of basic algorithm

• Continuous valued attributes

• Attributes with many values
• TE’s with missing data
• Attributes with associated costs
• Other impurity measures
• Regression tree

52
Continuous Valued Attributes
• Create a discrete attribute from continuous variables
– E.g., define critical Temperature = 82.5
• Candidate thresholds
– chosen by gain function
– can have more than one threshold
– typically where values change quickly
(48+60)/2 (80+90)/2

Temp 40 48 60 72 80 90
Tennis? N N Y Y Y N

53
Attributes with Many Values
• Problem:
– If attribute has many values, Gain will select it (why?)
– E.g. of birthdates attribute
 365 possible values

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

infinity
 ni/N -> 0

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

54
Attributes with many values
• Problem: Gain will select attribute with many values
• One approach: use GainRatio instead

Gain 𝑆, 𝐴
GainRatio 𝑆, 𝐴 =
SplitInformation 𝑆, 𝐴
Entropy of the
partitioning
𝑐
𝑆𝑖 𝑆𝑖 Penalizes
SplitInformation 𝑆, 𝐴 = − ෍ log2
𝑆 𝑆 higher number
𝑖=1
of partitions

where Si is the subset of S for which A has value vi

(example of Si/S = 1/N: SplitInformation = log N)
55
Unknown Attribute Values

• What if some examples are missing values of attribute A?

• Use training example anyway, sort through tree
– if node n tests A, assign most common value of A among other
examples sorted to node n
– assign most common value of A among other examples with same
target value
– assign probability pi to each possible value vi of A
 assign fraction pi of example to each descendant in tree
• Classify test instances with missing values in same fashion
• Used in C4.5

56
Attributes with Costs
• Consider
– medical diagnosis: BloodTest has cost $150, Pulse has a cost of $5.
– robotics, Width-From-1ft has cost 23 sec., from 2 ft 10s.
• How to learn a consistent tree with low expected cost?
• Replace gain by
– Tan and Schlimmer (1990)

Gain2 𝑆, 𝐴
Cost 𝐴

– Nunez (1988)
2Gain 𝑆,𝐴
−1

where     determines importance of cost

57
Gini Index
• Another sensible measure of impurity
(i and j are classes)

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Gini Index

. .

. .
. .

Attributes: color, border, dot

Classification: triangle, square

59
. .
. .
. .
. .
red
Gini Index for Color

Color? green

.
yellow .

.
.

60
Gain of Gini Index

61
Regression Tree
• Similar to classification
• Use a set of attributes to predict the value (instead
of a class label)
• Instead of computing information gain, compute
the sum of squared errors
• Partition the attribute space into a set of
rectangular subspaces, each with its own predictor
– The simplest predictor is a constant value

63
Rectilinear Division
• A regression tree is a piecewise constant function of the
input attributes
X2
X1 t1
r5
r2
X2  t2 X1  t3
r3
t2 r4
r1
r1 r2 r3 X2  t4

t1 t3 X1
r4 r5

64
Growing Regression Trees

• To minimize the square error on the learning sample,

the prediction at a leaf is the average output of the
learning cases reaching that leaf
• Impurity of a sample is defined by the variance of the
output in that sample:
I(LS)=vary|LS{y}=Ey|LS{(y-Ey|LS{y})2}

• The best split is the one that reduces the most variance:

I ( LS , A) = vary|LS { y} − 
| LS a |
vary|LS a { y}
a | LS |

65
Regression Tree Pruning
• Exactly the same algorithms apply: pre-pruning
and post-pruning.
• In post-pruning, the tree that minimizes the
squared error on VS is selected.
• In practice, pruning is more important in
regression because full trees are much more
complex (often all objects have a different output
values and hence the full tree has as many leaves
as there are objects in the learning sample)

66
When Are Decision Trees Useful ?
• Advantages
– Very fast: can handle very large datasets with many
attributes
– Flexible: several attribute types, classification and
regression problems, missing values…
– Interpretability: provide rules and attribute importance
• Disadvantages
– Instability of the trees (high variance)
– Not always competitive with other algorithms in terms
of accuracy

67
History of Decision Tree Research
• Hunt and colleagues in Psychology used full search decision
trees methods to model human concept learning in the 60’s

• Quinlan developed ID3, with the information gain heuristics

in the late 70’s to learn expert systems from examples

• Breiman, Friedmans and colleagues in statistics developed

CART (classification and regression trees simultaneously

• A variety of improvements in the 80’s: coping with noise,

continuous attributes, missing data, non-axis parallel etc.

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

68
Summary
• Decision trees are practical for concept learning
• Basic information measure and gain function for best first
search of space of DTs
• ID3 procedure
– search space is complete
– Preference for shorter trees
• Overfitting is an important issue with various solutions
• Many variations and extensions possible

69
Software
• In R:
– Packages tree and rpart
• C4.5:
– http://www.cse.unwe.edu.au/~quinlan
• Weka
– http://www.cs.waikato.ac.nz/ml/weka

Level 2 Elementary Myperfectice
No ratings yet
Level 2 Elementary Myperfectice
101 pages
Conic Sections Circle: X H (H, K)
No ratings yet
Conic Sections Circle: X H (H, K)
4 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Golden Ratio
No ratings yet
Golden Ratio
49 pages
Primary 2
100% (1)
Primary 2
11 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Trees
100% (1)
Decision Trees
61 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Vtu Notes
No ratings yet
Vtu Notes
91 pages
Range of Problem Solving (DP) : in The Context of Both Graduate Attributes and Professional Competencies
No ratings yet
Range of Problem Solving (DP) : in The Context of Both Graduate Attributes and Professional Competencies
3 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Q2-Translating English Phrases To Mathematical Phrases
No ratings yet
Q2-Translating English Phrases To Mathematical Phrases
6 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
STP Phase 5 Paper 2
No ratings yet
STP Phase 5 Paper 2
29 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Data Visualization With Mathematica - No 3D Rasterization
100% (1)
Data Visualization With Mathematica - No 3D Rasterization
41 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
DAA Unit - I
No ratings yet
DAA Unit - I
83 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Rudder Incorporated Winglet Design (PDFDrive)
No ratings yet
Rudder Incorporated Winglet Design (PDFDrive)
105 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Trees
No ratings yet
Trees
78 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Mech LND 2019r2 en Le07
No ratings yet
Mech LND 2019r2 en Le07
67 pages
Assessment and Service Life Updating of Existing Tunnels: Safe & Reliable Tunnels. Innovative European Achievements
No ratings yet
Assessment and Service Life Updating of Existing Tunnels: Safe & Reliable Tunnels. Innovative European Achievements
10 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Week 2 Lecture Notes
No ratings yet
Week 2 Lecture Notes
61 pages
Consolidated 1st Quarterly Test Results
No ratings yet
Consolidated 1st Quarterly Test Results
39 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Classes
No ratings yet
Classes
49 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
COI Unit-1
No ratings yet
COI Unit-1
64 pages
Decision Tree
No ratings yet
Decision Tree
54 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Geo-Registration of Satellite Images
No ratings yet
Geo-Registration of Satellite Images
43 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
W9S1S2 - First Order Logic (Formula, Model, Tableaux)
No ratings yet
W9S1S2 - First Order Logic (Formula, Model, Tableaux)
40 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
Unit 1-1
No ratings yet
Unit 1-1
40 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
M2 Decision Trees
No ratings yet
M2 Decision Trees
37 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Realestate Quiz Part1
No ratings yet
Realestate Quiz Part1
27 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Are You Ready?: Nowiwillaskyou4 Questions About This Square
No ratings yet
Are You Ready?: Nowiwillaskyou4 Questions About This Square
24 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Circle Geometry Notes
No ratings yet
Circle Geometry Notes
2 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Excreption Handling JavaApplets
No ratings yet
Excreption Handling JavaApplets
19 pages
Shell Programming
No ratings yet
Shell Programming
50 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Unit 4
No ratings yet
Unit 4
17 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Unit 2 Bayesian Learning Bayes Theorem and Bayes Optimal Classifier
No ratings yet
Unit 2 Bayesian Learning Bayes Theorem and Bayes Optimal Classifier
19 pages
Pue 1 - 2020 21
No ratings yet
Pue 1 - 2020 21
14 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
List of SP10 With Failing Grades
No ratings yet
List of SP10 With Failing Grades
2 pages
Unit4. SOM
No ratings yet
Unit4. SOM
15 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Lec 1
No ratings yet
Lec 1
13 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
9 pages
Algorithm Analysis GATE Questions SET 1
No ratings yet
Algorithm Analysis GATE Questions SET 1
9 pages
9.4 The Simplex Method: Minimization: X W X X
No ratings yet
9.4 The Simplex Method: Minimization: X W X X
11 pages
DAX Functions For Reference
No ratings yet
DAX Functions For Reference
8 pages
The Relationship Between Market Sentiment and Stoc
No ratings yet
The Relationship Between Market Sentiment and Stoc
5 pages
Classification
No ratings yet
Classification
8 pages
B.tech V Kcs-501 Unit1 8
No ratings yet
B.tech V Kcs-501 Unit1 8
4 pages
B.tech V Kcs-501 Unit1 9
No ratings yet
B.tech V Kcs-501 Unit1 9
5 pages
SQL Query Example-Solution
No ratings yet
SQL Query Example-Solution
4 pages
WT Assignment 1
No ratings yet
WT Assignment 1
3 pages
Video Rental Inventory System
No ratings yet
Video Rental Inventory System
4 pages
Quiz Asm 1
No ratings yet
Quiz Asm 1
3 pages
OS Assignment 2
No ratings yet
OS Assignment 2
4 pages
Pue 2 - 2020 21
No ratings yet
Pue 2 - 2020 21
2 pages
Web Technology Kit501
No ratings yet
Web Technology Kit501
2 pages
Decision Table Example
No ratings yet
Decision Table Example
2 pages
Teori Idw Dari Arcgis
No ratings yet
Teori Idw Dari Arcgis
2 pages
Brainy Kl8 Short Tests Welcome Unit Step 3
No ratings yet
Brainy Kl8 Short Tests Welcome Unit Step 3
1 page
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet

Unit 3 Decision Trees-3

Uploaded by

Unit 3 Decision Trees-3

Uploaded by

Decision Trees

• Decision trees: disjunction of conjunction of attributes

high normal strong weak

Day Outlook Temp Humidity Wind Tennis?

high normal strong weak

 Example of representing rule in DT’s:

 Instances describable by a fixed set of attributes and their values

Draw axis parallel

Decision nodes (splits)

 Start with empty tree

Attribute 1 > 40? Attribute 1

Attribute 1 > 40? Attribute 1

Attribute 1 > 20? Attribute 1

Already pure leaf

25+, 5− 4+, 30− 15+, 19− 14+, 16−

21+, 5− 8+, 30− 18+, 33− 11+, 2−

 Entropy can also be viewed as measuring

Day Outlook Temp Humidity Wind Tennis?

Sunny Overcast Rain

D1,D2,D8,D9,D11 D3,D7,D12,D13 D4,D5,D6,D10,D15

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

• Perform a greedy heuristic search: hill climbing without

• Statistics-based decisions using all data

• Preference for short trees, and for those with high

A very deep tree required

A very deep tree required

• What effect on earlier tree?

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

Noise or other Outlook

Sunny Overcast Rain

High Normal Strong Weak

• Two basic approaches

General Strategy: Overfit and Simplify

• Allow tree to grow until best fit (allow overfitting)

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

• Continuous valued attributes

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

where Si is the subset of S for which A has value vi

• What if some examples are missing values of attribute A?

where     determines importance of cost

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Attributes: color, border, dot

• To minimize the square error on the learning sample,

• Quinlan developed ID3, with the information gain heuristics

• Breiman, Friedmans and colleagues in statistics developed

• A variety of improvements in the 80’s: coping with noise,

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

You might also like