0% found this document useful (0 votes)

14 views35 pages

Lecture 12 - Decision and Regression Trees

The document discusses decision and regression trees as methods in applied machine learning, highlighting their training processes, splitting criteria, and applications. It explains how decision trees recursively choose attributes to separate classes and how regression trees minimize prediction errors for continuous values. Key concepts include entropy, information gain, and the advantages of using ensembles for improved performance.

Uploaded by

subalaxminb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views35 pages

Lecture 12 - Decision and Regression Trees

Uploaded by

subalaxminb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Decision and

Regression
Trees

Applied Machine Learning

Derek Hoiem

Dall-E: A dirt road splits around a large gnarly

tree, fractal art
Recap of classification and regression
• Nearest neighbor is widely used
– Super-powers: can instantly learn new classes and predict from one or many examples

• Naïve Bayes represents a common assumption as part of density estimation, more typical as
part of an approach rather than the final predictor
– Super-powers: Fast estimation from lots of data; not terrible estimation from limited data

• Logistic Regression is widely used

– Super-powers: Effective prediction from high-dimensional features; good confidence estimates

• Linear Regression is widely used

– Super-powers: Can extrapolate, explain relationships, and predict continuous values from many
variables

• Almost all algorithms involve nearest neighbor, logistic regression, or linear regression
– The main learning challenge is typically feature learning
• So far, we’ve seen two main
choices for how to use features x
1. Nearest neighbor uses all the x x x
o
features jointly to find similar o o x
examples o o o
o x
2. Linear models make predictions x2 x x x
out of weighted sums of the
x1
features
• If you wanted to give someone a If x2 < 0.6 and x2 > 0.2 and x2 < 0.7, ‘o’

rule to split the ‘o’ from the ‘x’, Else ‘x’

what other idea might you try?

Can we learn these kinds of rules automatically?
Decision trees
• Training: Iteratively choose the attribute and split value that
best separates the classes for the data in the current node
• Combines feature selection/modeling with prediction

Fig Credit: Zemel, Urtasun, Fidler

Decision Tree Classification

Slide Credit: Zemel, Urtasun, Fidler

Example with discrete inputs

Slide Credit: Zemel, Urtasun, Fidler

Example with discrete inputs

Figure Source: Zemel, Urtasun, Fidler

Decision Trees

Figure Source: Zemel, Urtasun, Fidler

Decision tree algorithm

Training
Recursively, for each node in tree:
1. If labels in the node are mixed:
a. Choose attribute and split values x
x
based on data that reaches each x o
x x
node x
o x
b. Branch and create 2 (or more) o x
o
nodes o
o
2. Return o
x2

x1
Decision tree algorithm x2 < 0.6
y n

Training
Recursively, for each node in tree:
1. If labels in the node are mixed: 1

a. Choose attribute and split values x

x
based on data that reaches each x o
x x
node x
o x
b. Branch and create 2 (or more) o x
o
nodes o
o
2. Return o
x2

(0,0) x1 1
Decision tree algorithm x2 < 0.6
y n

x1 < 0.7
Training
o x
Recursively, for each node in tree:
1. If labels in the node are mixed: 1

a. Choose attribute and split values x

x
based on data that reaches each x o
x x
node x
o x
b. Branch and create 2 (or more) o x
o
nodes o
o
2. Return o
x2

(0,0) x1 1
Decision tree algorithm x2 < 0.6
y n

x1 < 0.7 x2 < 0.8

Training
o x x
Recursively, for each node in tree:
1. If labels in the node are mixed: 1

a. Choose attribute and split values x

x
based on data that reaches each x o
x x
node x
o x
b. Branch and create 2 (or more) o x
o
nodes o
o
2. Return o
x2

(0,0) x1 1
Decision tree algorithm x2 < 0.6
y n

x1 < 0.7 x2 < 0.8

Training
o x x
Recursively, for each node in tree: x1 < 0.4

1. If labels in the node are mixed: x x1 < 0.5

o x
a. Choose attribute and split values 1
based on data that reaches each x
node x
x o
b. Branch and create 2 (or more) x x
x
nodes o x
o x
2. Return o o
o
o
x2

(0,0) x1 1
Decision tree algorithm x2 < 0.6
y n

x1 < 0.7 x2 < 0.8

Prediction o x
x1 < 0.4 x

1.Check conditions to descend tree x x1 < 0.5

2.Return label of leaf node o x

x
x
x o
x x
x *
o * o
x
x
o o
o
o
x2

(0,0) x1 1
How do you choose what/where to split?

Slide Source: Zemel, Urtasun, Fidler

Quantifying Uncertainty: Coin Flip Example

Slide Source: Zemel, Urtasun, Fidler

Quantifying Uncertainty: Coin Flip Example

Slide Source: Zemel, Urtasun, Fidler

Quantifying Uncertainty: Coin Flip Example
Entropy:

Slide Source: Zemel, Urtasun, Fidler

Entropy of a Joint Distribution

Slide Source: Zemel, Urtasun, Fidler

Specific Conditional Entropy

Slide Source: Zemel, Urtasun, Fidler

Conditional Entropy

Slide Source: Zemel, Urtasun, Fidler

Conditional Entropy

Slide Source: Zemel, Urtasun, Fidler

Conditional Entropy

Slide Source: Zemel, Urtasun, Fidler

Information Gain

Slide Source: Zemel, Urtasun, Fidler

Constructing decision tree

Training
Recursively, for each node in tree: 1. Measure information gain
• For each discrete attribute: compute
1. If labels in the node are mixed: •
information gain of split
For each continuous attribute: select
a. Choose attribute and split values most informative threshold and
compute its information gain. Can
based on data that reaches each be done efficiently based on sorted
node values.
2. Select attribute / threshold with
b. Branch and create 2 (or more) highest information gain
nodes
2. Return
Pause, stretch, and think: Is it better to split based on type or patrons?

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler
What if you need to predict a continuous value?
• Regression Tree
– Same idea, but choose splits to minimize sum squared error
∑𝑛𝑛∈𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑓𝑓𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑥𝑥𝑛𝑛 − 𝑦𝑦𝑛𝑛 2
– 𝑓𝑓𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑥𝑥𝑛𝑛 typically returns the mean prediction value of data points
in the leaf node containing 𝑥𝑥𝑛𝑛
– What are we minimizing?
Variants
• Different splitting criteria, e.g. Gini index: 1 − ∑𝑖𝑖 𝑝𝑝𝑖𝑖2 (very
similar result, a little faster to compute)
• Most commonly, split on one attribute at a time
– In case of continuous vector data, can also split on linear projections
of features
• Can stop early
– when leaf node contains fewer than Nmin points
– when max tree depth is reached
• Can also predict multiple continuous values or multiple classes
Decision Tree vs. 1-NN
DT Boundaries

• Both have piecewise-linear

decisions
• Decision tree is typically “axis-
aligned”
• Decision tree has ability for early
stopping to improve generalization 1-NN Boundaries

• True power of decision trees arrives

with ensembles (lots of small or
randomized trees)
Regression Tree for Temperature Prediction
• Min leaf size: 200 Chicago, yesterday

• RMSE= 3.42
• R2=0.88 Milwaukee, yesterday Grand Rapids, yesterday

Chicago, yesterday Chicago, yesterday

from sklearn import tree

from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(random_state=0, min_samples_leaf=200)
model.fit(x_train, y_train)
y_pred = model.predict(x_val)
tree_rmse = np.sqrt(np.mean((y_pred-y_val)**2))
tree_mae = np.sqrt(np.median(np.abs(y_pred-y_val)))
print('LR: RMSE={}, MAE={}'.format(tree_rmse, tree_mae))
print('R^2: {}'.format(1-tree_rmse**2/np.mean((y_pred-y_pred.mean())**2)))
plt.figure(figsize=(20,20))
tree.plot_tree(model)
plt.show()
for f in [334, 372, 405]:
print('{}: {}, {}'.format(f, feature_to_city[f], feature_to_day[f]))
Classification/Regression Trees Summary
• Key Assumptions
– Samples with similar features have similar predictions
• Model Parameters
– Tree structure with split criteria at each internal node and prediction at each leaf
node
• Designs
– Limits on tree growth
– What kinds of splits are considered
– Criterion for choosing attribute/split (e.g. gini impurity score is another common
choice)
• When to Use
– Want an explainable decision function (e.g. for medical diagnosis)
– As part of an ensemble (as we’ll see Thursday)
• When Not to Use
– One tree is not a great performer, but a forest is
Compare classifiers
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑦𝑦 = 𝒘𝒘𝑇𝑇 𝒙𝒙 + 𝑏𝑏

𝑇𝑇
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑦𝑦𝑛𝑛 = 1 = 𝒘𝒘 𝒙𝒙𝑛𝑛 + 𝑏𝑏
Things to remember
• Decision/regression trees
learn to split up the feature
space into partitions with
similar values

• Entropy is a measure of
uncertainty

• Information gain measures

how much particular
knowledge reduces prediction
uncertainty
Thursday
• Ensembles: model averaging and forests

Decision Tree
0% (1)
Decision Tree
24 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Lec 02
No ratings yet
Lec 02
73 pages
4.decision Tree
No ratings yet
4.decision Tree
39 pages
02 LecDT
No ratings yet
02 LecDT
85 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
210 Handout
No ratings yet
210 Handout
45 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
06 Trees Handout
No ratings yet
06 Trees Handout
39 pages
22.InfoTheory DecisionTrees Short
No ratings yet
22.InfoTheory DecisionTrees Short
25 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Module4 DS PPT
No ratings yet
Module4 DS PPT
49 pages
Mod 4-1
No ratings yet
Mod 4-1
42 pages
20ee38011 Exp4
No ratings yet
20ee38011 Exp4
24 pages
23 Ens RandomForests
No ratings yet
23 Ens RandomForests
27 pages
M2 Decision Trees
No ratings yet
M2 Decision Trees
37 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Present
No ratings yet
Present
20 pages
MLSP Lab Exp4
No ratings yet
MLSP Lab Exp4
9 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
3 Dtrees-Lect6
No ratings yet
3 Dtrees-Lect6
63 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material II 19-May-2021 Random Forest
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material II 19-May-2021 Random Forest
22 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Ds 6
No ratings yet
Ds 6
24 pages
Random Forests
No ratings yet
Random Forests
22 pages
Unit IV
No ratings yet
Unit IV
36 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Random Forest
No ratings yet
Random Forest
25 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
03 - Random Forest
No ratings yet
03 - Random Forest
24 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
No ratings yet
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
49 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Racine, Su, Ullah - Unknown - Applied Nonparametric & Semiparametric Econometrics & Statistics PDF
No ratings yet
Racine, Su, Ullah - Unknown - Applied Nonparametric & Semiparametric Econometrics & Statistics PDF
562 pages
Random Forest
No ratings yet
Random Forest
18 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Lecture Introductory Econometrics For Finance: Chapter 4 - Chris Brooks
100% (1)
Lecture Introductory Econometrics For Finance: Chapter 4 - Chris Brooks
52 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Statistical Modeling of Extreme Values PDF
No ratings yet
Statistical Modeling of Extreme Values PDF
28 pages
Bootstrap Method PDF
No ratings yet
Bootstrap Method PDF
14 pages
Probability and Statistics - Int 2 - Answer Paper - Part B
No ratings yet
Probability and Statistics - Int 2 - Answer Paper - Part B
26 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Lecture 5 Cont Prob If 2016
No ratings yet
Lecture 5 Cont Prob If 2016
35 pages
Chapter 3
No ratings yet
Chapter 3
81 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Slide-Chap6 MAS291
No ratings yet
Slide-Chap6 MAS291
9 pages
5) Mba Assignment 4
No ratings yet
5) Mba Assignment 4
2 pages
ch13 Multiple Regression
No ratings yet
ch13 Multiple Regression
37 pages
Dougherty C12G02 2016 05 22
No ratings yet
Dougherty C12G02 2016 05 22
18 pages
I3-TD3 (Confidence Intervals)
No ratings yet
I3-TD3 (Confidence Intervals)
4 pages
Slides - Module 2 - Lesson 1
No ratings yet
Slides - Module 2 - Lesson 1
41 pages
DPBS 1203 Business and Economic Statistics
No ratings yet
DPBS 1203 Business and Economic Statistics
21 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Advertising - Paulina Frigia Rante (34) - PPBP 1 - Colaboratory
No ratings yet
Advertising - Paulina Frigia Rante (34) - PPBP 1 - Colaboratory
7 pages
Incorporation of Exogenous Variable in Long Memory Model ARFIMAX GARCH Framework
No ratings yet
Incorporation of Exogenous Variable in Long Memory Model ARFIMAX GARCH Framework
8 pages
Assignment 2 (LCW) (1) HH
No ratings yet
Assignment 2 (LCW) (1) HH
12 pages
Chapter 2 - Exercises - Econometrics2
No ratings yet
Chapter 2 - Exercises - Econometrics2
2 pages
Correlation and Regression: Libeeth B. Guevarra
No ratings yet
Correlation and Regression: Libeeth B. Guevarra
10 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Machine Learning II Mid Term
No ratings yet
Machine Learning II Mid Term
3 pages
Multivariate Analysis: y N P V A
No ratings yet
Multivariate Analysis: y N P V A
2 pages
Why Data Visualizations Are Not Optional: Anscombe's Quartet
No ratings yet
Why Data Visualizations Are Not Optional: Anscombe's Quartet
2 pages
Regression
No ratings yet
Regression
3 pages
Statistics For Management and Economics, Tenth Edition Formulas
No ratings yet
Statistics For Management and Economics, Tenth Edition Formulas
11 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Lcleeresume 2008
No ratings yet
Lcleeresume 2008
9 pages
Hexagon Number Sense
From Everand
Hexagon Number Sense
Christopher Casey
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Lecture 12 - Decision and Regression Trees

Uploaded by

Lecture 12 - Decision and Regression Trees

Uploaded by

Decision and

Applied Machine Learning

Dall-E: A dirt road splits around a large gnarly

• Logistic Regression is widely used

• Linear Regression is widely used

rule to split the ‘o’ from the ‘x’, Else ‘x’

what other idea might you try?

Fig Credit: Zemel, Urtasun, Fidler

Slide Credit: Zemel, Urtasun, Fidler

Slide Credit: Zemel, Urtasun, Fidler

Figure Source: Zemel, Urtasun, Fidler

Figure Source: Zemel, Urtasun, Fidler

a. Choose attribute and split values x

a. Choose attribute and split values x

x1 < 0.7 x2 < 0.8

a. Choose attribute and split values x

x1 < 0.7 x2 < 0.8

1. If labels in the node are mixed: x x1 < 0.5

x1 < 0.7 x2 < 0.8

1.Check conditions to descend tree x x1 < 0.5

2.Return label of leaf node o x

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

Slide Source: Zemel, Urtasun, Fidler

• Both have piecewise-linear

• True power of decision trees arrives

Chicago, yesterday Chicago, yesterday

from sklearn import tree

• Information gain measures

You might also like