0% found this document useful (0 votes)

21 views

Decision Trees

Uploaded by

mert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Decision Trees

Uploaded by

mert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

CMPE 442 Introduction to

Machine Learning
• Decision Trees
Decision Trees

 Decision trees can perform:

 Classification
 Regression
 The function is approximated as tree
 Main advantage is interpretability
Example
A Decision tree

 F:<Outlook, Humidity, Wind, Temp> Play Golf?

A Decision tree

 F:<Outlook, Humidity, Wind, Temp> Play Golf?

 Each internal node: test one attribute 𝑋

 Each branch from a node: selects one value for 𝑋
 Each leaf node: predict 𝑌 (𝑜𝑟 𝑃(𝑌|𝑋 ∈ 𝑙𝑒𝑎𝑓))
Decision Tree Learning

Problem Setting:
 Set of possible instances 𝑋
 Each instance x in X is a feature vector
 𝑥 =< 𝑥 , 𝑥 , … , 𝑥 >
 Unknown target function 𝑓: 𝑋 → 𝑌
 Y is discrete valued
 Set of function hypotheses 𝐻 = {ℎ|ℎ: 𝑋 → 𝑌}
 Each hypothesis h is a decision tree
 Trees sort x to leaf, which assign y
Decision Tree Learning

 Best-split
 Measure of homogeneity (Impurity)
What we need

 Best-split
 Measure of homogeneity (Impurity)

 Impurity Measures:
 Gini Impurity
 Entropy
Best Split
Iris Dataset
Example
Making Predictions

 Suppose you found a new Iris with petal length of 5 and petal width 1.5
 Then our new feature is [5,1.5]
 Which class does it belong (what is y)?
Making Predictions

 Suppose you found a new Iris with petal length of 5 and petal width 1.5
 Then our new feature is [5,1.5]
 Which class does it belong (what is y)?
Meaning of the values

 Remember that Iris dataset has feastuers150 flowers

 3 classes: Virginica, Setosa and Versicolor
 50 samples have petal length<=2.45
 100 samples have petal length>2.45
 54 samples out 100 have petal width<=1.75
 gini attribute: measures node’s impurity
 A node is pure (gini=0) if all training instances
that it applies to belong to the same class
Gini impurity

 𝐺 =1−∑ 𝑝,
 𝑝 , is the ratio of class k instances among the training instances in the ith node
 Ex:
 The gini score for the depth-1 left node is:

1− − − =0

 The gini score for the depth-2 left node is:

1− − − ≈ 0.168

 The gini score for the depth-2 right node is:

1− − − ≈ 0.0425
Iris dataset depth 3
Iris dataset depth 3
Estimating Class Probabilities

 Decision Trees also can estimate the probabilities that an instance belongs
to a particular class
Algorithm to build a DT

node =Root
main loop:
1. A the best decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
4. Sort training examples to leaf nodes
5. If training examples perfectly classified then STOP. Else, iterate over new
leaf nodes

Note: Decision trees use greedy approach!

The CART training algorithm

 Scikit-Learn uses the Classification and Regression Tree (CART) algorithm to

train Decision Trees  outputs only binary trees
 Idea:
 Split the training set in two subsets using a single feature 𝑖 and a threshold 𝑡
 Searches for the pair (𝑖, 𝑡 ) that produces the purest subsets
 The cost function the algorithm tries to minimize:
 𝐽 𝑖, 𝑡 = 𝐺 + 𝐺

𝐺 / 𝑚𝑒𝑠𝑢𝑟𝑒𝑠 𝑡ℎ𝑒 𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 /𝑟𝑖𝑔ℎ𝑡 𝑠𝑢𝑏𝑠𝑒𝑡

 Where
𝑚 / 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 /𝑟𝑖𝑔ℎ𝑡 𝑠𝑢𝑏𝑠𝑒𝑡

 Stop recursing when reach the maximum depth, or when no new split to reduce
impurity is found.
Example
Example [9+, 5-]
𝐽 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.34
Outlook

Rainy Sunny
Overcast

Gini=0.48 Gini=0 Gini=0.48

Samples=5 Samples=4 Samples=5
Value=[2+,3-] Value=[4+,0-] Value=[3+,2-]

2 3
𝐺𝑖𝑛𝑖 𝑅𝑎𝑖𝑛𝑦 = 1 − − = 0.48
5 5

4 0
𝐺𝑖𝑛𝑖 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 = 1 − − =0
4 4

3 2
𝐺𝑖𝑛𝑖 𝑆𝑢𝑛𝑛𝑦 = 1 − − = 0.48
5 5
5 4 5
𝐽 𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = ∗ 0.48 + ∗0+ ∗ 0.48 = 0.34
14 14 14
Example [9+, 5-]
𝐽 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.34
Outlook

Rainy Sunny
Overcast

Gini=0.48 Gini=0 Gini=0.48

Samples=5 Samples=4 Samples=5
Value=[2+,3-] Value=[4+,0-] Value=[3+,2-]

[9+, 5-]

Temperature
𝐽 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.47

Hot Cool
Mild
Gini=0.5 Gini=0.44 Gini=0.5
Samples=4 Samples=6 Samples=4
Value=[2+,2-] Value=[4+,2-] Value=[2+,2-]
Example [9+, 5-]
𝐽 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = 0.37
Humidity

High Normal

Gini=0.49 Gini=0.25
Samples=7 Samples=7
Value=[3+,4-] Value=[6+,1-]

[9+, 5-]
𝐽 𝑊𝑖𝑛𝑑 = 0.375
Wind

True False

Gini=0.5 Gini=0.25
Samples=6 Samples=8
Value=[3+,3-] Value=[6+,2-]
Example

[9+, 5-]

Outlook

Rainy Sunny
Overcast

Gini=0.48 Gini=0.48
Samples=5 Yes Samples=5
Value=[2+,3-] Value=[3+,2-]
Example: Continuous Features
[2+, 2-]
𝐽 𝐶ℎ𝑒𝑠𝑡 𝑝𝑎𝑖𝑛 = 0.333
Chest Pain
Chest Pain Good Blood Blocked Weight Heart
Circulation Arteries Disease
Yes No
No No No 125 No

Yes Yes Yes 180 Yes Gini=0.444 Gini=0

Samples=3 Samples=1
Yes Yes No 210 No Value=[2+,1-] Value=[0+,1-]

Yes No Yes 167 Yes

[2+, 2-]
[2+, 2-]
Blocked
Arteries
Good Blood
Yes No
Yes No
Gini=0 Gini=0
Gini=0.5 Gini=0.5 Samples=2 Samples=2
Samples=2 Samples=2 Value=[2+,0-] Value=[0+,2-]
Value=[1+,1-] Value=[1+,1-]

𝐽 𝐵𝑙𝑜𝑐𝑘𝑒𝑑 𝐴𝑟𝑡𝑒𝑟𝑖𝑒𝑠 = 0
𝐽 𝐺𝑜𝑜𝑑 𝐵𝑙𝑜𝑜𝑑 = 0.5
Example: Continuous Features
[2+, 2-]
𝐽 𝑊𝑒𝑖𝑔ℎ𝑡 ≤ 146 = 0.333
Weight<=146
Weight Heart Weight
Disease
125 No
125 No Yes
146
167
180 Yes Gini=0.444
Gini=0
173.5 Samples=3
180 Samples=1
210 No Value=[0+,1-] Value=[2+,1-]
210 195
167 Yes

[2+, 2-]
[2+, 2-]
Weight<=195
Weight<=173.5
Yes No
Yes No
Gini=0.444 Gini=0
Gini=0.5 Gini=0.5 Samples=3 Samples=1
Samples=2 Samples=2 Value=[2+,1-] Value=[0+,1-]
Value=[1+,1-] Value=[1+,1-]

𝐽 𝑊𝑒𝑖𝑔ℎ𝑡 ≤ 195 = 0.333

𝐽 𝑊𝑒𝑖𝑔ℎ𝑡 ≤ 173.5 = 0.5
Regression

 Instead of trying to split the training set in a way that minimizes impurity, it
splits the training set in a way that minimizes the Mean Squared Error (MSE)

𝑀𝑆𝐸 = (𝑦 − 𝑦 ( ))
𝑚 𝑚 ∈
𝐽 𝑖, 𝑡 = 𝑀𝑆𝐸 + 𝑀𝑆𝐸 𝑤ℎ𝑒𝑟𝑒
𝑚 𝑚 1
𝑦 = 𝑦( )
𝑚
∈
Regression
Tree Pruning

 We want small decision trees:

 Interpretability
 Removal of irrelevant and redundant attributes
 Reduce danger of overfitting
Tree Pruning

 Replace one or more subtrees with leafs

 Label with the most common class among the samples
 We aim perfection in the future examples not in the training set
 Reasonable pruning often gives better performance
Tree Pruning

t1 t1

1 0 1 0

t2 t2
t5 t5

1 0 1 1 0
0 0
1

+ t3
t6 - + - + -
1 0
1 0

t4 - + -
1

+ -
Tree Pruning: Algorithm
Tree Pruning: Error Estimate

t1
𝑚 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
1 0
𝐷 =𝐸 −𝐸
t2
t5 𝑚 𝑚
𝐸 = 𝐸 + 𝐸
1 0 1 0
𝑚 𝑚

+ t3 𝑚 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑟𝑒𝑎𝑐ℎ𝑖𝑛𝑔 𝑡

t6 -
1 0 𝑒 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑤ℎ𝑒𝑛 𝑡 𝑖𝑠 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑑
1 0
𝑏𝑦 𝑎 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒
t4 - + -
1 𝑒+1
𝐸 =
𝑚 +𝑚
+ -
Tree Pruning
Summary

 The attribute values are tested one at a time

 Many alternative trees can be created- smaller trees are preferred
 Always choose an attribute that conveys maximum purity for the class
labels
 To deal with overfitting introduce pruning
Decision Trees

 Use greedy algorithm

 Heuristic since does not guarantee to give the optimal solution
 Finding an optimal tree is known to be an NP-complete problem
 NP complete -problems are problems whose status is unknown. No
polynomial time algorithm has yet been discovered for any NP complete
problem, nor has anybody yet been able to prove that no polynomial-time
algorithm exist for any of them.

CMPE472 Quiz#1
100% (1)
CMPE472 Quiz#1
52 pages
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 2
No ratings yet
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 2
4 pages
English 106 Reflection
No ratings yet
English 106 Reflection
3 pages
Planning and Scheduling Residential Building Using Primavera Soft Ware
0% (1)
Planning and Scheduling Residential Building Using Primavera Soft Ware
6 pages
module_4
No ratings yet
module_4
30 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
209 Handout
No ratings yet
209 Handout
37 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
decision tree
No ratings yet
decision tree
13 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
18 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Session3 Eng
No ratings yet
Session3 Eng
44 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
ESGB_2025_classification and regression tress [Enregistré automatiquement]
No ratings yet
ESGB_2025_classification and regression tress [Enregistré automatiquement]
43 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Evaluating Model Accuracy and Bias-Variance Tradeoff
No ratings yet
Evaluating Model Accuracy and Bias-Variance Tradeoff
40 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Prac 6
No ratings yet
Prac 6
6 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
A Mother's Guide to Multiplication: For 7-11 Year Olds
From Everand
A Mother's Guide to Multiplication: For 7-11 Year Olds
Sandhya Anugopal
5/5 (1)
Simple Numbers
From Everand
Simple Numbers
Prasant
No ratings yet
Lab 08
No ratings yet
Lab 08
2 pages
Lab 03
No ratings yet
Lab 03
2 pages
Lab3 UDPTCP
No ratings yet
Lab3 UDPTCP
4 pages
Chapter 8 V7.0
No ratings yet
Chapter 8 V7.0
129 pages
Questions 4
No ratings yet
Questions 4
16 pages
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 1
No ratings yet
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 1
3 pages
Chapter 02
No ratings yet
Chapter 02
36 pages
2023 Laufenn Tire Warranty Booklet
No ratings yet
2023 Laufenn Tire Warranty Booklet
20 pages
Ss - Vtamps 2024 2
No ratings yet
Ss - Vtamps 2024 2
13 pages
Changing The Gain in 2B Power Amplifiers: R4 R1 C3 (R5) C4 C5
No ratings yet
Changing The Gain in 2B Power Amplifiers: R4 R1 C3 (R5) C4 C5
1 page
Jurbal Internasional
No ratings yet
Jurbal Internasional
15 pages
برمجة المتحكم المنطقي م ريمون كمال معهد السالزيان الإيطالي دون بسكو الثلاث أجزاءكامل
100% (1)
برمجة المتحكم المنطقي م ريمون كمال معهد السالزيان الإيطالي دون بسكو الثلاث أجزاءكامل
641 pages
Nube Lizer
No ratings yet
Nube Lizer
5 pages
Stevenson Kaysee Resume Spring 2017
No ratings yet
Stevenson Kaysee Resume Spring 2017
2 pages
Ece PDF
No ratings yet
Ece PDF
346 pages
Antonis Balasopoulos - Anti-Utopia and Dystopia - Rethinking The Generic Field PDF
No ratings yet
Antonis Balasopoulos - Anti-Utopia and Dystopia - Rethinking The Generic Field PDF
10 pages
Basic Instrumentation Skills Assignment answer
No ratings yet
Basic Instrumentation Skills Assignment answer
3 pages
Prototype of A Fingerprint Based Licensing System For Driving
No ratings yet
Prototype of A Fingerprint Based Licensing System For Driving
3 pages
A Survey Report On Kirloskar Oil Engine Ltd. By: Prabhanshu Maheshwari
33% (6)
A Survey Report On Kirloskar Oil Engine Ltd. By: Prabhanshu Maheshwari
56 pages
Filipino Consumer Culture - Dimalibot
100% (1)
Filipino Consumer Culture - Dimalibot
4 pages
UX Storyboard Cheatsheet
No ratings yet
UX Storyboard Cheatsheet
5 pages
SAP ABAP RAP Reshaping Inventory Management 1702971397
No ratings yet
SAP ABAP RAP Reshaping Inventory Management 1702971397
24 pages
Urban Growth Scenarios Guidebook
No ratings yet
Urban Growth Scenarios Guidebook
131 pages
Virtue Ethics by John Rey B. Raquidan BSCpE-2C
No ratings yet
Virtue Ethics by John Rey B. Raquidan BSCpE-2C
10 pages
PRC-0008 Current
No ratings yet
PRC-0008 Current
50 pages
George Orwell Et The Handmaid's Tale
No ratings yet
George Orwell Et The Handmaid's Tale
51 pages
Enclosure-No.-5-Five-Month-LAC-Plan
No ratings yet
Enclosure-No.-5-Five-Month-LAC-Plan
5 pages
Handout 1 220831
No ratings yet
Handout 1 220831
35 pages
Protect Track: Cellular Panic Button & GPS Locator
No ratings yet
Protect Track: Cellular Panic Button & GPS Locator
2 pages
Introduction To MEMS Gyroscopes
No ratings yet
Introduction To MEMS Gyroscopes
16 pages
Introduction To Photogrammetry
No ratings yet
Introduction To Photogrammetry
39 pages
Reciprocating Pump
No ratings yet
Reciprocating Pump
7 pages
shc55 Handheld Controller Manual
No ratings yet
shc55 Handheld Controller Manual
18 pages
Q3 - Periodical Test MUSIC
No ratings yet
Q3 - Periodical Test MUSIC
2 pages