CSE 422 Machine Learning Tree Based Methods
CSE 422 Machine Learning Tree Based Methods
Fall 2024
Contents
● Tree Based Methods
○ ID3, C4.5, CART
○ Mixed of numeric and categorical
○ Missing data
● Pruning
● Visualization
● Rule Generator
https://www.flickr.com/photos/wonderlane/2062184804/
Explainable Rules
https://heartbeat.fritz.ai/understanding-the-mathematics-behind-decision-trees-22d86d55906
Decision Tree Example
Handling Numeric Attributes
y ● Suppose y is the prediction variable
● x is the feature
● The decision boundary is at value v
● If x > v then use samples on the right
v x
● If x < v use samples on the left
● For classification, voting
● For regression use uniform or
weighted average
Decision Tree
● There are many trees possible
○ Always prefer the shortest one
● What is a good decision tree?
● For numeric attributes, it is important to
decide the value to split
○ binary vs multiway splits
● For categorical variables its the set of the
different values
● How to select between multiple attributes?
● How many attributes should be selected?
○ Single or multiple?
Decision Tree for Regression and Classification
● Classification and Regression Trees
○ Breiman et al 1984
○ Only Binary Splits
○ Uses Gini “measure of impurity”
● Iterative Dichotomiser 3
○ Ross Quinlan, 1986
○ Uses Information Gain, Greedy Algorithm
● C 4.5 by
○ Ross Quinlan, 1993
○ Improved version over ID3
■ Pruning, attributes with different costs, missing values, continuous attributes,
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook
Outlook is a nominal
feature. It can be sunny,
overcast or rain. I will
summarize the final
decisions for outlook
feature.
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Temperature
Temperature is a nominal
feature and it could have
3 different values: Cool,
Hot and Mild. Let’s
summarize decisions for
temperature feature.
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Humidity
Humidity is a binary class feature. It can be high or normal.
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Wind
Wind is a binary class similar to humidity. It can be weak and strong.
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split: Outlook
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split:Outlook
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Recursive Partitioning
A sub dataset
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Temperature
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Humidity
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Wind
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The final tree
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Decision Tree Overfitting
● Pre-Pruning
○ Maximum number of leaf nodes
○ Maximum depth of the tree
○ Minimum number of training
instances at a leaf node
● Post-Pruning
○ Another strategy to avoid
overfitting in decision trees is to
first grow a full tree, and then
prune it based on a previously
held-out validation dataset.
○ Use statistical Tests
Tree Pruning: Validation Set
● Prune using a hold out validation dataset
https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
● Try Chi Square Test
● Check the statistic to find any
significance gain achieved by the
split
● Is there any difference with the
arbitrary split?
https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Decision Tree
Pros Cons