0% found this document useful (0 votes)
4 views

CSE 422 Machine Learning Tree Based Methods

K

Uploaded by

airobot28
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CSE 422 Machine Learning Tree Based Methods

K

Uploaded by

airobot28
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Tree Based Methods

Fall 2024
Contents
● Tree Based Methods
○ ID3, C4.5, CART
○ Mixed of numeric and categorical
○ Missing data
● Pruning
● Visualization
● Rule Generator

https://www.flickr.com/photos/wonderlane/2062184804/
Explainable Rules

https://heartbeat.fritz.ai/understanding-the-mathematics-behind-decision-trees-22d86d55906
Decision Tree Example
Handling Numeric Attributes
y ● Suppose y is the prediction variable
● x is the feature
● The decision boundary is at value v
● If x > v then use samples on the right
v x
● If x < v use samples on the left
● For classification, voting
● For regression use uniform or
weighted average
Decision Tree
● There are many trees possible
○ Always prefer the shortest one
● What is a good decision tree?
● For numeric attributes, it is important to
decide the value to split
○ binary vs multiway splits
● For categorical variables its the set of the
different values
● How to select between multiple attributes?
● How many attributes should be selected?
○ Single or multiple?
Decision Tree for Regression and Classification
● Classification and Regression Trees
○ Breiman et al 1984
○ Only Binary Splits
○ Uses Gini “measure of impurity”
● Iterative Dichotomiser 3
○ Ross Quinlan, 1986
○ Uses Information Gain, Greedy Algorithm
● C 4.5 by
○ Ross Quinlan, 1993
○ Improved version over ID3
■ Pruning, attributes with different costs, missing values, continuous attributes,

Top 10 algorithms in data mining http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf


Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Entropy
● Measure of disorder in a set
● Find out entropy of each of
the rectangles
Information Gain
● How much information is
gained by a split
● Originally a node have a
measure of entropy H(q)
● After the split, the entropy is
divided into sets. The gain is
the difference.
Gain Ratio
● IG biases the decision tree
against considering attributes
with a large number of distinct
values
○ E.g. credit card number
● Normalization of Information
Gain
● Split Information

● Gain Ratio = Information Gain /


Split Information
Gini Index
● Gini impurity is a measure of
how often a randomly chosen
element from the set would be
incorrectly labeled if it was
randomly labeled according to
the distribution of labels in the
subset
● Used by CART
● Gain is defined similarly
An Example
We will work on same dataset
in ID3. There are 14 instances
of golf playing decisions based
on outlook, temperature,
humidity and wind factors.

● We will use gini index

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook
Outlook is a nominal
feature. It can be sunny,
overcast or rain. I will
summarize the final
decisions for outlook
feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Temperature
Temperature is a nominal
feature and it could have
3 different values: Cool,
Hot and Mild. Let’s
summarize decisions for
temperature feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Humidity
Humidity is a binary class feature. It can be high or normal.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Wind
Wind is a binary class similar to humidity. It can be weak and strong.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split: Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The first split:Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Recursive Partitioning
A sub dataset

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Temperature

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Humidity

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Wind

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The final tree

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Decision Tree Overfitting
● Pre-Pruning
○ Maximum number of leaf nodes
○ Maximum depth of the tree
○ Minimum number of training
instances at a leaf node
● Post-Pruning
○ Another strategy to avoid
overfitting in decision trees is to
first grow a full tree, and then
prune it based on a previously
held-out validation dataset.
○ Use statistical Tests
Tree Pruning: Validation Set
● Prune using a hold out validation dataset

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
● Try Chi Square Test
● Check the statistic to find any
significance gain achieved by the
split
● Is there any difference with the
arbitrary split?

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Decision Tree
Pros Cons

● Interpretable and Simple ● NP-complete


● Handles all types of data ● Not stable
● Handles missing values ● Often Overfits
● Less pre-processing required ● High bias
● Fast computation ● Not suitable for unstructured
● non-parametric data
Multi-Variable Split?
Multi-Variable Split?

You might also like