CSE 422 Machine Learning Tree Based Methods

Uploaded by

airobot28

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

CSE 422 Machine Learning Tree Based Methods

Uploaded by

airobot28

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Tree Based Methods

Fall 2024
Contents
● Tree Based Methods
○ ID3, C4.5, CART
○ Mixed of numeric and categorical
○ Missing data
● Pruning
● Visualization
● Rule Generator

https://www.flickr.com/photos/wonderlane/2062184804/
Explainable Rules

https://heartbeat.fritz.ai/understanding-the-mathematics-behind-decision-trees-22d86d55906
Decision Tree Example
Handling Numeric Attributes
y ● Suppose y is the prediction variable
● x is the feature
● The decision boundary is at value v
● If x > v then use samples on the right
v x
● If x < v use samples on the left
● For classification, voting
● For regression use uniform or
weighted average
Decision Tree
● There are many trees possible
○ Always prefer the shortest one
● What is a good decision tree?
● For numeric attributes, it is important to
decide the value to split
○ binary vs multiway splits
● For categorical variables its the set of the
different values
● How to select between multiple attributes?
● How many attributes should be selected?
○ Single or multiple?
Decision Tree for Regression and Classification
● Classification and Regression Trees
○ Breiman et al 1984
○ Only Binary Splits
○ Uses Gini “measure of impurity”
● Iterative Dichotomiser 3
○ Ross Quinlan, 1986
○ Uses Information Gain, Greedy Algorithm
● C 4.5 by
○ Ross Quinlan, 1993
○ Improved version over ID3
■ Pruning, attributes with different costs, missing values, continuous attributes,

Top 10 algorithms in data mining http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf

Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Good Split vs Bad Split
● What make a split good?
● The case for classification
○ Entropy
○ Information Gain
○ Gain Ratio
○ Gini Index
● The case for regression
○ Squared Error
Entropy
● Measure of disorder in a set
● Find out entropy of each of
the rectangles
Information Gain
● How much information is
gained by a split
● Originally a node have a
measure of entropy H(q)
● After the split, the entropy is
divided into sets. The gain is
the difference.
Gain Ratio
● IG biases the decision tree
against considering attributes
with a large number of distinct
values
○ E.g. credit card number
● Normalization of Information
Gain
● Split Information

● Gain Ratio = Information Gain /

Split Information
Gini Index
● Gini impurity is a measure of
how often a randomly chosen
element from the set would be
incorrectly labeled if it was
randomly labeled according to
the distribution of labels in the
subset
● Used by CART
● Gain is deﬁned similarly
An Example
We will work on same dataset
in ID3. There are 14 instances
of golf playing decisions based
on outlook, temperature,
humidity and wind factors.

● We will use gini index

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook
Outlook is a nominal
feature. It can be sunny,
overcast or rain. I will
summarize the ﬁnal
decisions for outlook
feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Temperature
Temperature is a nominal
feature and it could have
3 diﬀerent values: Cool,
Hot and Mild. Let’s
summarize decisions for
temperature feature.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Humidity
Humidity is a binary class feature. It can be high or normal.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Wind
Wind is a binary class similar to humidity. It can be weak and strong.

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split: Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁrst split:Outlook

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Recursive Partitioning
A sub dataset

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Temperature

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Humidity

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Outlook sunny & Wind

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The second split

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
The ﬁnal tree

https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
Decision Tree Overfitting
● Pre-Pruning
○ Maximum number of leaf nodes
○ Maximum depth of the tree
○ Minimum number of training
instances at a leaf node
● Post-Pruning
○ Another strategy to avoid
overfitting in decision trees is to
first grow a full tree, and then
prune it based on a previously
held-out validation dataset.
○ Use statistical Tests
Tree Pruning: Validation Set
● Prune using a hold out validation dataset

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits
● Try Chi Square Test
● Check the statistic to find any
significance gain achieved by the
split
● Is there any difference with the
arbitrary split?

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Detecting Useless Splits

https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15381-s06/www/DTs2.pdf
Decision Tree
Pros Cons

● Interpretable and Simple ● NP-complete

● Handles all types of data ● Not stable
● Handles missing values ● Often Overﬁts
● Less pre-processing required ● High bias
● Fast computation ● Not suitable for unstructured
● non-parametric data
Multi-Variable Split?
Multi-Variable Split?

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
62% (68)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
KamaSutra Positions
69% (83)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
Sample Mental Health Progress Note
96% (47)
Sample Mental Health Progress Note
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Lecture 10 Big M Method
No ratings yet
Lecture 10 Big M Method
52 pages
Resource Leveling & Resource Smoothing
No ratings yet
Resource Leveling & Resource Smoothing
20 pages
Lab#3 Introduction To UML and Use Case Diagram: Objective
No ratings yet
Lab#3 Introduction To UML and Use Case Diagram: Objective
7 pages
Decision Tree Ppt
0% (1)
Decision Tree Ppt
24 pages
VIVA
No ratings yet
VIVA
5 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Partial Revision Checklist for Practice Draft
No ratings yet
Partial Revision Checklist for Practice Draft
3 pages
Chapter 09 CART-3
No ratings yet
Chapter 09 CART-3
42 pages
BDT KSETA Freudenstadt
No ratings yet
BDT KSETA Freudenstadt
32 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
entropy and information gain for decision tree algorithm
No ratings yet
entropy and information gain for decision tree algorithm
12 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
CART1
No ratings yet
CART1
17 pages
Class Basic
No ratings yet
Class Basic
75 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
AIML Qb Solutions First Cie
No ratings yet
AIML Qb Solutions First Cie
3 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Multiobjective Optimisation of The PSATSim Simulator Using Pareto and Non-Pareto Algorithms
No ratings yet
Multiobjective Optimisation of The PSATSim Simulator Using Pareto and Non-Pareto Algorithms
26 pages
PR GTU IMP questions by jay
No ratings yet
PR GTU IMP questions by jay
35 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Classification
No ratings yet
Classification
45 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Unit-7
No ratings yet
Unit-7
67 pages
Trees
No ratings yet
Trees
19 pages
BDA-Lec4
No ratings yet
BDA-Lec4
40 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Mining High-Speed Data Streams: Sixth ACM SIGKDD International Conference - 2000
No ratings yet
Mining High-Speed Data Streams: Sixth ACM SIGKDD International Conference - 2000
46 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Tree
No ratings yet
Tree
31 pages
Decision tree
No ratings yet
Decision tree
16 pages
21BCE3954 FraudDetectionInBanking
No ratings yet
21BCE3954 FraudDetectionInBanking
26 pages
331mt 3.1 (1)
No ratings yet
331mt 3.1 (1)
36 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
ML QB Solutionss
No ratings yet
ML QB Solutionss
16 pages
Supervised Learning
No ratings yet
Supervised Learning
67 pages
Data Science By Internshala Trainings
No ratings yet
Data Science By Internshala Trainings
46 pages
Final Review
No ratings yet
Final Review
20 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
ML for predictive analysis
No ratings yet
ML for predictive analysis
4 pages
Bridging Blaze Lbolytc Finals Reviewer
No ratings yet
Bridging Blaze Lbolytc Finals Reviewer
33 pages
Machine Learning Investigative Reporting NorthBaySolutions
No ratings yet
Machine Learning Investigative Reporting NorthBaySolutions
130 pages
ML Assignment-2: Unit 3
No ratings yet
ML Assignment-2: Unit 3
21 pages
Energy Systems Design: Spiral Development Procedure
No ratings yet
Energy Systems Design: Spiral Development Procedure
47 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
No ratings yet
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
18 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Lecture 2
No ratings yet
Lecture 2
18 pages
Decision Trees - Neha Chowdhary PPT
No ratings yet
Decision Trees - Neha Chowdhary PPT
20 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Top Data Science Skills 1721583698
No ratings yet
Top Data Science Skills 1721583698
9 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Supersived Machine Learning
No ratings yet
Supersived Machine Learning
52 pages
Untitled presentation
No ratings yet
Untitled presentation
6 pages
Clustering
No ratings yet
Clustering
30 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
DMBI Simplified
No ratings yet
DMBI Simplified
28 pages
Lab 4
No ratings yet
Lab 4
2 pages
331 MT2 STUDY
No ratings yet
331 MT2 STUDY
30 pages
Backtesting Strategies: From Theory to Practice: Algorithmic trading Academy, #1
From Everand
Backtesting Strategies: From Theory to Practice: Algorithmic trading Academy, #1
Full time trader Jordan Hale
No ratings yet
Chapter Two Overall Experience 2.1 How We Get Into The Company
No ratings yet
Chapter Two Overall Experience 2.1 How We Get Into The Company
8 pages
Lab 7: Ma With Cwsandbox, Anubis: Because Teaching Teaches Teachers To Teach
No ratings yet
Lab 7: Ma With Cwsandbox, Anubis: Because Teaching Teaches Teachers To Teach
10 pages
A Not-To-Be-Missed Deal: SAP HR - Quick Guide
No ratings yet
A Not-To-Be-Missed Deal: SAP HR - Quick Guide
43 pages
Mini 1U Series DVR User's Manual V1.1.0 201301
No ratings yet
Mini 1U Series DVR User's Manual V1.1.0 201301
145 pages
Performance Management: Key Performance Areas (Kpas) - Key Performance Areas (Kpas) May Be Described As The
No ratings yet
Performance Management: Key Performance Areas (Kpas) - Key Performance Areas (Kpas) May Be Described As The
2 pages
c18 Troubleshooting
No ratings yet
c18 Troubleshooting
16 pages
Microservices Interview Questions
No ratings yet
Microservices Interview Questions
5 pages
Computer Science Past Paper
No ratings yet
Computer Science Past Paper
12 pages
HW20004 NEC SL2100 Brochure 4p US
No ratings yet
HW20004 NEC SL2100 Brochure 4p US
4 pages
Lenovo 300w Yoga Gen 4 Spec
No ratings yet
Lenovo 300w Yoga Gen 4 Spec
8 pages
SS7 Protocol
No ratings yet
SS7 Protocol
5 pages
Manual Instalación Sistema Completo ENATEL
No ratings yet
Manual Instalación Sistema Completo ENATEL
28 pages
MEAN Stack Tutorial MongoDB ExpressJS AngularJS NodeJS (Part III) _ Adrian Mejia Blog
No ratings yet
MEAN Stack Tutorial MongoDB ExpressJS AngularJS NodeJS (Part III) _ Adrian Mejia Blog
26 pages
How To Configure An L2TP Connection From A Windows Client To A Locally Managed Quantum Spark Appliance
No ratings yet
How To Configure An L2TP Connection From A Windows Client To A Locally Managed Quantum Spark Appliance
3 pages
Fault Codes K500 Sinoboom
100% (1)
Fault Codes K500 Sinoboom
11 pages
Sa 2 Revision Worksheet Class 4
No ratings yet
Sa 2 Revision Worksheet Class 4
3 pages
Christine J. Shanks CV
No ratings yet
Christine J. Shanks CV
4 pages
APEO Free Declaration 2014 E 140818
No ratings yet
APEO Free Declaration 2014 E 140818
12 pages
OLYMPUS User Manual Gabinete
No ratings yet
OLYMPUS User Manual Gabinete
55 pages
Resume-AnkitSajwan
No ratings yet
Resume-AnkitSajwan
1 page
领航三维简介 Linghang 3d Prototype Introduction - 英文
No ratings yet
领航三维简介 Linghang 3d Prototype Introduction - 英文
33 pages
Lecture 0: Background: Rafael A. Irizarry and Hector Corrada Bravo January 2010
No ratings yet
Lecture 0: Background: Rafael A. Irizarry and Hector Corrada Bravo January 2010
3 pages
Chapter 8 Customer Experience
No ratings yet
Chapter 8 Customer Experience
27 pages
Unit 1: Introduction To Focused Build For SAP Solution Manager
No ratings yet
Unit 1: Introduction To Focused Build For SAP Solution Manager
13 pages
Midterm Lab Exam
No ratings yet
Midterm Lab Exam
3 pages
Project Euler Problems
No ratings yet
Project Euler Problems
3 pages
Java Module-3 Part -A(Inheritance )Book Notes
No ratings yet
Java Module-3 Part -A(Inheritance )Book Notes
18 pages