0% found this document useful (0 votes)

26 views24 pages

Ds 6

The document discusses decision trees, including how they are built by splitting nodes, techniques for splitting nodes like using linear classifiers, and strategies for pruning trees like stopping splits if nodes are pure or features are exhausted. It also discusses lessons about decision trees like how they can handle both numeric and discrete data and are generally very fast for predictions.

Uploaded by

hkhandelwal2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views24 pages

Ds 6

Uploaded by

hkhandelwal2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Decision Trees

Midsem Exam
Feb 22 (Thu), 6PM, L16,17,18,19,20
Only for registered students (regular + audit)
Assigned seating – will be announced soon
Open notes (handwritten only)
No mobile phones, tablets etc
Bring your institute ID card
If you don’t bring it, you may have to spend precious
time during the exam getting verified separately
Syllabus:
All videos, slides, code linked on the course
discussion page (link below) till 21 Feb 2024 (Wed)
https://www.cse.iitk.ac.in/users/purushot/courses/ml/
2023-24-w/discussion.html
See GitHub for practice questions
Doubt Clearing and Practice Session
Feb 21, 2024 (Wed), 11PM, Online
Exact timing and meeting link TBA
Solve previous years questions
Clear doubts
Building Decision Trees
Root

1 2 3 4 5 6 7 8 9 10 11 12
Yes X < 5.5 No
Internal
Y > 10.5 Y>9 Node

Internal
X < 11.5 Node X < 8.5
Internal
Node
Y > 2.5 Y > 5.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Leaf
X < 12
Leaf

Leaf Leaf
Decision Trees – all shapes and sizes
Thisprune
May DT isthe
balanced –
tree to make The previous DT was very
it more shallow
allasleaf nodes
well. Alsoare at sameto have DT with
possible imbalanced and
depththan
more from2 the root per internal node considered bad in general
children

Imbalanced DTs may

offer very poor
Prediction accuracy
as well as take as long as
kNN to make a prediction.
Imagine a DT which is just a
chain of nodes. With data points, there
would be chain: some predictions will take
time . With a balanced DT, every
prediction takes at most time 
Regression with Decision Trees
To perform real valued regression, may
simply use average score at a leaf node
to predict scores for test data points
How to learn a DT?
How many children should a node have?

How to send data points to children?

When to stop splitting and make the node a leaf?

What to do at a leaf?

How many trees to train?

Cheapest thing to do would be to store the majority
What to do at a leaf? color (label) at a leaf. A slightly more informative
(more expensive as well) thing can be to store how
Can take any many training
(complicated) points
action atofa each
leafcolor (label) reached that
leaf
Why not call another machine learning algorithm?
For speed, keep leaf action simple
Simplest action – constant prediction
Such a DT will encode a piecewise constant
prediction function
How to split a node into children nodes?
In principle there is no restriction (e.g. can even use a
deep net to split a node). However, in practice we use
a simple ML algo like linear to split nodes. This is
because the usefulness of DTs largely comes from
being able to rapidly send a test data point to a leaf

Notice, splitting a node is a

classification problem in itself! Binary
if two children, multiclass if more than
2 children

Can we use any Oh! So we are using a

classification technique to simple ML technique such
split a node or are there as binary classification to
some restrictions? learn a DT!
Splitting a Node – some lessons Various notions of purity exist – entropy
and Gini index for classification
problems, variance for regression
Node splitting algorithm must be fast else problems DT predictions will be
slow Making sure that the split is
Oftenbalanced
people(e.g. carefully choose
roughly half just nodes
the Pure a single feature
are very and We
convenient. split
cana node
baseddata
onpoints
that goe.g.left(age
and right)
25 gois left,
makeage
them25 go right)
leaves right away and not
also important to ensure that have to worry about splitting them 
Such the
“simple classifiers”
tree is balanced. are often called decision stumps
However,
ensuring balance is often
tricky
A child node is completely pure How do I decide whether to use age or
if it contains training data of gender? Even if using age, how do I
only one class. decide whether to threshold at 25 or 65?
Usually, people go over all available features and all possible thresholds (can
be slow if not done cleverly) and choose a feature and a threshold for that
feature so that the child nodes that are created are as pure as possible
Purifying Decision Stumps
Purest Horizontal Purest Vertical Split
Split

Left Right Left Right

Search purest horizontal split by going

over all possible thresholds and checking
Two possible splitting directions. Let us Purest vertical split
choose the one that gives us purer children is more pure – use it!
Node splitting via linear classifiers
One Final Recap
Pruning Strategies
Stop if node is pure or almost pure
Stop if all features exhausted – avoid using a feature twice on a path
Limits depth of tree to (num of dimensions)
Can stop if a node is ill-populated i.e. has few training points
Can also (over) grow a tree and then merge nodes to shrink it
Merge two leaves and see if it worsens performance on the
validation set or not – rinse and repeat
Use a validation set to make these decisions (never touch test set)
Decision Trees - Lessons
Very fast at making predictions (if tree is reasonably balanced)
Can handle discrete data (even non numeric data) as well – e.g. can
have a stump as: blood group AB or O go left, else go right
SVM, RR etc have difficulty with such non-numeric and discrete
data since difficult to define distance and averages with them
(however, there are workarounds to do SVM etc with discrete data as
well)
Tons of DT algorithms – both classical (ID3, C4.5) as well as recent
(GBDT, LPSR, Parabel) – DTs are versatile and very useful
Reason: DT learning is an NP hard problem 
If you think you have a better way of splitting nodes or handling leaf
Playing Hangman
Imagine a language where the letter

Uncertainty
“#” appears in every word.
Guessing # gives us no useful
To win at hangman, we must
ask questions that eliminate
wrong answers quickly
information

4096
2048 2048
1024 1024 1024 1024

Similarly, very rare letters are not very good either –

10 levels later … they will occasionally help us identify the word very
quickly but will mostly cause us to make a mistake.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1
Uncertainty Reduction – Hangman
amid, baby, back, bake, bike, book, bump, burn, cave, chip, cook, damp,
duck, dump, fade, good, have, high, hook, jazz, jump, kick, maid, many,

Start State mind, monk, much, must, paid, pain, park, pick, pine, pipe, pond, pony,
pump, push, quick, quid, quit, sail, same, save, sight, size, stay, study, stuff,
suffer, sway, tail, twin, wage, wake, wall, warn, wave, weak, wear, whip,
wife, will, wind, wine, wing, wipe, wise, wish, with, wood, wound, year

Goal States amid hook sail … year

amid, baby, back, bake, bike, book, bump, pump, push, quick, quid, quit, sail, same, save,
burn, cave, chip, cook, damp, duck, dump, sight, size, stay, study, stuff, suffer, sway, tail,
Good Question fade, good, have, high, hook, jazz, jump, kick,
maid, many, mind, monk, much, must, paid,
twin, wage, wake, wall, warn, wave, weak,
wear, whip, wife, will, wind, wine, wing, wipe,
pain, park, pick, pine, pipe, pond, pony, wise, wish, with, wood, wound, year

baby, back, bake, bike, book, bump, burn, cave, chip, cook, damp, duck,
dump, fade, good, have, high, hook, jazz, jump, kick, maid, many, mind,

Bad Question amid monk, much, must, paid, pain, park, pick, pine, pipe, pond, pony, pump,
push, quick, quid, quit, sail, same, save, sight, size, stay, study, stuff, suffer,
sway, tail, twin, wage, wake, wall, warn, wave, weak, wear, whip, wife, will,
wind, wine, wing, wipe, wise, wish, with, wood, wound, year
Uncertainty Reduction – Classification
I can see that we wish to go from uncertain
start state to goal states where we are
certain about a prediction – but how we
define a good question is still a big vague

Start State

Goal States …

Good Question

Bad Question
Notions of entropy exist for
Entropy is a measure of Uncertainty real-valued cases as well but
they involve probability
density functions so skipping
If we have a set of words, then that set has an entropy of for now

Larger sets have larger entropy and a set with a single word has entropy
Makes sense since we have no uncertainty if only a single word is possible
More generally, if there is a set of elements of types with elements
of type , then its entropy is defined as

where is proportion of elements of type (or class ) in multiclass cases

The earlier example is a special case where each word is its own “type”
i.e., there are “types” with for all
A pure set e.g., has entropy whereas a set with same number of elements of
each class i.e., has entropy
What is a good question?
No single criterion – depends on the application
ID3 (iterative dichotomizer 3 by Ross Quinlan) suggests that a good
question is one that reduces confusion the most i.e., reduces entropy the
most
Suppose asking a question splits a set into subsets
Note that if and
Let us denote – note that
Then the entropy of this collection of sets is defined to be

Can interpret this as “average” or “weighted” entropy since a fraction of

points will land up in the set where the entropy is
A good question for Hangman
The definition of entropy/information and why these were named as such is a bit unclear, but these
have several interesting properties e.g., if our first question halves the set of words (1 bit of info) and
the next question further quarters the remaining set (2 bits of info), then we have 3 bits of info and
our set size has gone down by a power of i.e., information defined this way can be added up!
Suppose a question splits a set of 4096 words into (2048, 2048)
Old entropy was
New entropy is
Yup! In fact, there is a mathematical proof that the definition of entropy we used is the only
Entropy reduced by so we say, we gained bit of information
definition that satisfies 3 intuitive requitements. Suppose an event occurs with probability and we
Suppose a question splits the set into (1024, 1024, 1024, 1024)
wish to measure the information from that event’s occurrence say s.t.
1. A sure event conveys no information i.e.,
New2.entropy is
The more common the event, the less information it conveys i.e., if
3. The information conveyed by two independent events adds up i.e.,
Gained bits of information – makes sense – each set is smaller
Suppose a question splits the set into (16, 64, 4016)
New entropy is definition of that satisfies all three requirements is for some base . We then
I see … the only
define entropy as . If we choose base we get information in “bits” (binary digits). If we choose
We gained
base we get-information
bits of ininformation  aka nats. If we choose base we get information
“nits” (natural digits)
in “dits” (decimal digits) aka hartleys
The ID3 Algorithm
Given a test data point, we go down the tree using
the splitting criteria till we reach a leaf where we
use the leaf action to make our prediction

With as set of all train points, create a root node and call train()
Train(node , set )
If is sufficiently pure or sufficiently small, make a leaf, decide a simple leaf
action (e.g., most popular class, label popularity vector, etc.) and return
Else, out of available choices, choose the splitting criteria (e.g. a single
feature) that causes maximum information gain i.e., reduces entropy the most
Split along that criteria to get partition of (e.g. if that feature takes distinct
values)
Create child nodes and call train()
There are several augmentations to this algorithm e.g. C4.5, C5.0 that
allow handing real-valued features, missing features, boosting etc
Note: ID3 will not ensure a balanced tree but usually balance is decent
Careful use of DTs
DTs can be tweaked to give very high
training accuracies
Can badly overfit to training data if
grown too large
Choice of decision stumps is critical
PUF problem: a single linear model
works
DT will struggle and eventually overfit if
we insist that questions used to split the
DT nodes use a single feature
However, if we allow node questions to
be a general linear model, root node
itself can purify the data completely 

Application 2025 PDF
No ratings yet
Application 2025 PDF
2 pages
Screening Disclosure and Authorization
No ratings yet
Screening Disclosure and Authorization
2 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Tree Classifier Project
100% (1)
Decision Tree Classifier Project
20 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Unit II
No ratings yet
Unit II
34 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
ML Chap 3
No ratings yet
ML Chap 3
52 pages
Week03 Classification
No ratings yet
Week03 Classification
22 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
22.InfoTheory DecisionTrees Short
No ratings yet
22.InfoTheory DecisionTrees Short
25 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Jdavis Indlearn2
No ratings yet
Jdavis Indlearn2
91 pages
Trees
No ratings yet
Trees
78 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
4.decision Tree
No ratings yet
4.decision Tree
39 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Java Programming: Algorithms and Structures
From Everand
Java Programming: Algorithms and Structures
Tanushri Kaniyar
No ratings yet
Ds 7
No ratings yet
Ds 7
16 pages
Ds 8
No ratings yet
Ds 8
10 pages
Ds 5
No ratings yet
Ds 5
12 pages
LevelManager Cs
No ratings yet
LevelManager Cs
2 pages
Aspekte Neu b2 LB Kapitelwortschatz
No ratings yet
Aspekte Neu b2 LB Kapitelwortschatz
2 pages
TSO C6e
No ratings yet
TSO C6e
6 pages
(IT) Information Technology Certifications - CompTIA IT Certifications
No ratings yet
(IT) Information Technology Certifications - CompTIA IT Certifications
3 pages
Hall Ticket
No ratings yet
Hall Ticket
3 pages
Aa Questioned Document Chapter 1 Reporting 2-1
No ratings yet
Aa Questioned Document Chapter 1 Reporting 2-1
10 pages
Renewal Ece One Year Cf4105
No ratings yet
Renewal Ece One Year Cf4105
4 pages
AWS Certified Solutions Architect Professional Exam Guide
No ratings yet
AWS Certified Solutions Architect Professional Exam Guide
3 pages
List of EPAS Acronyms and Definitions - For Publication
No ratings yet
List of EPAS Acronyms and Definitions - For Publication
15 pages
Arcitura Cloud AI Architect Training & Certification Guide
No ratings yet
Arcitura Cloud AI Architect Training & Certification Guide
23 pages
T.014 Integrity-Program-Part-1 V3.0 Transcript
No ratings yet
T.014 Integrity-Program-Part-1 V3.0 Transcript
7 pages
Irqb Guideline 6 Special Processes Rev01
No ratings yet
Irqb Guideline 6 Special Processes Rev01
22 pages
Certified Computer Application Accounting and Publishing Assistant
No ratings yet
Certified Computer Application Accounting and Publishing Assistant
11 pages
AC 21-11 Eligibility, Quality and Identification
No ratings yet
AC 21-11 Eligibility, Quality and Identification
20 pages
Privasec Deck
No ratings yet
Privasec Deck
7 pages
Setting Up An Azure SQL Managed Instance
No ratings yet
Setting Up An Azure SQL Managed Instance
14 pages
Ibps Clerk Mains Admit Card
No ratings yet
Ibps Clerk Mains Admit Card
1 page
Fitness Certificate
No ratings yet
Fitness Certificate
1 page
Building Certificate: Thiruvananthapuram Municipal Corporation
No ratings yet
Building Certificate: Thiruvananthapuram Municipal Corporation
1 page
Electronics/Instrumentation/Computer/IT: Medical UR in - B-1 in
No ratings yet
Electronics/Instrumentation/Computer/IT: Medical UR in - B-1 in
5 pages
Authentication and Authorization Interview Question
No ratings yet
Authentication and Authorization Interview Question
1 page
Guidance On Safety of ECG Device - China Regulation
No ratings yet
Guidance On Safety of ECG Device - China Regulation
16 pages
BTU Training Brochure
No ratings yet
BTU Training Brochure
6 pages
Authority To Print (ATP)
No ratings yet
Authority To Print (ATP)
2 pages
Youyang Airport Lighting Equipment Inc #738-2 Kwanyang-Dong Dongan-Ku, Anyang-Si Kyunggi-Do, Korea
No ratings yet
Youyang Airport Lighting Equipment Inc #738-2 Kwanyang-Dong Dongan-Ku, Anyang-Si Kyunggi-Do, Korea
1 page
Huawei Competition Call
No ratings yet
Huawei Competition Call
2 pages
Iso-18436-4-2008 - Field Lubricant Analysis
No ratings yet
Iso-18436-4-2008 - Field Lubricant Analysis
8 pages
Typography Record
No ratings yet
Typography Record
35 pages
Assignment-1: Student Name: K. Sai Charan
No ratings yet
Assignment-1: Student Name: K. Sai Charan
12 pages

Ds 6

Uploaded by

Ds 6

Uploaded by

Decision Trees

Imbalanced DTs may

How to send data points to children?

When to stop splitting and make the node a leaf?

How many trees to train?

Notice, splitting a node is a

Can we use any Oh! So we are using a

Left Right Left Right

Search purest horizontal split by going

Similarly, very rare letters are not very good either –

Goal States amid hook sail … year

where is proportion of elements of type (or class ) in multiclass cases

Can interpret this as “average” or “weighted” entropy since a fraction of

You might also like