Ds 6
Ds 6
Midsem Exam
Feb 22 (Thu), 6PM, L16,17,18,19,20
Only for registered students (regular + audit)
Assigned seating – will be announced soon
Open notes (handwritten only)
No mobile phones, tablets etc
Bring your institute ID card
If you don’t bring it, you may have to spend precious
time during the exam getting verified separately
Syllabus:
All videos, slides, code linked on the course
discussion page (link below) till 21 Feb 2024 (Wed)
https://www.cse.iitk.ac.in/users/purushot/courses/ml/
2023-24-w/discussion.html
See GitHub for practice questions
Doubt Clearing and Practice Session
Feb 21, 2024 (Wed), 11PM, Online
Exact timing and meeting link TBA
Solve previous years questions
Clear doubts
Building Decision Trees
Root
1 2 3 4 5 6 7 8 9 10 11 12
Yes X < 5.5 No
Internal
Y > 10.5 Y>9 Node
Internal
X < 11.5 Node X < 8.5
Internal
Node
Y > 2.5 Y > 5.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Leaf
X < 12
Leaf
Leaf Leaf
Decision Trees – all shapes and sizes
Thisprune
May DT isthe
balanced –
tree to make The previous DT was very
it more shallow
allasleaf nodes
well. Alsoare at sameto have DT with
possible imbalanced and
depththan
more from2 the root per internal node considered bad in general
children
What to do at a leaf?
Uncertainty
“#” appears in every word.
Guessing # gives us no useful
To win at hangman, we must
ask questions that eliminate
wrong answers quickly
information
4096
2048 2048
1024 1024 1024 1024
1 1 1 1 1 1 1 1 1 1 1 1 1 1 … 1 1
Uncertainty Reduction – Hangman
amid, baby, back, bake, bike, book, bump, burn, cave, chip, cook, damp,
duck, dump, fade, good, have, high, hook, jazz, jump, kick, maid, many,
Start State mind, monk, much, must, paid, pain, park, pick, pine, pipe, pond, pony,
pump, push, quick, quid, quit, sail, same, save, sight, size, stay, study, stuff,
suffer, sway, tail, twin, wage, wake, wall, warn, wave, weak, wear, whip,
wife, will, wind, wine, wing, wipe, wise, wish, with, wood, wound, year
amid, baby, back, bake, bike, book, bump, pump, push, quick, quid, quit, sail, same, save,
burn, cave, chip, cook, damp, duck, dump, sight, size, stay, study, stuff, suffer, sway, tail,
Good Question fade, good, have, high, hook, jazz, jump, kick,
maid, many, mind, monk, much, must, paid,
twin, wage, wake, wall, warn, wave, weak,
wear, whip, wife, will, wind, wine, wing, wipe,
pain, park, pick, pine, pipe, pond, pony, wise, wish, with, wood, wound, year
baby, back, bake, bike, book, bump, burn, cave, chip, cook, damp, duck,
dump, fade, good, have, high, hook, jazz, jump, kick, maid, many, mind,
Bad Question amid monk, much, must, paid, pain, park, pick, pine, pipe, pond, pony, pump,
push, quick, quid, quit, sail, same, save, sight, size, stay, study, stuff, suffer,
sway, tail, twin, wage, wake, wall, warn, wave, weak, wear, whip, wife, will,
wind, wine, wing, wipe, wise, wish, with, wood, wound, year
Uncertainty Reduction – Classification
I can see that we wish to go from uncertain
start state to goal states where we are
certain about a prediction – but how we
define a good question is still a big vague
Start State
Goal States …
Good Question
Bad Question
Notions of entropy exist for
Entropy is a measure of Uncertainty real-valued cases as well but
they involve probability
density functions so skipping
If we have a set of words, then that set has an entropy of for now
Larger sets have larger entropy and a set with a single word has entropy
Makes sense since we have no uncertainty if only a single word is possible
More generally, if there is a set of elements of types with elements
of type , then its entropy is defined as
With as set of all train points, create a root node and call train()
Train(node , set )
If is sufficiently pure or sufficiently small, make a leaf, decide a simple leaf
action (e.g., most popular class, label popularity vector, etc.) and return
Else, out of available choices, choose the splitting criteria (e.g. a single
feature) that causes maximum information gain i.e., reduces entropy the most
Split along that criteria to get partition of (e.g. if that feature takes distinct
values)
Create child nodes and call train()
There are several augmentations to this algorithm e.g. C4.5, C5.0 that
allow handing real-valued features, missing features, boosting etc
Note: ID3 will not ensure a balanced tree but usually balance is decent
Careful use of DTs
DTs can be tweaked to give very high
training accuracies
Can badly overfit to training data if
grown too large
Choice of decision stumps is critical
PUF problem: a single linear model
works
DT will struggle and eventually overfit if
we insist that questions used to split the
DT nodes use a single feature
However, if we allow node questions to
be a general linear model, root node
itself can purify the data completely