0% found this document useful (0 votes)

18 views

Decision Tree in ML

The document explains the concept of Decision Trees (DT) as a hierarchical structure used for making predictions based on rules derived from training data. It covers the process of constructing DTs, including how to split nodes, assess purity using metrics like entropy and information gain, and strategies to avoid overfitting. Key strengths and weaknesses of DTs are discussed, along with hyperparameters that can be tuned to optimize their performance.

Uploaded by

muskaanbhayana9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Decision Tree in ML

Uploaded by

muskaanbhayana9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Learning using Decision Trees

2
Decision Trees
 A Decision Tree (DT) defines a hierarchy of rules to make a
prediction Root Node
Body
Warm temp. Cold

An Internal Node Non-mammal

A Leaf Node

Gives No
Yes
birth

Mammal Non-mammal

 Root and internal nodes test rules. Leaf nodes make

predictions
3
A decision tree friendly problem
Loan approval prediction
4
Learning Decision Trees with Supervision
 The basic idea is very simple

 Recursively partition the training data into homogeneous

regions
What do you Even though the rule
mean by within each group is
“homogeneou simple, we are able to
s” regions? learn a fairly
A homogeneous sophisticated model
overall (note in this
region will have all
example, each rule is
(or a majority of) a simple
training inputs with horizontal/vertical
the same/similar classifier but the
outputs overall decision
boundary is rather
sophisticated)

 Within each group, fit a simple supervised learner (e.g., predict

5
Decision Trees for Classification
5
NO YES
𝑥1 >3.5 ?
4 Test input

NO 𝑥 2> 2?
YES NO 𝑥 2> 3 ?
YES
3

2
Predict Predict Predict Predict
1 Red Green Green Red

1 2 3 4 5 6
Remember: Root node
Feature 1 ( contains all training
DT is very efficient at test time: To inputs
predict the label of a test point, Each leaf node receives
nearest neighbors will require a subset of training
computing distances from 48 training inputs
inputs. DT predicts the label by doing
just 2 feature-value comparisons! Way
6
Decision Trees for Classification: Another Example
 Deciding whether to play or not to play Tennis on a Saturday
 Each input (Saturday) has 4 categorical features: Outlook,
Temp., Humidity, Wind
 A binary classification problem (play vs no-play)
 Below Left: Training data, Below Right: A decision tree
constructed using this data

Example credit: Tom Mitchell

Usually, cross-
7
Decision Trees: Some Considerations validation can be
used to decide
size/shape
 What should be the size/shape of the DT?
 Number of internal and leaf nodes
 Branching factor of internal nodes
 Depth of the tree

 Split criterion at internal nodes

 Use another classifier?
 Or maybe by doing a simpler test?
Usually, constant
 What to do at the leaf node? Some options: prediction at leaf
 Make a constant prediction for each test input reaching
nodes usedthere
since it
will be very
 Use a nearest neighbor based prediction using training fast at that
inputs
leaf node
 Train and predict using some other sophisticated supervised learner
on that node
8
How to Split at Internal Nodes?
 Recall that each internal node receives a subset of all the
training inputs
 Regardless of the criterion, the split should result in as “pure”
groups as possible
 A pure group means that the majority of the inputs have the same
label/output

 For classification problems (discrete outputs), entropy is a

9
Techniques to Split at Internal Nodes
 Each internal node decides which outgoing branch an input
should be sent to
 This decision/split can be done using various ways, e.g.,
 Testing
With this approach, the
all value of a single feature at a time (such internal node
DT methods
featurescalled based on
“Decision Stump”)
and all possible
testing a single
values of each feature
need to See here infor more details
be evaluated feature at each
selecting the feature to be internal node
tested at each internal are faster and
node more popular
(can be slow but can be (e.g., ID3, C4.5
made faster using some algos)
tricks)
DT methods based on
learning and using a
separate classifier at each
 Learning a classifier (e.g., LwP or some more sophisticated
internal node are lessclassifier)
common. But this approach
can be very powerful and
are sometimes used in
Given some training
10
Constructing Decision Trees data, what’s the
“optimal” DT?
How to decide which
5 NO
𝑥1 >3.5 ?
YES
rules to test for and in
4 what order?
NO
𝑥 2> 2?
YES NO
𝑥 2> 3 ?
YES How to assess informativeness of
3 a rule?
Feature 2 (

2
Predict
Red
Predict
Green
Predict
Green
Predict
Red In general, constructing
1
DT is an intractable
1 2 3 4 5 6 The rules are organized problem (NP-hard)
Feature 1 (
Often we can use some
Hmm.. So DTs are in the DT such that “greedy” heuristics to
like the “20 most informative rules construct a “good” DT
questions” game are tested first To do so, we use the training data to
(ask the most Informativeness of a rule is of figure out which rules should be
useful questions related to the extent of the tested at each node
first) purity of the split arising due The same rules will be applied on the
to that rule. More informative test inputs to route them along the
rules yield more pure splits tree until they reach some leaf node
where the prediction is made
11
Decision Tree Construction: An Example
 Let’s consider the playing Tennis example
 Assume each internal node will test the value of one of the
features

 Question: Why does it make more sense to test the feature

“outlook” first?
 Answer: Of all the 4 features, it’s the most informative
12
Entropy and Information Gain
 Assume a set of labelled inputs from C classes, as fraction
Uniform sets (all
of class c inputs classes roughly
 Entropy of the set is defined as equally present) have
high entropy; skewed
 Suppose a rule splits into two smaller disjoint setssetsand
low

 Reduction in entropy after the split is called information gain

This split has a low IG

This split has
(in fact zero IG)
higher IG
13
Entropy and Information Gain
 Let’s use IG based criterion to construct a DT for the Tennis example
 At root node, let’s compute IG of each of the 4 features
 Consider feature “wind”. Root contains all examples S = [9+,5-]
H ( S ) = −(9/14) log 2(9/14) − (5/14) log 2(5/14) = 0.94
Sweak = [6+, 2−] ⇒ H(Sweak ) = 0.811
Sstrong = [3+, 3−] ⇒ H(Sstrong) = 1
= 0.94 − 8/14 ∗ 0.811 − 6/14 ∗ 1 = 0.048

 Likewise, at root: IG(S, outlook) = 0.246, IG(S, humidity) = 0.151,

IG(S,temp) = 0.029
 Thus we choose “outlook” feature to be tested at the root node
 Now how to grow the DT, i.e., what to do at the next level? Which feature
to test next?
14
Growing the tree

 Proceeding as before, for level 2, left node, we can verify that

 IG(S,temp) = 0.570, IG(S, humidity) = 0.970, IG(S, wind) = 0.019
 Thus humidity chosen as the feature to be tested at level 2, left node
 No need to expand the middle node (already “pure” - all “yes” training
examples )
 Can also verify that wind has the largest IG for the right node
 Note: If a feature has already been tested along a path earlier, we don’t
15
When to stop growing the tree?

 Stop expanding a node further (i.e., make it a leaf node) when

 It consist of all training examples having the same label (the node
becomes “pure”)
 We run out of features to test along the path to that node
 The DT starts to overfit (canTobehelp prevent the
checked by monitoring
tree from growing
the validation set accuracy)too much!
 Important: No need to obsess too much for purity OR
 It is okay to have a leaf node that is not fully pure, e.g., this
 At test inputs that reach an impure leaf, can predict probability of
belonging to each class (in above example, p(red) = 3/8, p(green) =
16
Avoiding Overfitting in DTs
 Desired: a DT that is not too big in size, yet fits the training data
reasonably
 Note: An example of a very simple DT is “decision-stump”
 A decision-stump only tests the value of a single feature (or a simple
rule)
 Not very powerful in itself but often used Either
in large
canensembles
be of
decision stumps done using a
 Mainly two approaches to prune a complex DT validation set
 Prune while building the tree (stopping early)
 Prune after building the tree (post-pruning)
 Criteria for judging which nodes could potentially be pruned
 Use a validation set (separate from the training set)
 Prune each possible node that doesn’t hurt the accuracy on the
validation set

17
Decision Trees: Some Comments
 Gini-index defined as can be an alternative to IG

 For DT regression1, variance in the outputs can be used to assess purity

 When features are real-valued (no finite possible values to try), things are
a bit more tricky
 Can use tests based on thresholding feature values (recall our
synthetic data examples)
 Need to be careful w.r.t. number of threshold points, how fine each
range is, etc.
 More sophisticated decision rules at the internal nodes can also be used
 Basically, need some rule that splits inputs at an internal node into
homogeneous groups
 The rule can even be a machine learning classification algo (e.g., LwP
or aJ. H.;
, Leo; Friedman, deep
Olshen,learner)
R. A.; Stone, C. J. (1984). Classification and regression trees
18
An Illustration: DT with Real-Valued Features
Test example
“Best” (purest “Best”(purest
possible) possible) Vertical
Horizontal Split Split

Up Down Left Right

Between the best
horizontal vs best vertical
split, the vertical split is
better (purer), hence we
use this rule for the
internal node

s illustration’s credit: Purushottam Kar

19
Decision Trees for RegressionCan use any Another simple
regression model option can be to
but would like a predict the
simple one, so let’s average output
use a constant of the training
prediction based inputs in this
regression model region
4 NO YES
𝑥> 4 ?

3
y YES
NO Predict
2 𝑥 2> 3 ?

1
Predict Predict
1 2 3 4 5
𝐱

To predict the output for a test point,

nearest neighbors will require computing
distances from 15 training inputs. DT
predicts the label by doing just at most
feature-value comparisons! Way faster!
20
Decision Trees: ..AthusSummary
helping us learn
complex rule as a combination
Some key strengths: of several simpler rules
 Simple and easy to interpret
5
 Nice example of “divide and conquer” NO
𝑥1 >3.5 ?
YES

paradigm in machine learning 4

NO YES NO YES
 Easily handle different types of3
𝑥 2> 2? 𝑥 2> 3 ?

Feature 2 (
features (real, categorical, etc.)
2
 Very fast at test time Predict
Red
Predict
Green
Predict
Green
Predict
Red
1
 Multiple DTs can be combined
1 2 3 4 5 6 Human-body
via ensemble methods: more powerful Feature 1 ( pose
(e.g., Decision Forests; will see later) estimation
 Used in several real-world ML applications, e.g., recommender
systems, gaming (Kinect)
Some key weaknesses:
 Learning optimal DT is (NP-hard) intractable. Existing algos mostly
Key Hyperparameters
1. criterion
•What it does: Chooses the function to measure the quality of a split.
•Options:
• "gini" (default) — Uses Gini Impurity.
• "entropy" — Uses Entropy (from Information Gain).
•Impact: Affects how the tree decides where to split.
2. max_depth
•What it does: Sets the maximum depth of the tree.
•Impact: Controls overfitting (deep trees may overfit) and underfitting (shallow trees
may underfit).
3. min_samples_split
•What it does: Minimum number of samples required to split an internal node.
•Default: 2
•Impact: Higher values = less splits = simpler tree.
4. min_samples_leaf
from sklearn.tree import DecisionTreeClassifier clf = •What it does: Minimum number of samples that must be in a leaf node.
DecisionTreeClassifier •Default: 1
•Impact: Bigger values = more pruning = prevents very small leaves.
( criterion="gini", 5. max_features
max_depth=5, •What it does: Number of features to consider when looking for the best split.
•Options:
min_samples_split=4, • Integer (exact count)
• Float (proportion of total features)
min_samples_leaf=2, • "sqrt" (square root of total features — common in Random Forests)
max_features="sqrt" ) •
•
"log2"
None (use all features)
6. max_leaf_nodes
•What it does: Maximum number of leaf nodes.
•Impact: Limits tree growth and simplifies model.
7. splitter
•What it does: Chooses strategy to split nodes.
•Options:
• "best" — Chooses the best split.
• "random" — Chooses a random split among the best candidates.
8. class_weight
•What it does: Weights assigned to each class (for handling imbalance).
•Options:
• None — All classes treated equally.
• balanced — Weights are inversely proportional to class frequencies.

INSEAD - Case Book - 2020
100% (1)
INSEAD - Case Book - 2020
69 pages
MT Cswip
100% (4)
MT Cswip
186 pages
08 09 10 Cross Validation and Decision Trees
No ratings yet
08 09 10 Cross Validation and Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Lecture 4
No ratings yet
Lecture 4
19 pages
Lecture 4
No ratings yet
Lecture 4
19 pages
ml unit3
No ratings yet
ml unit3
8 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Unit 3
No ratings yet
ML Unit 3
30 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
decisiontrees (1)
No ratings yet
decisiontrees (1)
28 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
m3
No ratings yet
m3
141 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Decision Tree
100% (1)
Decision Tree
57 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023
No ratings yet
21-Data Clustering (K-Means Clustering Algorithm), Predictive Analytics-11!04!2023
41 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
UNIT-3 ML notes
No ratings yet
UNIT-3 ML notes
4 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
Unit IV
No ratings yet
Unit IV
36 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
3 Dtrees-Lect6
No ratings yet
3 Dtrees-Lect6
63 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
decision tree
No ratings yet
decision tree
13 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Finderscult Fbi1
No ratings yet
Finderscult Fbi1
374 pages
Data Modeling Vs Database Design
100% (1)
Data Modeling Vs Database Design
12 pages
Instrumentation Engineering Documents
No ratings yet
Instrumentation Engineering Documents
11 pages
Docebo
No ratings yet
Docebo
24 pages
Estimation
No ratings yet
Estimation
3 pages
Market Structures and Pricing Decisions
No ratings yet
Market Structures and Pricing Decisions
24 pages
IMRAD-2
No ratings yet
IMRAD-2
9 pages
4_5816452880718111596
No ratings yet
4_5816452880718111596
73 pages
test1-helpful-handout-1
No ratings yet
test1-helpful-handout-1
19 pages
TTF - SMC Workbook 07122023
No ratings yet
TTF - SMC Workbook 07122023
42 pages
s42107-023-00761-8
No ratings yet
s42107-023-00761-8
8 pages
Howdoiknowifiamagood Change Agent?: by Melanie Franklin, Director, Agile Change Management Limited
No ratings yet
Howdoiknowifiamagood Change Agent?: by Melanie Franklin, Director, Agile Change Management Limited
6 pages
Cs 3381 Oop Lab Manual-1
No ratings yet
Cs 3381 Oop Lab Manual-1
41 pages
"Workers' Participation in Management in India": Chanakya National Law University
No ratings yet
"Workers' Participation in Management in India": Chanakya National Law University
33 pages
Contingencies Crisis MGMT - 2002 - Hopkins - Was Three Mile Island A Normal Accident
No ratings yet
Contingencies Crisis MGMT - 2002 - Hopkins - Was Three Mile Island A Normal Accident
8 pages
Lexmark Multifunction Printer (MFP) Installation: Windows XP Windows 7
No ratings yet
Lexmark Multifunction Printer (MFP) Installation: Windows XP Windows 7
8 pages
The Official Guide For GMAT Review, 7th
0% (1)
The Official Guide For GMAT Review, 7th
37 pages
Earnings Quality Score 90: Nike Inc - Balance Sheet 25-Mar-2022 19:25
No ratings yet
Earnings Quality Score 90: Nike Inc - Balance Sheet 25-Mar-2022 19:25
12 pages
List of FRSC Pronouncements
No ratings yet
List of FRSC Pronouncements
10 pages
De La Cruz V Asian Consumer
100% (1)
De La Cruz V Asian Consumer
3 pages
MTO Report (Khubaib)
No ratings yet
MTO Report (Khubaib)
17 pages
BremasErsce Catalogue 2011 - Cam Switches CR Series
No ratings yet
BremasErsce Catalogue 2011 - Cam Switches CR Series
24 pages
Book 1
No ratings yet
Book 1
22 pages
Supply Chain Management Coca Cola Bevragespakistan LTD.: January 8
No ratings yet
Supply Chain Management Coca Cola Bevragespakistan LTD.: January 8
24 pages
Prospectus_v12
No ratings yet
Prospectus_v12
16 pages
Presentation On Option Strategies
No ratings yet
Presentation On Option Strategies
11 pages
Human Impact On The Environment Posters
No ratings yet
Human Impact On The Environment Posters
4 pages
Start, Manage and Grow Your Business: Help You Solve
No ratings yet
Start, Manage and Grow Your Business: Help You Solve
36 pages

Decision Tree in ML

Uploaded by

Decision Tree in ML

Uploaded by

Learning using Decision Trees

An Internal Node Non-mammal

 Root and internal nodes test rules. Leaf nodes make

 Recursively partition the training data into homogeneous

 Within each group, fit a simple supervised learner (e.g., predict

Example credit: Tom Mitchell

 Split criterion at internal nodes

 For classification problems (discrete outputs), entropy is a

 Question: Why does it make more sense to test the feature

 Reduction in entropy after the split is called information gain

This split has a low IG

 Likewise, at root: IG(S, outlook) = 0.246, IG(S, humidity) = 0.151,

 Proceeding as before, for level 2, left node, we can verify that

 Stop expanding a node further (i.e., make it a leaf node) when

 For DT regression1, variance in the outputs can be used to assess purity

Up Down Left Right

s illustration’s credit: Purushottam Kar

To predict the output for a test point,

paradigm in machine learning 4

You might also like