0% found this document useful (0 votes)

13 views18 pages

Lecture 3

Uploaded by

bhavesh agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views18 pages

Lecture 3

Uploaded by

bhavesh agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

MACHINE LEARNING (CS 403/603)

Decision tree; Overfitting and underfitting; and

Holdout Techniques

Dr. Puneet Gupta

Introduction
Decision Tree (DT)
● A sequence of tests.
● Representation very natural for humans.
● Style of many “How to” manuals and
trouble-shooting procedures.
● Example: Find a number between 1 and
100 by asking questions.
● DT learning is about learning such a tree
from labeled training data, i.e.,
supervised learning.
● Also known as learning by asking
questions.

Image source: New York Times April 16, 2008

Introduction
DT2 DT1

Aim: Given a person, find

out its class ● First node is called root node, internal nodes are some
attribute, edges are the values of attributes; and external
What are inputs? nodes are the outcome of prediction.
Gender = {Male(M), ● There can be multiple DT.
Female (F)} and Height = ● Which one will you prefer?
{1.5, …, 2.5} ● Optimal DT: Finding an optimal DT is an NP-complete
What are outputs? problem. Hence, DT algorithms use heuristic approaches
Class = {Short (S), like they follow a greedy, top-down recursive divide-and-
Medium (M), Tall (T)} conquer strategy to create DT.
Introduction

great Food
yuck
mediocre
Speedy no
no no
yes
yes Price Our data
adequate high
yes no

How to perform testing?

Computational complexity: DT are computationally

inexpensive. If a DT is known, classifying a test record
is extremely fast, with a worst-case time complexity of
O(d), where d is the maximum depth of the tree.
Top down Greedy DT: Algorithm
Algorithm
● For a successor, select the best attribute.
● Create descendent node using that attribute; create edges; and
partition examples accordingly.
● Repeat above steps for each successor node until all the examples
are classified correctly or there are no attributes left.
Which is the best attribute?

● A statistical property called information gain, measures how well a given

attribute separates the training examples
● Information gain uses the notion of entropy.
● Information gain = expected reduction of entropy
Entropy
Entropy measures the amount of information in a random variable.

What is the entropy of rolling a dice with 8 equiproble face?

Ans: 3
In our case, entropy measures the impurity of a collection of examples. It
depends from the distribution of the random variable p.
Assume: p+ and p– the proportion of positive and negative examples in S,
i.e., training set

● Entropy (S) ≡ – p+ log2 p+ – p–log2 p– [0 log20 = 0]

● Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
● Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
● Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) = 1/2
+ 1/2 = 1 [log21/2 = – 1]
● Note: the log of a number < 1 is negative, 0 ≤ p ≤ 1,
0 ≤ entropy ≤ 1
Information Gain
● Information gain is the
expected reduction in
entropy caused by
partitioning the
examples on an
attribute. Alternatively,
it is the difference in
the entropy before and
after the split.
● The higher the
information gain the
more effective the
attribute is, in
classifying training
data.
● Entropies obtained
after splitting are
normalized.
Top down Greedy DT: Algorithm
Step 1
● Which attribute should be tested at the root?
Gain(S, Outlook) = 0.246; Gain(S, Humidity) = 0.151; Gain(S,
Wind) = 0.084; and Gain(S, Temperature) = 0.029
● Outlook provides the best prediction hence,
partition using Outlook
Step 2
● Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3.5 × 0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5 × 0.0 − 2/5 × 1.0 − 1/5 × 0.0 = 0.570
● Humidity provides the best prediction.
Step 3
● Do nothing as every node is leaf node

{D1, D2, {D9, {D4, D5, {D6, D14}

D8} D11} D10} No
No Yes Yes
Top down Greedy DT: Error
New training sample
<Outlook=Sunny, Temp=Hot,
Humidity=Normal, Wind=Strong,
PlayTennis=No>

New noisy example causes

splitting of second leaf node.

Reasons of noise:
● Some values of attributes are incorrect because of
errors in the data acquisition process or the
preprocessing phase
● The classification is wrong because of some error
Overfitting reason 1: When a model gets
DT on two variables is trained with so much of data, it starts learning
distorted by noise point from the noise and inaccurate data entries.
Overfitting
Overfitting reason 2: If there are large number of attributes, ML algorithms may find
meaningless regularity in the data that is irrelevant to the true, important, distinguishing
features. It is due to lack of data points. e.g.,
1) predicting rain whether you go out or not is irrelevant.
2) predict the roll of a die using day of the week and color of the die
Overfitting means fitting the training set “too well” on the performance on the
test set degrades.
Underfitting refers to a model that can neither model the training data nor
generalize to new data.
● Model will keep on learning and thus
the error for the model on the training
and testing data will keep on
decreasing.
● If learning goes too long, overfitting
starts due to noise and less relevant
attributes. Hence the performance of
the model on test set decreases.
● For good model, we will stop at a
point just before where the error
starts increasing, i.e., the point where
the modelperforms well on training
and unseen testing dataset.
Mitigating Overfitting by Holdout
Techniques
Training set is used for creating the model and test set is used for
estimating model performance and it should not be used for training
1) Naive Approach Total Examples
● Use full training set.
● Suffer from overfitting. Training set Test set
2) Holdout method
● Small dataset size
● Since it is a single train-and- Total Examples
test experiment, it will be
misleading if we choose
Training set Validation set Test set
“unfortunate” split
Model Selection
Try different values of K in K-NN or tree depths in DT and look at the performance
on the validation set. Select the value that gives best accuracy on validation set
The limitations of the holdout can be handled with a family of re-sampling methods at
the expense of higher computational cost:
● Cross Validation using random Subsampling, K-Fold Cross-Validation, Leave-one-
out Cross-Validation (LOOCV)
● Bootstraping
Cross-validation
1) Random Subsampling 2) K-Fold Cross-Validation 3) Leave-one-
● Performs K data splits of full training set. ● Create a K-fold partition out Cross-
● Each data split randomly selects a of the the dataset Validation
(fixed) number of examples without ● For each of K (LOOCV)
replacement: validation set. experiments, use K-1 ● Use K=1 in
● For each data split i, we retrain classifier folds for training and the K-Fold cross
with the remaining examples and then remaining fold for testing. validation
estimate Ei on validation set. ● It is better than random ● Highly time
● True error estimate is obtained as the subsampling as it uses all expensive
average of all the estimates the examples in the
dataset.

EXP1 V V V V E1
EXP2 V V V V E2
Eaverage
E3
EXP3 V V V V :
: :
:

EXP1 Validation set E1

EXP2 E2
Validation set Eaverage
E3
EXP3 Validation set
Bootstraping
Step 1: From a dataset
with N examples:
1) Randomly select (with
Training dataset replacement) N examples
and use this set for training
2) The remaining examples
that were not selected for
training are used for
testing. This value is likely
to change from fold to fold.
Step 2: Calculate error.
Step 3: Repeat step 1 and
2 for a specified number of
folds, K.
Step 4: As before, the true
error is estimated as the
average error rate on test
examples
Mitigating Overfitting by Holdout
Model Selection
Techniques

Test Set
ML Algo. Error or Error1
Training Set

Select model with minimum error,

Model 1 average error
1

Error2
Validation Set

ML Algo. Error or
Model 2 average error
2

Model s
Selected Final
Error
: : : model
: : :
: : :
: : :
ML Algo. Error or Errorq
Model q average error
q

Different ML Algorithms are designed by varying hyperparameters

Summary: DT
Advantages
● Simple and each to interpret
● Do not make any assumption about data distribution of data
● Easily handle different types of features (real, categorical, etc.)
● Very fast at test time

Disadvantages
● Learning the optimal DT is NP-Complete. The existing algorithms
are heuristics, like the one we discussed.
● Can be complex if pruning is avoided

Questions left?
● Continuous valued attributes, ordinal attributes and so on...
● Alternative measures for selecting attributes
● Multi-value split or binary split
● How to perform Regression?: Use variance instead of entropy
Summary: Understanding of Overfitting
and underfitting
● Overfitting: Good performance on training data, poor generliazation to
other data.
● Underfitting: Poor performance on training data and poor generalization
to other data.

How to avoid Overfitting in ML?

● Early Stopping: Its rules provide us the guidance as to how many
iterations can be run before learner begins to over-fit. In case of DT, if
Gain of the best attribute at a node is below a threshold, stop and make
this node a leaf node. Alternatively, one can stop in DT if number of
instances is less than some user-specified threshold
● Pruning: Pruning is extensively used while building related models. It
simply removes the nodes which add little predictive power for the
problem in hand. Cross-validation.
● Regularization: It introduces a cost term for bringing in more features
with the objective function. Hence it tries to push the coefficients for
many variables to zero and hence reduce cost term. (Will see later...)
Summary: Holdout Techniques
Procedure outline for holdout techniques:
1)Divide full data into training, validation and test set
2)Train the model using the training set
3)Evaluate the model using the validation set
4)Repeat steps 2 through 4 using different architectures and
training parameters
5)Select the best model using evaluation result
6)Assess this final model using the test set

Note
● If cross-Validation or Bootstrap are used, steps 3 and 4

have to be repeated for each of the K folds

● After assessing the final model on the test set, YOU MUST

NOT tune the model any further.

References

1) Machine Learning, Tom Mitchell, McGraw-Hill.

2) Pattern Classification by Richard O. Duda, David G.
Stork, Peter E.Hart

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Supplier Approval Procedure
100% (3)
Supplier Approval Procedure
2 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Fall 2022 Midterm Notes PDF
No ratings yet
Fall 2022 Midterm Notes PDF
15 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
RB's ML2 Notes
No ratings yet
RB's ML2 Notes
5 pages
issues in decision trees
No ratings yet
issues in decision trees
22 pages
PPT6-Buss Intel Analytics
No ratings yet
PPT6-Buss Intel Analytics
41 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
ML-chap-3
No ratings yet
ML-chap-3
52 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Overfitting
No ratings yet
Overfitting
7 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
Classification
No ratings yet
Classification
8 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Wa0001.
No ratings yet
Wa0001.
173 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML Mod2
No ratings yet
ML Mod2
5 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Unit 2(P1)
No ratings yet
Unit 2(P1)
15 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
Lec 16,17
No ratings yet
Lec 16,17
90 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
EBUS537 Theme4 Week 5
No ratings yet
EBUS537 Theme4 Week 5
26 pages
Yapay Zeka Ve Makine Öğrenmesi 10
No ratings yet
Yapay Zeka Ve Makine Öğrenmesi 10
34 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
UNIT 2 - Decision Tree - Issues
No ratings yet
UNIT 2 - Decision Tree - Issues
23 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
The Problem of Overfitting
No ratings yet
The Problem of Overfitting
40 pages
Issues in Decision Tree Learning
No ratings yet
Issues in Decision Tree Learning
6 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Basicsof Probability
No ratings yet
Basicsof Probability
8 pages
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
No ratings yet
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
15 pages
Database Management System: Refer Below To Answer The Questions (Q.1 To Q4)
No ratings yet
Database Management System: Refer Below To Answer The Questions (Q.1 To Q4)
6 pages
Microprocessor Technology: Presented by Anshika Porwal Scholar No-222116609
No ratings yet
Microprocessor Technology: Presented by Anshika Porwal Scholar No-222116609
27 pages
Towards A Blockchain Based Fall Prediction Model For Aged Care
No ratings yet
Towards A Blockchain Based Fall Prediction Model For Aged Care
10 pages
Foot2hip: A Deep Neural Network Model For Predicting Lower Limb Kinematics From Foot Measurements
No ratings yet
Foot2hip: A Deep Neural Network Model For Predicting Lower Limb Kinematics From Foot Measurements
11 pages
Integration of Artificial Intelligence, Blockchain, and Wearable Technology For Chronic Disease Management: A New Paradigm in Smart Healthcare
No ratings yet
Integration of Artificial Intelligence, Blockchain, and Wearable Technology For Chronic Disease Management: A New Paradigm in Smart Healthcare
11 pages
Database Management System
No ratings yet
Database Management System
5 pages
Database Management System: Questions (1 - 20) Are Based On The Following 3 Tables
0% (1)
Database Management System: Questions (1 - 20) Are Based On The Following 3 Tables
6 pages
Transactions and Concurrency Control
No ratings yet
Transactions and Concurrency Control
6 pages
Transactions and Concurrency Control
No ratings yet
Transactions and Concurrency Control
7 pages
File Structures & Indexing
No ratings yet
File Structures & Indexing
4 pages
CD Q5 PDF
No ratings yet
CD Q5 PDF
3 pages
Transactions and Concurrency Control
100% (1)
Transactions and Concurrency Control
7 pages
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
No ratings yet
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
4 pages
CD Q4 PDF
No ratings yet
CD Q4 PDF
2 pages
CD Q4 PDF
No ratings yet
CD Q4 PDF
2 pages
Lab Maual For Experiments 6 To 10
No ratings yet
Lab Maual For Experiments 6 To 10
19 pages
CD Q5 PDF
No ratings yet
CD Q5 PDF
3 pages
Interview Vocab
No ratings yet
Interview Vocab
4 pages
Compiler Design: 1. The Advantage of Panic Mode of Error Recovery Is That
No ratings yet
Compiler Design: 1. The Advantage of Panic Mode of Error Recovery Is That
4 pages
Structure Warranty Certificate
No ratings yet
Structure Warranty Certificate
1 page
Solving Money Problems Involving Percentages
No ratings yet
Solving Money Problems Involving Percentages
72 pages
DeepASL - Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation4
No ratings yet
DeepASL - Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation4
13 pages
Chapter 2 The Basics of Supply and Demand Part2Summer2021
No ratings yet
Chapter 2 The Basics of Supply and Demand Part2Summer2021
6 pages
FPA Interview GuideBook - Thinking Bridge - Final
No ratings yet
FPA Interview GuideBook - Thinking Bridge - Final
15 pages
Renewable Energies
No ratings yet
Renewable Energies
17 pages
Irex document
No ratings yet
Irex document
3 pages
Sayantan Kumar Kar - Interim Report
No ratings yet
Sayantan Kumar Kar - Interim Report
15 pages
8 20 Up-Keeping The Area of Construction Site at Kapp-& Plant Site For The Year 8 20
No ratings yet
8 20 Up-Keeping The Area of Construction Site at Kapp-& Plant Site For The Year 8 20
68 pages
Import Substitution and Export Promotion As Development Strategies
No ratings yet
Import Substitution and Export Promotion As Development Strategies
11 pages
Problems On DC Motor Drives
No ratings yet
Problems On DC Motor Drives
2 pages
Molecules 26 00666 v4
No ratings yet
Molecules 26 00666 v4
24 pages
Pritam Singh Vs The State On 5 May, 1950
No ratings yet
Pritam Singh Vs The State On 5 May, 1950
2 pages
Control Systems (CS) : Lecture-17 Routh-Herwitz Stability Criterion
No ratings yet
Control Systems (CS) : Lecture-17 Routh-Herwitz Stability Criterion
18 pages
INS. BILL
No ratings yet
INS. BILL
1 page
SCTC Co Profile New
No ratings yet
SCTC Co Profile New
31 pages
Commercial Registration Data: Issue Date: Page No: 1
No ratings yet
Commercial Registration Data: Issue Date: Page No: 1
2 pages
Hotel Sample Accommodation Form
No ratings yet
Hotel Sample Accommodation Form
1 page
List of Commonly Encountered Petroleum and Petroleum Products
No ratings yet
List of Commonly Encountered Petroleum and Petroleum Products
6 pages
Business Cycle Shilpa
No ratings yet
Business Cycle Shilpa
9 pages
Non Comprehensive AMC Quote For SJP Supplied Ultrasonic Welding SPM - MR - Senthil (Groupo Antolin India PVT - LTD.) - 12.07
No ratings yet
Non Comprehensive AMC Quote For SJP Supplied Ultrasonic Welding SPM - MR - Senthil (Groupo Antolin India PVT - LTD.) - 12.07
2 pages
Chapter 1&2 Test Explanation
No ratings yet
Chapter 1&2 Test Explanation
4 pages
2006fileaveo MT
100% (1)
2006fileaveo MT
63 pages
downey-alfonso-2023-the-impact-of-patient-suicide-on-clinicians
No ratings yet
downey-alfonso-2023-the-impact-of-patient-suicide-on-clinicians
5 pages
Please Read Chapters 5, 6 and 7 of Your Vaccine Text For Next Wednesday's Chapters 9, 17 and 8 For Next Friday's Lectures
No ratings yet
Please Read Chapters 5, 6 and 7 of Your Vaccine Text For Next Wednesday's Chapters 9, 17 and 8 For Next Friday's Lectures
42 pages
C S Retail Pricelist
No ratings yet
C S Retail Pricelist
30 pages
Heidenhain Nd221 B
No ratings yet
Heidenhain Nd221 B
34 pages
Wind Load by A J Ju
No ratings yet
Wind Load by A J Ju
11 pages
FPD 6 Icr G9 RSLT 2 AB
No ratings yet
FPD 6 Icr G9 RSLT 2 AB
8 pages