0% found this document useful (0 votes)

16 views7 pages

Intro To Data Science Lecture 5

The document outlines key concepts for assessing model accuracy, including the bias-variance tradeoff. It discusses measuring the quality of fit using mean squared error on both training and test data. Highly flexible models may have low bias but high variance, while less flexible models like linear regression may have higher bias but lower variance. The best model balances bias and variance to minimize the test error. The document also introduces Bayes classifiers and K-nearest neighbors methods.

Uploaded by

engmjod.88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Intro To Data Science Lecture 5

Uploaded by

engmjod.88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

9/15/2022

OUTLINE
Introduction to Data Science for  Assessing Model Accuracy
Civil Engineers  Measuring the Quality of Fit
 The Bias-Variance Trade-off
 The Classification Setting
 BayesClassifier
Lecture 3a. Assessing Model Accuracy, Bayes
 K-Nearest Neighbors (KNN)
Classifier, and KNN

Some of the figures in this presentation are taken from "An Introduction to Statistical
Learning, with applications in R" (Springer, 1st Edition, 2013; 2nd Edition, 2021) with
Fall 2022 permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani

1 2

MEASURING MODEL ACCURACY Observed Predicted

 Many models/approaches can be used to estimate the
unknown function 𝑓 in 𝑌 = 𝑓 𝒙 + 𝜀 . How to select? # 𝑋 𝑋 . 𝑋 Y 𝑌 Squared
 We want to select the best one. What is “best”? error
 In regression problems, one common measure of 1 𝑋 𝑋 . 𝑋 Y 𝑌 (𝑌 −𝑌 )
accuracy of the function 𝑓 is the mean squared error 2 𝑋 𝑋 . 𝑋 Y 𝑌 (𝑌 −𝑌 )
(MSE)
. . .
1 . . .
𝑀𝑆𝐸 = (𝑌 − 𝑌 )
𝑛 . . .
n 𝑋 𝑋 . 𝑋 𝑌 𝑌 (𝑌 −𝑌 )
 where 𝑌 = 𝑓 𝒙 is the prediction our approach gives for
the 𝑌 observation in our training data.
1
 Note: this definition of MSE for predictor (𝑓) is different 𝑀𝑆𝐸 = (𝑌 − 𝑌 )
from the definition of MSE for estimator. 𝑛

3 4 1
9/15/2022

SOME CONSIDERATIONS TRAINING MSE VS. TEST MSE

 TheMSE discussed is based on the Training  In general, a more flexible method will fit the
Data, in which the response variable 𝑌 has training data better.
 More flexible methods are less restrictive in the
been observed.
possible shape of 𝑓 as compared to less flexible
methods.
 Whatis more important is how well the model  Less flexible methods, however, are easier to
works on new data, known as Test Data, in interpret.
which the response variable 𝑌 is unknown.  There is a trade-off between model flexibility

and model interpretability.

 Thereis no guarantee that the method with the

 However, the test MSE may in fact be higher for a
smallest training MSE will have the smallest
more flexible method than for a less flexible method
test MSE. like linear regression.

5 6

EXAMPLE 1: HIGHLY NONLINEAR FUNCTION, HIGH EXAMPLE 2: LESS NONLINEAR FUNCTION, HIGH
VARIANCE IN Y
VARIANCE INY

𝑌 =𝑓 𝒙 +𝜀

𝑣𝑎𝑟(𝜀)
𝑣𝑎𝑟(𝜀)

LEFT LEFT RIGHT

RIGHT
Black: True relationship Black: True relationship Red: Test MSE
Red: Test MSE
Orange: Linear regression Orange: Linear regression Grey: Training MSE
Grey: Training MSE
Blue: smoothing spline Blue: smoothing spline Dashed: Minimum possible
Dashed: Minimum possible test
Green: more flexible Green: smoothing spline test MSE (irreducible error)
MSE (irreducible error,𝑣𝑎𝑟(𝜀))
smoothing spline (more flexible)

7 8 2
9/15/2022

EXAMPLE 3: HIGHLY NONLINEAR FUNCTION, LOW BIAS-VARIANCE TRADEOFF

VARIANCE IN Y
 There are always two competing properties
(i.e., bias and variance) of statistical learning
methods that govern the choice of learning
method.

 To reach the minimum test MSE, we need to

𝑣𝑎𝑟(𝜀)
make a tradeoff between bias and variance.

LEFT
Black: Truth RIGHT
Orange: Linear regression Red: Test MSE
Blue: smoothing spline Grey: Training MSE
Green: smoothing spline (more Dashed: Minimum possible
flexible) test MSE (irreducible error)

9 10

BIAS OF LEARNING METHODS VARIANCE OF LEARNING METHODS

 Bias refers to the (model specification) error that is

introduced by modeling a complicated function by a  Variance refers to how much 𝑓 (i.e., the estimate of 𝑓)
much simpler one. would change if it was estimated using a different
 For example, simple linear regression assumes training data set.
that there is a linear relationship between 𝑌 and
𝑋. In reality, the relationship may not be exactly  Generally, the more flexible a method is, the more
linear, so some bias will be present. variance it has.

 The more flexible/complex a method is, the less bias  Can variance be reduced by increasing the size of
it will generally have. training data set?

 Can bias be reduced by increasing the size of

training data set?

11 12 3
9/15/2022

THE TRADE-OFF TEST MSE, BIAS AND VARIANCE

 It can be shown that for any given value of the input
variables, 𝒙 , the expected test MSE for a new value of
the output variable 𝑌 = 𝑌 at 𝒙 will be equal to
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑_𝑇𝑒𝑠𝑡_𝑀𝑆𝐸 = 𝐸(𝑌 − 𝑌 ) = 𝐸(𝑌 − 𝑓 (𝒙 ))
= 𝐵𝑖𝑎𝑠 + 𝑉𝑎𝑟 + 𝜎

where, 𝐵𝑖𝑎𝑠 = 𝐵𝑖𝑎𝑠 𝑌 = 𝐸 𝑌 − 𝐸(𝑌 ),

𝑉𝑎𝑟 = 𝑉𝑎𝑟 𝑌 = 𝐸(𝑌 − 𝐸 𝑌 ) ,
𝜎 = 𝑉𝑎𝑟(𝜀), 𝑌 =𝐸 𝑌 +𝜀
 As a method gets more flexible, the bias will decrease
and the variance will increase but expected test MSE
may increase or decrease.
 The challenge is to find a method for which both the
variance and the squared bias are low (i.e., bias-
variance tradeoff).
13 14

THE CLASSIFICATION SETTING BAYES CLASSIFIER

 For regression problems, the MSE is used to assess  The Bayes classifier: assigns each observation to the
the accuracy of the statistical learning method. most likely class, given its predictor values.
 For classification problems, the error rate may be used
instead:
 That is, assign an observation with predictor vector 𝒙
1
𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒 = 𝐼(Y ≠ 𝑌 ) to a class 𝑗 so that its conditional probability, 𝑃(𝑌
𝑛 = 𝑗|𝑋 = 𝒙 ), is the largest.

 𝐼(Y ≠ 𝑌 ) is an indicator function, which equals 1 if Y  The test error rate of the Bayes classifier is called the
≠ 𝑌 and 0 if Y = 𝑌 . Bayes error rate.

 Thus the error rate represents the fraction of incorrect  In reality, it is impossible to use the Bayes classifier
classifications, or misclassifications. because the conditional probability is unknown.

15 16 4
9/15/2022

BAYES ERROR RATE BAYES OPTIMAL CLASSIFIER

 The Bayes error rate defines the lower bound of test

error rate (i.e., the lowest possible error rate that
could be achieved if we knew the probability
distribution of the data).
 In real life problems the Bayes error rate cannot be
calculated.
 The Bayes error rate is analogous to the irreducible error
discussed earlier.

 On test data, no classifier can get lower error rates

than the Bayes error rate.

The purple dashed line is the Bayes decision boundary,

representing the points where the conditional probability is 50%.

17 18

K-NEAREST NEIGHBORS (KNN) KNN EXAMPLE WITH K = 3

 K-Nearest Neighbors is a flexible approach to estimate Two input variables:

the Bayes Classifier (i.e., estimate the conditional 𝑋 and 𝑋
probability in the Bayes Classifier).
Two categories of the
 For a given test observation 𝑋 = 𝒙 , we find the K out variable: blue and
neighbors closest to 𝒙 in the training data, and orange 𝑋
examine their corresponding 𝑌 .
Question: which
1
𝑃 𝑌=𝑗𝑋=𝒙 ≈ 𝐼(𝑌 = 𝑗) category does the black
𝐾
cross belong to?

 For example, if the majority of the 𝑌 ’s are orange we With K=3, we find three
predict 𝑌 (corresponding to the test observation 𝒙 ) to points closest to x, and 𝑋
be orange otherwise guess blue. check which category
most of the three points
belong to.
 The smaller K is, the more flexible the method will be.
19 20 5
9/15/2022

KNN EXAMPLE WITH K = 3 KNN DISTANCE FUNCTIONS

Black curve:  The distance between two vectors 𝒙 = (𝑥 , 𝑥 , … , 𝑥 )′
KNN decision boundary and 𝒚 = (𝑦 , 𝑦 , … , 𝑦 )′

 For continuous variables

Euclidean distance = ∑ 𝑥 −𝑦

𝑋
Manhattan distance = ∑ |𝑥 − 𝑦 |

 For categorical variable

Hamming distance = 𝐼(𝑥 ≠ 𝑦 )

𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝐼 𝑥 ≠ 𝑦 = 1 𝑖𝑓 𝑥 ≠ 𝑦 ; 0 otherwise.

21 22

K = 1 AND K = 100
SIMULATED DATA: K = 10

Black curve:
KNN decision
boundary

Purple dashed
line:
Bayes decision
boundary
K=1, KNN decision boundary is too flexible (low bias but high variance)
K=100, KNN decision boundary is close to linear (low variance but high bias)

Purple dashed line: Bayes decision boundary

23 24 6
9/15/2022

A FUNDAMENTAL PICTURE
TRAINING VS. TEST ERROR RATES ON THE
SIMULATED DATA  In general, with the increase of model flexibility
 Training errors will always decline.
 Training error
 But test errors will decline at first (due to reduction
rates keep
decreasing as k in bias) but then start to increase (due to increase in
decreases or variance).
equivalently as the
flexibility
increases.

 However, the test

error rate decreases
initially but then
begins to increase.

25 26

Fakulty of Computer and Mathematical Sciences Bachelor of Science (Hons.) Management Mathematics
No ratings yet
Fakulty of Computer and Mathematical Sciences Bachelor of Science (Hons.) Management Mathematics
13 pages
02 Chap02 AssesingModelAccuracy
No ratings yet
02 Chap02 AssesingModelAccuracy
22 pages
ASSESSING MODEL Accuracy PDF
No ratings yet
ASSESSING MODEL Accuracy PDF
22 pages
Bias Variance Trade Off
No ratings yet
Bias Variance Trade Off
14 pages
Week2-Day 1-Introduction To Data Mining
No ratings yet
Week2-Day 1-Introduction To Data Mining
30 pages
1 5 Bias Variance Trade Off
No ratings yet
1 5 Bias Variance Trade Off
34 pages
1 Machine Learning
No ratings yet
1 Machine Learning
111 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Lec 1
No ratings yet
Lec 1
54 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
T04 Soln
No ratings yet
T04 Soln
4 pages
Capitulo 2 Big Data
No ratings yet
Capitulo 2 Big Data
25 pages
ESGB Evaluation Methods
No ratings yet
ESGB Evaluation Methods
84 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
ISL Answers
No ratings yet
ISL Answers
19 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
KNN Bias Variance Classification Metrics
No ratings yet
KNN Bias Variance Classification Metrics
81 pages
Ch2 Statistical Learning
No ratings yet
Ch2 Statistical Learning
51 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
Lec-01-Introduction To Statistical Learning
No ratings yet
Lec-01-Introduction To Statistical Learning
38 pages
Notes 1
No ratings yet
Notes 1
3 pages
AML Winter 2021 Solution
No ratings yet
AML Winter 2021 Solution
6 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Introduction Statistical Learning
No ratings yet
Introduction Statistical Learning
39 pages
4.4 Parametric and Non-Parametric Estimator
No ratings yet
4.4 Parametric and Non-Parametric Estimator
47 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture2
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture2
86 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Book's Solutions
No ratings yet
Book's Solutions
20 pages
Merge
No ratings yet
Merge
240 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
07 - Evaluating Performance
No ratings yet
07 - Evaluating Performance
46 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Bias:Variance Tradeoff
No ratings yet
Bias:Variance Tradeoff
6 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
HW1
No ratings yet
HW1
18 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
Lect 1
No ratings yet
Lect 1
24 pages
Bias and Variance (v2)
No ratings yet
Bias and Variance (v2)
22 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
Week 3
No ratings yet
Week 3
56 pages
Inference For The Generalization Error
No ratings yet
Inference For The Generalization Error
43 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Lecture 02 - KNN and ML Basics
No ratings yet
Lecture 02 - KNN and ML Basics
33 pages
Chapter 19
No ratings yet
Chapter 19
30 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Srda Advance Sampling
No ratings yet
Srda Advance Sampling
33 pages
DSA Book by Narshimha
No ratings yet
DSA Book by Narshimha
76 pages
Mini Proj RCT 222 PDF
No ratings yet
Mini Proj RCT 222 PDF
34 pages
Car Price Prediction 1
No ratings yet
Car Price Prediction 1
24 pages
Lecture2-Supervised-Learning Slides
No ratings yet
Lecture2-Supervised-Learning Slides
56 pages
NNFL Midsem Presentation
No ratings yet
NNFL Midsem Presentation
20 pages
Measures of Variability
No ratings yet
Measures of Variability
20 pages
Comparing Machine Learning Models With Witczak NCHRP 1-40D Model For Hot-Mix Asphalt Dynamic Modulus Prediction
No ratings yet
Comparing Machine Learning Models With Witczak NCHRP 1-40D Model For Hot-Mix Asphalt Dynamic Modulus Prediction
13 pages
Fundamentals of Statistical Inference: Compiled by
No ratings yet
Fundamentals of Statistical Inference: Compiled by
101 pages
Operations Management Lesson 2 Trans
No ratings yet
Operations Management Lesson 2 Trans
6 pages
ML Usar Manual-2
No ratings yet
ML Usar Manual-2
21 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Properties of Estimators
No ratings yet
Properties of Estimators
27 pages
843-Artificial Intelligence-Xi Xii
100% (2)
843-Artificial Intelligence-Xi Xii
11 pages
Mastering Python: Basic To Advanced
No ratings yet
Mastering Python: Basic To Advanced
46 pages
Chapter-4 Mobile Radio Propagation Large-Scale Path Loss
100% (2)
Chapter-4 Mobile Radio Propagation Large-Scale Path Loss
106 pages
Road Materials and Pavement Design
No ratings yet
Road Materials and Pavement Design
10 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
An Energy-Efficient PAR-Based Horticultural Lighting System For Greenhouse Cultivation of Lettuce
No ratings yet
An Energy-Efficient PAR-Based Horticultural Lighting System For Greenhouse Cultivation of Lettuce
11 pages
Business Analytics
No ratings yet
Business Analytics
13 pages
Int375 Etp Paper
No ratings yet
Int375 Etp Paper
11 pages
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
No ratings yet
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
14 pages
Deepfilter: An Instrumental Baseline For Accurate and Efficient Process Monitoring
No ratings yet
Deepfilter: An Instrumental Baseline For Accurate and Efficient Process Monitoring
10 pages
A Generalized Family of Estimators For Estimating Population Mean Using Two Auxiliary Attributes
No ratings yet
A Generalized Family of Estimators For Estimating Population Mean Using Two Auxiliary Attributes
13 pages
Great Presentation Demand Forecasting - Spss
100% (1)
Great Presentation Demand Forecasting - Spss
93 pages
Faculty of Computer & Mathematical Sciences Time Series Analysis and Forecasting (Sta570) Assessment 3 Forecasting The Market Stock Price of Padini
No ratings yet
Faculty of Computer & Mathematical Sciences Time Series Analysis and Forecasting (Sta570) Assessment 3 Forecasting The Market Stock Price of Padini
21 pages
AI&ML Labmanual
No ratings yet
AI&ML Labmanual
33 pages
2-@@@from Mehrdad-Response Based Design Conditions in The North Sea Applicaiton of A New Method-1995
No ratings yet
2-@@@from Mehrdad-Response Based Design Conditions in The North Sea Applicaiton of A New Method-1995
12 pages

Intro To Data Science Lecture 5

Uploaded by

Intro To Data Science Lecture 5

Uploaded by

9/15/2022

MEASURING MODEL ACCURACY Observed Predicted

SOME CONSIDERATIONS TRAINING MSE VS. TEST MSE

and model interpretability.

 Thereis no guarantee that the method with the

LEFT LEFT RIGHT

EXAMPLE 3: HIGHLY NONLINEAR FUNCTION, LOW BIAS-VARIANCE TRADEOFF

 To reach the minimum test MSE, we need to

BIAS OF LEARNING METHODS VARIANCE OF LEARNING METHODS

 Bias refers to the (model specification) error that is

 Can bias be reduced by increasing the size of

THE TRADE-OFF TEST MSE, BIAS AND VARIANCE

where, 𝐵𝑖𝑎𝑠 = 𝐵𝑖𝑎𝑠 𝑌 = 𝐸 𝑌 − 𝐸(𝑌 ),

THE CLASSIFICATION SETTING BAYES CLASSIFIER

BAYES ERROR RATE BAYES OPTIMAL CLASSIFIER

 The Bayes error rate defines the lower bound of test

 On test data, no classifier can get lower error rates

The purple dashed line is the Bayes decision boundary,

K-NEAREST NEIGHBORS (KNN) KNN EXAMPLE WITH K = 3

 K-Nearest Neighbors is a flexible approach to estimate Two input variables:

KNN EXAMPLE WITH K = 3 KNN DISTANCE FUNCTIONS

 For continuous variables

 For categorical variable

Hamming distance = 𝐼(𝑥 ≠ 𝑦 )

Purple dashed line: Bayes decision boundary

 However, the test

You might also like