0% found this document useful (0 votes)
16 views7 pages

Intro To Data Science Lecture 5

The document outlines key concepts for assessing model accuracy, including the bias-variance tradeoff. It discusses measuring the quality of fit using mean squared error on both training and test data. Highly flexible models may have low bias but high variance, while less flexible models like linear regression may have higher bias but lower variance. The best model balances bias and variance to minimize the test error. The document also introduces Bayes classifiers and K-nearest neighbors methods.

Uploaded by

engmjod.88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Intro To Data Science Lecture 5

The document outlines key concepts for assessing model accuracy, including the bias-variance tradeoff. It discusses measuring the quality of fit using mean squared error on both training and test data. Highly flexible models may have low bias but high variance, while less flexible models like linear regression may have higher bias but lower variance. The best model balances bias and variance to minimize the test error. The document also introduces Bayes classifiers and K-nearest neighbors methods.

Uploaded by

engmjod.88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

9/15/2022

OUTLINE
Introduction to Data Science for  Assessing Model Accuracy
Civil Engineers  Measuring the Quality of Fit
 The Bias-Variance Trade-off
 The Classification Setting
 BayesClassifier
Lecture 3a. Assessing Model Accuracy, Bayes
 K-Nearest Neighbors (KNN)
Classifier, and KNN

Some of the figures in this presentation are taken from "An Introduction to Statistical
Learning, with applications in R" (Springer, 1st Edition, 2013; 2nd Edition, 2021) with
Fall 2022 permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani

1 2

MEASURING MODEL ACCURACY Observed Predicted


 Many models/approaches can be used to estimate the
unknown function 𝑓 in 𝑌 = 𝑓 𝒙 + 𝜀 . How to select? # 𝑋 𝑋 . 𝑋 Y 𝑌 Squared
 We want to select the best one. What is “best”? error
 In regression problems, one common measure of 1 𝑋 𝑋 . 𝑋 Y 𝑌 (𝑌 −𝑌 )
accuracy of the function 𝑓 is the mean squared error 2 𝑋 𝑋 . 𝑋 Y 𝑌 (𝑌 −𝑌 )
(MSE)
. . .
1 . . .
𝑀𝑆𝐸 = (𝑌 − 𝑌 )
𝑛 . . .
n 𝑋 𝑋 . 𝑋 𝑌 𝑌 (𝑌 −𝑌 )
 where 𝑌 = 𝑓 𝒙 is the prediction our approach gives for
the 𝑌 observation in our training data.
1
 Note: this definition of MSE for predictor (𝑓) is different 𝑀𝑆𝐸 = (𝑌 − 𝑌 )
from the definition of MSE for estimator. 𝑛

3 4 1
9/15/2022

SOME CONSIDERATIONS TRAINING MSE VS. TEST MSE

 TheMSE discussed is based on the Training  In general, a more flexible method will fit the
Data, in which the response variable 𝑌 has training data better.
 More flexible methods are less restrictive in the
been observed.
possible shape of 𝑓 as compared to less flexible
methods.
 Whatis more important is how well the model  Less flexible methods, however, are easier to
works on new data, known as Test Data, in interpret.
which the response variable 𝑌 is unknown.  There is a trade-off between model flexibility

and model interpretability.

 Thereis no guarantee that the method with the


 However, the test MSE may in fact be higher for a
smallest training MSE will have the smallest
more flexible method than for a less flexible method
test MSE. like linear regression.

5 6

EXAMPLE 1: HIGHLY NONLINEAR FUNCTION, HIGH EXAMPLE 2: LESS NONLINEAR FUNCTION, HIGH
VARIANCE IN Y
VARIANCE INY

𝑌 =𝑓 𝒙 +𝜀

𝑣𝑎𝑟(𝜀)
𝑣𝑎𝑟(𝜀)

LEFT LEFT RIGHT


RIGHT
Black: True relationship Black: True relationship Red: Test MSE
Red: Test MSE
Orange: Linear regression Orange: Linear regression Grey: Training MSE
Grey: Training MSE
Blue: smoothing spline Blue: smoothing spline Dashed: Minimum possible
Dashed: Minimum possible test
Green: more flexible Green: smoothing spline test MSE (irreducible error)
MSE (irreducible error,𝑣𝑎𝑟(𝜀))
smoothing spline (more flexible)

7 8 2
9/15/2022

EXAMPLE 3: HIGHLY NONLINEAR FUNCTION, LOW BIAS-VARIANCE TRADEOFF


VARIANCE IN Y
 There are always two competing properties
(i.e., bias and variance) of statistical learning
methods that govern the choice of learning
method.

 To reach the minimum test MSE, we need to


𝑣𝑎𝑟(𝜀)
make a tradeoff between bias and variance.

LEFT
Black: Truth RIGHT
Orange: Linear regression Red: Test MSE
Blue: smoothing spline Grey: Training MSE
Green: smoothing spline (more Dashed: Minimum possible
flexible) test MSE (irreducible error)

9 10

BIAS OF LEARNING METHODS VARIANCE OF LEARNING METHODS

 Bias refers to the (model specification) error that is


introduced by modeling a complicated function by a  Variance refers to how much 𝑓 (i.e., the estimate of 𝑓)
much simpler one. would change if it was estimated using a different
 For example, simple linear regression assumes training data set.
that there is a linear relationship between 𝑌 and
𝑋. In reality, the relationship may not be exactly  Generally, the more flexible a method is, the more
linear, so some bias will be present. variance it has.

 The more flexible/complex a method is, the less bias  Can variance be reduced by increasing the size of
it will generally have. training data set?

 Can bias be reduced by increasing the size of


training data set?

11 12 3
9/15/2022

THE TRADE-OFF TEST MSE, BIAS AND VARIANCE


 It can be shown that for any given value of the input
variables, 𝒙 , the expected test MSE for a new value of
the output variable 𝑌 = 𝑌 at 𝒙 will be equal to
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑_𝑇𝑒𝑠𝑡_𝑀𝑆𝐸 = 𝐸(𝑌 − 𝑌 ) = 𝐸(𝑌 − 𝑓 (𝒙 ))
= 𝐵𝑖𝑎𝑠 + 𝑉𝑎𝑟 + 𝜎

where, 𝐵𝑖𝑎𝑠 = 𝐵𝑖𝑎𝑠 𝑌 = 𝐸 𝑌 − 𝐸(𝑌 ),


𝑉𝑎𝑟 = 𝑉𝑎𝑟 𝑌 = 𝐸(𝑌 − 𝐸 𝑌 ) ,
𝜎 = 𝑉𝑎𝑟(𝜀), 𝑌 =𝐸 𝑌 +𝜀
 As a method gets more flexible, the bias will decrease
and the variance will increase but expected test MSE
may increase or decrease.
 The challenge is to find a method for which both the
variance and the squared bias are low (i.e., bias-
variance tradeoff).
13 14

THE CLASSIFICATION SETTING BAYES CLASSIFIER

 For regression problems, the MSE is used to assess  The Bayes classifier: assigns each observation to the
the accuracy of the statistical learning method. most likely class, given its predictor values.
 For classification problems, the error rate may be used
instead:
 That is, assign an observation with predictor vector 𝒙
1
𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒 = 𝐼(Y ≠ 𝑌 ) to a class 𝑗 so that its conditional probability, 𝑃(𝑌
𝑛 = 𝑗|𝑋 = 𝒙 ), is the largest.

 𝐼(Y ≠ 𝑌 ) is an indicator function, which equals 1 if Y  The test error rate of the Bayes classifier is called the
≠ 𝑌 and 0 if Y = 𝑌 . Bayes error rate.

 Thus the error rate represents the fraction of incorrect  In reality, it is impossible to use the Bayes classifier
classifications, or misclassifications. because the conditional probability is unknown.

15 16 4
9/15/2022

BAYES ERROR RATE BAYES OPTIMAL CLASSIFIER

 The Bayes error rate defines the lower bound of test


error rate (i.e., the lowest possible error rate that
could be achieved if we knew the probability
distribution of the data).
 In real life problems the Bayes error rate cannot be
calculated.
 The Bayes error rate is analogous to the irreducible error
discussed earlier.

 On test data, no classifier can get lower error rates


than the Bayes error rate.

The purple dashed line is the Bayes decision boundary,


representing the points where the conditional probability is 50%.

17 18

K-NEAREST NEIGHBORS (KNN) KNN EXAMPLE WITH K = 3

 K-Nearest Neighbors is a flexible approach to estimate Two input variables:


the Bayes Classifier (i.e., estimate the conditional 𝑋 and 𝑋
probability in the Bayes Classifier).
Two categories of the
 For a given test observation 𝑋 = 𝒙 , we find the K out variable: blue and
neighbors closest to 𝒙 in the training data, and orange 𝑋
examine their corresponding 𝑌 .
Question: which
1
𝑃 𝑌=𝑗𝑋=𝒙 ≈ 𝐼(𝑌 = 𝑗) category does the black
𝐾
cross belong to?

 For example, if the majority of the 𝑌 ’s are orange we With K=3, we find three
predict 𝑌 (corresponding to the test observation 𝒙 ) to points closest to x, and 𝑋
be orange otherwise guess blue. check which category
most of the three points
belong to.
 The smaller K is, the more flexible the method will be.
19 20 5
9/15/2022

KNN EXAMPLE WITH K = 3 KNN DISTANCE FUNCTIONS


Black curve:  The distance between two vectors 𝒙 = (𝑥 , 𝑥 , … , 𝑥 )′
KNN decision boundary and 𝒚 = (𝑦 , 𝑦 , … , 𝑦 )′

 For continuous variables

Euclidean distance = ∑ 𝑥 −𝑦

𝑋
Manhattan distance = ∑ |𝑥 − 𝑦 |

 For categorical variable

Hamming distance = 𝐼(𝑥 ≠ 𝑦 )

𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝐼 𝑥 ≠ 𝑦 = 1 𝑖𝑓 𝑥 ≠ 𝑦 ; 0 otherwise.

21 22

K = 1 AND K = 100
SIMULATED DATA: K = 10

Black curve:
KNN decision
boundary

Purple dashed
line:
Bayes decision
boundary
K=1, KNN decision boundary is too flexible (low bias but high variance)
K=100, KNN decision boundary is close to linear (low variance but high bias)

Purple dashed line: Bayes decision boundary

23 24 6
9/15/2022

A FUNDAMENTAL PICTURE
TRAINING VS. TEST ERROR RATES ON THE
SIMULATED DATA  In general, with the increase of model flexibility
 Training errors will always decline.
 Training error
 But test errors will decline at first (due to reduction
rates keep
decreasing as k in bias) but then start to increase (due to increase in
decreases or variance).
equivalently as the
flexibility
increases.

 However, the test


error rate decreases
initially but then
begins to increase.

25 26

You might also like