0% found this document useful (0 votes)

43 views

Machine Learning

Resampling methods like cross-validation and the bootstrap are used to estimate the test error of statistical models and evaluate model performance. Cross-validation involves splitting the dataset into training and validation sets multiple times and fitting the model on each training set to obtain error estimates. It is more accurate than a single validation set split as it uses almost all of the data for training. Leave-one-out cross-validation repeats this process by leaving out each sample individually. This helps reduce overfitting compared to using a validation set. K-fold cross-validation is similar but uses k randomly assigned folds for increased computational efficiency over leave-one-out. Cross-validation can be used for both regression and classification problems by using mean squared error or

Uploaded by

Hari Krishnan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Machine Learning

Uploaded by

Hari Krishnan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Resampling Methods

Resampling methods are an indispensable tool in modern statistics. They involve repeatedly
drawing samples from a training set and refitting a model of interest on each sample in order
to obtain additional information about the fitted model.

Resampling approaches can be computationally expensive, because they involve fitting the
same statistical method multiple times using different subsets of the training data.

Dr. A V Prajeesh August 21, 2022 1 / 23

Two of the most commonly used resampling methods, cross-validation and the bootstrap.

cross-validation can be used to estimate the test error associated with a given statistical
learning method in order to evaluate its performance.

The bootstrap is used in several contexts, most commonly model selection to provide a
measure of accuracy of a parameter estimate or of a given statistical learning method.

Dr. A V Prajeesh August 21, 2022 2 / 23

The Validation Set Approach

Suppose that we would like to estimate the test error associated with fitting a particular
statistical learning method on a set of samples. The validation set approach involves randomly
dividing the available set of samples into two parts, a training set and a validation set or
hold-out set. The model is fit on the training set, and the fitted model is used to predict the
responses for the observations in the validation set.

Dr. A V Prajeesh August 21, 2022 3 / 23

The validation set approach is conceptually simple and is easy to implement.

But it has two potential drawbacks.

1. The validation estimate of the test error rate can be highly variable, depending on precisely
which observations are included in the training set and which observations are included in the
validation set.

2. In the validation approach, only a subset of the observations - those that are included in the
training set rather than in the validation set - are used to fit the model. Since statistical
methods tend to perform worse when trained on fewer observations, this suggests that the
validation set error rate may tend to overestimate the test error rate for the model fit on the
entire data set.

Dr. A V Prajeesh August 21, 2022 4 / 23

To overcome that drawbacks a refined technique is introduced.

Leave-One-Out Cross-Validation(LOOCV)

Like the validation set approach, LOOCV involves splitting the set of observations into two
parts. However, instead of creating two subsets of comparable size, a single observation (x1 , y1 )
is used for the validation set, and the remaining observations {(x2 , y2 ), ..., (xn , yn )} make up
the training set. The statistical learning method is fit on the n − 1 training observations, and a
prediction ŷ1 is made for the excluded observation, using its value x1 . Since (x1 , y1 ) was not
used in the fitting process, MSE1 = (y1 − ŷ1 )2 provides an approximately unbiased estimate
for the test error. But even though MSE1 is unbiased for the test error, it is a poor estimate
because it is highly variable, since it is based upon a single observation (x1 , y1 )

Dr. A V Prajeesh August 21, 2022 5 / 23

We can repeat the procedure by selecting (x2 , y2 ) for the validation data, training the statistical
learning procedure on the n − 1 observations {(x1 , y1 ) , (x3 , y3 ) , . . . , (xn , yn )}, and computing
MSE2 = (y2 − ŷ2 )2 . Repeating this approach n times produces n squared errors, MSE1 , . . .,
MSEn . The LOOCV estimate for the test MSE is the average of these n test error estimates:
n
1X
CV(n) = MSEi
n
i=1

Dr. A V Prajeesh August 21, 2022 6 / 23

LOOCV has a couple of major advantages over the validation set approach.

In LOOCV, we repeatedly fit the statistical learning method using training sets that contain
n − 1 observations, almost as many as are in the entire data set. This is in contrast to the
validation set approach, in which the training set is typically around half the size of the original
data set. Consequently, the LOOCV approach tends not to overestimate the test error rate as
much as the validation set approach does.

Second, in contrast to the validation approach which will yield different results when applied
repeatedly due to randomness in the training/validation set splits, performing LOOCV multiple
times will always yield the same results: there is no randomness in the training/validation set
splits.

Dr. A V Prajeesh August 21, 2022 7 / 23

k-Fold Cross-Validation

An alternative to LOOCV is k-fold CV . This approach involves randomly dividing the set of
observations into k groups, or folds, of approximately equal size. The first fold is treated as a
validation set, and the method is fit on the remaining k − 1 folds. The mean squared error,
MSE1 , is then computed on the observations in the held-out fold. This procedure is repeated
k times; each time, a different group of observations is treated as a validation set. This
process results in k estimates of the test error, MSE1 , MSE2 , . . . , MSEk . The k-fold CV
estimate is computed by averaging these values,
k
1X
CV(k) = MSEi .
k
i=1

Dr. A V Prajeesh August 21, 2022 8 / 23

It is not hard to see that LOOCV is a special case of k-fold CV in which k is set to equal n. In
practice, one typically performs k-fold CV using k = 5 or k = 10. What is the advantage of
using k = 5 or k = 10 rather than k = n ? The most obvious advantage is computational.
LOOCV requires fitting the statistical learning method n times. This has the potential to be
computationally expensive.

Dr. A V Prajeesh August 21, 2022 9 / 23

The Bias-Variance Trade-Off

The expected test MSE, for a given value x0 , can always be decomposed into the sum of three
fundamental quantities: the variance of fˆ (x0 ), the squared bias of fˆ (x0 ) and the variance of
the error terms ϵ. That is,
2 h i2
E y0 − fˆ (x0 ) = Var fˆ (x0 ) + Bias fˆ (x0 ) + Var(ϵ).
2
Here the notation E y0 − fˆ (x0 ) defines the expected test MSE, and refers to the average
test MSE that we would obtain if we repeatedly estimated f using a large number of training
sets, and tested each at x0 . The overall expected test MSE can be computed by averaging
2
E y0 − fˆ (x0 ) over all possible values of x0 in the test set.

Dr. A V Prajeesh August 21, 2022 10 / 23

What do we mean by the variance and bias of a statistical learning method? Variance refers to
the amount by which fˆ would change if we estimated it using a different training data set.
Since the training data are used to fit the statistical learning method, different training data
sets will result in a different fˆ. But ideally the estimate for f should not vary too much
between training sets. However, if a method has high variance then small changes in the
training data can result in large changes in fˆ.

Dr. A V Prajeesh August 21, 2022 11 / 23

On the other hand, bias refers to the error that is introduced by approximating a real-life
problem, which may be extremely complicated, by a much simpler model. For example, linear
regression assumes that there is a linear relationship between Y and X1 , X2 , . . . , Xp . It is
unlikely that any real-life problem truly has such a simple linear relationship, and so performing
linear regression will undoubtedly result in some bias in the estimate of f . Suppose the true f
is substantially non-linear, so no matter how many training observations we are given, it will
not be possible to produce an accurate estimate using linear regression. In other words, linear
regression results in high bias in this example.

The relationship between bias, variance, and test set MSE is referred to as the bias-variance
trade-off.

Dr. A V Prajeesh August 21, 2022 12 / 23

Cross-Validation on Classification Problems

We have illustrated the use of cross-validation in the regression setting where the outcome Y
is quantitative, and so have used MSE to quantify test error. But cross-validation can also be
a very useful approach in the classification setting when Y is qualitative. In this setting,
cross-validation works just as described earlier, except that rather than using MSE to quantify
test error, we instead use the the number of misclassified observations. For instance, in the
classification setting, the LOOCV error rate takes the form
n
1X
CV(n) = Erri
n
i=1

where Erri = I (yi ̸= ŷi ). The k-fold CV error rate and validation set error rates are defined
analogously.
Dr. A V Prajeesh August 21, 2022 13 / 23
Bootstrap

The bootstrap is a widely applicable and extremely powerful statistical tool bootstrap that can
be used to quantify the uncertainty associated with a given estimator or statistical learning
method. As a simple example, the bootstrap can be used to estimate the standard errors of
the coefficients from a linear regression fit. However, the power of the bootstrap lies in the
fact that it can be easily applied to a wide range of statistical learning methods, including
some for which a measure of variability is otherwise difficult to obtain and is not automatically
output by statistical software.

Dr. A V Prajeesh August 21, 2022 14 / 23

Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns
of X and Y , respectively, where X and Y are random quantities. We will invest a fraction α of
our money in X , and will invest the remaining 1 − α in Y . Since there is variability associated
with the returns on these two assets, we wish to choose α to minimize the total risk, or
variance, of our investment. In other words, we want to minimize Var(αX + (1 − α)Y ). Using
some statistics one can find an expression for α that minimizes the risk.

σY2 − σXY
α= ,
σX2 + σY2 − 2σXY

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

Dr. A V Prajeesh August 21, 2022 15 / 23

In reality, the quantities σX2 , σY2 , and σXY are unknown. We can compute estimates for these
quantities, σ̂X2 , σ̂Y2 , and σ̂XY , using a data set that contains past measurements for X and Y .
We can then estimate the value of α that minimizes the variance of our investment using

σ̂Y2 − σ̂XY
α̂ = .
σ̂X2 + σ̂Y2 − 2σ̂XY

So to get a better estimates we need to have data say 100 or 1000 sets of data and generate
the estimates of α̂ and Standard error (α̂) (the standard deviation of the estimates). So in the
real world situation we can use the concept of Bootstrap for generating such data sets from a
single data set.

Dr. A V Prajeesh August 21, 2022 16 / 23

This approach is illustrated in Figure A on a simple data set, which we call Z , that contains
only n = 3 observations. We randomly select n observations from the data set in order to
produce a bootstrap data set, Z ∗1 . The sampling is performed with replacement, which means
that the same observation can occur more than once in the bootstrap data set. In this
example, Z ∗1 contains the third observation twice, the first observation once, and no instances
of the second observation. Note that if an observation is contained in Z ∗1 , then both its X
and Y values are included. We can use Z ∗1 to produce a new bootstrap estimate for α, which
we call α̂∗1 . This procedure is repeated B times for some large value of B, in order to produce
B different bootstrap data sets, Z ∗1 , Z ∗2 , . . . , Z ∗B , and B corresponding α estimates,
α̂∗1 , α̂∗2 , . . . , α̂∗B . We can then compute the standard error as required.

Dr. A V Prajeesh August 21, 2022 17 / 23

Dr. A V Prajeesh August 21, 2022 18 / 23
Python executable python Code for Validation set approach, LOOCV and K- fold Cross
validation in the case of a linear regression model fitting

Figure 1
Dr. A V Prajeesh August 21, 2022 18 / 23
Figure 2: Validation set approach

Dr. A V Prajeesh August 21, 2022 19 / 23

Figure 3: K-fold cross validation

Dr. A V Prajeesh August 21, 2022 20 / 23

Figure 4: LOOCV

Dr. A V Prajeesh August 21, 2022 21 / 23

Bootstrapping Technique used in k-fold cross validation for a linear regression problem

Figure 5: Importing required data

Dr. A V Prajeesh August 21, 2022 22 / 23

Figure 6: Bootstrapping technique used in K-fold CV for a Linear regression problem

Dr. A V Prajeesh August 21, 2022 23 / 23

(Ebook PDF) Creating Value With Big Data Analytics: Making Smarter Marketing Decisions Ebook All Chapters PDF
100% (6)
(Ebook PDF) Creating Value With Big Data Analytics: Making Smarter Marketing Decisions Ebook All Chapters PDF
41 pages
Vollmer Et Al-1995-Journal of Applied Behavior Analysis
No ratings yet
Vollmer Et Al-1995-Journal of Applied Behavior Analysis
16 pages
ASQ 6 Sigma Green Belt
No ratings yet
ASQ 6 Sigma Green Belt
12 pages
MI_Unit 5
No ratings yet
MI_Unit 5
72 pages
Resampling Methods - ML
No ratings yet
Resampling Methods - ML
115 pages
DATA ANALYSIS UNIT 4 Notes
No ratings yet
DATA ANALYSIS UNIT 4 Notes
19 pages
Week7_Lecture_1_ML_SPR25 (1)
No ratings yet
Week7_Lecture_1_ML_SPR25 (1)
23 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
4-ResamplingMethods 1
No ratings yet
4-ResamplingMethods 1
23 pages
KNN_Bias_Variance_Classification_Metrics (1)
No ratings yet
KNN_Bias_Variance_Classification_Metrics (1)
81 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Classification
No ratings yet
Classification
4 pages
Statistical Learning: Master in Data Science For Management
No ratings yet
Statistical Learning: Master in Data Science For Management
47 pages
CV
No ratings yet
CV
37 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
Mi Unit 5 2m
No ratings yet
Mi Unit 5 2m
3 pages
Assignment_Solution_2
No ratings yet
Assignment_Solution_2
8 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
Crossvalidation - 1
No ratings yet
Crossvalidation - 1
30 pages
18+cv+%26+model+selection
No ratings yet
18+cv+%26+model+selection
11 pages
ML 5
No ratings yet
ML 5
14 pages
Lec 16
No ratings yet
Lec 16
18 pages
lec5
No ratings yet
lec5
28 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
Resampling Methods Class 2
No ratings yet
Resampling Methods Class 2
38 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
ASSESSING MODEL Accuracy PDF
No ratings yet
ASSESSING MODEL Accuracy PDF
22 pages
Resampling Methods Class 1
No ratings yet
Resampling Methods Class 1
33 pages
SLA Mid-termV2 Soln
No ratings yet
SLA Mid-termV2 Soln
5 pages
SLChapter 4
No ratings yet
SLChapter 4
20 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Part4 ChoosingH Slides
No ratings yet
Part4 ChoosingH Slides
24 pages
02 Chap02 AssesingModelAccuracy
No ratings yet
02 Chap02 AssesingModelAccuracy
22 pages
07 - Evaluating Performance
No ratings yet
07 - Evaluating Performance
46 pages
ML Nithish
No ratings yet
ML Nithish
16 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
MIS410 Lecture8toLecture10
No ratings yet
MIS410 Lecture8toLecture10
13 pages
Unit IV
No ratings yet
Unit IV
51 pages
Cross Validation
No ratings yet
Cross Validation
6 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
lec1
No ratings yet
lec1
54 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
1.4 Intro To Need of Estimation and Validation PDF
No ratings yet
1.4 Intro To Need of Estimation and Validation PDF
18 pages
Over Fit
No ratings yet
Over Fit
63 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
ML U-4
No ratings yet
ML U-4
63 pages
Bouckaert Calibrated Tests
No ratings yet
Bouckaert Calibrated Tests
8 pages
Fundamentals Part 3
No ratings yet
Fundamentals Part 3
21 pages
ADS
No ratings yet
ADS
20 pages
Cross-Validation and Model Selection
No ratings yet
Cross-Validation and Model Selection
46 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Diagram Simple Regression Appropriate: Roadway
No ratings yet
Diagram Simple Regression Appropriate: Roadway
1 page
Data Mining Primer
No ratings yet
Data Mining Primer
15 pages
Pca - STATA
No ratings yet
Pca - STATA
17 pages
L3 Academic Writing
No ratings yet
L3 Academic Writing
44 pages
Inventory Management Using Power BI 1
No ratings yet
Inventory Management Using Power BI 1
8 pages
Solving Repeated Measures ANOVA Problems
No ratings yet
Solving Repeated Measures ANOVA Problems
29 pages
OPS601 Generic Business Review Report Template v1.0
No ratings yet
OPS601 Generic Business Review Report Template v1.0
6 pages
DR ImranZahraanProf DR AbuBakar
No ratings yet
DR ImranZahraanProf DR AbuBakar
6 pages
Chapter 3 Corrected
No ratings yet
Chapter 3 Corrected
14 pages
Report OM
No ratings yet
Report OM
13 pages
HW1 Cse 5160-01
No ratings yet
HW1 Cse 5160-01
6 pages
The Problem and Its Background
No ratings yet
The Problem and Its Background
40 pages
Organizational Performance in Hospitals in Suriname. A Test of An Extended Version of DeLone and McLean S IS Success Model 2003 by DeLano Gefferie MBAIV
No ratings yet
Organizational Performance in Hospitals in Suriname. A Test of An Extended Version of DeLone and McLean S IS Success Model 2003 by DeLano Gefferie MBAIV
129 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
Session 16,17 and 18
No ratings yet
Session 16,17 and 18
7 pages
CH 05
No ratings yet
CH 05
166 pages
Research Writing Guide
0% (1)
Research Writing Guide
52 pages
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
No ratings yet
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
29 pages
Quantitative Data Analysis: Hypothesis Testing: © 2009 John Wiley & Sons LTD
No ratings yet
Quantitative Data Analysis: Hypothesis Testing: © 2009 John Wiley & Sons LTD
39 pages
Research/Capstone Project Rubric
No ratings yet
Research/Capstone Project Rubric
4 pages
Uju Normalitas,: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Uju Normalitas,: One-Sample Kolmogorov-Smirnov Test
6 pages
Hypothesis Testing (Vikram Shermale)
No ratings yet
Hypothesis Testing (Vikram Shermale)
5 pages
Data-Analyst - ERT
No ratings yet
Data-Analyst - ERT
21 pages
Sudhakar Kumar: Contact
No ratings yet
Sudhakar Kumar: Contact
2 pages
QB AMT305module 2
No ratings yet
QB AMT305module 2
4 pages
Adami Tulu Pesticide Factory
No ratings yet
Adami Tulu Pesticide Factory
83 pages
DPP on Statistics
No ratings yet
DPP on Statistics
4 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

Resampling Methods

Dr. A V Prajeesh August 21, 2022 1 / 23

Dr. A V Prajeesh August 21, 2022 2 / 23

Dr. A V Prajeesh August 21, 2022 3 / 23

But it has two potential drawbacks.

Dr. A V Prajeesh August 21, 2022 4 / 23

Dr. A V Prajeesh August 21, 2022 5 / 23

Dr. A V Prajeesh August 21, 2022 6 / 23

Dr. A V Prajeesh August 21, 2022 7 / 23

Dr. A V Prajeesh August 21, 2022 8 / 23

Dr. A V Prajeesh August 21, 2022 9 / 23

Dr. A V Prajeesh August 21, 2022 10 / 23

Dr. A V Prajeesh August 21, 2022 11 / 23

Dr. A V Prajeesh August 21, 2022 12 / 23

Dr. A V Prajeesh August 21, 2022 14 / 23

where σX2 = Var(X ), σY2 = Var(Y ), and σXY = Cov(X , Y ).

Dr. A V Prajeesh August 21, 2022 15 / 23

Dr. A V Prajeesh August 21, 2022 16 / 23

Dr. A V Prajeesh August 21, 2022 17 / 23

Dr. A V Prajeesh August 21, 2022 19 / 23

Dr. A V Prajeesh August 21, 2022 20 / 23

Dr. A V Prajeesh August 21, 2022 21 / 23

Figure 5: Importing required data

Dr. A V Prajeesh August 21, 2022 22 / 23

Dr. A V Prajeesh August 21, 2022 23 / 23

You might also like