0% found this document useful (0 votes)
22 views43 pages

Resampling Methods-1

The document discusses resampling methods in machine learning, focusing on techniques like cross-validation and bootstrapping to assess model performance and select the best model. It emphasizes the importance of estimating prediction error through methods such as K-Fold Cross-Validation and Leave-One-Out Cross-Validation (LOOCV). Additionally, it highlights the significance of training, validation, and test sets in evaluating model generalization and avoiding overfitting.

Uploaded by

sainathgunda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Resampling Methods-1

The document discusses resampling methods in machine learning, focusing on techniques like cross-validation and bootstrapping to assess model performance and select the best model. It emphasizes the importance of estimating prediction error through methods such as K-Fold Cross-Validation and Leave-One-Out Cross-Validation (LOOCV). Additionally, it highlights the significance of training, validation, and test sets in evaluating model generalization and avoiding overfitting.

Uploaded by

sainathgunda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

sainathgunda99@gmail.

com
DLZNK464L9 Resampling – CV and bootstrapping

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Resampling Methods
• Involves repeatedly drawing samples from a training set and refitting a
model of interest on each sample in order to obtain more information
about the fitted model.

• Example: We can estimate the variability of a linear regression fit by


repeatedly drawing different samples from the training data, fitting a
OLS regression to each new sample, and then examining the extent to
[email protected]
DLZNK464L9
which the resulting fits differ.

• Model Assessment: having chosen a final model, estimating its prediction


error on new data.

• Model Selection: estimating the performance of different models in


order to choose the best one.
Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Resampling Methods

• Cross-Validation
• Used to estimate test set prediction error rates associated with
a given machine learning method to evaluate its performance,
or to select the appropriate level of model flexibility.

[email protected]
• Bootstrap
DLZNK464L9

• Used most commonly to provide a measure of accuracy of a


parameter estimate of a given machine learning method.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Model Assessment

• The generalization performance of a machine learning method


relates to its prediction capability on independent test sets.

• Assessment of this performance is extremely important in practice,


since it guides the choice of the machine learning method or model.
[email protected]
DLZNK464L9

• Further, this gives us a measure of the quality of the ultimately


chosen model.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Model Assessment

• Test Error
• The average error that results from using a machine learning
method to predict the response on a new observation.
• The prediction error over an independent test sample.
[email protected]
DLZNK464L9
• Training Error
• The average loss over the training sample:

• Note: The training error rate can dramatically underestimate the test
error rate

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Model Assessment
● As the model becomes more and more
complex, it uses the training data more and is
able to adapt to more complicated underlying
structures.

● Hence, there is a decrease in bias but an


[email protected]
DLZNK464L9 increase in variance.

● However, training error is not a good estimate


of the test error.

• Training error consistently decreases with model complexity.


• A model with zero training error is overfit to the training data and will typically
generalize poorly.
Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Model Assessment

● If we are in a data-rich situation, the best approach for both model selection and
model assessment is to randomly divide the dataset into three parts: training set,
validation set, and test set.

● The training set is used to fit the models. The validation set is used to estimate
[email protected]
DLZNK464L9
prediction error for model selection. The test set is used for assessment of the
prediction error of the final chosen model.

● A typical split: 50% for training, and 25% each for validation and testing.

Train Validation Test

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 Validation Set Approach

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Validation Set Approach

● Suppose that we would like to find a set of variables that give the lowest validation
error rate (an estimate of the test error rate).

● If we have a large data set, we can achieve this goal by randomly splitting the data
into separate training and validation data sets.
[email protected]
DLZNK464L9

● Then, we use the training data set to build each possible model and select the model
that gives the lowest error rate when applied to the validation data set.

Training Data Validation Data


Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Validation process

123 n

[email protected]
DLZNK464L9
7 22 13 91

A random splitting into two halves: left part is training set, right
part is validation set

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Validation Set Approach: Example Results

[email protected]
DLZNK464L9

• Left Panel: Validation error estimates for a single split into


training and validation data sets.
• Right Panel: Validation error estimates for multiple splits; shows
the test error rate is highly variable.
Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Validation Set Approach: Review

• Advantages:
• Conceptually simple and easy implementation.

• Drawbacks:
• The validation set error rate (MSE) can be highly variable.
[email protected]
DLZNK464L9

• Only a subset of the observations (those in the training set) are


used to fit the model.
• Machine learning methods tend to perform worse when trained
on fewer observations.
• Thus, the validation set error rate may tend to overestimate the
test error rate for the model fit on the entire data set.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Training, validation, test

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 Cross-Validation Approach

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
K-Fold Cross-Validation

• Probably the simplest and most widely used method for estimating
prediction error.

• This method directly estimates the average prediction error when the
machine learning method is applied to an independent test sample.
[email protected]
DLZNK464L9

• Ideally, if we had enough data, we would set aside a validation set (as
previously described) and use it to assess the performance of our
prediction model.

• To finesse the problem, K-fold cross-validation uses part of the available


data to fit the model, and a different part to test it.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
K-Fold Cross-Validation

• We use this method because LOOCV (~next few


slides!) is computationally intensive.

• We randomly divide the data set of into K folds


[email protected] (typically K = 5 or 10).
DLZNK464L9

• The first fold is treated as a validation set, and the method is fit on
the remaining K – 1 folds. The MSE is computed on the observations
in the held-out fold. The process is repeated K times, taking out a
different part each time.

• By averaging the K estimates of the test error, we get an estimated


validation (test) error rate for new observations.
Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
K-Fold Cross-Validation

[email protected]
DLZNK464L9

• 10-fold CV run nine separate times, each with a different random


split of the data into ten parts.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-validation

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-validation

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-validation

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-validation

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-Validation: Wrong Way

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-Validation: Right Way

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 LOOCV

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Leave-One-Out Cross-Validation
• Instead of creating two subsets of comparable size, a single
observation is used for the validation set and the remaining
observations (n – 1) make up the training set.

LOOCV Algorithm:
– Split the entire data set of size n into:
[email protected]
DLZNK464L9 • Blue = training data set
• Beige = validation data set
– Fit the model using the training data set
– Evaluate the model using validation set and
compute the corresponding MSE.
– Repeat this process n times, producing n
squared errors. The average of these n
squared errors estimates the test MSE.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
LOOCV

[email protected]
DLZNK464L9

• LOOCV sometimes useful, but typically doesn’t shake up the


data enough.
• The estimates from each fold are highly correlated and hence
their average can have high variance.
• Sometimes a better choice is K = 5 or 10.
• Extremely handy when Proprietary
n isThislow content.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Validation Set Approach vs. LOOCV

• LOOCV has far less bias and, therefore, tends not to overestimate
the test error rate.

• Performing LOOCV multiple times always yields the same results


because there is no randomness in the training/validation set splits.
[email protected]
DLZNK464L9

• With exceptions, LOOCV is computationally intensive because the


model has to be fit n times.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Bias-Variance Trade-off for K-Fold Cross-Validation

• Which is better, LOOCV or K-fold CV?


• LOOCV is more computationally intensive than K-fold CV
• From the perspective of bias reduction, LOOCV is preferred to K-fold CV (when K <
n)
• However, LOOCV has higher variance than K-fold CV (when K < n)
• Thus, we see the bias-variance trade-off between the two resampling methods
[email protected]
DLZNK464L9

• We tend to use K-fold CV with K = 5 or K = 10, as these values have been


shown empirically to yield test error rate estimates that suffer neither
from excessively high bias nor from very high variance

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
K-Fold Cross-Validation vs. LOOCV

[email protected]
DLZNK464L9

• Left Panel: LOOCV Error Curve


• Right Panel: 10-fold CV run nine separate times, each with a
different random split of the data into ten parts.
• Note: LOOCV is a special case of K-fold, where K = n

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Cross-Validation on Classification Problems

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 Bootstrapping

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap

• The bootstrap is a flexible and powerful statistical tool that can be used to
quantify uncertainty associated with a given estimator or machine learning
method; it is a general tool for assessing statistical accuracy.

• The bootstrap can be used to estimate the standard errors of the coefficients
[email protected]
from a linear regression fit, or a confidence interval for that coefficient.
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap

• CV uses sampling without replacement


• The same instance, once selected, can not be selected again for a particular
training/test set
• The bootstrap uses sampling with replacement to form the training
set
[email protected]
DLZNK464L9
• Sample a dataset of n instances n times with replacement to form a new
dataset of n instances
• Use this data as the training set
• Use the instances from the original dataset that do not occur in the new
training set for testing

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
Bootstrap sampling – sampling with replacement
n random numbers 1..n Examples from selected indices

3
3
2 • Most bootstrap samples contain
[email protected] 1 duplicates from the original
DLZNK464L9
5
• On average, a bootstrap sample
Original Training Set
4 omits ~37 of the original data.
5
1
1
2
Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: Overview

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap

• Suppose we have a model fit to a set of training data. We denote the training
set by Z = (z1, z2, . . . , zN) where zi = (xi, yi).

• The basic idea is to randomly draw datasets with replacement from the
training data, each sample the same size as the original training set.
[email protected]
DLZNK464L9

• This is done B times, producing B bootstrap datasets. Then we refit the model
to each of the bootstrap datasets, and examine the behavior of the fits over
the B replications.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: Overview
• S(Z) is any quantity computed from
the data Z, for example, the
prediction at some input point.

• From the bootstrap sampling we can


estimate any aspect of the
[email protected]
DLZNK464L9

distribution of S(Z), for example, its


variance:

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: An Example

• A graphical illustration of the


bootstrap approach on a small sample
containing n = 3 observations.

• Each bootstrap data set contains n


observations, sampled with
[email protected]
DLZNK464L9
replacement from the original data
set.

• Each bootstrap data set is used to


obtain an estimate of α.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: An Example

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: More Details

• In more complex data situations, figuring out the appropriate way to


generate bootstrap samples can require some thoughts.

• For example, if the data is a time series, we cannot simply sample


the observations with replacement.
[email protected]
DLZNK464L9

• However, we can instead create blocks of consecutive observations,


and sample those with replacement. Then, we paste together
sampled blocks to obtain a bootstrap dataset.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: More Details

• Although the bootstrap is used primarily to obtain standard errors of an


estimate, it can also provide approximate confidence intervals for a
population parameters.
• One approach is known as Bootstrap Percentile confidence interval.

• In cross-validation, each of the K validation folds is distinct from the other K


[email protected]
– 1 folds used for training (i.e. there is no overlap).
DLZNK464L9

• To estimate the prediction error using the bootstrap, one approach would
be to fit the model in question on a set of bootstrap samples, and then keep
track of how well it predicts the original training set.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: More Details

[email protected]
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.
The Bootstrap: More Details

• For each observation, we only keep track of prediction from bootstrap samples not
containing that observation.

• The leave-one-out bootstrap estimate of prediction error is defined by:

[email protected]
DLZNK464L9

• Here C-i is the set of indices of the bootstrap samples b that do not contain
observation I, and |C-i| is the number of such samples.

• Note that the leave-one-out bootstrap solves the problem of overfitting, but has a
training-set-size bias.
• The “.632 estimator” is designed to alleviate this bias.

Proprietary
Thiscontent.
file is ©University
meant forof personal
Arizona. All Rights
use by Reserved. Unauthorized use or distributiononly.
[email protected] prohibited."

Sharing or publishing the contents in part or full is liable for legal action.

You might also like