0% found this document useful (0 votes)
7 views13 pages

Lec 6

Uploaded by

foreverycc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Lec 6

Uploaded by

foreverycc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Lecture 6: The Bootstrap

Reading: Chapter 5

STATS 202: Data mining and analysis

Rajan Patel

1/1
Cross-validation vs. the Bootstrap

Cross-validation: provides estimates of the (test) error.

The Bootstrap: provides the (standard) error of estimates.

I One of the most important techniques in all


of Statistics.
I Computer intensive method.
I Popularized by Brad Efron, from Stanford.

2/1
Standard errors in linear regression
Standard error: SD of an estimate from a sample of size n.

3/1
Classical way to compute Standard Errors

Example: Estimate the variance of a sample x1 , x2 , . . . , xn :


n
X
2 1
ˆ = (xi x)2 .
n 1
i=1

What is the Standard Error of ˆ 2 ?

1. Assume that x1 , . . . , xn are normally distributed.


2. Assume that the true variance is close to ˆ 2 and the true
mean is close to x.
3. Then ˆ 2 (n 1) has a -squared distribution with n degrees of
freedom.
4. The SD of this sampling distribution is the Standard Error.

4/1
Limitations of the classical approach

This approach has served statisticians well for 90 years; however,


what happens if:

I The distributional assumption — for example, x1 , . . . , xn


being normal — breaks down?
I The estimator does not have a simple form and its sampling
distribution cannot be derived analytically?

5/1
Example. Investing in two assets
Suppose that X and Y are the returns of two assets.

These returns are observed every day: (x1 , y1 ), . . . , (xn , yn ).

2
2

1
1
0
Y

Y
0
−1

−1
−2

−2
−2 −1 0 1 2 −2 −1 0 1

X X

6/1
Example. Investing in two assets

We have a fixed amount of money to invest and we will invest a


fraction ↵ on X and a fraction (1 ↵) on Y . Therefore, our return
will be
↵X + (1 ↵)Y.

Our goal will be to minimize the variance of our return as a


function of ↵. One can show that the optimal ↵ is:
2 Cov(X, Y )
Y
↵= 2 2 .
X + Y 2Cov(X, Y )

Proposal: Use an estimate:

ˆY2 ˆ
Cov(X, Y)

ˆ= .
2 2 ˆ
ˆX + ˆY 2Cov(X, Y )

7/1
Example. Investing in two assets

Suppose we compute the estimate ↵


ˆ = 0.6 using the samples
(x1 , y1 ), . . . , (xn , yn ).

I How sure can we be of this value?


I If we resampled the observations, would we get a wildly
different ↵ˆ?

In this thought experiment, we know the actual joint distribution


P (X, Y ), so we can resample the n observations to our hearts’
content.

8/1
Resampling the data from the true distribution

2
2

1
1
0
Y

Y
0
−1

−1
−2

−2
−2 −1 0 1 2 −2 −1 0 1 2

X X
2

2
1

1
0
0
Y

Y
−1
−1

−2
−2

−3
−3

−3 −2 −1 0 1 2 −2 −1 0 1 2 3

X X

9/1
Computing the standard error of ↵
ˆ

For each resampling of the data,


(1)
(x1 , . . . , x(1)
n )

(2)
(x1 , . . . , x(2)
n )

...
ˆ (1) , ↵
we can compute a value of the estimate ↵ ˆ (2) , . . . .

The Standard Error of ↵


ˆ is approximated by the standard deviation
of these values.

10 / 1
In reality, we only have n samples

I However, these samples can be


used to approximate the joint
distribution of X and Y .

2
2

I The Bootstrap: Resample from

1
1

the empirical distribution:


0
Y

Y
0
n
1X
−1

P̂ (X, Y ) = (xi , yi ).
−1
n
i=1
−2

−2

−2 −1 0 1 2
I Equivalently,
−2 −1 0
resample
1 2
the data by
X drawing n samples
X with
replacement from the actual
observations.
2

11 / 1
A schematic of the Bootstrap
Obs X Y

3 5.3 2.8
α̂ *1
1 4.3 2.4
*1
Z 3 5.3 2.8

Obs X Y
Obs X Y
2 2.1 1.1
1 4.3 2.4 Z *2
!!
3 5.3 2.8 α̂ *2
2 2.1 1.1
!! 1 4.3 2.4 !!
3 5.3 2.8 !!
!! !!
!! !! !!
!Z
*B
!! !!
Original Data (Z)
!!
Obs X Y
α̂ *B
2 2.1 1.1
2 2.1 1.1
1 4.3 2.4

12 / 1
Comparing Bootstrap resamplings
to resamplings from the true distribution

0.9
200
200

0.8
150
150

0.7
0.6

100
100

0.5
50
50

0.4
0

0.3
0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.4 0.5 0.6 0.7 0.8 0.9 True Bootstrap

↵ ↵

13 / 1

You might also like