Lecture 19 20
Lecture 19 20
The bootstrap, invented by Efron in 1979, is a method for estimating standard errors and computing condence intervals. Let the = T (F ) be an interesting parameter of a distribution function F, which is the mean parameter of the distribution F, where T () is a functional of F. One simple example is T (F ) = xdF (x), which is the mean parameter of the distribution F. Another example ydF (x))2 dF (x), which is the variance parameter.
is T (F ) = (x
Let X1 , , Xn be an i.i.d. sample from F, and we use Fn to denote the empirical distribution which puts mass is the sample mean 1 = Xn = n
n 1 n
xdF (x)
Xi .
i=1
Now suppose we want to know the variance of our estimators usually depends on the unknown distribution F. For example, when is the sample mean, we have
2 = VF () n
where 2 = The basic bootstrap has two steps : 1 (x ydF (y))2 dF (x).
1. Estimate VF () with VFn (). 2. Approximate VFn using simulation. Note that for simple estimator we may be able to directly calculate VFn () without using simulation. For example, when = Xn = VFn () = 11 2 = n nn
n n Xi i=1 n ,
we have
(Xi Xn )2
i=1
However, in more complicated cases it is not easy to write down the formula of VFn () and we need to resort to simulation.
Simulation
Suppose that Y H and we want to estimate E(h(Y )). We can draw i.i.d. samples Y1 , Y2 , , YB from H and use the sample mean 1 B
B
h(Yj ) E(h(Y )) =
j=1
h(y)dH(y).
(Yj Y )2 =
j=1
1 B
Yj2 (
j=1
1 B
Yj )2 V (Y ).
i=1
In bootstrap we want to estimate VFn () which stands for the variance of if the true population distribution is Fn , and recall that = T (X1 , X2 , , Xn )). Now think of as Y
in the above example (i.e. Y = T (X1 , , Xn )) and the distribution G of Y in this case is the empirical distribution of all samples (X1 , , Xn ) whose elements are drawn i.i.d., from
(1) For b = 1, 2, , B:
1. Draw X1 , , Xn Fn 2. Compute b = T (X1 , , Xn )
Compute vboot =
1 B
B b=1 (b
1 B
B 2 c=1 c )
Bagging* : So far, we have investigated the bootstrap as a means to access estimation accuracy. An interesting question is whether the bootstrap can improve accuracy. Bagging is an attempt to do this. Bagging is a acronym meaning bootstrap aggregation. The idea is simple. Suppose we are estimating some quantity, e.g., the optimal portfolio weights to achieve an expected return of 0.012. We have one estimate from the original sample, and this estimate is often used. However, we also have B additional estimates, one from each of the bootstrap samples. The bagging estimate is the average of all of these bootstrap estimates.
Condence Interval:
There are dierent types of bootstrap condence intervals available in the literature. We are going to discuss only two types.
4. Let (r) is the r-th order statistics. 5. Therefore the 100(1 )% condence interval will be [ (k) , (k ) ] where k = [ b] if b 2 2 is integer and k = [ b] + 1 if b is not an integer. Similarly, k = [(1 )b] if (1 )b 2 2 2 2 is integer and k = [(1 )b] + 1 if (1 )b is not an integer. 2 2
are i.i.d. sample from N (, 2 ). But the problem will occur if we are not sampling from normal distribution, but rather some other distribution. In that case the following bootstrap condence interval can be constructed. Let Xboot,b and sboot,b be the sample mean and standard deviation of the b-th resample, b = 1, , B. Dene tboot,b = X Xboot,b
sboot,b n
Notice that tboot,b is dened in the same way as t except for two changes. First, X and s in t are replaced by Xboot,b and sboot,b . Second, in t is replaced by X in tboot,b . The last point is a bit subtle, and you should stop to think about it. A resample is taken using the original sample as the population. Thus, for the resample, the population mean is X!. Because the resamples are independent of each other, the collection tboot,1 , tboot,1 , can be treated as a random sample from the distribution of the t statistic. After B values of tboot,b have been calculated, one from each resample, we nd the 100(1 )% and 100(1 2
)% 2
percentiles of this collection of tboot,b values. Call these percentiles tL and tU . More 4
specically, we nd tU and tL as we described earlier. We sort all the B values from smallest to largest. Then we calculate the B/2 and round to the nearest integer. Suppose the result is KL . Then the KL -th sorted value of tboot,b is tL . Similarly, let KU be B(1 ) rounded 2 to the nearest integer and then tU is the KU th sorted value of tboot,b . Finally we can make
s s the bootstrap condence interval for as (X + tL n , X + tU n ). We get two advantages
through bootstrap:
We do not need to know the population distribution. We do not need to calculate the distribution of t-statistic using probability theory.
4.1
Jackknife Estimator:
Jackkning, which is similar to bootstrapping, is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. The basic idea behind the jackknife estimator lies in systematically recomputing the statistic estimate leaving out one or more observations at a time from the sample set. Therefore, in delete-1 jackknife the resamples for the sample (X1 , X2 , X3 ) can be given by , (X2 , X3 ), (X1 , X3 ) and (X1 , X2 ). Suppose, b b = 1(1)n are the estimators based on jackknife resamples of size (n-1)
from the original sample of size n. Then jackknife estimate of can be written as
avg = n b=1 b
n
n b=1 (b
1 avg )2 ] 2