0% found this document useful (0 votes)
70 views44 pages

Constructing Bootstrap Confidence Intervals: Section 3.3

This document provides an overview of how to construct bootstrap confidence intervals. It explains that bootstrapping involves taking many random samples with replacement from an original sample and computing the statistic of interest for each bootstrap sample. This creates a bootstrap distribution that can be used to estimate the standard error and create confidence intervals. Specifically, it describes how the variability of bootstrap statistics estimates the variability of the sampling distribution, allowing the standard deviation of the bootstrap distribution to estimate the standard error needed for confidence intervals.

Uploaded by

Islam Salikhanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views44 pages

Constructing Bootstrap Confidence Intervals: Section 3.3

This document provides an overview of how to construct bootstrap confidence intervals. It explains that bootstrapping involves taking many random samples with replacement from an original sample and computing the statistic of interest for each bootstrap sample. This creates a bootstrap distribution that can be used to estimate the standard error and create confidence intervals. Specifically, it describes how the variability of bootstrap statistics estimates the variability of the sampling distribution, allowing the standard deviation of the bootstrap distribution to estimate the standard error needed for confidence intervals.

Uploaded by

Islam Salikhanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Section 3.

Constructing Bootstrap
Confidence Intervals

Statistics: Unlocking the Power of Data Lock5


Outline
Bootstrap samples

Bootstrap distribution

Standard error from bootstrap distribution

95% confidence interval using SE from


bootstrap distribution

Statistics: Unlocking the Power of Data Lock5


Confidence Intervals
Confidence Interval

Sample statistic ± ME
Population

Sample

Sample
...
Sample Margin of Error (ME)
Sample Sample (95% CI: ME = 2×SE)

Sampling Distribution
Calculate statistic
for each sample Standard Error (SE):
standard deviation of
sampling distribution

Statistics: Unlocking the Power of Data Lock5


Summary
• To create a plausible range of values for a
parameter:
o Take many random samples from the population,
and compute the sample statistic for each sample
o Compute the standard error as the standard
deviation of all these statistics
o Use statistic  2SE

• One small problem…

Statistics: Unlocking the Power of Data Lock5


Reality

… WE ONLY HAVE ONE SAMPLE!!!!


• How do we know how much sample
statistics vary, if we only have one
sample?!?

BOOTSTRAP!
Statistics: Unlocking the Power of Data Lock5
ONE Reese’s Pieces Sample

Sample: 52/100 orange

pˆ  0.52

Where might the “true” p be?


Statistics: Unlocking the Power of Data Lock5
“Population”
• Imagine the “population” is many, many
copies of the original sample

• (What do you have to assume?)

Statistics: Unlocking the Power of Data Lock5


Reese’s Pieces “Population”

Sample repeatedly from


this “population”

Statistics: Unlocking the Power of Data Lock5


Sampling with Replacement
• To simulate a sampling distribution, we can just
take repeated random samples from this
“population” made up of many copies of the
sample
• In practice, we can’t actually make infinite copies
of the sample…
• … but we can do this by sampling with
replacement from the sample we have (each unit
can be selected more than once)
Statistics: Unlocking the Power of Data Lock5
Suppose we have a random sample
of 6 people:

Statistics: Unlocking the Power of Data Lock5


Original
Sample

A simulated “population” to sample from

Statistics: Unlocking the Power of Data Lock5


Bootstrap Sample: Sample with
replacement from the original sample, using
the same sample size.

Original Bootstrap Sample


Sample
Statistics: Unlocking the Power of Data Lock5
Reese’s Pieces
• How would you take a bootstrap sample from
your sample of Reese’s Pieces?

Statistics: Unlocking the Power of Data Lock5


Bootstrap Sample
Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 19, 20, 21, 22

NO. 22 is not a value from


the original sample

Statistics: Unlocking the Power of Data Lock5


Bootstrap Sample
Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 19, 20, 21

NO. Bootstrap samples must be the


same size as the original sample

Statistics: Unlocking the Power of Data Lock5


Bootstrap Sample
Your original sample has data values

18, 19, 19, 20, 21

Is the following a possible bootstrap sample?

18, 18, 19, 20, 21


YES. Same size, could be gotten
by sampling with replacement

Statistics: Unlocking the Power of Data Lock5


Bootstrap
A bootstrap sample is a random sample taken
with replacement from the original sample, of
the same size as the original sample

A bootstrap statistic is the statistic computed


on a bootstrap sample

A bootstrap distribution is the distribution of


many bootstrap statistics

Statistics: Unlocking the Power of Data Lock5


Bootstrap Bootstrap
Sample Statistic

Bootstrap Bootstrap
Sample Statistic
Original Bootstrap
Sample . . Distribution
. .
Sample . .
Statistic
Bootstrap Bootstrap
Sample Statistic
Statistics: Unlocking the Power of Data Lock5
Bootstrap Distribution
lock5stat.com/statkey/

Statistics: Unlocking the Power of Data Lock5


Why “bootstrap”?

“Pull yourself up by your bootstraps”


• Lift yourself in the air simply by pulling up on
the laces of your boots
• Metaphor for accomplishing an “impossible”
task without any outside help
Statistics: Unlocking the Power of Data Lock5
Sampling Distribution

Population
BUT, in practice we
don’t see the “tree” or
all of the “seeds” – we
only have ONE seed

µ
Statistics: Unlocking the Power of Data Lock5
Bootstrap Distribution
What can we
do with just Bootstrap  
Estimate the
one seed? “Population” distribution
and variability
Grow a (SE) of ’s from
NEW tree! the bootstraps

 
µ
Statistics: Unlocking the Power of Data Lock5
Golden Rule of Bootstrapping

Bootstrap statistics are to the


original sample statistic
as
the original sample statistic is to
the population parameter

Statistics: Unlocking the Power of Data Lock5


Center
•The sampling distribution is centered around
the population parameter
• The bootstrap distribution is centered
around the sample statistic

•Luckily, we don’t care about the center… we


care about the variability!
Statistics: Unlocking the Power of Data Lock5
Standard Error
•The variability of the bootstrap statistics is
similar to the variability of the sample
statistics

• The standard error of a statistic can be


estimated using the standard deviation of
the bootstrap distribution!

Statistics: Unlocking the Power of Data Lock5


Confidence Intervals
Confidence Interval
Bootstrap statistic ± ME
Sample Sample

Bootstrap
Sample

Bootstrap
Sample
Margin of Error (ME)
Bootstrap ... Bootstrap
Sample (95% CI: ME = 2×SE)
Sample
Bootstrap Distribution
Calculate statistic
for each bootstrap Standard Error (SE):
sample standard deviation of
bootstrap distribution

Statistics: Unlocking the Power of Data Lock5


What about Other Parameters?
 
Estimate the standard error and/or a
confidence interval for...
• proportion ()
• difference in means ()
• difference in proportions ()
• standard deviation ()
• correlation ()
• ... Generate samples with replacement
Calculate sample statistic
Repeat...
Statistics: Unlocking the Power of Data Lock5
The Magic of Bootstrapping
• We can use bootstrapping to assess the
uncertainty surrounding ANY sample
statistic!

• If we have sample data, we can use


bootstrapping to create a 95% confidence
interval for any parameter!

(well, almost…)

Statistics: Unlocking the Power of Data Lock5


Used Mustangs
What’s the average price of a used Mustang car?

Select a random sample of n = 25 Mustangs


from a website (autotrader.com) and record the
price (in $1,000’s) for each car.

Statistics: Unlocking the Power of Data Lock5


Sample of Mustangs:
MustangPrice Dot Plot

0 5 10 15 20 25 30 35 40 45
Price

Our best estimate for the


average price of used
Mustangs is $15,980, but how
accurate is that estimate?

BOOTSTRAP!
Statistics: Unlocking the Power of Data Lock5
Original Sample 1. Bootstrap Sample

2. Calculate
mean price
of bootstrap
sample

3. Repeat
many times!

Statistics: Unlocking the Power of Data Lock5


Used Mustangs

Standard
Error

Statistics: Unlocking the Power of Data Lock5


Used Mustangs
 
95% CI:

$15,980
($11,624, $20,336)

We are 95% confident that the average price of a


used Mustang on autotrader.com is between
$11,624 and $20,336
Statistics: Unlocking the Power of Data Lock5
Atlanta Commutes
What’s the mean commute time for workers in
metropolitan Atlanta?

Data: The American Housing Survey (AHS) collected data


from Atlanta in 2004

Statistics: Unlocking the Power of Data Lock5


Random Sample of 500 Commutes
CommuteAtlanta Dot Plot

20 40 60 80 100 120 140 160 180


Time

Where might the “true” μ be?


WE CAN BOOTSTRAP TO FIND OUT!!!
Statistics: Unlocking the Power of Data Lock5
Original Sample

Statistics: Unlocking the Power of Data Lock5


“Population” = many copies of sample

Sample from this


“population”

Statistics: Unlocking the Power of Data Lock5


Atlanta Commutes

95% confidence interval for the average commute time for


29.11 ± 2 × 0.915 27.3 to 30.9
Atlantans:
Statistics: Unlocking the Power of Data Lock5
Global Warming
What percentage of Americans believe in global warming?

A survey on 2,251 randomly selected individuals


conducted in October 2010 found that 1328 answered
“Yes” to the question

“Is there solid evidence of global warming?”

Give and interpret a 95% CI for the proportion of


Americans who believe there is solid evidence of global
warming.
Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10.
http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party

Statistics: Unlocking the Power of Data Lock5


Global Warming
www.lock5stat.com/statkey

0.59  2(0.01)
= (0.57, 0.61)
We are 95% sure that the
true percentage of all
Americans that believe
there is solid evidence of
global warming is between
57% and 61%
Statistics: Unlocking the Power of Data Lock5
Global Warming
Does belief in global warming differ by political party?

“Is there solid evidence of global warming?”

The sample proportion answering “yes” was 79% among


Democrats and 38% among Republicans.
(exact numbers for each party not given, but assume n=1000 for each group)

Give a 95% CI for the difference in proportions.

Source: “Wide Partisan Divide Over Global Warming”, Pew Research Center, 10/27/10.
http://pewresearch.org/pubs/1780/poll-global-warming-scientists-energy-policies-offshore-drilling-tea-party
Statistics: Unlocking the Power of Data Lock5
Global Warming
www.lock5stat.com/statkey
0.41  2(0.02)
= (0.37, 0.45)

We are 95% sure that the


difference in the proportion of
Democrats and Republicans
who believe in global warming
is between 0.37 and 0.45.
Statistics: Unlocking the Power of Data Lock5
Global Warming
Based on the data just analyzed, can you
conclude with 95% certainty that the
proportion of people believing in global
warming differs by political party?

Yes. We are 95% confident


that the difference is
between 0.37 and 0.45, and
this interval does not
include 0 (no difference)

Statistics: Unlocking the Power of Data Lock5


Summary
To
   generate a bootstrap distribution, we
 Generate bootstrap samples by sampling with
replacement form the original sample, using the same
sample size
 Compute the statistic of interest, a bootstrap statistic, for
each of the bootstrap samples
 Collectthe statistics for many bootstrap samples to form
a bootstrap distribution
If the bootstrap distribution is symmetric and bell-
shaped, a 95% CI can be estimated by , where SE can be
estimated as the standard deviation of a bootstrap
distribution
Statistics: Unlocking the Power of Data Lock5

You might also like