Introducing Statistical Inference
with Resampling Methods
(Part 1)
Allan Rossman, Cal Poly – San Luis Obispo
Robin Lock, St. Lawrence University
George Cobb (TISE, 2007)
“What we teach is largely the technical
machinery of numerical approximations
based on the normal distribution and its
many subsidiary cogs. This machinery
was once necessary, because the
conceptually simpler alternative based
on permutations was computationally
beyond our reach….
2
George Cobb (cont)
… Before computers statisticians had
no choice. These days we have no
excuse. Randomization-based
inference makes a direct connection
between data production and the logic
of inference that deserves to be at the
core of every introductory course.”
3
Overview
We accept Cobb’s argument
But, how do we go about
implementing his suggestion?
What are some questions that need
to be addressed?
4
Some Key Questions
How should topics be sequenced?
How should we start resampling?
How to handle interval estimation?
One “crank” or two (or more)?
Which statistic(s) to use?
What about technology options?
5
Format – Back and Forth
Pick a question
One of us responds
The other offers a contrasting answer
Possible rebuttal
Repeat
No break in middle
Leave time for audience questions
Warning: We both talk quickly (hang on!)
Slides will be posted at:
www.rossmanchance.com/jsm2013/
6
How should topics be sequenced?
What order for various parameters (mean,
proportion, ...) and data scenarios (one
sample, two sample, ...)?
Significance (tests) or estimation (intervals)
first?
When (if ever) should traditional methods
appear?
7
How should topics be sequenced?
Breadth first
Start with data production
Summarize with statistics and graphs
Interval estimation (via bootstrap)
Significance tests (via randomizations)
Traditional approximations
More advanced inference
8
How should topics be sequenced?
ANOVA, two-way tables, regression More advanced
normal, t-intervals and tests Traditional methods
hypotheses, randomization, p-value, ... Significance tests
bootstrap distribution, standard error, CI, ... Interval estimation
mean, proportion, differences, slope, ... Data summary
experiment, random sample, ... Data production
9
How should topics be sequenced?
1. Ask a research
question
Depth first:
2. Design a study
Study one scenario and collect data
from beginning to end of
3. Explore the
statistical investigation data
process
4. Draw
Repeat (spiral) through inferences
various data scenarios
5. Formulate
as the course conclusions
progresses
6. Look back and
ahead
10
How should topics be sequenced?
One proportion
Descriptive analysis
Simulation-based test
Normal-based approximation
Confidence interval (simulation-, normal-based)
One mean
Two proportions, Two means, Paired data
Many proportions, many means, bivariate data
11
How should we start resampling?
Give an example of where/how your
students might first see inference
based on resampling methods
12
How should we start resampling?
From the very beginning of the course
To answer an interesting research question
Example: Do people tend to use “facial
prototypes” when they encounter certain
names?
13
How should we start resampling?
Which name do you associate with the face
on the left: Bob or Tim?
Winter 2013 students: 46 Tim, 19 Bob
14
How should we start resampling?
Are you convinced that people have genuine
tendency to associate “Tim” with face on left?
Two possible explanations
People really do have genuine tendency to associate
“Tim” with face on left
People choose randomly (by chance)
How to compare/assess plausibility of these
competing explanations?
Simulate!
15
How should we start resampling?
Why simulate?
To investigate what could have happened by chance
alone (random choices), and so …
To assess plausibility of “choose randomly”
hypothesis by assessing unlikeliness of observed
result
How to simulate?
Flip a coin! (simplest possible model)
Use technology
16
How should we start resampling?
Very strong evidence that people do tend to
put Tim on the left
Because the observed result would be very
surprising if people were choosing randomly
17
How should we start resampling?
Bootstrap interval estimate for a mean
Example: Sample of prices (in $1,000’s) for n=25
Mustang (cars) from an online car site.
MustangPrice Dot Plot
0 5 10 15 20 25 30 35 40 45
Price
𝑛 = 25 𝑥ҧ = 15.98 𝑠 = 11.11
How accurate is this sample mean likely to be?
18
Original Sample Bootstrap Sample
𝑥ҧ = 15.98 𝑥ҧ = 17.51
Bootstrap Bootstrap
Sample Statistic
Bootstrap Bootstrap
Sample Statistic
Original Bootstrap
Sample ● ● Distribution
● ●
● ●
Sample
Statistic
Bootstrap Bootstrap
Sample Statistic
We need technology!
StatKey
www.lock5stat.com/statkey
Chop 2.5%
Chop 2.5% in
in each tail Keep 95%
each tail
in middle
We are 95% sure that the mean price for
Mustangs is between $11,930 and $20,238
How to handle interval estimation?
Bootstrap? Traditional formula? Other?
Some combination? In what order?
24
How to handle interval estimation?
Bootstrap!
Follows naturally
Data Sample statistic How accurate?
Same process for most parameters
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 2 𝑆𝐸 : Good for moving to
traditional margin of error by formula
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 : Good to understand varying
confidence level
25
Sampling Distribution
Population
BUT, in practice we
don’t see the “tree” or
all of the “seeds” – we
only have ONE seed
µ
Bootstrap Distribution
What can we
do with just Bootstrap
one seed? “Population”
Chris Wild - USCOTS 2013
Grow a Use bootstrap errors that
NEW tree! we CAN see to estimate
sampling errors that we
CAN’T see.
𝑥ҧ µ
How to handle interval estimation?
At first: plausible values for parameter
Those not rejected by significance test
Those that do not put observed value of statistic in
tail of null distribution
28
How to handle interval estimation?
Example: Facial prototyping (cont)
Statistic: 46 of 65 (0.708) put Tim on left
Parameter: Long-run probability that a person
would associate “Tim” with face on left
We reject the value 0.5 for this parameter
What about 0.6, 0.7, 0.8, 0.809, …?
Conduct many (simulation-based) tests
Confident that the probability that a student puts
Tim with face on left is between .585 and .809
29
How to handle interval estimation?
30
How to handle interval estimation?
Then: statistic ± 2 × SE(of statistic)
Where SE could be estimated from simulated null
distribution
Applicable to other parameters
Then theory-based (z, t, …) using technology
By clicking button
31
Introducing Statistical Inference
with Resampling Methods
(Part 2)
Robin Lock, St. Lawrence University
Allan Rossman, Cal Poly – San Luis Obispo
One Crank or Two?
What’s a crank?
A mechanism for generating
simulated samples by a random
procedure that meets some criteria.
33
One Crank or Two?
Randomized experiment: Does wearing socks
over shoes increase confidence while walking
down icy incline?
Socks over Usual footwear
shoes
Appeared confident 10 8
Did not 4 7
Proportion who .714 .533
appeared confident
How unusual is such an extreme result, if there
were no effect of footwear on confidence?
34
One Crank or Two?
How to simulate experimental results under
null model of no effect?
Mimic random assignment used in actual
experiment to assign subjects to treatments
By holding both margins fixed (the crank)
Socks over Usual Total
shoes footwear
Confident 10 8 18 Black
Not 4 7 11 Red
Total 14 15 29 29 cards
35
One Crank or Two?
Not much evidence of an effect
Observed result not unlikely to occur by chance alone
36
One Crank or Two?
Two cranks
Example: Compare the mean weekly exercise
hours between male & female students
ExerciseHours
Gender Row
F M Summary
9.4 12.4 10.6
Exercise 7.40736 8.79833 8.04325
30 20 50
S1 = mean
S2 = s
S3 = count
37
One Crank or Two?
𝑥𝑓ҧ = 11.5
𝑥𝑓ҧ = 9.4
𝑥ҧ = 10.6
𝑥𝑓ҧ − 𝑥ҧ𝑚 = 1.25
𝑥ҧ𝑚 = 12.4
𝑥ҧ𝑚 = 10.25
Resample
Combine samples
(with replacement)
38
One Crank or Two?
𝑥𝑓ҧ = 10.6 𝑥𝑓ҧ = 10.3
𝑥𝑓ҧ = 9.4
𝑥𝑓ҧ − 𝑥ҧ𝑚 = 1.5
𝑥ҧ𝑚 = 10.6
𝑥ҧ𝑚 = 12.4
𝑥ҧ𝑚 = 8.8
Resample
Shift samples
(with replacement)
39
One Crank or Two?
Example: independent random samples
1950 2000 Total
Born in CA 219 258 477
Born elsewhere 281 242 523
Total 500 500 1000
How to simulate sample data under null that
popn proportion was same in both years?
Crank 2: Generate independent random binomials
(fix column margin)
Crank 1: Re-allocate/shuffle as above (fix both
margins, break association)
40
One Crank or Two?
For mathematically inclined students: Use
both cranks, and emphasize distinction
between them
Choice of crank reinforces link between data
production process and determination of p-value
and scope of conclusions
For Stat 101 students: Use just one crank
(shuffling to break the association)
41
Which statistic to use?
Speaking of 2×2 tables ...
What statistic should be used for the
simulated randomization distribution?
With one degree of freedom, there are many
candidates!
42
Which statistic to use?
#1 – the difference in proportions
𝑝Ƹ1 − 𝑝Ƹ2
... since that’s the parameter being estimated
43
Which statistic to use?
#2 – count in one specific cell
𝑋
What could be simpler?
Virtually no chance for students to mis-calculate,
unlike with 𝑝Ƹ1 − 𝑝Ƹ 2
Easier for students to track via physical simulation
44
Which statistic to use?
#3 – Chi-square statistic
2
2
𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝜒 =
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
Since it’s a neat way to see a 2-distribution
45
Which statistic to use?
#4 – Relative risk
𝑝ො1
𝑝ො2
46
Which statistic to use?
More complicated scenarios than 22 tables
Comparing multiple groups
With categorical or quantitative response variable
Why restrict attention to chi-square or F-statistic?
Let students suggest more intuitive statistics
E.g., mean of (absolute) pairwise differences in group
proportions/means
47
Which statistic to use?
48
What about technology options?
49
What about technology options?
50
What about technology options?
51
One to Many Samples
Three
Distributions
Interact with tails
What about technology options?
Rossman/Chance applets
www.rossmanchance.com/iscam2/
ISCAM (Investigating Statistical Concepts,
Applications, and Methods)
www.rossmanchance.com/ISIapplets.html
ISI (Introduction to Statistical Investigations)
StatKey
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
[email protected] [email protected] www.rossmanchance.com/jsm2013/
53
lock5stat.com/talks/RossmanLockJSM2013.pptx
Questions?
[email protected] [email protected]
Thanks!
54