Stats CH 7 Powerpoint
Stats CH 7 Powerpoint
Survey Sampling
and Inference
1- 7
Statistical Inference
1- 8
A survey asked 1000 US college
students if they preferred to study
alone or with others. 420 said they
preferred to study alone.
The population is all US college students.
The sample is the 1000 students who were
surveyed.
The parameter of interest is p, the proportion of all
US college students who study alone.
The statistic pˆ 0.42 is the proportion of the 1000
students who study alone.
Statistical inference: We estimate that 42% of all
US college students prefer to study alone.
1- 9
Bias
A method is Biased if it has a
tendency to produce an untrue value.
Sampling Bias results from taking a sample
1 - 10
It is Important to Know:
What percentage of people who were asked
to participate actually did so?
Did the researchers choose people to
1 - 12
Identify the Possible Biases.
Population: All Americans
1 - 13
Simple Random Sampling
Simple Random Sampling, SRS, involves
randomly drawing people from the
population, without replacement.
An SRS attempts to provide a sample that
1 - 16
Key Point p. 305
Population Parameter vs
Sample Statistic
How well does a sample statistic work as an
estimator of the population parameter?
The population parameter p, µ, σ is always the same.
The sample statistic , , s changes from sample to
sample. Sample Population Sample Statistic
Parameter
Example: 1 p = 25% = 3/10 = 30%
Take samples of size n = 10
2 p = 25% = 1/10 = 10%
from a population that
has p = 25%. 3 p = 25% = 2/10 = 20%
4 p = 25% = 4/10 = 40%
1 - 17
Sampling Distribution
The probability distribution of a statistic , , s is
called a sampling distribution.
Example: Value of Probability of
Take a sample of size n = 10 from a Seeing that Value
population that has p = 0.25. 0% 0.06
10% 0.18
20% 0.28
30% 0.26
40% 0.14
50% 0.06
60% 0.02
70% ~ 0.00
0%
60%
20%
80% Sample Proportion
40%
Total = 1
1 - 18
Key Point p. 306
Bias and Precision
Bias is measured using the center of the
sampling distribution. It is the distance
between the center and the population value.
Precision is a measure of the spread of the
sampling distribution,
1 - 20
Focus on Proportions
Simulations show us:
The bias of is 0.
1 - 21
Focus on Proportions
Simulations show us:
1 - 22
Focus on Proportions
Simulations show us:
1 - 23
Focus on Proportions
Simulations show us:
The shape of the sampling distribution is
more symmetric for larger sample sizes.
n = 10 n = 100
n = 10 n = 100
Mean = p
Standard Error:
0.65 1 0.65
pˆ 2.1%
500
Conclusion: If we drew a random sample of 500
women, we would expect 65% of them to have an
annual exam, give or take 2.1%
1 - 27
7.3
The Central Limit
Theorem for Sample
Proportions
Copyright © 2013 Pearson Education, Inc. All rights reserved
The Central Limit Theorem
The Central Limit Theorem CLT gives us a very
good approximation of the sampling distribution of
a statistic (examples: , , s) without our needing to
do simulations.
The theorem is named “Central” because the
concept is central to much of modern statistics.
The CLT has several versions. For estimating a
population proportion, the CLT tells us that the
sampling distribution of is close to Normal.
Some basic conditions must be met!
1 - 29
Requirements for the Central Limit
Theorem for Sample Proportions
Random and Independent: The sample is
collected randomly and the trials are independent of
each other.
Large Sample:
The sample has at least 10 successes, np ≥ 10,
and at least 10 failures n(1 – p) ≥ 10.
1 - 32
Key Point p. 312
The Central Limit Theorem
for Sample Proportions
The Central Limit Theorem for Sample Proportions:
If the trials are random and independent and the
sample and population sizes are large, then the
sampling distribution of p̂ is approximately Normal
and follows
p 1 p
N p,
n
If you don’t know p, p̂ can be substituted to
estimate the standard error.
1 - 33
Finding Probabilities with
the Central Limit Theorem: ~Example
78% of all laboratory mice can make it through a
certain maze.
If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
Note that all requirements are met:
Random Sample
Large Enough Sample:
1 - 34
Finding Probabilities with
the Central Limit Theorem: ~Example
78% of all laboratory mice can make it through a
certain maze.
If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
By CLT, the distribution for all possible sample
proportions (the sampling distribution)
is approximately Normal.
Mean = .78 .78 .22
SE 0.017
600
Sampling Distribution: N(0.78, 0.017)
1 - 35
Finding Probabilities with
the Central Limit Theorem: ~Example
78% of all laboratory mice can make it through a
certain maze.
If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
StatCrunch:
Stat/Calculators
/Normal
P ( pˆ 0.8) 0.12
1 - 36
Key Point p. 316
The Sample Proportion and
the Empirical Rule
If the conditions of a survey sample satisfy
those required by the CLT,
then the probability that a sample proportion
1 - 37
The Sample Proportion and
the Empirical Rule ~Example
No-till was practiced on 23.5 percent of corn acres in
the Basin states in 2005 *Nat’l Sustainable Agriculture Coalition
If we randomly select 1000 1-acre plots of Basin corn,
the sampling distribution for is N(0.235, 0.0134).
since SE = = 0.0134
There is a 95% probability that the sample proportion
that we get will fall between
0.235 ± 2(0.0134)
20.82% to 26.18%“Give or Take”
1 - 38
Example of a Failure of the CLT
About half a percent of all people in the
world are living with HIV.
You want to find the probability that out of
1 - 41
Example of a Confidence Interval
Using a sample of apples from my tree, I am
95% confident that the proportion of wormy
apples on my whole tree is:
1 - 42
Example of a Confidence Interval
Using a sample of apples from my tree, I am
95% confident that the proportion of wormy
apples on my whole tree is:
1 - 43
Example of a Confidence Interval
95% Confidence Interval: 0.18 ± 0.02 or 0.16 to 0.20
Our sample had = 0.18 and the SE = 0.01
Using the CLT, we know that the probability
The symbol
for the
multiplier is
z*
1 - 48
Formula 7.1 p. 320
Confidence Intervals
(p is not known!)
When we are trying to estimate an unknown p, we
can use as an estimate for p when calculating the
standard error.
To find confidence intervals for the population
proportion:
±m, ± z*SEest where SEest
z* is a multiplier that is chosen to achieve the
desired confidence level (see table p.320).
1 - 49
Example
Use a 95% confidence interval to estimate the
proportion of US drivers who admit to texting while
driving.
Is it plausible that half (0.50) of Americans text
while driving?
1. Take a sample:
200 randomly selected American
drivers were asked if they text
while driving.
48 of them admitted that they did.
1 - 50
Example
2. Check conditions for using the CLT:
The drivers were randomly selected.
0.24 0.76
SEest 0.03
200
1 - 51
Z* for 95%
Example
4. Estimate 1.96 SE either side of the sample mean:
0.24 – 1.96 x 0.03 = 0.24 – 0.0588 = 0.18
0.24 + 1.96 x 0.03 = 0.24 + 0.0588 = 0.30
Based upon our sample, we are 95% confident that
the proportion of all US drivers who text is between
0.18 and 0.30.
It is not plausible that half of US
drivers text while driving. Our
confidence interval does not contain
that proportion.
1 - 52
Interpreting Confidence Intervals
300 randomly chosen voters were asked if
they favored the bond initiative to fund a new
college sports arena. 120 did support it. The
95% confidence interval is: (0.34, 0.46).
Since a bond initiative requires over 50% of
the votes to pass and the 0.50 is above the
confidence interval, it is unlikely that the
bond initiative will pass.
by the formula: * 2
z 1
n
m 4
1 - 57 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Formula 7.2 p. 324
The Sample Size Formula
to Estimate a Population
Proportion
When using a 95% confidence level, the
formula simplifies: 1
n 2
m
m is the desired margin of error.
n will be the approximate sample size
needed to get a margin of error m, assuming
a 95% confidence level.
1 - 58
Formula 7.2 p. 324
The Sample Size Formula
to Estimate a Population
Proportion
When using a 95% confidence level, the
formula simplifies: 1
n 2
m
m is the desired margin of error.
Decreasing the margin of error m will
increase the sample size n!
1 - 59
Example: Required Sample Size
Find a 95% confidence interval for the
proportion of people who are lactose
intolerant
Use a margin of error of ± 3%.
How many randomly selected people
do you need to survey?
1 1
n 2
1111 .111
0.03 .0009
You need to survey 1112 people.
1 - 60
Chapter 7
Case Study
Answer:
The AMA poll was actually based on an