0% found this document useful (0 votes)
14 views65 pages

Stats CH 7 Powerpoint

Chapter 7 discusses survey sampling and inference, highlighting a case study by the AMA that faced scrutiny regarding its sampling method. It emphasizes the importance of proper sampling techniques and the potential biases that can affect survey results. The chapter also introduces key statistical concepts such as population, parameter, sample, and the Central Limit Theorem for sample proportions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views65 pages

Stats CH 7 Powerpoint

Chapter 7 discusses survey sampling and inference, highlighting a case study by the AMA that faced scrutiny regarding its sampling method. It emphasizes the importance of proper sampling techniques and the potential biases that can affect survey results. The chapter also introduces key statistical concepts such as population, parameter, sample, and the Central Limit Theorem for sample proportions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Chapter 7

Survey Sampling
and Inference

Copyright © 2013 Pearson Education, Inc. All rights reserved


Case Study

 In 2006, the American medical Association (AMA)


issued a press release (“Sex and Intoxication among
women more common on spring break according to
AMA poll”) in which it concluded, among other
things, that “83% of the [female, college-attending]
respondents agreed spring break trips involve more
or heavier drinking than occurs on campuses and
74% said spring break trips result in increased
sexual activity.”

1- 2 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Case Study

 The authors of the study claim these percentages


reflect the opinions not only of the 644 women who
responded to the survey but of all women who
participate in spring break.
 The AMA’s website claimed the results were based
on “a nationwide random sample of 644 women
who … currently attend college…. The survey has a
margin of error of +/–4 percentage points at a 95%
level of confidence.

1- 3 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Case Study

 However, some survey specialists were suspicious.


After a specialist corresponded with the AMA, it
changed its website posting to say the results were
based not on a random sample, nut instead on “a
nationwide sample of 644 women … who are part
of an online survey panel.” Margin of error is no
longer mentioned.
 What was wrong with the AMA’s spring break
survey?

1- 4 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Case Study

 Disagreement over how to interpret these results


show just how difficult inference is.
 In this chapter you’ll see why the method used to
collect data is so important to inference, and how
we use probability, under the correct conditions, to
calculate a margin of error to quantify our
uncertainty.
 At the end of this chapter, you’ll see why the AMA
changed its report.

1- 5 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


7.1
Learning about the
World through Surveys

Copyright © 2013 Pearson Education, Inc. All rights reserved


Survey Terminology
 The Population is the entire group of people or
objects we wish to study.
 A Parameter is a numerical value that characterizes
some aspect of the population. Examples: p, µ, σ
 A Census is a survey in which every member of the
population is measured.
 A Sample is a small collection of people or objects
taken from the population.
 A Statistic (also called an Estimator) is a number
calculated from the sample data. Examples: , , s

1- 7
Statistical Inference

 Statistical inference is the art and science of


drawing conclusions about a population on
the basis of observing only a small subset of
that population.
 Statistical inference always involves

uncertainty, so an important component of


this science is measuring our uncertainty.

1- 8
A survey asked 1000 US college
students if they preferred to study
alone or with others. 420 said they
preferred to study alone.
 The population is all US college students.
 The sample is the 1000 students who were
surveyed.
 The parameter of interest is p, the proportion of all
US college students who study alone.
 The statistic pˆ 0.42 is the proportion of the 1000
students who study alone.
 Statistical inference: We estimate that 42% of all
US college students prefer to study alone.
1- 9
Bias
 A method is Biased if it has a
tendency to produce an untrue value.
 Sampling Bias results from taking a sample

that is not representative of the population.


 Convenience sampling, voluntary response
sampling, non-response
 Measurement Bias comes from asking
questions that do not produce a true answer.
 Confusing wording, misleading questions
 Lack of realism, survey influences responses

1 - 10
It is Important to Know:
 What percentage of people who were asked
to participate actually did so?
 Did the researchers choose people to

participate in the survey or did the people


themselves choose to participate?
 Did the researcher leave out whole segments

of the population who are likely to answer


the question differently from the rest of the
population?
1 - 11 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Identify the Possible Biases.
Population: All Americans
 A student asked all 2500 of her Facebook friends if
they preferred Facebook to Twitter.
 A researcher asked 500 randomly selected people,
“Are you in favor of the unfair tax burden that the
hard working successful business people have so
that the lazy unemployed can receive a paycheck
without working?”
 On July 4, CNN posted on their website a question
asking if they supported the current US military
operations. 18,943 people responded.

1 - 12
Identify the Possible Biases.
Population: All Americans

 100 randomly selected Americans were asked by a


researcher, “Do you currently have a sexually
transmitted disease?”
 Gallop randomly selected 1000 phone numbers
from the yellow pages and then called to ask if they
supported government funding of high speed rail.
 A researcher stood outside a grocery store and asked
250 shoppers, “Do you eat out at a restaurant at
least three times per week?”

1 - 13
Simple Random Sampling
 Simple Random Sampling, SRS, involves
randomly drawing people from the
population, without replacement.
 An SRS attempts to provide a sample that

represents the whole population.


 If a scientific sampling technique is not done,

we cannot learn anything about the


population by looking at the
sample data.
1 - 14
7.2
Measuring the Quality
of a Survey

Copyright © 2013 Pearson Education, Inc. All rights reserved


Estimator and Estimate
Statisticians evaluate the method used for a
survey, not the outcome of a single survey.

We often use the word estimator to mean the
same thing as estimation method.

An estimate, on the other hand, is a number
produced by our estimation method.

1 - 16
Key Point p. 305
Population Parameter vs
Sample Statistic
How well does a sample statistic work as an
estimator of the population parameter?
The population parameter p, µ, σ is always the same.
The sample statistic , , s changes from sample to
sample. Sample Population Sample Statistic
Parameter
Example: 1 p = 25% = 3/10 = 30%
Take samples of size n = 10
2 p = 25% = 1/10 = 10%
from a population that
has p = 25%. 3 p = 25% = 2/10 = 20%
4 p = 25% = 4/10 = 40%

1 - 17
Sampling Distribution
The probability distribution of a statistic , , s is
called a sampling distribution.
Example: Value of Probability of
Take a sample of size n = 10 from a Seeing that Value
population that has p = 0.25. 0% 0.06
10% 0.18
20% 0.28
30% 0.26
40% 0.14
50% 0.06
60% 0.02
70% ~ 0.00
0%
60%
20%
80% Sample Proportion
40%
Total = 1

1 - 18
Key Point p. 306
Bias and Precision
Bias is measured using the center of the
sampling distribution. It is the distance
between the center and the population value.
Precision is a measure of the spread of the

sampling distribution,

using the standard


deviation of the
sampling distribution,
0% 20% 40%
60% 80% Sample Proportion

1 - 19 which is called the


Bias and Accuracy, Precision and
Standard Error
 Bias is a measure of the accuracy.
 These results are biased
towards the right.

 Standard Error is a measure of precision.


 These results are not very
precise. They have a wide
spread about the center.

1 - 20
Focus on Proportions
 Simulations show us:

 The mean of the distribution of all the


possible sample proportions always equals
the population proportion.

 The bias of is 0.

1 - 21
Focus on Proportions
 Simulations show us:

 The precision and bias are independent of


the population size
 as long as the population size is as least 10 times
larger than the sample size.
 An estimator based upon the sample size of
n = 20 is just as precise in a population of
1000 as in a population of a million.

1 - 22
Focus on Proportions
 Simulations show us:

 The standard error SE will be smaller for


larger sample sizes.

 Increasing the sample size improves


precision!

1 - 23
Focus on Proportions
 Simulations show us:
 The shape of the sampling distribution is
more symmetric for larger sample sizes.

n = 10 n = 100

0% 20% 40% 10% 15% 20% 25% 30% 35% 40%


60% 80% Sample Proportion Sample Proportion

Simulated sampling distributions from a population with p = 0.25


1 - 24
Focus on Proportions
 How are these sampling
distributions centered, and spread?
 Can we calculate the average spread?

n = 10 n = 100

0% 20% 40% 10% 15% 20% 25% 30% 35% 40%


60% 80% Sample Proportion Sample Proportion

Simulated sampling distributions from a population with p = 0.25


1 - 25
Focus on Proportions
 Formulas for the mean and standard error of
the sampling distribution of :

Mean = p

 The mean of the sampling distribution is equal to


the population proportion. The bias of p̂ is 0.
 If we don’t know p, we may use p̂ as an estimate.
This gives an estimate pˆ (1  pˆ )
for the standard error: SEest 
n
1 - 26
Example
 Only 65% of insured women get an
annual exam. Find the mean and
standard error for the sampling
distribution with sample size 500.
 Mean: p = 65%

 Standard Error:  
0.65 1  0.65 
pˆ 2.1%
500
Conclusion: If we drew a random sample of 500
women, we would expect 65% of them to have an
annual exam, give or take 2.1%
1 - 27
7.3
The Central Limit
Theorem for Sample
Proportions
Copyright © 2013 Pearson Education, Inc. All rights reserved
The Central Limit Theorem
 The Central Limit Theorem CLT gives us a very
good approximation of the sampling distribution of
a statistic (examples: , , s) without our needing to
do simulations.
 The theorem is named “Central” because the
concept is central to much of modern statistics.
 The CLT has several versions. For estimating a
population proportion, the CLT tells us that the
sampling distribution of is close to Normal.
 Some basic conditions must be met!

1 - 29
Requirements for the Central Limit
Theorem for Sample Proportions
 Random and Independent: The sample is
collected randomly and the trials are independent of
each other.
 Large Sample:
 The sample has at least 10 successes, np ≥ 10,
 and at least 10 failures n(1 – p) ≥ 10.

 If you don’t know p,


p̂ can be substituted
 Large Population: If the sample is collected
without replacement, then the population size is at
least 10 times the sample size.
1 - 30
Notes About the Requirements
 Since random sampling is usually impossible
to do, other sampling techniques are often
used instead. We must use methods that give
a fair representation of the population.
 A large enough sample size is absolutely

necessary. We must check this condition


using our data sample.
 Typically the population of interest is very

large, but one should still be aware of this


requirement.
1 - 31
Probability Distributions for
Sample Proportions

1 - 32
Key Point p. 312
The Central Limit Theorem
for Sample Proportions
 The Central Limit Theorem for Sample Proportions:
If the trials are random and independent and the
sample and population sizes are large, then the
sampling distribution of p̂ is approximately Normal
and follows
 p 1  p  
N  p, 
 n 
 
 If you don’t know p, p̂ can be substituted to
estimate the standard error.
1 - 33
Finding Probabilities with
the Central Limit Theorem: ~Example
 78% of all laboratory mice can make it through a
certain maze.
 If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
 Note that all requirements are met:
 Random Sample
 Large Enough Sample:

# Successes (n) = 600 x 0.78 = 468 ≥10


# Failures (n(1-)) = 600 x 0.22 = 122 ≥10
 Large Population Size: All mice in existence.

1 - 34
Finding Probabilities with
the Central Limit Theorem: ~Example
 78% of all laboratory mice can make it through a
certain maze.
 If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
 By CLT, the distribution for all possible sample
proportions (the sampling distribution)
is approximately Normal.
 Mean = .78 .78 .22
SE  0.017
600
 Sampling Distribution: N(0.78, 0.017)
1 - 35
Finding Probabilities with
the Central Limit Theorem: ~Example
 78% of all laboratory mice can make it through a
certain maze.
 If 600 randomly selected mice attempt the maze,
what is the probability that more than 80% of this
sample will make it through the maze?
StatCrunch:
Stat/Calculators
/Normal

P ( pˆ  0.8) 0.12
1 - 36
Key Point p. 316
The Sample Proportion and
the Empirical Rule
 If the conditions of a survey sample satisfy
those required by the CLT,
 then the probability that a sample proportion

will fall within two standard errors of the


population p
 is about 95%

1 - 37
The Sample Proportion and
the Empirical Rule ~Example
 No-till was practiced on 23.5 percent of corn acres in
the Basin states in 2005 *Nat’l Sustainable Agriculture Coalition
 If we randomly select 1000 1-acre plots of Basin corn,
the sampling distribution for is N(0.235, 0.0134).
since SE = = 0.0134
 There is a 95% probability that the sample proportion
that we get will fall between
 0.235 ± 2(0.0134)
 20.82% to 26.18%“Give or Take”

1 - 38
Example of a Failure of the CLT
 About half a percent of all people in the
world are living with HIV.
 You want to find the probability that out of

1000 randomly selected people, at least 1%


of them are living with HIV.
 “successes” np = 1000 x 0.005 = 5 < 10
 The CLT does not apply.
 Do not use the Normal Distribution

to calculate this probability.


1 - 39
7.4
Estimating the
Population Proportion
with Confidence
Intervals
Copyright © 2013 Pearson Education, Inc. All rights reserved
Confidence Intervals
 A Confidence Interval is used to estimate
an unknown parameter. Examples: p, µ, σ
 It is calculated from a sample.
 A confidence intervals provides:

1. A range of plausible values for the population


parameter

2. A confidence level, which expresses our level


of confidence in this interval.

1 - 41
Example of a Confidence Interval
 Using a sample of apples from my tree, I am
95% confident that the proportion of wormy
apples on my whole tree is:

0.18 ± 0.02 or 0.16 to 0.20

estimate margin confidence


of error interval
m
 95% is the confidence level

1 - 42
Example of a Confidence Interval
 Using a sample of apples from my tree, I am
95% confident that the proportion of wormy
apples on my whole tree is:

0.18 ± 0.02 or 0.16 to 0.20

 Where did these numbers come from?

1 - 43
Example of a Confidence Interval
95% Confidence Interval: 0.18 ± 0.02 or 0.16 to 0.20
 Our sample had = 0.18 and the SE = 0.01
 Using the CLT, we know that the probability

distribution of is Normal and centered around


the true population proportion p.
Thus, there is a 95% chance

that our sample is closer


than 2 SE, or 0.02, away
p from the true population p.
1 - 44 Standard Errors
Example of a Confidence Interval
95% Confidence Interval: 0.18 ± 0.02 or 0.16 to 0.20
 So, I am 95% confident that I got a sample that
has a within 0.02 of the population p.
 Likewise, I am 95% confident that the

population p is within 0.02 of my sample .


95% of such samples will
“catch” the true p within
a margin of error of 0.02.
5% of such samples will miss it!
p
1 - 45 Standard Errors
Confidence
Interval Interpretation
 The proportion of green M&M’s is 0.16. You take several
samples of 80 M&M’s each and come up with the following
95% confidence intervals:
(.14,.18), (.12,.17), (.15,.19), (.11,.15), (.12,.17), (.15,.20),
(.13,.17), (.14,.19), (.13,.18), (.15,.19), (.15,.20), (.14,.19)

 All of the above confidence intervals except (.11,.15)


successfully contain the population proportion.
 With a 95% confidence level, we expect 19 out of 20 samples
to “catch” the population proportion.

1 - 46 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Confidence vs. Margin of Error

 The confidence level measures the capture rate for


our method of finding confidence intervals.
 Increasing the level of confidence increases the
margin of error.
 Decreasing the level of confidence decreases the
margin of error.
1 - 47 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Confidence vs. Margin of Error

 These are the margins of error for four


commonly used confidence levels:
Table p. 320

The symbol
for the
multiplier is
z*
1 - 48
Formula 7.1 p. 320
Confidence Intervals
(p is not known!)
 When we are trying to estimate an unknown p, we
can use as an estimate for p when calculating the
standard error.
 To find confidence intervals for the population
proportion:
 ±m, ± z*SEest where SEest
 z* is a multiplier that is chosen to achieve the
desired confidence level (see table p.320).

1 - 49
Example
 Use a 95% confidence interval to estimate the
proportion of US drivers who admit to texting while
driving.
 Is it plausible that half (0.50) of Americans text
while driving?

1. Take a sample:
200 randomly selected American
drivers were asked if they text
while driving.
48 of them admitted that they did.

1 - 50
Example
2. Check conditions for using the CLT:
 The drivers were randomly selected.

 Successes: 48 ≥ 10, Failures: 152 ≥ 10

 Population Size: # of US drivers is very large.

3. Calculate the mean and SEest of the sampling


distribution:
Mean = = 48/200 = 0.24

0.24 0.76
SEest  0.03
200

1 - 51
Z* for 95%
Example
4. Estimate 1.96 SE either side of the sample mean:
0.24 – 1.96 x 0.03 = 0.24 – 0.0588 = 0.18
0.24 + 1.96 x 0.03 = 0.24 + 0.0588 = 0.30
 Based upon our sample, we are 95% confident that
the proportion of all US drivers who text is between
0.18 and 0.30.
 It is not plausible that half of US
drivers text while driving. Our
confidence interval does not contain
that proportion.
1 - 52
Interpreting Confidence Intervals
 300 randomly chosen voters were asked if
they favored the bond initiative to fund a new
college sports arena. 120 did support it. The
95% confidence interval is: (0.34, 0.46).
 Since a bond initiative requires over 50% of
the votes to pass and the 0.50 is above the
confidence interval, it is unlikely that the
bond initiative will pass.

1 - 53 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


StatCrunch and Confidence Intervals
 395 of the 600 randomly surveyed
students purchased an e-text.
Find a 99% confidence interval.
 Stat ► Proportions ► One sample ► with
Summary:

1 - 54 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Confidence Intervals Summary
 Use a confidence interval to get plausible
bounds on a population proportion.
 Do not use if np ˆ  10 or n(1  pˆ )  10
 The sample must be an SRS.
 The confidence level of 95% is standard.
 A lower level, e.g. 90%, can be used if you need

a smaller margin of error.


 A higher level, e.g. 99%, can be used at the

expense of a higher margin of error.

1 - 55 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


7.5
Margin of Error and
Sample Size for
Proportions
Copyright © 2013 Pearson Education, Inc. All rights reserved
The Sample Size

 Too small a sample size will result in a


larger margin of error than is wanted.
 Too large a sample size will result in

unnecessary expense of time and


resources.
 The sample size is determined

by the formula: * 2
z  1
n  
m 4
1 - 57 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Formula 7.2 p. 324
The Sample Size Formula
to Estimate a Population
Proportion
 When using a 95% confidence level, the
formula simplifies: 1
n 2
m
 m is the desired margin of error.
 n will be the approximate sample size
needed to get a margin of error m, assuming
a 95% confidence level.
1 - 58
Formula 7.2 p. 324
The Sample Size Formula
to Estimate a Population
Proportion
 When using a 95% confidence level, the
formula simplifies: 1
n 2
m
 m is the desired margin of error.
 Decreasing the margin of error m will
increase the sample size n!
1 - 59
Example: Required Sample Size
 Find a 95% confidence interval for the
proportion of people who are lactose
intolerant
 Use a margin of error of ± 3%.
 How many randomly selected people
do you need to survey?

1 1
n 2
 1111 .111
0.03 .0009
 You need to survey 1112 people.
1 - 60
Chapter 7
Case Study

Copyright © 2013 Pearson Education, Inc. All rights reserved


Case Study

 In 2006, the American medical Association (AMA)


issued a press release (“Sex and Intoxication among
women more common on spring break according to
AMA poll”) in which it concluded, among other
things, that “83% of the [female, college-attending]
respondents agreed spring break trips involve more
or heavier drinking than occurs on campuses and
74% said spring break trips result in increased
sexual activity.”

1 - 62 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Case Study

 However, some survey specialists were suspicious.


After a specialist corresponded with the AMA, it
changed its website posting to say the results were
based not on a random sample, nut instead on “a
nationwide sample of 644 women … who are part
of an online survey panel.” Margin of error is no
longer mentioned.

1 - 63 Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Case Study

 What was wrong with the American Medical


Association’s spring break survey?

Answer:
 The AMA poll was actually based on an

“online survey panel”, which consisted of a


group of people who agree to take part in
several different online surveys in exchange
for a small payment.
1 - 64 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Case Study

 For such a survey, it is impossible to find a confidence


interval for the true proportion of women who “agree that
spring break trips involve more/heavier drinking than occurs
on campus: because (1) our estimates might be biased, and
(2) the true percentage might lie much further from our
estimate than two standard errors.

 For this reason, AMA ended up removing the margin of error


from its website and no longer claimed that the figures were
a valid inference for all college-aged women.

1 - 65 Copyright © 2013 Pearson Education, Inc.. All rights reserved.

You might also like