12 - Sampling Distribution of Statistics
12 - Sampling Distribution of Statistics
• A statistic is a number that can be computed from the sample data without making
use of any unknown parameters.
In practice, we often use a statistic to estimate an unknown parameter.
• Different random samples yield different statistics (recall sampling error). We need to be able to
describe the sampling distribution of possible statistic values in order to perform statistical
inference.
• We can think of a statistic as a random variable because it takes numerical values that describe the
outcomes of the random sampling process.
• Therefore, we can examine the probability distribution of a statistics using what we’ve learned so far
Population
Unknown parameters
Hypothesis
Sample
Collect data from a
representative
Sample
• St. Andrew’s College received 900 applications from prospective students. The application form
contains a variety of information including:
the individual’s (SAT) score
and whether or not the individual desires on-campus housing
• At a meeting in a few hours, the Director of Admissions would like to announce the following for the
population of 900 applicants:
the average SAT score , and
the proportion of applicants that want to live on campus
𝑥=
∑ 𝑥𝑖 50,520
= =1684
would have identified a different
sample which would have resulted in
𝑛 30 different point estimates (sampling
error)
• as a point estimator for
√ ∑ ( 𝑥𝑖 − 𝑥 )
√
2
210,512
𝑠= = =¿ 85.2¿
𝑛− 1 29
• as a point estimator for
𝑝=
∑ 𝑦 𝑖 20
= =0.67
𝑛 30
12: Sampling Distribution of Statistics 9
Point Estimation
Example: St. Andrew’s College
• Once all the data for the 900 applicants were entered in the college’s database, the
values of the population parameters of interest were calculated.
• The population mean SAT score,
𝜇=
∑ 𝑥𝑖 1,527,300
= =1697
𝑁 900
• The population standard deviation for SAT score,
∑
√
2
( 𝑥𝑖 − 𝜇 )
𝑠= = 87 . 4
𝑁
• Population proportion wanting on-campus housing,
𝑝=
∑ 𝑦 𝑖 648
= =0.72
𝑁 900
12: Sampling Distribution of Statistics 10
Summary of Point Estimates
Example: St. Andrew’s College
Population Parameter Point Point
Parameter Value Estimator Estimate
= Population mean 1697 = Sample mean 1684
SAT score SAT score
• A: If we keep on taking larger and larger samples, the statistic is guaranteed to get
closer and closer to the parameter "μ“
As the sample size increases, gets closer to 𝜇
• The same can be said for (the population proportion) and (the sample proportion)
• If we took every one of the possible samples of a certain size, calculated the sample
mean for each, and graphed all of those values, we’d have a sampling distribution.
Population:
100 blue balls
100 red balls
Define:
proportion of red balls 𝑝
𝒑
𝑝
E() =
𝜎
𝜎 𝑥=
√𝑛
𝜎
𝑥is approximately 𝑁(μ, )
√𝑛
• The central limit theorem allows us to use Normal probability calculations to
answer questions about sample means from many observations even when the
population distribution is not Normal.
𝜎
𝑥is approximately 𝑁(μ, )
√𝑛
𝑋−𝜇
𝑍=
𝜎 / √𝑛
𝑥
𝐸 ( 𝑥 )=1697
12: Sampling Distribution of Statistics 25
The sampling distribution of
Example: St. Andrew’s College
• What is the probability that a simple random sample of 30 applicants
will provide an estimate of the population mean SAT score that is within
+/-10 of the actual population mean ?
• In other words, what is the probability that will be between 1687 and
1707?
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Sampling
Distribution 15.96
of
for SAT
Scores
Area = .7357
𝑥
1697 1707
12: Sampling Distribution of Statistics 29
The sampling distribution of
Example: St. Andrew’s College
• Step 3: Calculate the z-value at the lower endpoint of the interval.
• Step 4: Find the area under the curve to the left of the upper endpoint.
𝑥
1687 1697
• The probability that the sample mean SAT score will be between 1687
and 1707 is:
𝑥
1687 1697 1707
E() = p
𝜎 𝑝=
𝑝( 1− 𝑝)
𝑛 √
where is the standard error of the proportion
𝑝 is approximately 𝑁 (𝑝 ,
𝑝 ( 1 −𝑝 )
𝑛
)
√
𝑝−𝑝
𝑍=
√ 𝑝 (1− 𝑝)/ 𝑛
Scenario 2: Population distribution of unknown or not normal & sample size at least 30
• According to CLT, the sampling distribution of can be approximated by a normal distribution
whenever the sample is size 30 or more
E() = .72
• Step 2: Find the area under the curve to the left of the upper endpoint.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Sampling
Distribution
of
𝜎 𝑝=
√
.72 (1− .72)
30
=0.082
Area = .7291
.72 .77
• Step 4: Find the area under the curve to the left of the upper endpoint.
Sampling
Distribution
of
𝜎 𝑝=
√
.72 (1− .72)
30
=0.082
Area = .2709
.67 .72
Area = .4582
The College Board reported the following mean scores for the three parts of the Scholastic Aptitude Test (SAT). Assume
that the population standard deviation on each part of the test is . An SRS sample of 90 test takers was drawn.
b) What is the probability a sample of 90 test takers will provide a sample mean test score within 10 points of the
population mean of 502 on the Critical Reading part of the test? Within 10 points means
The College Board reported the following mean scores for the three parts of the Scholastic Aptitude Test (SAT). Assume
that the population standard deviation on each part of the test is .
c) What is the probability a sample of 90 test takers will provide a sample mean test score within 10 points of the
population mean of 515 on the Mathematics part of the test? Compare this probability to the value computed in part (a).
Within 10 points means
The probabilities are the same for both the Math and Reading portion of the SAT. This is because the standard error is the same in both cases. The fact
that the means differ does not affect the probability calculations.
d) What is the probability a sample of 100 test takers will provide a sample mean test score within 10 of the population
mean of 494 on the writing part of the test? Comment on the differences between this probability and the values
computed in parts (a) and (b).
Note that the standard error is smaller because the sample size is larger.
Within 10 points means
The probability is larger here than in part a) and b) because the larger sample size has made the standard error smaller.