Chapter 07
Chapter 07
same chance of
S1 S3
Selection (1/N)
S2
Simple Random
S4 SN Sample
2
SPs and EDs
Ways of Selecting a Sample of Size
Simple random sampling is a commonly used SP
2 from 4 Objects
in which every sample of size 𝑛 has the same
chance of being selected. E.g., suppose you want to
select a sample of size 𝑛 = 2 from a population
containing 𝑁 = 4 objects. If the four objects are
identified by the symbols 𝑥1 , 𝑥2 , 𝑥3 and 𝑥4 , there
are six distinct pairs that could be selected.
If the sample of 𝑛 = 2 observations is selected
so that each of these six samples has the same
chance—one out of six or 1/6—of selection, then
the resulting sample is called a simple random
sample, or just a random sample.
3
Statistics and Sampling Distributions (SDs)
When we select a random sample from a population, the numerical
descriptive measures we calculate from the sample are called statistics.
4
Statistics and SDs
Example. A population consists of 𝑁 = 5 numbers: 3, 6, 9, 12, 15. If a
random sample of size 𝑛 = 3 is selected without replacement, find the
sampling distributions for the sample mean 𝑥ҧ and the sample median 𝑚.
3 + 6 + 9 + 12 + 15
𝜇= and 𝑀 = 9
5 5
Statistics and SDs
Probability histograms for the sampling distributions of the sample mean, 𝑥,ҧ
and the sample median, 𝑚. 9
Statistics and SDs
Remember that the population mean is 𝜇 = 9 and that the population
median is also 𝑀 = 9, the exact center of both sampling distributions.
If we only had a sample, and didn’t know the values for 𝜇 and 𝑀, we might
consider using their sample equivalents, 𝑥ҧ and 𝑚, as estimators. But which of
the two estimators is better?
Both SDs are centered on the “target,” that is, the population mean or
median. The sample median misses the target by 3 when it is either 6 or 12,
which happens .3 + .3 = .6 or 60% of the time.
10
Statistics and SDs
The sample mean also misses the target by 3 when it is either 6 or 12,
but this only happens .1 + .1 = 2 or 20% of the time.
Eighty percent of the time, the sample mean is closer to its target,
which is the population mean 𝜇.
11
The Central Limit Theorem (CLT)
This theorem states that sums and means of random samples of
measurements drawn from a population tend to have an approximately
normal distribution.
Suppose you toss a balanced die 𝑛 = 1 time. The random variable 𝑥 is the
number observed on the upper face. This familiar random variable can take six
values, each with probability 1/6. The shape of the distribution is flat—
generally called a discrete uniform distribution—and is symmetric about the
mean 𝜇 = 3.5, with a standard deviation 𝜎 = 1.71.
12
CLT
When all the 36 possible averages are consolidated into a statistical table, the
result is the sampling distribution of 𝑥ҧ = ∑𝑥𝑖 /𝑛.
Sampling Distribution of 𝑥ҧ
16
CLT
𝜎
𝑛
The approximation becomes more accurate as 𝑛 becomes large.
18
CLT
When the Sample Size is large enough to Use the Central limit
Theorem
1) If the sampled population is normal, then the sampling distribution of
𝑥ҧ will also be normal, no matter what sample size you choose. This result
can be proven theoretically, but it should not be too difficult for you to
accept without proof.
2) When the sampled population is approximately symmetric, the sampling
distribution of 𝑥 becomes approximately normal for relatively small values of
𝑛.
3) When the sampled population is skewed, the sample size 𝑛 must be larger,
with 𝑛 at least 30 before the sampling distribution of 𝑥ҧ becomes
approximately normal.
19
The SD of the Sample Mean
20
Standard Error of the Sample Mean
21
Standard Error of the Sample Mean
22
Standard Error of the Sample Mean
Example. The duration of Alzheimer’s disease from the time
symptoms first appear until death ranges from 3 to 20 years; the
average is 8 years with a standard deviation of 4 years. The
administrator of a large medical center randomly selects the medical
records of 30 deceased Alzheimer’s patients from the medical center’s
database and records the average duration.
Find the approximate probabilities for these events:
1. The average duration is less than 7 years.
2. The average duration exceeds 7 years.
3. The average duration lies within 1 year of the population mean 𝜇 = 8.
23
Standard Error of the Sample Mean
Solution. Sampling Plan: Since the administrator has selected a
random sample from the database at this medical center, he can draw
conclusions about only past, present, or future patients with
Alzheimer’s disease at this medical center.
skewed to the right, with a few patients living a long time after the onset
of the disease.
25
Standard Error of the Sample Mean
Sampling Distribution of 𝑥:ҧ
Regardless of the shape of the population distribution, the sampling
𝜎 4
distribution of 𝑥ത has a mean 𝜇 = 8 and standard deviation =
𝑛 30
= .73.
In addition, because the sample size is 𝑛 = 30, the CLT ensures the
approximate normality of it sampling distribution.
Standard Error of the Sample Mean
The probability that 𝑥ത is less than 7.
To find this area, you need to calculate the value of 𝑧 corresponding to
𝑥ത = 7:
𝑥ҧ − 𝜇 7 − 8
𝑧= = = −1.37
𝜎/ 𝑛 .73
𝑃(𝑥ҧ < 7) = 𝑃(𝑧 < −1.37) = .0853
27
Standard Error of the Sample Mean
28
Standard Error of the Sample Mean
less than 7.
29
Standard Error of the Sample Mean
30
Standard Error of the Sample Mean
31