0% found this document useful (0 votes)
2 views

Chapter 07

Chapter 07 discusses sampling distributions, focusing on sampling plans and experimental designs, particularly simple random sampling. It explains the concept of statistics derived from samples and introduces the Central Limit Theorem, which states that sample means tend to follow a normal distribution as sample size increases. Additionally, it provides examples of calculating probabilities related to sample means and standard errors in the context of Alzheimer's disease duration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 07

Chapter 07 discusses sampling distributions, focusing on sampling plans and experimental designs, particularly simple random sampling. It explains the concept of statistics derived from samples and introduces the Central Limit Theorem, which states that sample means tend to follow a normal distribution as sample size increases. Additionally, it provides examples of calculating probabilities related to sample means and standard errors in the context of Alzheimer's disease duration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Statistics

Chapter 07 – Sampling Distribution


Sampling Plans (SPs) and Experimental Designs (EDs)
 The way a sample is selected is called the SP or ED.

same chance of
S1 S3
Selection (1/N)
S2
Simple Random
S4 SN Sample

2
SPs and EDs
Ways of Selecting a Sample of Size
 Simple random sampling is a commonly used SP
2 from 4 Objects
in which every sample of size 𝑛 has the same
chance of being selected. E.g., suppose you want to
select a sample of size 𝑛 = 2 from a population
containing 𝑁 = 4 objects. If the four objects are
identified by the symbols 𝑥1 , 𝑥2 , 𝑥3 and 𝑥4 , there
are six distinct pairs that could be selected.
 If the sample of 𝑛 = 2 observations is selected
so that each of these six samples has the same
chance—one out of six or 1/6—of selection, then
the resulting sample is called a simple random
sample, or just a random sample.
3
Statistics and Sampling Distributions (SDs)
 When we select a random sample from a population, the numerical
descriptive measures we calculate from the sample are called statistics.

4
Statistics and SDs
 Example. A population consists of 𝑁 = 5 numbers: 3, 6, 9, 12, 15. If a
random sample of size 𝑛 = 3 is selected without replacement, find the
sampling distributions for the sample mean 𝑥ҧ and the sample median 𝑚.

 Solution. We are sampling from the population that contains five


distinct numbers, and each is equally likely, with probability 𝑝(𝑥) = 1/5.
We can easily find the population mean and median as:

3 + 6 + 9 + 12 + 15
𝜇= and 𝑀 = 9
5 5
Statistics and SDs

Probability histogram for the 𝑁 = 5 population values


6
Statistics and SDs Values of 𝑥ҧ and 𝑚 for Simple Random
Sampling when 𝑛 = 3 and 𝑁 = 5.

 To find the sampling distribution,


we need to know what values of 𝑥ത
and 𝑚 can occur when the sample is
taken. There are 𝐶35 = 10 possible
random samples of size 𝑛 = 3 and
each is equally likely, with probability
1/10.
7
Statistics and SDs

 You will notice that some values of


𝑥ҧ are more likely than others because they
occur in more than one sample. For SDs for the Sample Mean
example, and the Sample Median

𝑝(𝑥ҧ = 8) = 0.2 and 𝑝(𝑚 = 6) = 0.3


8
Statistics and SDs

Probability histograms for the sampling distributions of the sample mean, 𝑥,ҧ
and the sample median, 𝑚. 9
Statistics and SDs
 Remember that the population mean is 𝜇 = 9 and that the population
median is also 𝑀 = 9, the exact center of both sampling distributions.

 If we only had a sample, and didn’t know the values for 𝜇 and 𝑀, we might
consider using their sample equivalents, 𝑥ҧ and 𝑚, as estimators. But which of
the two estimators is better?

 Both SDs are centered on the “target,” that is, the population mean or
median. The sample median misses the target by 3 when it is either 6 or 12,
which happens .3 + .3 = .6 or 60% of the time.
10
Statistics and SDs

 The sample mean also misses the target by 3 when it is either 6 or 12,
but this only happens .1 + .1 = 2 or 20% of the time.

 Eighty percent of the time, the sample mean is closer to its target,
which is the population mean 𝜇.

 Because the sample mean is closer to the population mean more


often, we might prefer to use it as our estimator.

11
The Central Limit Theorem (CLT)
 This theorem states that sums and means of random samples of
measurements drawn from a population tend to have an approximately
normal distribution.

 Suppose you toss a balanced die 𝑛 = 1 time. The random variable 𝑥 is the
number observed on the upper face. This familiar random variable can take six
values, each with probability 1/6. The shape of the distribution is flat—
generally called a discrete uniform distribution—and is symmetric about the
mean 𝜇 = 3.5, with a standard deviation 𝜎 = 1.71.
12
CLT

Probability distribution for 𝑥, the number appearing on a single


toss of a die 13
CLT
 Now, take a sample of size 𝑛 = 2 from this population; that is, toss two dice
and record the sum of the numbers on the two upper faces, i.e., ∑𝑥𝑖 = 𝑥1
+ 𝑥2 . So far, we have 36 possible outcomes, each with probability 1/36. The
sums are tabulated, and each of the possible sums is divided by 𝑛 = 2 to obtain
an average.

 When all the 36 possible averages are consolidated into a statistical table, the
result is the sampling distribution of 𝑥ҧ = ∑𝑥𝑖 /𝑛.

 Notice the dramatic difference in the shape of the sampling distribution. It is


now roughly mound-shaped but still symmetric about the mean 𝜇 = 3.5. 14
CLT

The sum of the Upper Faces of Two Dice


15
CLT

Sampling Distribution of 𝑥ҧ

16
CLT

Sampling Distribution of 𝑥ҧ for 𝑛 = 2 dice. 17


CLT

 If random samples of 𝑛 observations are drawn from a non-normal


population with finite mean 𝜇 and standard deviation 𝜎, then, when 𝑛 is
large, the sampling distribution of the sample mean 𝑥ҧ is approximately
normally distributed, with mean 𝜇 and standard deviation:

𝜎
𝑛
 The approximation becomes more accurate as 𝑛 becomes large.
18
CLT
 When the Sample Size is large enough to Use the Central limit
Theorem
1) If the sampled population is normal, then the sampling distribution of
𝑥ҧ will also be normal, no matter what sample size you choose. This result
can be proven theoretically, but it should not be too difficult for you to
accept without proof.
2) When the sampled population is approximately symmetric, the sampling
distribution of 𝑥 becomes approximately normal for relatively small values of
𝑛.
3) When the sampled population is skewed, the sample size 𝑛 must be larger,
with 𝑛 at least 30 before the sampling distribution of 𝑥ҧ becomes
approximately normal.
19
The SD of the Sample Mean

20
Standard Error of the Sample Mean

21
Standard Error of the Sample Mean

22
Standard Error of the Sample Mean
 Example. The duration of Alzheimer’s disease from the time
symptoms first appear until death ranges from 3 to 20 years; the
average is 8 years with a standard deviation of 4 years. The
administrator of a large medical center randomly selects the medical
records of 30 deceased Alzheimer’s patients from the medical center’s
database and records the average duration.
 Find the approximate probabilities for these events:
1. The average duration is less than 7 years.
2. The average duration exceeds 7 years.
3. The average duration lies within 1 year of the population mean 𝜇 = 8.
23
Standard Error of the Sample Mean
 Solution. Sampling Plan: Since the administrator has selected a
random sample from the database at this medical center, he can draw
conclusions about only past, present, or future patients with
Alzheimer’s disease at this medical center.

 If, on the other hand, this medical center can be considered


representative of other medical centers in the country, it may be possible
to draw more far-reaching conclusions.
24
Standard Error of the Sample Mean
 Population of Interest: What can you say about the shape of the

sampled population? It is not symmetric, because the mean 𝜇 = 8 does

not lie halfway between the maximum and minimum values.

 Since the mean is closer to the minimum value, the distribution is

skewed to the right, with a few patients living a long time after the onset

of the disease.
25
Standard Error of the Sample Mean
 Sampling Distribution of 𝑥:ҧ
 Regardless of the shape of the population distribution, the sampling
𝜎 4
distribution of 𝑥ത has a mean 𝜇 = 8 and standard deviation =
𝑛 30

= .73.
 In addition, because the sample size is 𝑛 = 30, the CLT ensures the
approximate normality of it sampling distribution.
Standard Error of the Sample Mean
 The probability that 𝑥ത is less than 7.
 To find this area, you need to calculate the value of 𝑧 corresponding to
𝑥ത = 7:

𝑥ҧ − 𝜇 7 − 8
𝑧= = = −1.37
𝜎/ 𝑛 .73
𝑃(𝑥ҧ < 7) = 𝑃(𝑧 < −1.37) = .0853

27
Standard Error of the Sample Mean

28
Standard Error of the Sample Mean

 The event that 𝑥ത exceeds 7 is the complement of the event that 𝑥ത is

less than 7.

 Thus, the probability that 𝑥ത exceeds 7 is:

𝑃(𝑥ҧ > 7) = 1 − 𝑃(𝑥ҧ ≤ 7) = 1 − .0853 = .9147

29
Standard Error of the Sample Mean

 The probability that 𝑥ത lies within 1 year of 𝜇 = 8 is the shaded area in


the Figure below.
 The 𝑧-value corresponding to 𝑥ത = 7 is 𝑧 = −1.37, from part 1, and
the 𝑧-value for 𝑥ത = 9 is:
𝑥ҧ − 𝜇 9 − 8
𝑧= = = 1.37
𝜎/ 𝑛 .73
𝑃(7 < 𝑥ҧ < 9) = 𝑃(−1.37 < 𝑧 < 1.37) = .9147 − .0853 = .8294

30
Standard Error of the Sample Mean

31

You might also like