Module 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Applied Statistics I

STAT 151
Module 5

By Haile Gessesse
Winter 2024

1
5 Sampling Distribution of statistics
Goal: We discuss the qqplot or normal probability plot,
distributions of sample means, and distribution of sample
proportions.
5.1 Normal Probability Plot
• A normal probability plot is a plot to identify normaly
distributed variables.

Example: A histogram and normal probability plot of


a sample of 100 male heights.

2
5.2 Sampling Distribution of the Sample Means
• We are often interested in estimating population pa-
rameters ( like µ, σ, p) using sample statistics (like x,
s, p̂) called point estimates.
• Sample statistics vary from sample to sample which
leads us to study their distribution.
• The sampling distribution of a statistic is the dis-
tribution of all values of the statistic when all possible
samples of the same size n are taken from the same
population.
• If the mean of a sampling distribution of a statistic
is equal to the true value of the parameter, then the
statistics is called unbiased estimator.

3
• The sampling distribution of the sample mean
is the distribution of all possible sample means (or the
distribution of the variable x), with all samples having
the same sample size n taken from the same popula-
tion.
Example: Consider the population consisting of the
values {1, 2, 5}. Thus, µ = 8/3

(a) List all possible samples of size 2 with their respec-


tive sample means and probabilities.

(b) Find the mean of the sampling distribution of the


sample means.
The mean of x =
µx̄ =(1+1.5+3+1.5+2+3.5+3+3.5+5)/9 = 8/3.
Interpretation: We can see that the mean of the sam-
ple means of all possible samples selected with replace-
ment is the same as the population mean. This is

4
actually generally true. We say that the sample means
tend to target the population mean.
• If all possible random samples of size n are selected
(with replacement) from a population with mean µ and
standard deviation σ, the mean of the sample means is
denoted by µx, and
µx = µ
and the standard deviation of the sample means is de-
noted by σx, and
σ
σx = √
n
which measures the sampling variability.
• Note that if the sample size gets larger, then the sam-
pling variability gets smaller.
• The Central Limit Theorem: For a large sample
size (n ≥ 30), the possible sample means are approxi-
mately normally distributed, regardless of the distribu-
tion of the variable under consideration.
Summary
• Suppose that a variable x of a population has mean µ
and standard deviation σ. Then, for samples of size n,
– The mean of x is µ. That is µx = µ.

– The standard deviation of x is σ/ n.
5
– if x is normally distributed, so is x, regardless of
sample size; and
– if the sample size is large (usually n ≥ 30), x is ap-
proximately normally distributed, regardless of the
distribution of x.
i.e.
 σ 
x ∼ N µ, √ when n ≥ 30
n
– As the sample size increases we would expect sam-
ples to yield more consistent sample means, hence
the variability among the sample means would be
lower. √
This is because σx = σ/ n.

6
Example: IQ scores are normally distributed with mean
100 and standard deviation 16. Find the probability
that
(a) a randomly selected person has an IQ score of at
least 104
(b) a group of 64 people has an average IQ score of at
least 104.
Solution:
(a) Let x = IQ scores. Then x ∼ N (100, 16). So, to
compute the probability that a randomly selected
person has an IQ score of at least 104, we should
compute P (x > 104). By the finding the z-score,
this means
104 − 100
P (x > 104) = P (z > ) = P (z > 0.25) = 1−0.5987 = 0.4013
16

(b) The sample means of a group of 64 individuals IQ


is x̄ ∼ N (100, 2). That is,
√ the mean√IQ is 100 and
standard deviation is σ/ n = 16/ 64 = 2. So,
to compute the probability of a group of 64 people
has an average IQ score of at least 104, we should
7
compute P (x̄ > 104). By the finding the z-score,
this means
104 − 100
P (x̄ > 104) = P (z > ) = P (z > 2) = 1 − 0.9772 = 0.0228
2

Example: Birth weights of male babies have a stan-


dard deviation of 1.33 lb. Determine the percentage of
all samples of 400 male babies that have mean birth
weights within 0.125 lb of the population mean birth
weight of all male babies.
Solution:
√ µx̄ = µ √
σx̄ = σ/ n = 1.33/ 400 = 0.0665
x−µx̄ (µ−0.125)−µ −0.125
z1 = σx̄ = 0.0665 = 0.0665 = −1.88
x−µx̄ (µ+0.125)−µ 0.125
z2 = σx̄ = 0.0665 = 0.0665 = 1.88

8
So, we find that the area under the standard normal
curve between -1.88 and 1.88 equals 0.9398. Conse-
quently, 93.98% of all samples of 400 male babies have
mean birth weights within 0.125 lb of the population
mean birth weight of all male babies.
Interpretation: There is about a 94% chance that
the sampling error made in estimating the mean birth
weight of all male babies by that of a sample of 400
male babies will be at most 0.125 lb.

Example: Data on salaries in the public school system


are published annually in National Survey of Salaries
and Wages in Public Schools by the Education Re-
search Service. The mean annual salary of (public)
classroom teachers is $49 thousand. Assume a stan-
dard deviation of $9.2 thousand. Do the following for
the variable annual salary of classroom teachers.
(a) Determine the sampling distribution of the sample
mean for samples of size 64.
Answer: Normal distribution is the sampling distribu-
tion of the sample means since the sample size is large
9
( > 30).
(b) Repeat part (a) for samples of size 256.
Answer: Again it is normal with smaller variation (i.e.
smaller standard deviation).
(c) Do you need to assume that class room teacher
salaries are normally distributed to answer parts (a)
and (b)?
Answer: No we don’t need as the sample size is large
enough (> 30)

• Remark: Note that for all samples of the same size


from P
a population the sample means are computed by
x̄ = nxi . So P
the sample totals of these samples can
be written as xi = nx̄.
• So the mean of the sample totals is nµ.
• The
√ standard deviations of the sample totals is
nσ.
• If n ≥ 30, then he the distribution of the sample
totals is X  √ 
xi = N nµ, nσ

10
Example: Find the mean. standard deviation, and
the distribution of the sample totals of samples of size
36 if the population mean is µ = 12 and population
standard deviation is σ = 5.
Solution:

5.3 Sampling Distribution of the Sample propor-


tions
• Consider a population in which each member either has
or does not have a specified attribute.
• Population proportion (p): The proportion (per-
centage) of the entire population that has the specified
attribute.
• Sample proportion (p̂) : The proportion (percent-
age) of a sample from the population that has the spec-
ified attribute.
• A sample proportion, p̂, is computed by using the for-
mula
x
p̂ =
n
11
n where x denotes the number of members in the sam-
ple that have the specified attribute and, as usual, n
denotes the sample size.
Note: For convenience, x is referred as the number
of successes and n − x is referred as the number of
failures.
Example: Many employers are concerned about the
problem of employees who call in sick when they are
not ill. The Hilton Hotels Corporation commissioned
a survey to investigate this issue. One question asked
the respondents whether they call in sick at least once
a year when they simply need time to relax. For brevity,
we use the phrase play hooky to refer to that practice.
The survey polled 1010 randomly selected U.S. employ-
ees. The proportion of the 1010 employees sampled
who play hooky was used to estimate the proportion of
all U.S. employees who play hooky.
(a) If 202 of the 1010 employees sampled play hooky,
then
x 202
p̂ = n = 1010 = 0.2
that is, 20.0% of the employees sampled play hooky.
20% can be used as a point estimate for the popu-
lation proportion p.
(b) If 184 of the 1010 employees sampled play hooky,
however, then
12
x 184
p̂ = n = 1010 = 0.182
that is, 18.2% of the employees sampled play hooky.
• The sampling distribution of p̂ for all samples of
size n is stated in short as
r
 p(1 − p) 
N p,
n
whenever n is large (i.e. np ≥ 10 and n(1 − p) ≥ 10)
• The mean of the sample proportions p̂ is p. i.e.
µp̂ = p.

• The q
standard deviation of the sample proportions p̂ is
p(1−p)
p is n . i.e.
r
p(1 − p)
σp̂ =
n
which measures the sampling variability.
• Note that if the sample size gets larger, then the sam-
pling variability gets smaller.
• Moreover, the sample proportions are normally distributed
for large enough n.

13
Example: Assume that about 20% of students in
Canadian universities are international students. Com-
pute the probability that
(a) out of 100 randomly selected students, at least 10%
are international students.
(b) out of 81 randomly selected students, at most 18 of
them are international students.
Solution:
(a) The sample proportions of international students in
a group of 100 students p̂ ∼ N (0.2, 0.04). That
is, the mean of the sampleq proportion q is 0.2 and
the standard deviation is p(1−p)
n = 0.2(1−0.2)
100 =
0.04. So, to compute the probability that out of
100 randomly selected students, at least 10% are
international students., we should compute P (p̂ >
0.1). By the finding the z-score, this means
0.1 − 0.2
P (p̂ ≥ 0.1) = P (z ≥ ) = P (z ≥ −2.5) = 1 − P (z < −2.5)
0.04
= 1 − 0.0062 = 0.9938

(b) The sample proportions of international students in


a group of 81 students p̂ ∼ N (0.2, 0.4/9). That
14
is, the mean of the sample
q proportion
q is 0.2 and
the standard deviation is p(1−p)
n = 0.2(1−0.2)
81 =
0.4/9. So, to compute the probability that out of
81 randomly selected students, at most 18 are in-
ternational students., we should compute P (p̂ ≤
18/81) = P (p̂ ≤ 2/9). By the finding the z-score,
this means
2/9 − 0.2
P (p̂ ≤ 2/9) = P (z ≤ ) = P (z ≤ 0.5) = 0.6915
0.4/9

Practice: A coke bottler claims that only 5% of the


coke cans are underfilled. A quality control techni-
cian randomly samples 200 cans of coke. What is
the probability that more than 10% of the cans are
underfilled? What is the probability that more than
20 of the cans in this sample are underfilled?

15
Bibliography
M. F. Triola, Elementary Statistics, 12th Edition, Pear-
son
N. A. Weiss, Introductory Statistics, 9th Edition,

16

You might also like