Chap 1 Sampling Distributions
Chap 1 Sampling Distributions
Sampling distributions
Definition 1.1.1. A population consists of all the observations from a random vari-
able of interest. Each observation in a population is a value of a random variable X
with some probability distribution, f (x).
Example 1.1.1. The observations obtained by measuring the air pollution from any
1-1
1.1. Parameters and sample statistics 1. Sampling distributions
Example 1.1.2. Consider the lifetimes of a certain storage battery being manufactured
for mass distribution in the country. The population in this case would be very large
but finite. The values of the battery lifetimes can be assumed by a continuous random
variable, perhaps with an exponential distribution or a Normal distribution.
Example 1.1.3. If one is inspecting items coming off an assemble line for defects,
then each observation in the population would take the values 0 or 1 of the Bernoulli
random variable X with probability distribution
f (x; p) = px (1 − p)x , x = 0, 1
where X = 0 indicates that the item is not defective and X = 1 indicates a defective
item, and p is the probability of any item being defective.
All these probability distributions, of course, are actually families of models in the
sense that each includes one or more parameters. The binomial variable, for instance,
is indexed by the probability of success, p; Poisson variable by rate of occurrence, λ;
the normal distribution is defined by two parameters µ and σ. More formally,
1-2
1. Sampling distributions 1.1. Parameters and sample statistics
Note that this will usually leave a small number of parameters unspecified at this
point, to be estimated from the data.
(c) Observe an often quite small number of actual instances (outcomes of random
experiments, or realizations of random variables), the sample, and use the as-
sumed distributional forms to generalize the sample results to the entire assumed
population, by estimating the unknown parameters.
If the inferences from the sample to the population are to be valid, the sample
need to be representative of the population. To make this concept precise, consider the
following examples:
(a) Are the heights of students in STA211 representative of all the UB students?
1-3
1.1. Parameters and sample statistics 1. Sampling distributions
(b) Would 10 successive insurance claims be representative of the claims over the
entire year?
n
Y
F (x1 , x2 , . . . , xn ) = F (xi ) (1.1)
i=1
1-4
1. Sampling distributions 1.1. Parameters and sample statistics
with the sample mean and variance. All of these summaries have the property that
they can be calculated from the sample, without any knowledge of the distribution of
X. Any summary which satisfies this property is called a statistic.
Definition 1.1.4. A statistic is any function of a random sample which does not
depend on any unknown parameters of the distribution of the population random
variable.
Pn 4
Pn 2
Example 1.1.4. A function such as i=1 Xi / i=1 Xi would be a statistic, but
Pn 2
i=1 (Xi − µ) would generally not be unless the population mean µ was known a
priori.
Example 1.1.5. The other most commonly used statistics for measuring center of the
data are the sample mean and median. Let X1 , X2 , . . . , Xn be a random sample. Then
1-5
1.2. Sampling distributions 1. Sampling distributions
observe the realization of the sample. If we are to use this statistic to draw inferences
about the distribution of X, then it is important to understand how observed values
of T (x1 , x2 , . . . , xn ) vary from one sample to the next. That is, we need to know the
probability distribution of T (x1 , x2 , . . . , xn ).
Exercise 1
1. Let X ∼ B(1, 0.5), and consider all possible random samples of size 3 on X.
Compute the sample mean for each of the sample and also compute its probability
mass function.
2. A fair die is rolled. Let X be the face value that turns up, and X1 , X2 be two
faces independent observations of X. Compute the PMF of the sample mean.
We first discuss the sampling distribution of a mean from a Normal population since
it can be derived theoretical and is exact.
1-6
1. Sampling distributions 1.2. Sampling distributions
σ2
X̄ ∼ N µ, (1.2)
n
Proof. By using the MGF (from STA221), we know that a sum of normal
random variables is also normal distributed and therefore here its only a matter
of finding the parameters. Thus,
n
! n
1X 1X
E X̄ = E Xi = E(Xi ) = nµ/n = µ
n i=1 n i=1
and
n
1 X
V(Xi ) = (nσ 2 )/n2 = σ 2 /n
V X̄ = 2
n i=1
What about if we are sampling from a population with unknown distribution? The
sampling distribution of the mean will still be approximately normal with mean µ and
variance σ 2 /n, provided that the sample size is large enough. This result is an imme-
diate consequence of the the Central Limit Theorem (CLT).
1-7
1.2. Sampling distributions 1. Sampling distributions
σ2
X̄ ∼ N µ, (1.3)
n
approximately.
The normal approximation for X̄ will generally be good if n ≥ 30, provided the popu-
lation distribution is not terribly skewed.
Example 1.2.1. An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal to 800 hours and a
standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs
will have an average life of less than 775 hours.
Example 1.2.2. Suppose the mean age of the University of Botswana students is 22.3
years and the standard deviation is 4 years. What is the probability that an average
of 64 random students is greater than 23 years?
Example 1.2.3. Let X denote the number of flaws in a 1 meter length of copper wire.
The probability mass function of X is presented in the following table.
x 0 1 2 3
Suppose 100 wires are sampled from this population. What is the probability that the
average number of flaws per wire in this sample is less than 0.5?
1-8
1. Sampling distributions 1.2. Sampling distributions
Recall that from STA221 that if Y1 , Y2 , . . . , Yn are independently and identically dis-
tributed (iid) Bernoulli(p) random variables then their sum, X = Y1 + Y2 + . . . + Yn ,
follows a binomial distribution, Bin(n, p). Thus, X is the sum of the sample observa-
tions and therefore the sample proportion is given by
X Y1 + Y2 + . . . + Yn
p̂ = = (1.4)
n n
is also the sample mean, Ȳ . Since the mean of Bernoulli random variable is p and
variance is p(1 − p), it then follows from the CLT that if n is sufficiently large,
p(1 − p)
p̂ ∼ N p, (1.5)
n
and
X ∼ N np, np(1 − p) (1.6)
The general conditions on n and p that affect the quality of the approximation are
np ≥ 5.
Example 1.2.4. Suppose that Gaborone Senior School, 10% of the students are over
18 years of age. In a sample of 400 students, what is the probability that more than
110 students of them are over 18?
(a) In a sample of 250 randomly chosen components, what is the probability that
fewer than 20 of them are defective?
1-9
1.2. Sampling distributions 1. Sampling distributions
(b) To what value must the probability of defective component be reduced so that
only 1% of lots 250 components contain 20 or more that are defective?
The previous examples have been concerned with sampling distributions on a single
mean. However, many researchers are usually interested in comparative experiments in
which one group is compared to the other. For example, sociologists may be interested
in a comparative study between male-headed and female-headed females. The basis
for that comparison revolves around µ1 − µ2 , the difference in population means.
Suppose that we have two independent populations with means µ1 and µ2 , and
variances σ12 and σ22 respectively. Let X̄1 and X̄2 be the corresponding means from
random samples of size n1 and n2 respectively. Then according to Theorem 1.2.2, both
X̄1 and X̄2 are approximately Normally distributed with means µ1 and µ2 and variances
σ12 /n1 and σ2 /n2 respectively. As a result, X̄1 − X̄2 is also Normally distributed with
mean
µX̄1 −X̄2 = E X̄1 − X̄2
= E X̄1 − E X̄2
= µ1 − µ2 (1.7)
1-10
1. Sampling distributions 1.2. Sampling distributions
and variance
2
σX̄1 −X̄2
= V X̄ 1 − X̄ 2
= V X̄1 + V X̄2
σ12 σ22
= + . (1.8)
n1 n2
Similarly, the difference in proportions are also normally distributed. This is be-
cause the proportion X/n, where X is a Binomial random variable, is actually an
average of a set of 0s and 1s and therefore by CLT approximately normal for suffi-
ciently large sample size. Hence
p1 (1 − p1 ) p2 (1 − p2 )
P̂1 − P̂2 ∼ N p1 − p2 , + (1.9)
n1 n2
Example 1.2.6. A random sample of size 25 is taken from a normal population having
a mean of 80 and a standard deviation of 5. A second random sample of size 36 is taken
from a different normal population having a mean of 75 and a standard deviation of 3.
Find the probability that the sample mean computed from the 25 measurements will
exceed the sample mean computed from the 36 measurements by at least 3.4 but less
than 5.9.
Example 1.2.7. Two different box-filling machines are used to fill cereal boxes on an
assembly line. The critical measurement influenced by these machines is the weight of
the product in the boxes. Engineers are quite certain that the variance of the weight of
product is σ 2 = 1 gram. Experiments are conducted using both machines with sample
1-11
1.2. Sampling distributions 1. Sampling distributions
sizes of 36 each. The sample averages for machines A and B are µA = 4.5 grams and
µB = 4.7 grams. Engineers are surprised that the two sample averages for the filling
machines are so different.
(b) Do the aforementioned experiments seem to, in any way, strongly support a
conjecture that the population means for the two machines are different? Explain
using your answer in (a).
Lets consider a random sample of size n from Normal population with mean µ and
variance σ 2 . Instead of finding the sampling distribution of the sample variance directly,
we consider the distribution of the statistic, (n − 1)S 2 /σ 2 . Note that this is indeed a
statistic since the variance is considered known here.
n 2
X Xi − µ
Y =
i=1
σ
1-12
1. Sampling distributions 1.2. Sampling distributions
Now consider,
n n
X
2
X 2
(Xi − µ) = (Xi − X̄) + (X̄ − µ)
i=1 i=1
n
X n
X n
X
= (Xi − X̄)2 + 2(X̄ − µ) (Xi − X̄) + (X̄ − µ)2
i=1 i=1 i=1
Pn
Note that the cross-product terms falls off since i=1 (Xi − X̄) = 0 and that (X̄ − µ)2
is constant. Thus,
n
X n
X
2
(Xi − µ) = (Xi − X̄)2 + n(X̄ − µ)2
i=1 i=1
n 2 Pn
X Xi − µ i=1 (Xi − X̄)2 (X̄ − µ)2
= +
i=1
σ σ2 σ 2 /n
Pn Xi −µ 2
i. The term i=1 σ
is exactly Y from Theorem 1.2.3 and therefore has a
chi-square distribution with n degrees of freedom.
ii. As for the last term, recall that X̄ ∼ N (µ, σ 2 /n). Therefore, similar to the
(X̄ − µ)2
above term, also follow a chi-square distribution but with one degree
σ 2 /n
of freedom.
Pn
i=1 (Xi − X̄)2
It thus follows that = (n − 1)S 2 /σ 2 has a chi-square distribution
σ2
with n − 1 degree of freedom. We present the formal theorem below.
1-13
1.2. Sampling distributions 1. Sampling distributions
Pn
2(n − 1)S 2 i=1 (Xi − X̄)
χ = =
σ2 σ2
It can be noted that the difference between Theorem 1.2.3 and 1.2.4 is that the popu-
lation mean is known in the former while in the latter is replaced by the sample mean.
Thus, when µ is not known then there is 1 less degree of freedom. That is, when
we use sample data to compute the mean, µ, there is 1 less degree of freedom in the
information used to estimate σ 2 .
Example 1.2.8. A manufacturer of car batteries guarantees that the batteries will
last, on average, 3 years with a standard deviation of 1 year. If five of these batteries
have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, should the manufacturer still be
convinced that the batteries have a standard deviation of 1 year? Assume that the
battery lifetime follows a normal distribution.
1-14