0% found this document useful (0 votes)
14 views14 pages

Chap 1 Sampling Distributions

Chapter 1 discusses sampling distributions, focusing on the concepts of population parameters and sample statistics. It defines populations, random samples, and statistics, emphasizing the importance of representative sampling to avoid bias. The chapter also introduces the sampling distribution of the mean and proportions, detailing the conditions under which these distributions can be approximated as normal.

Uploaded by

Olorato Modise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views14 pages

Chap 1 Sampling Distributions

Chapter 1 discusses sampling distributions, focusing on the concepts of population parameters and sample statistics. It defines populations, random samples, and statistics, emphasizing the importance of representative sampling to avoid bias. The chapter also introduces the sampling distribution of the mean and proportions, detailing the conditions under which these distributions can be approximated as normal.

Uploaded by

Olorato Modise
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CHAPTER 1

Sampling distributions

1.1 Parameters and sample statistics

In applying probability concepts to real world situations, we usually need to know


the distributions of the underlying random variables, such as heights of people in a
population (for a clothing manufacturer), sizes of insurance claims (for the actuary),
concrete lifetimes (for the structural engineer), or number of defective items (for the
quality manager). Each random variable is defined in terms of a population, which
may, in fact, be somewhat hypothetical, and will almost always be very large or even
infinite. The word population here refers to observations relevant to the variable of
interest, whether it be groups of people, animals, or all possible outcomes of from a
biological or engineering system.

Definition 1.1.1. A population consists of all the observations from a random vari-
able of interest. Each observation in a population is a value of a random variable X
with some probability distribution, f (x).

Example 1.1.1. The observations obtained by measuring the air pollution from any

1-1
1.1. Parameters and sample statistics 1. Sampling distributions

conceivable position is an example of an infinite population whereas the blood types


of students from University of Botswana is an example of a finite population.

Example 1.1.2. Consider the lifetimes of a certain storage battery being manufactured
for mass distribution in the country. The population in this case would be very large
but finite. The values of the battery lifetimes can be assumed by a continuous random
variable, perhaps with an exponential distribution or a Normal distribution.

Example 1.1.3. If one is inspecting items coming off an assemble line for defects,
then each observation in the population would take the values 0 or 1 of the Bernoulli
random variable X with probability distribution

f (x; p) = px (1 − p)x , x = 0, 1

where X = 0 indicates that the item is not defective and X = 1 indicates a defective
item, and p is the probability of any item being defective.

All these probability distributions, of course, are actually families of models in the
sense that each includes one or more parameters. The binomial variable, for instance,
is indexed by the probability of success, p; Poisson variable by rate of occurrence, λ;
the normal distribution is defined by two parameters µ and σ. More formally,

Definition 1.1.2. A population parameter, also termed population characteris-


tic, is a numerical expressions summarizing various aspects of the entire population.
The mean and variance of a population are the most common examples of population
parameters.

In practice, we usually cannot determine the distribution of the random variable by

1-2
1. Sampling distributions 1.1. Parameters and sample statistics

total census or enumeration of the population, and consequently resort to sampling.


Typically, this involves three conceptual steps:

(a) Conceptualize or visualize a convenient hypothetical population defined on pos-


sible values of the random variable itself rather than the real observational units.
For example, the population may viewed as all non-negative integers rather than
hours of the day or students.

(b) Use a combination of professional judgment and mathematical modeling to pos-


tulate a distributional form on this population, e.g.:

i. Student’s final marks assumed N (µ, σ 2 ) on all real numbers.

ii. Insurance claims assumed to be Gamma(α, β) distributed on positive real


numbers.

Note that this will usually leave a small number of parameters unspecified at this
point, to be estimated from the data.

(c) Observe an often quite small number of actual instances (outcomes of random
experiments, or realizations of random variables), the sample, and use the as-
sumed distributional forms to generalize the sample results to the entire assumed
population, by estimating the unknown parameters.

If the inferences from the sample to the population are to be valid, the sample
need to be representative of the population. To make this concept precise, consider the
following examples:

(a) Are the heights of students in STA211 representative of all the UB students?

1-3
1.1. Parameters and sample statistics 1. Sampling distributions

(b) Would 10 successive insurance claims be representative of the claims over the
entire year?

The above examples suggest two possible sources of non-representativeness:

(a) observing certain parts of the population preferentially, and

(b) observing outcomes which are not independent of each other.

These ways of sampling may consistently overestimate or underestimate some char-


acteristics of the population and consequently lead to biased results. One way to
ensure eliminate the possibility of bias in sampling procedure is to deliberately impose
randomness on the selected population, in a way which ensures some uniformity of
coverage.
We shall refer to the random variable X in which we are interested as the popula-

tion random variable, with probability distribution f (x). The set X1 , X2 , . . . , Xn is
then a sample of size n taken from a population distribution F . The set of n values

x1 , x2 , . . . , xn is called a realization of the sample.

Definition 1.1.3. A random sample of size n of a population random variable X is a


collection on n i.i.d. random variables X1 , X2 , . . . , Xn all having the same distribution
as X.

The joint distribution of the i.i.d. random variables X1 , X2 , . . . , Xn is given by

n
Y
F (x1 , x2 , . . . , xn ) = F (xi ) (1.1)
i=1

The results of a random sample are usually summarized by a small number of


functions of X1 , X2 , . . . , Xn . You should be familiar with the 5-number summaries, and

1-4
1. Sampling distributions 1.1. Parameters and sample statistics

with the sample mean and variance. All of these summaries have the property that
they can be calculated from the sample, without any knowledge of the distribution of
X. Any summary which satisfies this property is called a statistic.

Definition 1.1.4. A statistic is any function of a random sample which does not
depend on any unknown parameters of the distribution of the population random
variable.

Pn 4
Pn 2
Example 1.1.4. A function such as i=1 Xi / i=1 Xi would be a statistic, but
Pn 2
i=1 (Xi − µ) would generally not be unless the population mean µ was known a

priori.

Example 1.1.5. The other most commonly used statistics for measuring center of the
data are the sample mean and median. Let X1 , X2 , . . . , Xn be a random sample. Then

(a) Sample mean:


n
1X
X̄ = Xi
n i=1
Pn
Note that the statistic X̄ assumes the value x̄ = i=1 xi when X1 = x1 , X2 = x2
and so on.

(b) Sample median:




x

(n+1)/2 when n is odd
x̃ =
 1 xn/2 + xn/2+1 when n is even

 
2

Let T (X1 , X2 , . . . , Xn ) be a statistic. It is important to realize that T (X1 , X2 , . . . , Xn )


is a random variable which takes on the numerical value T (x1 , x2 , . . . , xn ) whenever we

1-5
1.2. Sampling distributions 1. Sampling distributions

observe the realization of the sample. If we are to use this statistic to draw inferences
about the distribution of X, then it is important to understand how observed values
of T (x1 , x2 , . . . , xn ) vary from one sample to the next. That is, we need to know the
probability distribution of T (x1 , x2 , . . . , xn ).

Definition 1.1.5. The probability of a statistic, T = T (x1 , . . . , xn ), is called a sam-


pling distribution of T .

Exercise 1

1. Let X ∼ B(1, 0.5), and consider all possible random samples of size 3 on X.
Compute the sample mean for each of the sample and also compute its probability
mass function.

2. A fair die is rolled. Let X be the face value that turns up, and X1 , X2 be two
faces independent observations of X. Compute the PMF of the sample mean.

1.2 Sampling distributions

1.2.1 Sampling distribution of a mean

We first discuss the sampling distribution of a mean from a Normal population since
it can be derived theoretical and is exact.

1-6
1. Sampling distributions 1.2. Sampling distributions

Theorem 1.2.1. Let X1 , X2 , . . . , Xn be a random sample from N (µ, σ 2 )


population. Then

σ2
 
X̄ ∼ N µ, (1.2)
n

Proof. By using the MGF (from STA221), we know that a sum of normal
random variables is also normal distributed and therefore here its only a matter
of finding the parameters. Thus,

n
! n
 1X 1X
E X̄ = E Xi = E(Xi ) = nµ/n = µ
n i=1 n i=1

and

n
1 X
V(Xi ) = (nσ 2 )/n2 = σ 2 /n

V X̄ = 2
n i=1

What about if we are sampling from a population with unknown distribution? The
sampling distribution of the mean will still be approximately normal with mean µ and
variance σ 2 /n, provided that the sample size is large enough. This result is an imme-
diate consequence of the the Central Limit Theorem (CLT).

1-7
1.2. Sampling distributions 1. Sampling distributions

Theorem 1.2.2. Let X1 , X2 , . . . , Xn be a random sample from a population


with mean, µ and variance, σ 2 . Then if n is sufficiently large,

σ2
 
X̄ ∼ N µ, (1.3)
n

approximately.

The normal approximation for X̄ will generally be good if n ≥ 30, provided the popu-
lation distribution is not terribly skewed.

Example 1.2.1. An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal to 800 hours and a
standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs
will have an average life of less than 775 hours.

Example 1.2.2. Suppose the mean age of the University of Botswana students is 22.3
years and the standard deviation is 4 years. What is the probability that an average
of 64 random students is greater than 23 years?

Example 1.2.3. Let X denote the number of flaws in a 1 meter length of copper wire.
The probability mass function of X is presented in the following table.

x 0 1 2 3

P (X = x) 0.48 0.39 0.12 0.01

Suppose 100 wires are sampled from this population. What is the probability that the
average number of flaws per wire in this sample is less than 0.5?

1.2.2 Sampling distribution of a proportion

1-8
1. Sampling distributions 1.2. Sampling distributions

Recall that from STA221 that if Y1 , Y2 , . . . , Yn are independently and identically dis-
tributed (iid) Bernoulli(p) random variables then their sum, X = Y1 + Y2 + . . . + Yn ,
follows a binomial distribution, Bin(n, p). Thus, X is the sum of the sample observa-
tions and therefore the sample proportion is given by

X Y1 + Y2 + . . . + Yn
p̂ = = (1.4)
n n

is also the sample mean, Ȳ . Since the mean of Bernoulli random variable is p and
variance is p(1 − p), it then follows from the CLT that if n is sufficiently large,

 
p(1 − p)
p̂ ∼ N p, (1.5)
n

and


X ∼ N np, np(1 − p) (1.6)

The general conditions on n and p that affect the quality of the approximation are
np ≥ 5.

Example 1.2.4. Suppose that Gaborone Senior School, 10% of the students are over
18 years of age. In a sample of 400 students, what is the probability that more than
110 students of them are over 18?

Example 1.2.5. In a certain process, the probability of producing a defective compo-


nent is 0.07.

(a) In a sample of 250 randomly chosen components, what is the probability that
fewer than 20 of them are defective?

1-9
1.2. Sampling distributions 1. Sampling distributions

(b) To what value must the probability of defective component be reduced so that
only 1% of lots 250 components contain 20 or more that are defective?

1.2.3 Sampling distribution of the difference between two means

The previous examples have been concerned with sampling distributions on a single
mean. However, many researchers are usually interested in comparative experiments in
which one group is compared to the other. For example, sociologists may be interested
in a comparative study between male-headed and female-headed females. The basis
for that comparison revolves around µ1 − µ2 , the difference in population means.

Suppose that we have two independent populations with means µ1 and µ2 , and
variances σ12 and σ22 respectively. Let X̄1 and X̄2 be the corresponding means from
random samples of size n1 and n2 respectively. Then according to Theorem 1.2.2, both
X̄1 and X̄2 are approximately Normally distributed with means µ1 and µ2 and variances
σ12 /n1 and σ2 /n2 respectively. As a result, X̄1 − X̄2 is also Normally distributed with
mean


µX̄1 −X̄2 = E X̄1 − X̄2
 
= E X̄1 − E X̄2

= µ1 − µ2 (1.7)

1-10
1. Sampling distributions 1.2. Sampling distributions

and variance

2

σX̄1 −X̄2
= V X̄ 1 − X̄ 2
 
= V X̄1 + V X̄2
σ12 σ22
= + . (1.8)
n1 n2

(X̄1 − X̄2 ) − (µ1 − µ2 )


Hence, p is approximately standard normal random variable.
σ12 /n1 + σ22 /n2

Similarly, the difference in proportions are also normally distributed. This is be-
cause the proportion X/n, where X is a Binomial random variable, is actually an
average of a set of 0s and 1s and therefore by CLT approximately normal for suffi-
ciently large sample size. Hence

 
p1 (1 − p1 ) p2 (1 − p2 )
P̂1 − P̂2 ∼ N p1 − p2 , + (1.9)
n1 n2

Example 1.2.6. A random sample of size 25 is taken from a normal population having
a mean of 80 and a standard deviation of 5. A second random sample of size 36 is taken
from a different normal population having a mean of 75 and a standard deviation of 3.
Find the probability that the sample mean computed from the 25 measurements will
exceed the sample mean computed from the 36 measurements by at least 3.4 but less
than 5.9.

Example 1.2.7. Two different box-filling machines are used to fill cereal boxes on an
assembly line. The critical measurement influenced by these machines is the weight of
the product in the boxes. Engineers are quite certain that the variance of the weight of
product is σ 2 = 1 gram. Experiments are conducted using both machines with sample

1-11
1.2. Sampling distributions 1. Sampling distributions

sizes of 36 each. The sample averages for machines A and B are µA = 4.5 grams and
µB = 4.7 grams. Engineers are surprised that the two sample averages for the filling
machines are so different.

(a) Calculate P(X̄A − X̄B ≥ 0.2) under condition that µA = µB .

(b) Do the aforementioned experiments seem to, in any way, strongly support a
conjecture that the population means for the two machines are different? Explain
using your answer in (a).

1.2.4 Sampling distribution of sample variance

Lets consider a random sample of size n from Normal population with mean µ and
variance σ 2 . Instead of finding the sampling distribution of the sample variance directly,
we consider the distribution of the statistic, (n − 1)S 2 /σ 2 . Note that this is indeed a
statistic since the variance is considered known here.

However before we discuss the sampling distribution of (n − 1)S 2 /σ 2 , we have to


state the following important result.

Theorem 1.2.3. Let X1 , X2 , . . . , Xn be independent and identical Normally


distributed random variables with mean, µ and variance, σ 2 . Then the random
variable

n  2
X Xi − µ
Y =
i=1
σ

has a chi-square distribution with n degrees of freedom.

1-12
1. Sampling distributions 1.2. Sampling distributions

Now consider,

n n
X
2
X  2
(Xi − µ) = (Xi − X̄) + (X̄ − µ)
i=1 i=1
n
X n
X n
X
= (Xi − X̄)2 + 2(X̄ − µ) (Xi − X̄) + (X̄ − µ)2
i=1 i=1 i=1

Pn
Note that the cross-product terms falls off since i=1 (Xi − X̄) = 0 and that (X̄ − µ)2
is constant. Thus,

n
X n
X
2
(Xi − µ) = (Xi − X̄)2 + n(X̄ − µ)2
i=1 i=1

Now dividing throughout the equation by σ 2 we get

n  2 Pn
X Xi − µ i=1 (Xi − X̄)2 (X̄ − µ)2
= +
i=1
σ σ2 σ 2 /n

Two points to note from the equation above:

Pn Xi −µ 2

i. The term i=1 σ
is exactly Y from Theorem 1.2.3 and therefore has a
chi-square distribution with n degrees of freedom.

ii. As for the last term, recall that X̄ ∼ N (µ, σ 2 /n). Therefore, similar to the
(X̄ − µ)2
above term, also follow a chi-square distribution but with one degree
σ 2 /n
of freedom.

Pn
i=1 (Xi − X̄)2
It thus follows that = (n − 1)S 2 /σ 2 has a chi-square distribution
σ2
with n − 1 degree of freedom. We present the formal theorem below.

1-13
1.2. Sampling distributions 1. Sampling distributions

Theorem 1.2.4. Let X1 , X2 , . . . , Xn be independent and identical Normally


distributed random variables with mean, µ and variance, σ 2 . Also let S 2 be
the sample variance. Then the statistic

Pn
2(n − 1)S 2 i=1 (Xi − X̄)
χ = =
σ2 σ2

has a chi-square distribution with n − 1 degrees of freedom.

It can be noted that the difference between Theorem 1.2.3 and 1.2.4 is that the popu-
lation mean is known in the former while in the latter is replaced by the sample mean.
Thus, when µ is not known then there is 1 less degree of freedom. That is, when
we use sample data to compute the mean, µ, there is 1 less degree of freedom in the
information used to estimate σ 2 .

Example 1.2.8. A manufacturer of car batteries guarantees that the batteries will
last, on average, 3 years with a standard deviation of 1 year. If five of these batteries
have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, should the manufacturer still be
convinced that the batteries have a standard deviation of 1 year? Assume that the
battery lifetime follows a normal distribution.

1-14

You might also like