Sampling Distribution-1
Sampling Distribution-1
Sampling:
In Judgment sampling, personal knowledge and opinion are used to identify the items from the
population that are to be included in the sample.
This sampling is based on someone’s expertise about the population.
Systematic sampling
In systematic sampling, elements are selected from the population at a uniform interval that is
measured in time, order, or space.
Stratified sampling
In this sampling, we divide the population into relatively homogeneous groups, called strata. Then
we use one of two approaches. Either we select at random from each stratum a specified number of
elements corresponding to the proportion of that stratum in the population as a whole or we draw an
equal number of elements from each stratum and give weight to the results according to the stratum’s
proportion of total population.
Stratified sampling is appropriate when the population is already divided into groups of different sizes
and we wish to acknowledge this fact.
For example: a state could be separated into counties, a school could be separated into grades.
University is divided into faculties. These would be the 'strata'
Cluster sampling
In cluster sampling, we divide the population into groups, or clusters, and then select a random sample
of these clusters. Every individual of that cluster/ sample will be examined.
For example: In a study of the opinions of homeless across a country, rather than study a few
homeless people in all towns, a number of towns are selected and a significant number of homeless
people are interviewed in each one.
We use stratified sampling when each group has small variation within each group but there is a wide
variation between the groups, while cluster sampling is used in the opposite case when there is wide
variation within each group but the groups are essentially similar to each other.
The main difference between stratified and cluster sampling is that in stratified sampling all the
strata need to be sampled. In cluster sampling one proceeds by first selecting a number of clusters at
random and then sampling each cluster or conduct a census of each cluster. But usually not all
clusters would be included.
Sampling Distribution:
A probability distribution of all the possible statistics (means, medians, proportions) of the samples is a
distribution of the sample statistics (means, medians, proportions).
Statisticians call
Sampling distribution of the mean
Sampling distribution of the median
Sampling distribution of the proportion
Any probability distribution (and therefore, any sampling distribution) can be partially described by its
mean and standard deviation.
When we use the term Standard Error to describe a distribution, we mean “distribution’s standard
deviation”.
OR “the standard deviation of the distribution of sample means”
we say “the standard error of the mean”.
Take series of all possible samples (each sample of expenditures for different class
students) and calculate the mean and standard deviation for each one of these samples.
As a result, each sample would have its own mean, x ( x bar), and its own standard
deviation S.
All the individual sample means would not be the same as the population mean. They
would tend to be near the population mean, but only rarely would they have been exactly
that value.
Now we produce a distribution of all the means from every sample that could be taken.
This distribution called the sampling distribution of the mean.
This distribution of sampling means (the sampling distribution) would have its own
mean x (mu sub x bar) and its own standard deviation x (sigma sub x bar)
The standard deviation of the distribution of sample means ( x ) is the “standard error of
the mean”.
“Standard Error of the mean” measures the extent to which we expect the means from
the different samples to vary.
2. The sampling distribution has a standard deviation (a standard error) equal to the population
standard deviation by the square root of the sample size
x
n
Question: For a population 0, 4, 8, 12 construct sampling distribution of mean for samples of size 2
taken without replacement and find its mean and standard error.
Total number of Samples of size 2 out of 4 objects, taken without replacement = 4C 2 = (42) = 6
x P( x )
2 1/6
4 1/6
6 2/6
8 1/6
10 1/6
x P( x ) x . P( x ) x - x ( x - x )2 P( x ). ( x - x )2
2.67
2 0.167 0.33 -4.00 16.00
0.67
4 0.167 0.67 -2.00 4.00
0.00
6 0.33 2.00 0.00 0.00
0.67
8 0.167 1.33 2.00 4.00
2.67
10 0.167 1.67 4.00 16.00
6.68
x = 6
𝜎𝑥̅ = √𝐏( x ). ( x − x )2
𝜎 𝑁−𝑛
This can be done directly by using 𝜎𝑥̅ = √
√𝑛 𝑁−1
(When sampling is without replacement, Finite Population Multiplier will be included)
𝒙 𝒙−𝝁 (𝒙 − 𝝁)2
0 0–6 = 6 36
4 4 – 6 = -2 4
8 8–6 = 2 4
12 12 – 6 = 6 36
24 80
∑𝑥 24
𝜇= = =6
𝑛 4
∑(𝑥−𝜇)2 80
𝜎2 = = = 20
4 4
𝜎 = √20 = 4.47
Question: For a population 0, 4, 8, 12 construct sampling distribution of mean for samples of size 2
taken with replacement and find its mean and standard error.
Total number of Samples of size 2 out of 4 objects, taken with replacement = 24 = 16
x 0 2 4 6 8 10 12
P( x ) 1/16 2/16 3/16 4/16 3/16 2/16 1/16
x P( x ) x . P( x ) x - x ( x - x )2 P( x ). ( x - x )2
2.25
0 0.0625 0 0–6 36
2.00
2 0.125 0.25 2–6 16
0.75
4 0.1875 0.75 4–6 4
00
6 0.25 1.50 6–6 0
0.75
8 0.1875 1.50 8–6 4
2.00
10 0.125 1.25 10 – 6 16
2.25
12 0.0625 0.75 12 – 6 36
10.00
x =6.00
𝜎𝑥̅ = √𝐏( x ). ( x − x )2
𝜎
This can be done directly by using 𝜎𝑥̅ =
√𝑛
(When sampling is with replacement, Finite Population Multiplier will be ignored)
𝜎 4.47
Then 𝜎𝑥̅ = = = 3.162
√𝑛 √2
Question: A random sample of size 16 is drawn from a population with replacement. The standard
deviation of the population is 5.
(i) Find standard error of mean
(ii) Find standard error of mean if sample size is increased to 100.
Question: A random sample of size 25 is drawn from a population with mean 82 and standard
deviation 30. It is desired to reduce the standard error of mean by 1/3. What sample size
must be selected to accomplish that?
Solution:
Population size is not given so we have to assume it large (or infinite), therefore FPM
will be ignored, and
𝜎
𝜎𝑥̅ =
√𝑛
Put n=25 and 𝜎=30, we get
30 30
𝜎𝑥̅ = = =6
√25 5
The standard error, 𝜎𝑥̅ = 6 is to be reduced to 𝜎𝑥̅ = 2 (since one third)
𝜎 𝜎
𝜎𝑥̅ = or √𝑛 𝜎𝑥̅ = 𝜎 or √𝑛 =
√𝑛 𝜎𝑥̅
𝜎2
Or 𝑛=
𝜎𝑥̅ 2
Question: A bank calculates that its individual savings accounts are normally distributed with a mean
of $2000 and a standard deviation of $600. If the bank takes a random sample of 100
accounts, what is the probability that the sample mean will lie between $1900 and $2050?
Solution: This question is about the sampling distribution of the mean, therefore we calculate standard
error of the mean.
The equation to calculate standard error of the mean is x
n
600 600
x $60 ← Standard error of the mean
100 10
Now we use the equation of z values, which enables us to use standard Normal Probability
Distribution table to determine the probability that the sample mean will lie between $1900
and $2050.
First we calculate for x $1900
x 1900 2000 100
z 1.67 ← Standard deviations from the mean of SNPD
x 60 60
Now we use standard Normal Probability Distribution table to determine the probability
for a Z value of -1.67, which we find 0.4525
and for a Z value of 0.83, we find 0.2967.
P(1900 ≤ X≤ 2050) = P(-1.67 ≤ Z≤ 0.83)
= P(-1.67 ≤ Z ≤ 0) + P(0 ≤ Z ≤ 0.83)
= 0.4525 + 0.2967 = 0.7492
Hence 0.7492 is the total probability that the sample mean will lie between $1900 and
$2050.
The sampling distribution of mean is normal distribution either if the population is normal or if
the sample size is more than 30.
Question: A population has mean 40 and standard deviation 6. What is the shape of the sampling
distribution of mean for samples of size (i) 100 (ii) 25 ?
Solution:
(i) Sampling distribution of mean would be approximated by a normal distribution
irrespective of the shape of the population because sample size n =100 i.e. n > 30
Then the mean and standard error of this distribution are:
x = = 40
𝜎 6 6
𝜎𝑥̅ = = = = 0.6
√𝑛 √100 10
(ii) The sampling distribution of mean for n<30 can be normal only when the
population is normal. Since n=25, i.e. < 30 and the shape of the population is not
known, nothing can be said about the shape of the sampling distribution .
The mean and standard error of this distribution (whether normal or not) are:
x = = 40
𝜎 6 6
𝜎𝑥̅ = = = = 1.2
√𝑛 √25 5
Question: The distribution of annual earnings of all bank tellers with five years’ experience has a mean
of $19,000 and a standard deviation of $2,000. If we draw a sample of 30 tellers, what is the
probability that their earnings will average more than $19,750 annually?
Solution:
First we calculate the standard error of the mean from the population standard deviation:
The equation to calculate standard error of the mean is x
n
2000 2000
x $365.16 ← Standard error of the mean
30 5.477
As we are dealing with a sampling distribution, we must use the equation of Z value and
SNPD table for x $1900
Now we use standard Normal Probability Distribution table to determine the probability
for a Z value of 2.05, which we find 0.4798.
Hence there is .0202 probability of average earnings being more than $19,750 annually
in a group of 30 tellers.
𝑥
Sample Proportion is a statistic - 𝑝 = 𝑛 , where 𝑥 is the no of successes in a sample of size n.
𝑥
Where population proportion is denoted by 𝜋, and 𝜋= , where 𝑥 is the no of successes in a
𝑁
population of size N.
𝜋(1−𝜋) 𝜋(1−𝜋)
𝜎𝑝 = √ =√ 𝑛
𝑛
But in sampling distribution of proportion, without replacement, the mean & standard error of p are:
𝜇𝑝 = 𝜋
𝜋(1 − 𝜋) 𝑁−𝑛
𝜎𝑝 = √ ∗√
𝑛 𝑁−1
Question: Consider a population of 5 students Kamran (K), Hafeez (H), Shumaila (S), Noreen (N) and
Tahmina (T).
(a) Develop the sampling distribution of the proportion of females for simple random
samples of size 2 drawn without replacement.
(b) Determine the mean and variance of the distribution
𝜋(1−𝜋) 𝑁−𝑛
(c) Verify that 𝜇𝑝 = 𝜋 and 𝜎𝑝2 = ∗ 𝑁−1
𝑛
Solution: Total number of Samples of size 2 out of 5 objects, taken without replacement = NC n 5C2 = (52) = 10
𝑁𝑜 𝑜𝑓 𝑓𝑒𝑚𝑎𝑙𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
Population proportion of females =
𝑁𝑜.𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
3
Or 𝜋 = = 0.6 , then 1 – 𝜋 = 0.4
5
𝜇𝑝 = ∑ 𝑝. 𝑃(𝑝) = 0.6
3
(c) From population 𝜋 = = 0.6 & N=5
5
Then 𝜇𝑝 = 𝜋 = 0.6
𝜋(1−𝜋) 𝑁−𝑛 0.6∗0.4 5−2 0.24 3
𝜎𝑝2 = ∗ 𝑁−1 = ∗ = ∗ 4 = 0.09
𝑛 2 5−1 2
𝑛
If the random sampling is without replacement & sampling fraction ≥ 0.05,
𝑁
𝜋(1−𝜋) 𝑁−𝑛
The f.p.m must be used and 𝜎𝑝2 = ∗ 𝑁−1
𝑛
When 𝒏 ≥ 𝟓𝟎 and both 𝒏𝝅 𝑎𝑛𝑑 𝒏(𝟏 − 𝝅)are greater than, 5,the sampling distribution of 𝒑
can be considered Normal.
𝒑−𝝅
When the distribution of p is normal, the statistic 𝒛 = will be standard normal variable.
√𝝅(𝟏−𝝅)
𝒏
Generally the population proportion 𝜋 is not known & the proportion of a random sample
(say 𝒑𝟎 ) is used as an estimate of 𝜋, and therefore, 𝜋 is replaced by the sample estimate 𝑝0
For large samples, the sampling distribution of 𝑝0 is approximately normal if the interval
𝑝0 (1−𝑝0 )
𝑝0 ± 3𝜎𝑝 = 𝑝0 ± 3√ lies within (0,1) but this interval does not include 0 or 1.
𝑛
𝒑−𝑝0
& then 𝒛= 𝑝 (𝟏−𝑝0)
√ 0
𝒏
𝜋(1−𝜋) 0.3∗0.7
𝜎𝑝2 = = = 0.0042
𝑛 50
(ii) Since the sampling distribution of p has been determined as normal, therefore z-
transformation is used to find probabilities.
𝒑−𝝅 𝟎. 𝟒 − 𝟎. 𝟑
𝒛= = = 𝟏. 𝟓𝟒
√𝟎. 𝟎𝟎𝟒𝟐
√𝝅(𝟏 − 𝝅)
𝒏
313
Solution: Estimated population proportion = 𝑝0 = 𝜋 = 1000 = 0.313
Required: P(𝑝 < 0.35)
We check to determine whether the sample size is large enough to use the normal
approximation for the sampling distribution of p. The criterion is tested by the interval
𝑝0 (1 − 𝑝0 ) 0.313 ∗ 0.678
𝑝0 ± 3𝜎𝑝 = 𝑝0 ± 3√ = 0.313 ± 3√ = 0.313 ± 0.043992
𝑛 1000
Or (0.269, 0.357), Since the interval lies within the interval (0, 1), the normal
approximation will be adequate.
Therefore,