Chapter One
Chapter One
1
Objectives
After completing this chapter you will be able to:
▪ explain the concept of sampling
2
What is sampling and sampling distribution?
i. Sampling error:
❖ It is the discrepancy between the population value and sample value.
❖ arise due to inappropriate sampling techniques applied
➢ Greater speed
➢ Greater accuracy
➢ Greater scope
➢ More detailed information can be obtained.
Types of sample selections
There are two types of sample selections:
i. Sample selection with replacement and
ii. Sample selection without replacement
6
Sampling with replacement
7
Sampling without replacement
➢A unit selected from a population is not returned /replaced to the
population for the next selection.
➢The population size reduces by one for each selection.
➢There are NCn possible samples.
➢The probability of selection is not constant. It is 1/N for the 1𝑠𝑡 unit,
1/(N-1) for the next, 1/(N-2) for the third and so on
➢Note: Sample outcomes are statistically independent when sampling with
replacement and are statistically dependent when sampling without replacement
8
Example
A population consists of the five numbers: 2, 3, 6,8,11. Find all possible samples of
size two which can be drawn
i) without replacement ii) with replacement.
Solution: (i) there are 5C2 =10 samples of size two which can be drawn without
replacement namely (2,3), (2,6), (2,8), (2,11), (3,6), (3,8), (3,11), (6,8), (6,11), (8,11),
The selection (2, 3) is considered the same as (3, 2).
ii) 52 = 25 samples of size two which can be drawn with replacement. Namely (2, 2)
(2, 3) (2, 6) (2, 8) (2, 11) (3, 2) (3, 3) (3, 6) (3, 8) (3, 11) (6, 2) (6, 3) (6, 6) (6, 8) (6, 11) (8,
2) (8, 3) (8, 6) (8, 8) (8, 11) (11, 2) (11, 3) (11, 6) (11, 8) (11, 11)
9
Sampling Technique
11
i. Simple Random Sampling:
▪ All elements in the population have the same pre-assigned non-zero
probability to be included in to the sample.
▪ Every possible sample of specific size has an equal chance of being
selected.
▪ Lists of all elements are needed.
▪ sampling can be done with or without replacement.
▪ It can be done either using the lottery method or table of random
numbers. 12
Count….
In the lottery method, first give a unique identification code to each unit of
the population. Then, write down the codes on identical pieces of papers, mix
them up in bowl and select the units whose codes appear on the randomly
selected pieces of papers.
In the table of random numbers method: to select sample of size n, first
make a list of the population to be sampled and give a distinct code number
to each unit of the population then, choose the direction of selection randomly
and finally take n units.
13
Stratified Random Sampling
▪ The population will be divided in to non-overlapping (means each and every unit
in the population belongs to one and only one stratum) and exhaustive groups
called strata and it formed in a way that elements in the same strata should be
more or less homogeneous while different in different strata.
In short, Elements with in strata should be homogeneous & between strata should
be heterogeneous).
▪ Sample from each group/ strata will be selected by using SRS.
▪ It is applied if the population is heterogeneous.
14
Count…
▪ Some of the criteria for dividing a population into strata are: Sex (male, female); Age
(under 18, 18 to 28, 29 to 39,); Occupation (blue-collar, professional, and other)
etc....
▪ In stratified sampling the given population of size N is divided in to say, k relatively
homogeneous strata of sizes N1, N2, N3,…, NK respectively such that N=σ𝑘𝑖=1 𝑁𝑖 .
▪ Draw simple random samples (without replacement) from each of the k strata.
▪ Let sample of size n, be drawn from the ith strata, (i= 1, 2, 3,…, k) such that
n=σ𝑘𝑖=1 𝑛𝑖 , where n is the total sample size form a population of size N.
15
Example
1. suppose the president of university wants to know, the experience of a four-year
students at the university. Furthermore, the president wishes to see if the
experience of the students is differ from year to year (1st from 2nd , 2nd from 3rd
and the 3rd from 4th) students. The president will divides the students in 4 groups
& randomly select students from each group to use in the sample.
2. A population consist of males and females who are smokers & nonsmokers. The
researcher will want to include in the sample people from each group that is, males
who smoke, males who do not smoke, females who smoke, and females who do not
smoke. To accomplish this selection, the researcher divides the population into four
subgroups and then selects a random sample from each subgroup.
16
Proportional Allocation of sample size in stratified sampling
• The items are selected from each stratum in the same proportion as they exist in the
population.
• The allocation of sample sizes is termed as proportional if the sample fraction, i.e., if
the ratio of the sample size to the population size, remains the same in all the strata.
𝑛1 𝑛2 𝑛3 𝑛𝑘
Mathematically the principle of proportional allocation gives: = = =…
𝑁1 𝑁2 𝑁3 𝑘
By the property of ratio and proportions, each of these ratios is equal to the ratio of the
sum of numerators to the sum of denominators,
𝑛1 𝑛2 𝑛3 𝑛𝑘 𝑛1 +𝑛2 +𝑛3 ….+ 𝑛𝑘 σ𝑘
𝑖=1 𝑛𝑖 𝑛
i.e., = = = = = = = c, (constant )
𝑁1 𝑁2 𝑁3 𝑘 𝑁1 +𝑁2 +𝑁3 +⋯𝑁𝑘 σ𝑘
𝑖=1 𝑁𝑖 𝑁
since the total sample size n , and the population size N are fixed.
𝑛 𝑛 𝑛 𝑛
Hence, 𝑛1 = 𝑁1 ( ), 𝑛2 = 𝑁2 ( ), 𝑛3 = 𝑁3 ( ), 𝑛𝑖 = 𝑁𝑖 ( ), (i = 1,17 2, 3,..., k )
𝑁 𝑁 𝑁 𝑁
Example
• A stratified sample of size n =80 is to be taken form a population of size N=2000,
which consists of four strata for which N1 = 500, N2=1200, N3=200 and N4=100. If
we use proportional allocation, how large a sample must be taken from each stratum?
• Solution: In proportional allocation, we know that the sample size for the ith stratum
is given by
𝑛
• 𝑛𝑖 = 𝑁𝑖 ( ), (i = 1, 2, 3,..., k ), Then
𝑁
𝑛 80
• 𝑛1 = 𝑁1 ( ) = 500( ) = 20 must be taken from the 1st stratum.
𝑁 2000
𝑛 80 𝑛 80
• 𝑛2 = 𝑁2 ( ) = 1200( ) = 48, 𝑛3 = 𝑁3 ( )= 200( )=8
𝑁 2000 𝑁 2000
𝑛 80
• 𝑛4 = 𝑁4 ( )= 100( )=4 18
𝑁 2000
Cluster Sampling
➢Population is divided in to non-overlapping groups called clusters
and it formed in a way that elements within a cluster are
heterogeneous.
➢Randomly selects some of these clusters and uses all members of the
selected clusters as the subjects of the samples.
➢Cluster Sampling used when the population is large and difficult to
generate a simple random sample or when it involves subjects exist in
in a large geographic area.
19
Example
1. Estimate the average annual HH income in a AA. Let each Sub city
represent a cluster. A sample of clusters could be randomly selected, and
every household within these clusters could be interviewed to find the
average annual HH income in AA.
20
Systematic Sampling:
➢ This technique is recommended if the complete list of the sampling units, is
available and the units are arranged in some systematic order such as
alphabetical, chronological, geographical order, etc
➢ The procedure starts by determining the first element to be included in the sample.
➢ Only the first sample is selected at random and the remaining units are automatically
selected in a definite sequence. Then take the kth item from the sampling frame
𝑁
➢ Let, 𝑁 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠𝑖𝑧𝑒, 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑖𝑧𝑒, 𝑘 = 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙.
𝑛
2. Convenience Sampling
• The investigator selects a sample from the population in a manner that is
relatively easy and convenient/suitable.
3. Quota Sampling
26
Properties of the Sampling Distribution of Means
1. The mean of the sampling distribution of the means is equal to the population
mean. µ = 𝜇𝑋 =𝑋.ധ
2. the standard deviation of the sampling distribution of the means (standard error) is
equal to the population standard deviation divided by the square root of the sample
size: 𝛿𝑥 = δ/√n. This hold true if and only if n<0.05N and N is very large. If N is
𝛿 𝑁−𝑛 𝑁−𝑛
finite and n≥ 0.05N, 𝛿𝑥 = ∗ . The expression is called finite
𝑛 𝑁−1 𝑁−1
population correction factor/finite population multiplier.
In the calculation of the standard error of the mean, if the population standard
deviation δ is unknown, the standard error of the mean𝛿𝑥 , can be estimated by using
the sample standard error of the mean 𝑆𝑋 which is calculated as follows:
𝑆 𝑁−𝑛
𝑆𝑋 = 𝑆ൗ 𝑛
𝑜𝑟𝑆𝑋 = ∗ .
𝑛 𝑁−1
σ𝑿 σ𝒙
𝝁= = ന And
= 𝟑𝟎, Regardless of the sample size 𝝁 = 𝑿
𝑵 𝒏
𝟐
σ 𝑿𝒊 −𝑿 𝟏𝟎𝟎𝟎
𝝈= = = 𝟏𝟒. 𝟏𝟒𝟐
𝑵 𝟓
𝟐
𝜹 𝑵−𝒏 𝟏𝟒.𝟏𝟒𝟐 𝟓−𝟑 σ 𝑿𝒊 −𝑿 𝟑𝟑𝟑.𝟒
• 𝝈𝑿 = ∗ = ∗ = 𝟓. 𝟕𝟕𝟒 = = = 𝟓. 𝟕𝟕𝟒
𝒏 𝑵−𝟏 𝟑 𝟓−𝟏 𝑵 𝟏𝟎
• Since averaging reduces variability 𝛿𝑥 < δ except the cases where δ = 029 and n = 1.
Central Limit Theorem and the Sampling Distribution of the Mean
• The Central Limit Theorem (CLT) states that:
1. If the population is normally distributed, the distribution of sample means is normal
regardless of the sample size.
2. If the population from which samples are taken is not normal, the distribution of
sample means will be approximately normal if the sample size (n) is sufficiently
large (n ≥ 30). The larger the sample size is used, the closer the sampling
distribution is to the normal curve
The relationship between the shape of the population distribution and the
shape of the sampling distribution of the mean is called the Central Limit
Theorem. 30
The significance of the Central Limit Theorem
➢ it permits to use sample statistics to make inference about population
parameters with out knowing anything about the shape of the frequency
distribution of that population other than what we can get from the sample.
➢ It also permits to use the normal distribution (curve for analyzing distributions
whose shape is unknown.
➢ It creates the potential for applying the normal distribution to many problems
when the sample is sufficiently large.
31
Example 1and 2
1. The distribution of annual earnings of all bank tellers with five years of
experience is skewed negatively. This distribution has a mean of Birr 15,000 and a
standard deviation of Birr 2000. If we draw a random sample of 30 tellers, what is
the probability that their earnings will average more than Birr 15,750 annually?
And interpret the result?
2. Suppose that during any hour in a large department store, the average number of
shoppers is 448, with a standard deviation of 21 shoppers. What is the probability
of randomly selecting 49 different shopping hours, counting the shoppers, and
having the sample mean fall between 441 and 446 shoppers, inclusive?
32
Solution 1
1. Calculate µ and 𝛿𝑥
µ = Birr 15,000
𝛿𝑥 = δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for𝑋
𝑋−𝑋ധ 𝑋−𝜇 15,750−15,000
𝑍𝑋 = = , 𝑍15,750 = = +2.05
𝛿𝑋 𝛿𝑋 365
4. Suppose that a random sample size of 36 is being drawn from a population with a mean of
278. If 86% of the time the sample mean is less than 280, what is the population standard
deviation?
5. A teacher gives a test to a class containing several hundred students. It is known that the
standard deviation of the scores is about 12 points. A random sample of 36 scores is obtained.
a) What is the probability that the sample mean will differ from the population mean by
more than 6 points?
b) What is the probability that the sample mean will be within 6 points of the population
35
mean?
Solution 3
1. Calculate µ and 𝛿𝑥
µ = 37.6 years n/N= 45/350 > 5%...... FPCF is needed
𝛿 𝑁−𝑛 8.3 350−45
𝛿𝑥 = ∗ 𝛿𝑥 = ∗ = 1.16
𝑛 𝑁−1 45 350−1
2. Calculate Z for 𝑋
𝑋−𝑋ധ 𝑋−𝜇 40−37.6
𝑍𝑋 = = 𝑍40 = = +2.07
𝛿𝑋 𝛿𝑋 1.16
3. Find the area covered by the interval
ത 40) = P (Z < +2.07)
P (𝑋<
= 0.5 + P (0 to +2.07)
= 0.5 + 0.48077
= 0.98077
4. Interpret the results: There is a 98.08% chance of randomly selecting 45 hourly employees
36
and their mean age be less than 40 years.
Solution 4
µ = 278 n = 36 𝑋 = 280 P (𝑋 < 280) = 0.86 δ =?
𝑍
( = 0.36) = +1.08
𝑃
𝑋−𝜇 2
𝑍𝑋 = 𝛿𝑋 = = 1.85
1.08
𝛿𝑋 𝛿
𝛿𝑋 =
280−278 𝑛
𝑍280 = 1.85 =
𝛿
𝛿𝑋 36
280−278 𝛿
+1.08 = 1.85 =
𝛿𝑋 6
2 𝛿 = 6 ∗ 1.85 = 11.1
+1.08 =
𝛿𝑋
37
Solution 5
𝛿 12 12
a. n = 36 δ =12 𝛿𝑋 = = = =2 P (𝑋 > µ +6) + P (𝑋< µ - 6) =?
𝑛 36 6
𝜇+6−𝜇 𝜇−6−𝜇
𝑍𝜇+6 = = +3 𝑍𝜇−6 = = −3
2 2
40
Properties of Sampling distribution of 𝑃
1. The population proportion P, is always equal to the mean of the sample proportion,
i.e., P = E (𝑷).
𝑃𝑞
2. The standard error of the proportion is equal to: 𝛿𝑃 = ,
𝑛
where P= population proportion, q = 1 – P and n = sample size.
Or
𝑃𝑞 𝑁−𝑛 𝑁−𝑛
𝛿𝑃 = ∗ , where = finite population correction factor.
𝑛 𝑁−1 𝑁−1
41
Central Limit Theorem (CLT) and Sampling distribution of 𝑷
The CLT states that normal distribution approximates the shape of the distribution of
sample proportions if np and nq are greater than 5. Consequently, we solve problems
involving sample proportions by using a normal distribution whose mean and standard
deviation are:
𝑃𝑞ൗ 𝑃−𝑃
𝜇𝑃 = 𝑃, 𝛿𝑃 = 𝑛 𝑎𝑛𝑑𝑍 𝑃 =
𝛿𝑃
42
Examples
1. Suppose that 60% of the electrical contractors in a region use a particular brand of wire.
What is the probability of taking a random sample of size 120 from these electrical
contractors and finding that 0.5 or less use that brand of wire?
2. If 10% of a population of parts is defective, what is the probability of randomly
selecting 80 parts and finding that 12 or more are defective?
3. Suppose that a population proportion is .40 and that 80% of the time you draw a random
sample from this population, you get a sample proportion of 0.35 or more. How large a
sample were you taking?
4. If a population proportion is 0.28 and if the sample size is 140, 30% of the time the
sample proportion will be less than what value if you are taking random samples?
43
Solution 1
n = 120 P = 0.6 q = 0.4 P (𝑝 < 0.5) =?
1. Check that np and nq > 5
120*0.6 = 72, and 120*0.4 = 48. Both are greater than 5.
2. Calculate 𝛿𝑃
𝑃𝑞 0.6∗0.4
𝛿𝑃 = 𝑛
= = 120
= 0.0477
3. Calculate Z for 𝑝
𝑃−𝑃 0.5−0.6
𝑍𝑝 = , 𝑍0.5 = = −2.24
𝛿𝑝 0.0477
5. About 6.81% of the time, twelve or more defective parts would appear in a random sample
of eighty parts when the population proportion is 0.10. 45
Solution 3
P= 0.4 P (𝑝> 0.35) = 0.80 n =?
𝑍 𝑃𝑞
1. ( = 0.30) = 0.84 𝛿𝑃 = ; squaring both sides
𝑃 𝑛
0.35−0.4 .4∗.6
2. 𝑍0.35 = 0.0595 =
𝛿𝑝 𝑛
2
.4∗.6
-0.84 = -0.05/𝛿𝑃 0.0595 2 = 0.0035 = 0.24/n
𝑛
46
Solution 4
P= 0.28 n = 140 P (𝑝< X) = 0.30 X =?
𝑍
( = 0.2) = - 0.52
𝑃
𝑃−𝑃
𝑍𝑃 =
𝛿𝑝
𝑃−0.28
−0.52 =
0.0379
−0.0197 = 𝑃 − 0.28
𝑃 = 0.26
𝑃𝑞 0.28∗0.72
𝛿𝑃 = = = 0.0379 47
𝑛 140