Chapter Threee
Chapter Threee
Sampling Theory
3.1. Basic Concepts of Sampling Theory
1) Population or universe is a group of all elements /observations (persons, animals, objects,
measurements, etc.) under consideration in a certain problem. The word population is a technical
term in statistics, not necessarily referring to people.
Examples:
All students in this university;
All households in Mettu town;
All fish in a lake, etc.
2) Census is a collection of data from the whole population (that is, complete enumeration). It is the
actual measurement or observation of all possible elements from the population or it is a survey
of everyone in the population.
3) Reference population (source or target population) the population of interest, to which the
researcher would like to generalize the results of the study. Example: If a researcher would like
to study the effect of a new fertilizer on crop yield in Ethiopia, then the reference population is all
farmers in Ethiopia who are using the new fertilizer.
4) Sampling theory is a study of relationships existing between a population and samples drawn
from the population. Attaining a specified precision at minimum cost is the main intention of
sampling theory. In sampling theory population is often required as an assumption.
5) Sample is the small group that is chosen for the study. It is a part or portion or sub set of a
population taken so that some generalizations about the population can be made. The main
concern in sampling is to ensure that the sample accurately represents the population we are
interested to study. That is, samples are taken in a way that they will be representative of the
population.
6) Sampling is the process involving the selection of a finite number of elements from a given
population of interest for purposes of an inquiry. It is a process of taking samples from a
population of interest for purpose of an inquiry. Example: In industry, the quality of a product is
assessed through sampling; the public opinion on social, economical and political problems is
ascertained through sampling.
7) Sample size is the number of individuals or observations in a sample (usually denoted by n).
8) Parameter is any measurable characteristic of a population. Example: Population means,
Population standard deviations, population medians, etc.
9) Statistic is a number resulting from manipulation of sample data. That is, it is any measurable
characteristic of a sample. Example: sample means, sample standard deviations, sample medians,
etc. A statistic is used to estimate a population parameter such as Population mean ( μ ),
Population standard deviation (δ ), etc.
10) The sampling error is the difference between a sample statistic and its corresponding population
parameter. It is the error that occurs because a sample has been taken instead of a census. For
example: the sample mean may differ from the true population mean.
11) Sampling Unit is the ultimate unit to be sampled (elements of the population to be sampled).It is
the unit of selection in the sampling process. Examples:
In a sample of households, the sampling unit is a household;
1
In a sample of students, a student is the sampling unit.
In a sample of districts, the sampling unit is a district, etc.
12) Sampling Frame is the list of all possible units in the reference population, from which a sample
is to be drawn. Example: If a researcher would like to do a research on poverty levels of residents
in a town and if s/he decided that the sampling unit for the study is an individual, then the
sampling frame would be the list of all individuals living in that town. A student roster is a
sampling frame for a sample of students.
13) Sample design is a set of procedures for selecting the units from the population that are to be in
the sample.
14) Sampling fraction (sampling interval):- the ratio of the number of units in the sample to the
number of units in the sampling frame or in the reference population. For example, a sampling
fraction or ratio of 1:3 is equivalent to a sampling interval of 1 in every 3 units. This means that
the sample constitutes 33.3% of the total units in the sampling frame or in the reference
population.
An application of the terminologies
2
situation, the product "sampled" would be lost during the sampling process, so a complete
census is out of the question.
d) The sample results are usually adequate: In practice, a sample can be more accurate than a
census.
e) Speed: The collection and analysis of data can be done more quickly if the data are not
excessive. Time and energy are saved. That is, the data can be collected and summarized
more quickly with a sample than with a census. This is a valid consideration when the
information is urgently needed.
f) It enables the researcher to get more detailed information about a particular subject under
investigation. If only a few people are surveyed, the researcher can conduct an in-depth
interview by spending more time with each person, thus getting more information about the
subject. That is not to say the smaller the sample, the better; in fact, the opposite is true. In
general, larger samples-if correct sampling techniques are used-give more reliable
information about the population.
Disadvantages of sampling:
i. Reliability problem: If the sample is not a true representative of the population, then we may
sacrifice reliability in favor of less time and money.
ii. If complete information is required on each and every element of the population, census should
be applied. /not suitable for complete information
3.3 Errors in Sampling
There are two types of errors:
1. Sampling error: is the discrepancy (d/c) between the population value (parameter) and
sample value (statistic). It may arise due to inappropriate sampling technique applied. It can
be minimized by increasing the size of the sample. When n = N, sampling error = 0
2. Non-sampling error (bias): are due to procedure bias such as:
Subjects’ non-response
Due to incorrect response
Problem with sampling frame
Measurement error
Errors at different stages in processing the data.
Ways to reduce data error
Ensure that survey instruments are well prepared, simple to read, and easy to understand.
Properly select and train interviewer to control data gathering bias or error.
Use sound editing, coding, and tabulating procedures to reduce the possibility of data
processing error.
3.4. Types of samples
The population is too large to consider for collecting information from its all members. Usually, a
representative sub-group of the population (sample) is included in the investigation. Sampling involves
the selection of a number of study units from a defined population. The main concern in sampling is,
therefore, to ensure that the sample accurately represents the population we are interested to study.
Sampling methods can be categorized as probability and non-probability.
i. Probability Sampling:
• A probability sample is a sample selected such that each item in the population being studied has
a known chance (greater than zero) of being included in the sample.
3
• These methods remove human judgment from the sampling process and ensure a more
representative sample
• There are number of probability sampling some of them are discussed bellow
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Multi-stage Sampling
• Example: Suppose a researcher wants to know the impact of microfinance on the clients'
household income. S/he wishes to select 10 clients out of 250 clients and a research assistant is
required to select a random samples. Assuming that you are a research assistant, select a simple
random sample of 10 clients.
• Solution:
• Number each client from 1 to 250 (based on alphabet of their names or identity
numbers),
• Using the random numbers, find the starting point. To find the starting point, one
generally closes one's eyes and places one's figure anywhere on the table. In this
case, let us select number 005 in the 6th row and 2nd column,
• Going down the column and continuing to the next columns, select the first 10
numbers.
• The numbers are 005, 042, 159, 049, 173, 172, 029, 221,213 and 205. Therefore,
clients with these numbers will be included in the sample for further analysis.
2. Systematic Sampling (Quasi-random sampling): In systematic sampling, the elements to be
included in a sample are picked at a constant interval. That is, the items or individuals of the
population are arranged in some order and a random starting point is selected from 1 through k
5
In systematic sampling:
• A complete list of all the elements within the population (sampling frame) is required.
• The procedure is to take every kth item from the sampling frame.
• Let N= population size; n=sample size; k=sampling interval, k=N/n
Example 1: Suppose there are 2000 subjects in the population and a sample size of 50 subjects
are needed. Select a systematic sample of these 50 subjects.
• Solution: The sampling interval (k) is 40 (2000/50). The number of the first subject to be
included in the sample is chosen randomly, for example, by blindly picking up one out of 40
pieces of paper numbered 1 to 40. Suppose subject 12 was the first subject selected, then the
sample would consist of samples whose numbers were 12, 52, 92, etc until 50 subjects (samples)
are obtained.
• It is obvious that a sample chosen this way is not strictly random since not all the members of the
population have an equal chance of being selected.
3. Stratified Sampling
If population from which the sample to be drawn does not constitute a homogeneous group, stratified
sampling technique is used in order to obtain a representative sample. Under this technique, the
population is divided into various classes or sub-population, which is individually more homogeneous
than the total population. The different sub-populations are called strata.
Then certain items (elements) are selected from the classes by the random sampling technique. Since each
stratum is more homogeneous than the total population, we are able to get more precise estimate for each
stratum.
The strata are made according to various homogeneous characteristics such as sex, race, region or
institutional affiliation such as faculty.
4. Cluster Sampling
• if the population is homogeneous and very large or resides in a large area, it is costly and time
consuming to take samples by using the three methods just mentioned above
• In this case, we divide the population in to groups called clusters and then we select
representative clusters randomly. Finally, the samples will be taken from the sample clusters.
6
We can take either all members of the sample clusters or we may select samples from the clusters
by using other sampling techniques.
Procedures:
• The reference population is divided in to clusters or subgroups, preferably similar in size,
• A sample of the clusters is taken by random or systematic sampling,
• All the units in the selected clusters are then studied or we may select samples from each cluster.
If part of the elements in each cluster is included in the sample, then the procedure is called two
stage sampling.
• The first stage is selecting a sample of clusters and the second stage is selecting a sample of
elements from each cluster.
5. Multi-Stage sampling
• is a sampling technique that is used when the reference population is large and widely scattered.
• Selection of samples is done in stages until the final sampling unit is obtained.
• The number of stages of sampling is the number of times a sampling procedure is carried out.
• The primary sampling unit (PSU) is the sampling unit in the first sampling stage and the
secondary sampling unit (SSU) is the sampling unit in the second sampling stage, etc.
• For example: the PSU can be the weredas, the SSU can be the kebeles, etc. From PSUs, we can
select samples based a suitable method and each of these selected PSUs is further sub-divided in
to second stage units (say kebeles) and from these SSUs again a sample is taken by some suitable
methods.
• Further stages may be added if required.
• Example: Multistage sampling procedure was used to conduct a research entitled “Factors
affecting saving behavior of Households in ilu ababore zone
• Procedures followed:
• Select 8 woredas randomly.
• Select 50 kebeles randomly
• Select 400 households randomly
ii. Non-probability sampling
- Does not give equal chance that each element of the population will be included in the sample.
- Units are selected at the discretion of the researcher.
- Such samples derive their control from the judgment of the researcher.
• This sampling technique is used when a sampling frame doesn't exist,
- There are number of non-probability sampling.
Quota Sampling
Judgment sampling
Snowball sampling
Convenience sampling
1. Quota sampling
- is a method that ensures that a certain number of sample units from different categories
with specific characteristics are represented. Here, judgmental and convenience sampling
methods are combined.
- Quota sampling can be applied for affirmative action. Example:
7
- Suppose we know that 54% of the adults in a community are females, and the study
requires 100 respondents as a sample. In quota sampling, we might interview the first 54
females and the first 46 males.
2. Judgment (Purposive or deliberate) sampling
- In this approach the investigator has complete freedom in choosing his sample according to his
wishes and desire.
- The experienced individual (researcher) select the sample based upon his judgment about some
appropriate characteristics required from the sample members
3. Snowball Sampling
• It is also known as Multiplicity sampling or Multi-stage Sampling
• The term snowball comes from the analogy of the snowball, beginning small but becomes bigger
and bigger as it rolls downhill.
• Snowball sampling is popular among scholars conducting observational research and in
community study.
• The major purpose of snowball sampling is to estimate characteristics that are rare in the total
population.
• First initial respondents are selected randomly but additional respondent are then obtained from
referrals or by other information provided by the initial respondent.
• E.g., consider a researcher use telephone to obtain referral. Random telephone calls are made;
the respondents (answering the call) are asked if they know someone else who meets the studies
respondent qualification
4. Convenience Sampling
- is a method in which a sample is chosen with ease of access being the primary concern.
- Example: Interviews conducted in convenient locations such as student lounge.
- the availability and willingness to respond are the major factors in selecting the
respondents
3.5. Sampling distribution
It is often impossible to measure the mean or standard deviation of an entire population unless the
population is small, or we do a nationwide census. The population mean and standard deviation are
examples of population parameters--descriptive measurements of the entire population. Given the
impracticality of measuring population parameters, we instead measure sample statistics--descriptive
measurements of a sample. Examples of sample statistics are the sample mean, sample median, and
sample standard deviation.
so why not use the sample statistic as an estimate of the corresponding population parameter: for instance,
why not use the sample mean as an estimate of the population mean is how confident can we be in the
sample statistic.
Hence, sampling distribution is a probability distribution for possible outcomes values of sample
statistics, such as sample means, sample proportion etc.
8
3.5.1 Sampling Distribution of the means ( x )
Sampling distribution of the sample means, x , is the probability distribution consisting of a list of all
possible sample means of a given sample size selected from a population, and the probability of
occurrence associated with each sample mean.
10 20 30 40 & 50
A random sample of three is to be selected from this population & mean computed. Develop the sampling
distribution of the mean.
Thus to find how many different sample of size three can be taken from a finite population of size five we
c
can use combinations formula, N n i.e. a number of possible samples of size three to be drawn out of a
c =10 .
population of five when order is an important. 5 3
9
Sampling distribution of sample mean ( X̄ )
μx
= 20+23.33+26.67+26.67+33.33+33.33+30+30+36.67+40 =30
10
Example. The following distribution is the hourly wage of seven employees
If we are planning to take sample of two employees, we will have 21 possible samples and this sample
means. I.e. 7C2 = 21. The 21 possible samples with this mean are the following
sample A AC A AE AF AG BC BD BE BF BG CD CE CF CG DE DF DG EF EG FG
B D
Sample 7 7.5 7.5 7 7.5 8 7.5 7.5 7 7.5 8 8 7.5 8 8.5 7.5 8 8.5 7.5 8 8.5
mean
10
8.50 3 0. 1429
21 1
The mean of the distribution of sample means is obtained by summing the various sample means and
dividing the sum by the number of samples.
7+7 . 5+.. .+8 .5 162
μ x = 21 = =7 . 71
21
The sampling distribution of the mean is described by two parameters. Mean of sample means &
Standard deviation of sample means, which is termed as standard error of the mean (s x̄ ) .The mean
of sample means (¯
x̄ ) or(µ x̄ ) is always equal to the population mean(µ).
µ x̄
=¯
x̄ =µ (the mean of sample mean is equal to population mean)
The standard error of the mean is equal to population standard deviation divided by the square root of the
sample size.
σ
σ x̄ =
√n
This works if and only if population size is large and sample size is very small (n<0.05N). But if n is large
(n>=30) and population size is finiteand n>=0.05N, the sampling distribution of the mean can be
approximated by normal distribution.
σ x̄ =
σ
×
N−n
√n N−1 √
Exercise
Consider a finite population with five elements labeled A, B, C, D, and E. Ten possible simple
random samples of size 2 can be selected.
a. List the 10 samples beginning with AB, AC, and so on.
b. Using simple random sampling, what is the probability that each sample of size 2 is selected?
11
12
2. Assume a finite population has 350 elements. Using the last three digits of each of the
following five-digit random numbers, determine the first four elements that will be selected for
the simple random sample.
98601 73022 83448 02147 34229 27553 84147 93289 14209
2. If the population is not normal, the distribution of sample means will be approximately normal if the
sample size n is sufficiently large.
In order to use the central limit theorem, we need to know the population standard deviation when it is not
known the standard deviation of the sample, designated by S is used to approximate it.
x−μ x−μ
σ
Z = √ n , if the population standard deviation is known or √n , if the population standard deviation
is unknown.
Example 1:
The annual wages of all employees of a company has a mean of 20,400 per year with standard deviation
of 3200. The personnel manager is going to take a random sample of 36 employees. What is the
probability that the sample mean will exceed 21,000?
n= 36 = 20,400 and =3200
x−μ 21000−20400
σ 3200
P[ x > 21,000] = √n = √36 = 1.125
P(Z > 1.13) = 0.1292
13
2 A company makes engine used in speed boats. The company’s engineers believe that the
Example
engine delivers an average power of 220 horse power / HP/ and that the standard deviation of power
delivered is 15 HP. A potential buyer intends to sample 100 engines ( each engine to be run a single
time ) . What is the probability that the sample mean x , will be less than 217 HP.
( )
217−μ 217−220
Z< σ
= 15
P( x <2/7)= P √n √100 = -2
P(Z < -2) = 0.0228
Thus if the population mean is indeed = 220 HP and the standard deviation is = 15 HP, there is a
rather small probability that the potential buyer’s tests will result in a sample mean lower than 217HP.
Sampling Distribution of the Proportion ( Ρ̄ )
It is the probability distribution for sample proportion ( Ρ̄ ).
x
Ρ̄=
n
x- Number of items which carry specific characteristics
n- sample size
Sampling distribution of the proportion has two parameters:
μ Ρ̄
Mean of sample proportion ( )
μ Ρ̄ =P (population proportion)
The standard error of the proportion
σ Ρ̄ =
√ pq
n Where,
σ Ρ̄ =standard error of proportion
Ρ =population proportion
q=1-p
n= sample size
σ
If n>=0.05N, Ρ̄ is calculated as:
σ Ρ̄=
√ √ pq
n
×
N −n
N −1
N −n
√
Where N −1 is a finite population multiplier
14
The central limit theorem states that
1. The sampling distribution of the proportions is normally distributed if np & nq>=5
.i.e. n is large.
2. The sampling distribution of the proportion is normally distributed regardless of the sample size if the
population is normally distributed.
Example: Suppose that 60%of the electrical contractors in a region use a particular brand of wire. What
is the probability of taking a random sample of size 120 from these electrical contractors 50% or less use
that brand of wire?
Step 1. Check that np & nq >=5
np=120x0.6=72
nq=120x 0.4=48
2. Calculate
μ Ρ̄ andσ ⃗Ρ
μ Ρ̄= p=0. 6
σ Ρ̄=
√ pq
n
=0 . 0447
3. Calculate Z
p̄− p
Z Ρ̄ =
σ Ρ̄
0 . 5−0. 6
Z 0. 5 = =−2. 24
0 . 0447
15