Lecture 8 SAMPLING AND SAMPLING DISTRIBUTIONS - ECN 2331 NOTES
Lecture 8 SAMPLING AND SAMPLING DISTRIBUTIONS - ECN 2331 NOTES
By Mr. Matongo
Lecture Aim
This unit aims to familiarize you with Sampling and Sampling Techniques.
By the time you finish working through this unit, you should be able to;
1. Define Sampling
2. Differentiate between a sample and a population
3. Differentiate between estimators and estimates
4. Discuss The key sampling techniques
5. State the advantages and disadvantages of Sampling
Sampling
Sampling is simply the process of selecting a sample from the population. The entire process of drawing
a subset of elements (Sample) from a population so that results of that subset may be generalized to the
population is called sampling. Recall that in statistical inference, information contained in a sample is
used to make inferences about population characteristics.
A. Some Concepts
We discuss below some of the key concepts frequently used in sampling theory.
1. Population
A population is simply a set of all the elements of interest in a study. If you think of any study,
you will always have in your mind a collection of units that are relevant for that particular
enquiry/study. A unit in this regard, is an entity on which we can make observations according to
a well-defined procedure. The entire collection of such units is called a population or universe.
1
This being the case, we may have a population of human beings, cattle, trees, prices, production,
etc. Below are some of the key concepts that define Populations.
i. Finite and Infinite Population: A population can either be finite or infinite. A
particular population is finite if its elements are countable and infinite if its elements are
uncountable. Usually in practice, we are concerned with finite populations.
ii. Census: Refers to the procedure of inquiry based upon obtaining information from all the
units of a population. It also is known as the complete enumeration method.
iii. Target Population: Refers to the entire group of elements to which the researcher
wishes to generalize the study findings. It meets a set of criteria of interest to researcher.
Often used interchangeably with population
2. Sample
A sample is a subset of the population. When we have a collection of a part or section of the
population, it is called a sample. A census, as we have seen earlier, is based upon obtaining
information from every member of the population. However, in order to obtain information
about certain characteristic of the population, we need not always resort to a census. In practice,
we get quite satisfactory results by studying an appropriate sample from the population. The
procedure of obtaining a sample is known as sample survey. In the case of a census, we examine
the entire population; on the other hand, when we take a sample, we consider a representative
fraction of the population and use the sample information to infer about the entire population
Representative sample: A sample is representative if all salient features of the population are
present in it. It goes without saying that at its best every sample is considered to be a
representative sample. For example, if a population has 30% males and 70% females, then we
also expect the sample to have nearly 30% males and 70% females. In another example, if we
take out a handful of ground nuts from a 100 Kg. bag of groundnut, we expect the same quality
of ground nuts in our hand as is inside the bag. Similarly, it is expected that a drop of blood will
give the same information as all the blood in the body.
Sampling frame: Simply a list of all the elements in the population from which the sample is
drawn
3. Parameter
2
A parameter is a characteristic of a population. It refers to a numerical characteristic of a
population, such as the mean and standard deviation. It is a fixed descriptive measure of a
population based on all the elements of the population. Take for example, that we may be
interested in the mean income of the people of Lusaka for a particular year. We may also like to
know the standard deviation of these incomes of the people. Here, both mean and standard
deviation are parameters. Parameters are conventionally denoted by Greek alphabets. For
example, the population mean can be denoted by and population the standard deviation can be
denoted by .
It is important to note that the value of a parameter is computed from all the population
observations. Thus, the parameter 'mean income' is calculated from all the income figures of
different individuals that constitute the population.
4. Statistic
A statistic is a characteristic of a sample. It is defined as a numerical value, which is obtained
from a sample of data. It is a descriptive statistical measure of a sample based on all the elements
of the selected sample. The sample mean, sample median, sample mode or sample standard
deviation for instance is called a statistic. Recall also that the sample is at its best if it is
representative. The common use of a statistic is to estimate a particular population parameter.
Note that from the given population, it is possible to draw multiple samples, and the result
(statistic) obtained from different samples will vary, depending on the samples. Conventionally,
a statistic is denoted by the English alphabet. For example, the sample mean may be denoted by
̅ and the sample standard deviation may be denoted by s.
5. Estimator and Estimate
As mentioned earlier, the purpose of a statistic is to estimate some population parameter. The
procedure followed or the formula used to compute a statistic is called an estimator and the
value of a statistic so computed is known as an estimate. For example, if we use the formula
∑
̅ for calculating a statistic, then this formula is an estimator. Next, if we
use this formula and get ̅ = 10, then this ' 10' is an estimate.
In conclusion, the table below gives a comparative summary of useful population and sample
concepts.
3
Basis for Statistic Parameter
comparison
.
B. Sampling Techniques
The method you use to select this sample is known as your Sampling Technique. We will consider two
types of sampling techniques:
1) Probability sampling: it is the one in which each sample has the same probability of being chosen.
2) Non-probability sampling: do not follow the theory of probability in the choice of elements from the
sampling population
Probability Sampling
- random selection
4
- Can estimate sampling error
• Stratified sampling.
• Cluster sampling.
• Systematic sampling.
A simple random sample of size from a finite population of size N is defined as a sample selected
such that each possible sample of size has the same probability of being selected. This can be achieved
by choosing each element for the sample one at a time in such a way that the elements remaining in the
population have the same probability of being chosen. Under simple random sampling, one can either
sample without replacement or sample with replacement. Note however, that when simple random
sampling is referred, we will assume that the sampling is without replacement. The number of different
simple random samples of size n that can be selected from a finite population of size N is
( )
5
Sampling from an infinite population
In some situations, the population is either infinite or so large that for practical purposes it must be
treated as infinite. A simple random sample from an infinite population is a sample selected such that the
following conditions are satisfied.
The purpose of the second requirement is to prevent selection bias. Selection bias can be avoided by
ensuring that the selection of a particular element does not influence the selection of any other elements.
In other words, the elements must be selected independently.
Point estimation
After obtaining a simple random sample, one can estimate the value of the population parameter of
interest. Note that a numerical characteristic of a sample is a sample statistic. For example, we use the
sample mean( ̅ ) and standard deviation ( ) to estimate the population mean( ) and standard deviation
( ) respectively.
Recall that:
∑
̅
∑ ( ̅)
√
In addition, we can also use sample proportions (denoted by ̂ ) to estimate population proportions
( ). The above computations give rise to what we call point estimation. That is, sample data is used to
compute sample statistics that serve as estimates of population parameters.
Thus, we refer to the sample mean ̅ as the point estimator of the population mean µ, the sample
standard deviation s as the point estimator of the population standard deviation σ, and the sample
6
proportion ̂ as the point estimator of the population proportion p. The numerical value obtained for ̅ ,
s, or ̂ is called the point estimate.
As stated previously, point estimates may deviate from the true values of the population parameters.
This is because the estimate is generated using a sample rather than a census of the entire population.
Thus the absolute value of the difference between an unbiased point estimate and the corresponding
population parameter is called the sampling error. For example, the sampling errors for the sample
mean, standard deviation and proportion are:
| ̅ || | and | ̅ |
Note that we cannot compute the sampling error accurately because the true values of the population
parameters are often unknown.
Other sampling methods
There are several other sampling techniques used to select a sample from a population. These include:
Stratified sampling:
Under this method, the elements in the population are first divided into groups called strata in such a
way that each element in the population belongs to one and only one stratum. For example, categorizing
students according to their minors. After forming the strata, simple random sampling is applied to each
stratum. Note that if the elements within the strata are homogeneous, the stratum will have low
variances.
Cluster sampling
Under this technique, the elements in the population are first divided into separate groups called clusters
in such a way that each element of the population belongs to one and only one cluster.
Then a simple random sample of the clusters is taken, implying that all the elements within each
sampled cluster form each sample. Note that cluster sampling provides the best estimates when elements
within the clusters are heterogeneous.
7
Systematic sampling
Under this method, the desired sample is selected by sampling one element for every n th element from
the population containing all the elements. For example, we can obtain a sample of 20 observations from
a population of 1000 observations by selecting one observation for every 50 observations. Note that
under systematic sampling the first element is chosen randomly.
Convenient sampling
Unlike simple random sampling, stratified sampling, cluster sampling and systematic sampling,
convenient sampling is a nonprobability sampling technique. As hinted by the name, the sample is
identified primarily for convenience. Under convenience sampling, sample selection and data collection
are relatively easy. However, the samples obtained are not representative of the population.
Judgement sampling
This is also a nonprobability sampling technique. Under this approach, a knowledgeable person on a
subject of interest chooses the elements of the population that he or she feels are most representative of
the population. For example, a reporter may select two or three parliamentarians that she or he may
judge that they represent the opinions of all the parliamentarians. Note that the quality of the sample
depends on the judgment of the person selecting the sample.
8
It permits a high degree of accuracy due to a limited area of operations. Moreover, careful
execution of field work is possible. Ultimately, the results of sampling studies turn out to be
sufficiently accurate.
4. Less time consuming in sampling:
Use of sampling takes less time also. It consumes less time than census technique.
5. Feasibility:
Conducting the experiment on smaller number of units is more feasible.
Disadvantages of sampling:
1. Chances of bias:
The serious limitation of the sampling method is that it involves biased selection and thereby
leads us to draw erroneous conclusions. Bias arises when the method of selection of sample
employed is faulty. Relative small samples properly selected may be much more reliable than
large samples poorly selected.
2. Difficulties in selecting a truly representative sample:
Difficulties in selecting a truly representative sample produce reliable and accurate results only
when they are representative of the whole group. Selection of a truly representative sample is
difficult when the phenomena under study are of a complex nature. Selecting good samples is
difficult.
3. Inadequate knowledge in the subject:
Use of sampling method requires adequate subject specific knowledge in sampling technique.
Sampling involves statistical analysis and calculation of probable error. When the researcher
lacks specialized knowledge in sampling, he may commit serious mistakes. Consequently, the
results of the study will be misleading.
4. Impossibility of sampling:
Deriving a representative sample is difficult, when the universe is too small or too
heterogeneous. In this case, census study is the only alternative. Moreover, in studies requiring a
very high standard of accuracy, the sampling method may be unsuitable. There will be chances
of errors even if samples are drawn most carefully.