Lecture 1
Lecture 1
Lecture Notes 1
Assa Mulagha-Maganga
Lilongwe University of Agriculture and Natural Resources
Department of Mathematical Sciences (Statistics), Chancellor College
Summer 2022
1.1 Introduction
This section presents an introduction to the theory and practice of statistical sampling. It
outlines how students should sample elements from a population, and then discusses various
terms and concepts in sampling.
1
Statistics for Economists 2
Systematic random sampling is a statistical sampling technique used to select a sample from
a population. It involves selecting a starting point at random and then selecting every nth
member of the population. For example, if you wanted to select a sample of 100 students
from a school with 1000 students, you could choose a random number between 1 and 10 and
then select every 10th student after that.
The advantage of using systematic random sampling is that it can be quicker and more
efficient than other methods of sampling, such as simple random sampling, especially when
dealing with large populations. It can also help to ensure that the sample is representative
of the population, as long as the starting point is chosen at random and the interval is
appropriate.
However, one potential disadvantage of systematic random sampling is that it can introduce
bias if there is any pattern or periodicity in the population that aligns with the sampling
interval. To avoid this, it is important to ensure that the starting point is chosen randomly
and that the interval does not coincide with any patterns in the population.
In order to obtain a systematic sample of 5 of the nation’s 30 districts, divide 30 by 5 to
obtain 6. From a list of the 30 districts, select any one of the first 6 districts at random.
Suppose district number 2 is selected. To obtain the other 4 districts, add 6 to 2 to obtain
When a population can be clearly divided into groups based on some characteristic, then
stratified random sampling can be used to guarantee that each group is represented in the
sample. The groups are also called strata. For example, college students can be grouped as
full time or ODL, male or female. Once the strata are defined, we can apply simple random
sampling within each group or strata to collect the sample.
For instance, we might study the security expenditures for the 352 largest companies in
the in Malawi. Suppose the objective of the study is to determine whether firms with high
returns on equity (a measure of profitability) spent more of each sales dollar on security than
firms with a low return or deficit. To make sure that the sample is a fair representation of
the 352 companies, the companies are grouped on percent return on equity. Table 1.1 shows
the strata and the relative frequencies. If simple random sampling was used, observe that
firms in the 3rd and 4th strata have a high chance of selection (probability of 0.87) while
firms in the other strata have a low chance of selection (probability of 0.13). We might not
select any firms in stratum 1 or 5 simply by chance. However, stratified random sampling
will guarantee that at least one firm in strata 1 and 5 are represented in the sample. Let’s
say that 50 firms are selected for intensive study. Then 1 (0.02 × 50) firm from stratum 1
would be randomly selected, 5(0.10 × 50) firms from stratum 2 would be randomly selected,
and so on. In this case, the number of firms sampled from each stratum is proportional to
the stratum’s relative frequency in the population. Stratified sampling
Table 1. Test scores for UNIMA Entrance
Another common type of sampling is cluster sampling. It is often employed to reduce the
cost of sampling a population scattered over a large geographic area. It is a probability
sampling method in which each sampling unit is a collection or cluster of elements, which
are close together. For example, it may be a collection or groups of households that are
geographically close together. The first task in cluster sampling is to specify appropriate
clusters. For example, dividing the city into regions such as blocks, areas or clusters of
elements, and then select a simple random sample of blocks from the population (here is
where we apply the probability theory when selecting the random blocks from the total
blocks available.
Suppose you want to determine the views of residents in a district about FISP policies.
Selecting a random sample of residents in the district and personally contacting each one
would be time consuming and very expensive. Instead, you could employ cluster sampling by
subdividing the district into small units-either TAs or EPAs. These are often called primary
units. Suppose you divided the districts into 5 primary units, then selected at random 3
TAs and concentrated your efforts in these primary units. You could take a random sample
of the residents in each of these regions and interview them.
• Law of Large Numbers (LLN) - Draw observations at random from any population
with finite mean x̄. As the number of observations drawn increases, the mean x of the
observed values tends to get closer and closer to the mean µ of the population.
• The weak law of large numbers (WLLN) (also called Khinchin’s law) states that
the sample average converges in probability towards the expected value
p
x̄n → µ when n → ∞
That is, for any positive number ϵ,
lim Pr(|x̄n − µ| < ϵ) = 1
x→∞
Interpreting this result, the weak law states that for any nonzero margin specified
(ϵ), no matter how small, with a sufficiently large sample there will be a very high
probability that the average of the observations will be close to the expected value;
that is, within the margin.
• The strong law of large numbers (SLLN) (also called Kolmogorov’s law) states
that the sample average converges almost surely to the expected value
a.s
x̄n → µ when n → ∞
That is
Pr( lim x̄n = µ) = 1
x→∞
What this means is that the probability that, as the number of trials n goes to infinity,
the average of the observations converges to the expected value, is equal to one.
x̄ 30 33 36 39 42
P (x̄) 0.2 0.2 0.2 0.2 0.2
1521 × 0.2 + 1764 × 0.2 − 1296 = 1314 − 1296 = 18. The population standard deviation is
the square root of 18, or 4.24.
The number of samples of size 3 possible from this population is equal to the number of
combinations possible when selecting three cities from five. The number of possible samples
5!
is C35 = 3!2! = 10. Using the letters A, B, C, D, and E rather than the name of the town,
Table 4 gives all the possible samples of three towns, the sample values, and the means of
the samples.
Table 4. Possible samples
The sampling distribution of the mean is obtained from Table 4. For random sampling,
each of the samples in Table 3 is equally likely to be selected. The probability of selecting a
sample with mean 39 is 0.1 since only one of 10 samples has a mean of 39. The probability
of selecting a sample with mean 36 is 0.2, since two of the samples have a mean equal to 36.
Table 4 gives the sampling distribution of the sample mean.
Table 5. Sampling distribution
x̄ 33 34 35 36 37 38 39
P (x̄) 0.1 0.1 0.2 0.2 0.2 0.1 0.1
The previous section discussed sampling methods that can be used to select a sample that
is a fair or unbiased representation of the population. In each method, it is important to
note that the selection of every possible sample of a specified size from a population has a
known chance or probability. This is another way to describe an unbiased sampling method.
Samples are used to estimate population characteristics. For example, the mean of a sample
is used to estimate the population mean. However, since the sample is a part or portion of
the population, it is unlikely that the sample mean would be exactly equal to the population
mean. Similarly, it is unlikely that the sample standard deviation would be exactly equal to
the population standard deviation. We can therefore expect a difference between a sample
statistic and its corresponding population parameter. This difference is called sampling error,
Sampling error = |x̄ − µ|, the population of the five towns with the most nigerian-owned
businesses is given. The mean of this population is 36. Table 6 gives the possible sample
means for all samples of size 3 selected from this population. Table 6 gives the sampling
errors and probabilities associated with all the different sample means.
Table 6. Sampling error
From Table 6, it is seen that the probability of no sampling error in this scenario is 0.2.
There is a 60% chance that the sampling error is 1 or less.
x̄ 33 34 35 36 37 38 39
P (x̄) 0.1 0.1 0.2 0.2 0.2 0.1 0.1
The mean of the sample mean is found as follows:µx̄ = Σx̄p(x̄) = 33 × 0.1 + 34 × 0.1 + 35 ×
0.2 + 36 × 0.2 + 37 × 0.2 + 38 × 0.1 + 39 × 0.1 = 36. The variance of the sample mean is
found as follows: σx̄2 = Σx̄2 p(x̄ − µ2x̄ ) = 1089 × 0.1 + 1156 × 0.1 + 1225 × 0.2 + 1296 × 0.2 +
1369 × 0.2 +
√1444 × 0.1 + 1521 × 0.1 − 1296 = 3. The standard error of the mean, sigmax̄ ,
is equal to 3 = 1.73.
the mean of this distribution is µbarx = µ, and the standard deviation of this distribution
is σbarx = √σn . The shape of the distribution of the sample means is normal or bell-shaped
regardless of the sample size. The central limit theorem states that when sampling from
a large population of any distribution shape, the sample means have a normal distribution
whenever the sample size is 30 or more. Furthermore, the mean of the distribution of sample
means is µbarx = µ, and the standard deviation of this distribution is σbarx = √σn . It is
important to note that when sampling from a non-normal distribution, x̄ has a normal
distribution only if the sample size is 30 or more.
1.7 Determining the sample size for the estimation of the popu-
lation mean
Furthermore, two important questions on the designing of any sample survey inquiry are the
total cost of the survey and the precision of the main estimates. Both these are related to
the size of the sample; given the variability of the data, type of sampling and the method of
estimation. Keep in mind that the ultimate goal is, the survey should be designed to provide
estimates with minimum sampling error (meaning, estimates with maximum precision that
are closer to the population parameter) when the total cost is fixed; and a sample size that
fulfills these conditions is called the optimal sample size (Edriss 2012). This sub-section
provides methods of determining optimal sample size.
The maximum error of estimate when using x̄ as a point estimate of µ is defined by formula:
E = (z)(σx̄ )
If the maximum error of estimate is specified, then the sample size necessary to give this
maximum error may be determined from formula from the formula above. Replacing σx̄ by
√σ to obtainE = z √σ , and then solving for n we obtain formula below. The value obtained
n n
for n is always rounded up to the next whole number. This is a conservative approach and
sometimes results in a sample size larger than actually needed.
n = (z 2 σ 2 )/E 2
Example: A sociologist desires to estimate the mean age of teenagers having abortions. She
wishes to estimate µ with a 99% confidence interval so that the maximum error of estimate
is E = 0.1 years. The z value is 2.58. The range of ages for teenagers is 19 − 13 = 6 years.
A rough estimate of the standard deviation is obtained by dividing the range by 4 to obtain
1.5 years. This method for estimating σ works best for mound shaped distributions. The
approximate sample size is
1.8 Determining the sample size for the estimation of the popu-
lation proportion
When p̄ is used as a point estimate of p, the maximum error of estimate is given by
E = zσ p̄
If the maximum error of estimate is specified, then the sample size necessary
q to give this
maximum error may be determined from formula below. If σp̄ is replaced by pq/n and the
resultant equation is solved for n, we obtain
n = (z 2 pq)/E 2
Since p and $q $are usually unknown, they must be estimated when formula above is used
to determine a sample size to give a specified maximum error of estimate. If a reasonable
estimate of p and q exists, then the estimate is used in the formula. If no reasonable estimate
is known, then both p and q are replaced by 0.5. This gives a conservative estimate for
n. That is, replacing p and q by 0.5 usually gives a larger sample size than is needed, but it
covers all cases so to speak.
Unit Activity:
1. A district wishes to obtain a measure of the agribusiness competency of the farmers.
The district consists of 40 different farmer field schools. Describe how you could use
cluster sampling to obtain a measure of the agribusiness competency of the farmer field
schools participants in the district.
Solution: Randomly select a small number, say 5, of the 40 farmer field schools. Then
administer a test of agribusiness competency to all participants in the 5 selected schools.
The test results from the 5 schools constitute a cluster sample.
2. Refer to Problem 1 above. Explain how you could use stratified sampling to determine
the agribusiness competency of the farmer field schools participants in the district.
Solution: Consider each of the 40 farmer field schools as a stratum. Randomly select
a sample from each farmer field school proportional to the number of students in that
farmer field school and administer the agribusiness competency test to the selected
students. Note that even though stratified sampling could be used, cluster sampling
as described in Problem 1 would be easier to administer.
3. A large study is undertaken to estimate the percentage of students at tertiary education
institutions who attend SCOM meetings. Other studies have indicated that the percent
ranges between 30% and 35%. How large a sample is needed in order to estimate the
true percentage for all such students with a maximum error of estimate equal to 0.5%
with a probability of 0.90?
Answer: sample size is 24,775
4. A drug store chain has stores located in 5 cities as follows: 20 stores in Mzuzu, 40 stores
in Lilongwe, 20 stores in Kasungu, 10 stores in Zomba, and 30 stores in Blantyre. In
order to estimate pharmacy sales, the following number of stores are randomly selected
from the 5 cities: 4 from Mzuzu, 8 from Lilongwe, 4 from Kasungu, 2 from Zomba,
and 6 from Blantyre. What type of sampling is being used?
Answer: stratified sampling
5. When is it appropriate to use stratified random sampling? What are strata, and how
should strata be selected?
• When the population consists of two or more groups that differ with respect to
the variable of interest.
• Strata are nonover-lapping groups of similar units.
• Strata should be chosen so that the units in each strata are similar on some
characteristic (often a categorical variable).
6. When is cluster sampling used? Why do we describe this type of sampling by using
the term cluster?
• Cluster sampling is often used when selecting a sample from a large geographical
region. Because at each stage we “cluster” units into subpopulations
7. Explain how to take a systematic sample of 100 companies from the 1,853 companies
that are members of an industry trade association.
• First divide 1853 by 100 and round down to 18. We randomly select 1 company
from the first 18 (in a list of all the companies). From the company selected we
simply count down 18 to get to the next company to select. We continue this
process until we have reached a sample size of 100
8. Explain how a stratified random sample is selected. Discuss how you might define the
strata to survey student opinion on a proposal to charge all students a MK20,000 fee for
a new university-run bus system that will provide transportation between off-campus
apartments and campus locations.
• A stratified random sample is selected by dividing the population into some num-
ber of strata, and then randomly sampling inside each strata. Potential strata:
students who live off campus and students who live on campus.
9. Marketing researchers often use city blocks as clusters in cluster sampling. Using this
fact, explain how a market researcher might use multistage cluster sampling to select a
sample of consumers from all cities having a population of more than 10,000 in Malawi.
• List all cities with population > 10,000. In each city, randomly select a number
of city blocks. In each city block, take a random sample of individuals.
10. When a pizza restaurant’s delivery process is operating effectively, pizzas are delivered
in an average of 45 minutes with a standard deviation of 6 minutes. To monitor its
delivery process, the restaurant randomly selects five pizzas each night and records
their delivery times. a For the sake of argument, assume that the population of all
delivery times on a given evening is normally distributed with a mean of minutes and
a standard deviation of minutes. (That is, we assume that the delivery process is
operating effectively.) Find the mean and the standard deviation of the population of
all possible sample means.