0% found this document useful (0 votes)
82 views47 pages

STA 410 Lecture Notes

Lecturer notes

Uploaded by

gitaumoseskiarie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views47 pages

STA 410 Lecture Notes

Lecturer notes

Uploaded by

gitaumoseskiarie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

STA 410: Sample Surveys Theory and Methods

Prof. Davis Bundi


[email protected]
Department of Mathematics, University of Nairobi

September 2024
Contents

1 Introduction 4
1.1 Introduction to Sample Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Sample Surveys Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Types of populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Sampling frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Importance of sampling frames . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.7 Challenges with sampling frames . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Uses, scope, and advantages of sample surveys . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Uses of sample surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Scope of sample surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Advantages of sample surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Key Considerations in sample surveys . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Probability sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Types of probability sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Non-Probability sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Types of Non-probability sampling . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Comparison of probability and non-probability sampling . . . . . . . . . . . . . . . . . 9
1.5.1 Probability sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Non-probability sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Simple random sampling (SRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6.1 Key concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6.2 Advantages of simple random sampling . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Types of simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.1 Simple random sampling with replacement (SRSWR) . . . . . . . . . . . . . . 10
1.7.2 Simple random sampling without replacement (SRSWOR) . . . . . . . . . . . . 11
1.7.3 SRSWR example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7.4 SRSWOR example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8.1 SRSWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8.2 SRSWOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9.1 SRSWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9.2 SRSWOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.10 Chapter Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1
1.11 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Estimators, Accuracy, and Sample Size 15


2.1 Estimators, accuracy level and estimation of sample size . . . . . . . . . . . . . . . . . 15
2.1.1 Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Key properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Accuracy Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Estimation of sample size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Sample size estimation for means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Sample size estimation for proportions . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Sample size estimation for comparing two means . . . . . . . . . . . . . . . . . . . . . 19
2.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Sample size estimation for comparing two proportions . . . . . . . . . . . . . . . . . . 19
2.7.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Statistical power and effect size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8.1 Sample size estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9 Chapter Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.10 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Sampling Methods 28
3.1 Stratified Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Advantages of stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Steps in stratified random sampling . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.3 Objective of optimal allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.4 Neyman allocation formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.5 Benefits of Neyman allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.6 Estimation of population mean and variance . . . . . . . . . . . . . . . . . . . 31
3.1.7 Numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Introduction to systematic sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Steps in systematic sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Advantages of systematic sampling . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3 Disadvantages of systematic sampling . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 One-stage cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Two-stage cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Multi-stage cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Criteria for choosing a sampling design . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Population Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.3 Sampling frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.5 Desired precision and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.6 Method of data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2
3.4.7 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.8 Example: Choosing a sampling design . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Chapter Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Chapter Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Project 43
4.1 Project Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Sample Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Sample Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5 Diagrams 47
5.1 Systematic sampling diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Diagram of Cluster Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Stratified Sampling Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Diagram of Multi-stage sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

• Click Here to access the reference materials

3
Chapter 1

Introduction

1.1 Introduction to Sample Survey


Sample surveys theory is a branch of statistics that deals with the methods and principles for selecting
and analyzing a sample to infer properties about a population.

• Sample survey theory and methods encompass a range of principles and techniques used to collect,
analyze, and interpret data from a subset (sample) of a larger population. The goal is to make
inferences about the population based on the sample data. This field is crucial in statistics, social
sciences, market research, public health, and many other domains.

• Sample survey theory and methods provide a scientific approach to collecting and analyzing data,
enabling researchers to draw valid conclusions about larger populations from smaller, manageable
samples.

1.1.1 Basic concepts


• Population: The entire group of individuals or items of interest.

• Sample: A subset of the population selected for measurement.

• Sampling Frame: A list or database from which the sample is drawn.

• Parameter: A numerical characteristic of the population (e.g., mean, proportion).

• Statistic: A numerical characteristic of the sample used to estimate the population parameter.

1.1.2 Sample Surveys Theory


1.1.3 Population
• The term ”population” in statistics refers to the entire group of individuals or items that we are
interested in studying.

• A population can be finite or infinite.

• Examples:

4
– All the students in a university (finite population).
– All possible outcomes of rolling a die (finite population).
– All stars in the universe (infinite population).

1.1.4 Types of populations


• Target population: The entire group about which we want to draw conclusions.

• Study population: The group from which we actually collect data, which should ideally be a
representative subset of the target population.

1.1.5 Sampling frame


• A sampling frame is a list or a device used to define a researcher’s population of interest. It is a
practical representation of the population from which a sample can be drawn.

• Examples:

– A list of registered voters in a city.


– A customer database of a retail store.
– An employee roster of a company.

1.1.6 Importance of sampling frames


• Ensures that every member of the population has a chance to be included in the sample.

• Helps in minimizing selection bias.

• Provides a basis for defining the population of interest.

1.1.7 Challenges with sampling frames


• The sampling frame may not include all elements of the population.

• Some elements may be listed more than once.

• The sampling frame may not be up to date, leading to inaccuracies.

1.2 Uses, scope, and advantages of sample surveys


1.2.1 Uses of sample surveys
• To gather data on characteristics, behaviors, and opinions of a population.

• To estimate population parameters (e.g., mean, proportion).

• To test hypotheses and make inferences about a population.

• To monitor and evaluate programs and interventions.

5
1.2.2 Scope of sample surveys
• Descriptive surveys: Aim to describe the characteristics of a population at a given point in
time.

– Example: A survey to determine the average income of households in a city.

• Analytical surveys: Aim to explore relationships between variables and test hypotheses.

– Example: A survey to investigate the relationship between exercise habits and health out-
comes.

• Exploratory surveys: Used for exploratory research to generate hypotheses.

– Example: A survey to identify potential factors affecting employee satisfaction.

• Causal surveys: Aim to identify cause-and-effect relationships.

– Example: A survey to assess the impact of a training program on employee performance.

1.2.3 Advantages of sample surveys


• Cost-effective: Surveys based on samples are usually less expensive than those based on entire
populations.

• Time-saving: Collecting data from a sample is quicker than from an entire population.

• Manageable data collection: Handling a smaller amount of data is more practical and man-
ageable.

• Greater accuracy: If properly designed, sample surveys can provide more accurate and reliable
results than attempting to survey an entire population due to better quality control.

• Feasibility: Some populations are too large or inaccessible to survey in their entirety, making
sampling the only feasible option.

1.2.4 Key Considerations in sample surveys


• Sampling method: The method used to select the sample (e.g., random sampling, stratified
sampling) affects the validity of the survey results.

• Sample size: Larger sample sizes generally lead to more accurate estimates but also increase
costs.

• Non-response: Non-response can introduce bias if certain groups are underrepresented.

• Questionnaire design: The design of the survey questionnaire can significantly impact the
quality of the data collected.

6
1.3 Probability sampling
1.3.1 Definition
Probability sampling is a sampling technique where each member of the population has a known,
non-zero chance of being selected in the sample.

1.3.2 Types of probability sampling


Simple random sampling (SRS)
Every member of the population has an equal chance of being selected.

• Example: Selecting 10 students from a class of 50 by drawing names from a hat.

Systematic sampling
Every k-th member of the population is selected after a random start.

• Example: Selecting every 5th person on an alphabetical list of employees after starting at a
random point.

Stratified sampling
The population is divided into homogeneous subgroups (strata) and random samples are taken from
each stratum.

• Example: Dividing a population into male and female groups and then randomly selecting an
equal number of participants from each group.

Cluster sampling
The population is divided into clusters, some clusters are randomly selected, and all members of selected
clusters are included in the sample.

• Example: Dividing a city into blocks and then randomly selecting certain blocks, including all
households within those blocks.

Multistage sampling
A combination of different sampling methods used in various stages.

• Example: Using cluster sampling to select schools and then using simple random sampling to
select students within those schools.

1.4 Non-Probability sampling


Non-probability sampling is a sampling technique where some elements of the population have no chance
of being selected, or the probability of selection is unknown.

7
1.4.1 Types of Non-probability sampling
Convenience sampling
Samples are chosen based on their ease of access.

• Example: Surveying people at a shopping mall because it is easy to find participants.

Judgmental/Purposive sampling
Samples are selected based on the researcher’s knowledge and judgment.

• Example: Choosing experts in a field to participate in a study.

Quota sampling
The population is segmented into mutually exclusive sub-groups, and then a non-random set of obser-
vations is chosen from each subgroup.

• Example: Ensuring that a survey includes a certain number of men and women based on their
proportion in the population.

Snowball sampling
Existing study subjects recruit future subjects from among their acquaintances.

• Example: A researcher studying drug abuse starts with known users who then refer other users.

1.5 Comparison of probability and non-probability sampling


1.5.1 Probability sampling
• Allows for generalization of results to the population.

• Reduces sampling bias.

• Requires a complete sampling frame.

• More complex and time-consuming.

1.5.2 Non-probability sampling


• Easier and faster to implement.

• Useful in exploratory research.

• Cannot reliably generalize results to the population.

• Higher risk of sampling bias.

8
1.6 Simple random sampling (SRS)
Simple Random Sampling (SRS) is a basic sampling technique where each element in the population
has an equal chance of being selected in the sample.

1.6.1 Key concepts


• Population: The entire group of individuals or items that we are interested in studying.

• Sample: A subset of the population selected for observation and analysis.

• Sampling Frame: A list or representation of all elements in the population from which the
sample is drawn.

1.6.2 Advantages of simple random sampling


1. Simple random sampling ensures that every member of the population has an equal probability
of being included in the sample. This equal probability minimizes selection bias and results in a
sample that is representative of the entire population.

2. The process of selecting a simple random sample is straightforward. It can be achieved using
random number generators or drawing lots, making it easier to implement compared to more
complex sampling methods.

3. Data analysis is simplified with simple random sampling. Since each observation is equally likely,
standard statistical techniques and formulas can be applied without the need for complex adjust-
ments.

4. Simple random sampling tends to have lower variance in estimates of population parameters com-
pared to non-random sampling methods. This occurs because the variability is spread uniformly
across the sample, resulting in more stable estimates.

5. Simple random sampling can be easily adapted to various types of data and research designs.
It does not require specific information about the population structure, making it versatile for
different research contexts.

6. Simple random sampling provides a foundation for more advanced statistical methods, including
stratified sampling and cluster sampling. Understanding simple random sampling is essential for
implementing and interpreting these more complex methods.

Simple random sampling is a fundamental method in statistics with several advantages, including unbi-
ased representation, ease of implementation, simplified analysis, equal variance, flexibility, and its role
as a basis for advanced techniques. These advantages make it a preferred choice in many research and
data collection scenarios.

9
1.7 Types of simple random sampling
1.7.1 Simple random sampling with replacement (SRSWR)
• In SRSWR, each element in the population is returned to the population after it is selected. This
allows the same element to be selected more than once.

• Procedure:

1. Identify the population size N .


2. Determine the sample size n.
3. Randomly select an element from the population.
4. Record the selected element and return it to the population.
5. Repeat steps 3 and 4 until the sample size n is reached.

• Advantages:

– Each selection is independent of the others.


– Easier to handle mathematically, especially for theoretical work.

• Disadvantages:

– May include the same element multiple times, which might not be practical in some cases.

1.7.2 Simple random sampling without replacement (SRSWOR)


• In SRSWOR, each element in the population can be selected only once. Once selected, it is not
returned to the population.

• Procedure:

1. Identify the population size N .


2. Determine the sample size n.
3. Randomly select an element from the population.
4. Record the selected element and remove it from the population.
5. Repeat steps 3 and 4 until the sample size n is reached.

• Advantages:

– Ensures that each element in the sample is unique.


– More representative of the population if duplicates are not desirable.

• Disadvantages:

– Each selection is not independent, making the analysis slightly more complex.

10
1.7.3 SRSWR example
• Suppose we have a population of 5 students: A, B, C, D, E.

• We want to select a sample of 3 students with replacement.

• Randomly selected students could be A, C, C.

1.7.4 SRSWOR example


• Suppose we have the same population of 5 students: A, B, C, D, E.

• We want to select a sample of 3 students without replacement.

• Randomly selected students could be A, C, D.

1.8 Applications
1.8.1 SRSWR
• Useful in scenarios where the population size is small, and repeated measurements are acceptable
(e.g., quality control processes).

1.8.2 SRSWOR
• Commonly used in surveys and opinion polls where duplicate selections are not desirable (e.g.,
market research).

1.9 Mathematical formulation


1.9.1 SRSWR
1
• Probability of selecting any specific element in one draw: N
.
1
• For n draws, the probability remains N
for each draw, independent of previous draws.

In SRSWR, each member of the population is included in the sample with a probability of Nn , where
n is the sample size and N is the population size. The variance of the sample mean X̄ is given by:

σ2
Var(X̄) = , where σ 2 is the population variance.
n

1.9.2 SRSWOR
1
• Probability of selecting any specific element on the first draw: N
.
1
• Probability of selecting a different element on the second draw: N −1
, and so on.

• The selection probability changes with each draw, making it dependent on previous selections.

11
In SRSWOR, the variance of the sample mean is adjusted for the fact that sampling without
replacement reduces the variance of the sample mean compared to sampling with replacement. The
variance is given by:
( )
σ2 N − n
Var(X̄) =
n N −1
where σ 2 is the population variance, n is the sample size, and N is the population size.

Numerical example
Consider a population of size N = 100 with a population variance σ 2 = 25. Suppose we draw a sample
of size n = 10.

For SRSWR
σ2 25
Var(X̄) = = = 2.5
n 10

For SRSWOR
( ) ( )
σ2 N −n 25 100 − 10 90
Var(X̄) = = = 2.5 × ≈ 2.2727
n N −1 10 100 − 1 99

1.10 Chapter Example


A researcher wants to conduct a survey to estimate the average income of residents in a city. The
total population of the city is 100,000 residents. The researcher creates a sampling frame consisting of
10,000 residents randomly selected from a city database.
If the researcher randomly selects a sample of 200 residents from this sampling frame, what is the
probability that a randomly chosen resident from the sampling frame is included in the sample?

Solution
To determine the probability that a randomly chosen resident from the sampling frame is included in
the sample, we use the formula for the probability of inclusion in a simple random sample. Given:

• Total number of residents in the sampling frame (N) = 10,000

• Number of residents in the sample (n) = 200

The probability of inclusion for a randomly chosen resident in the sampling frame is given by:
n
Probability = (1.1)
N
Substitute the given values:
200
Probability = = 0.02 (1.2)
10, 000
So, the probability that a randomly chosen resident from the sampling frame is included in the sample
is 0.02, or 2%.

12
1.11 Chapter Exercises
1. Simple random sampling: A population consists of 1000 individuals. A simple random sample of
100 individuals is selected. What is the probability that a particular individual is included in the
sample?

2. Population mean: The weights (in kg) of a population of 5 individuals are as follows: 60, 65, 70,
75, and 80. Calculate the population mean.

3. Sample mean: From the population given in Question 2, a simple random sample of 3 individuals
is drawn. The weights of the selected individuals are 60, 70, and 75. Calculate the sample mean.

4. Standard error of the mean: A population has a mean (µ) of 50 and a standard deviation (σ) of
10. If a sample of 25 individuals is selected, what is the standard error of the mean?

5. Population Size and Sample Selection: Suppose a researcher wants to conduct a survey on the
health habits of university students in a large city. The city has 50,000 university students. The
researcher decides to select a sample of 500 students. What is the sampling fraction for this
study?

6. Sampling frame accuracy: A researcher is studying the impact of a new teaching method on high
school students. The sampling frame consists of 1,200 high school students, but it is estimated
that 10% of the sampling frame is outdated or incorrect. If the researcher randomly selects
a sample of 100 students from this frame, what is the expected number of students from the
sampling frame who may be outdated or incorrect?

7. Estimation of population parameter: A random sample of 200 households is taken from a city
with 10,000 households. If the sample mean for monthly grocery expenditure is $400 with a
sample standard deviation of $50, estimate the total monthly grocery expenditure for the entire
population of households in the city. Assume the sample is representative of the population.

13
Chapter 2

Estimators, Accuracy, and Sample Size

2.1 Estimators, accuracy level and estimation of sample size


2.1.1 Estimator
An estimator is a rule or formula that tells us how to calculate an estimate of a population parameter
based on sample data.

2.1.2 Key properties


Unbiasedness
An estimator is unbiased if its expected value is equal to the true value of the population parameter.

• Example: The sample mean (x̄) is an unbiased estimator of the population mean (µ).

Consistency
An estimator is consistent if, as the sample size increases, the estimator converges in probability to the
true value of the parameter.

• Example: The sample mean (x̄) becomes closer to the population mean (µ) as the sample size
increases.

Efficiency
An estimator is efficient if it has the smallest variance among all unbiased estimators of the parameter.

• Example: Among all unbiased estimators, the sample mean (x̄) has the smallest variance for
estimating the population mean (µ).

Sufficiency
An estimator is sufficient if it captures all the information about the parameter contained in the sample.

• Example: The sample mean (x̄) is a sufficient estimator for the population mean (µ) when
sampling from a normal distribution.

14
Robustness
An estimator is robust if it remains relatively unaffected by small deviations from model assumptions.
• Example: The median is a robust estimator of central tendency, especially in the presence of
outliers.

Examples of estimators
• Mean (x̄); Used to estimate the population mean (µ)

• Proportion (p̂): Used to estimate the population proportion (p)

• Variance (s2 ): Used to estimate the population variance (σ 2 )

2.2 Accuracy Level


Accuracy level refers to how close an estimate is to the true value of the population parameter. It is
often quantified using the concepts of bias and variance.

2.2.1 Key Concepts


Bias
The difference between the expected value of the estimator and the true value of the parameter.
• Low Bias: Indicates that the estimator is accurate on average.

Variance
The variability of the estimator due to random sampling.
• Low Variance: Indicates that the estimator is stable and provides similar results for different
samples.

Mean squared error (MSE)


Combines bias and variance to provide a single measure of accuracy.

MSE(θ̂) = Bias2 (θ̂) + Variance(θ̂)

2.3 Estimation of sample size


• Estimation of sample size involves determining the number of observations or replicates required
to achieve a desired level of accuracy and confidence in the estimates.

• Estimating sample size is a crucial step in the design of any statistical study. It involves deter-
mining the number of observations or replicates necessary to achieve a desired level of accuracy
and confidence in the results. The sample size depends on the research objectives, the variability
in the population, and the precision required for the estimates.

15
• Sample size estimation is a critical step in designing experiments and surveys. It ensures that
the study has enough power to detect a statistically significant effect if one exists. The required
sample size depends on several factors, including the desired confidence level, the power of the
test, the effect size, and the variability in the data. In this lecture, we will discuss the different
methods for estimating sample sizes for various types of data and study designs.

2.3.1 Key Concepts


Margin of error (E)
The maximum acceptable difference between the sample estimate and the true population parameter.
• Formula: E = Zα/2 × √σ
n

Confidence level (1 - α)
The probability that the interval estimate will contain the true parameter.
• Common confidence levels are 90%, 95%, and 99%.

Sample size formula for mean


( )2
Zα/2 × σ
n=
E
Where Zα/2 is the critical value from the standard normal distribution, σ is the population standard
deviation, and E is the margin of error.

Sample size formula for proportion


2
Zα/2 × p(1 − p)
n=
E2
Where p is the estimated population proportion.

2.4 Sample size estimation for means


To estimate the sample size for a study where the goal is to estimate the population mean, we can use
the following formula: ( )
Zα/2 · σ 2
n=
E
where:
• n is the required sample size.
• Zα/2 is the critical value of the normal distribution at the desired confidence level (e.g., 1.96 for
95% confidence).
• σ is the population standard deviation.
• E is the desired margin of error.

16
2.4.1 Example
Suppose we want to estimate the average height of university students with a 95% confidence level and
a margin of error of 2 cm. If the population standard deviation is known to be 10 cm, the required
sample size is: ( )2
1.96 · 10
n= = 96.04
2
Therefore, we need a sample size of at least 97 students.

2.5 Sample size estimation for proportions


For estimating a population proportion, the sample size can be estimated using the formula:
2
Zα/2 · p · (1 − p)
n=
E2
where:
• n is the required sample size.
• Zα/2 is the critical value of the normal distribution at the desired confidence level.
• p is the estimated population proportion.
• E is the desired margin of error.

2.5.1 Example
Suppose we want to estimate the proportion of university students who prefer online classes with a
95% confidence level and a margin of error of 5%. If we expect the proportion to be around 50%, the
required sample size is:
1.962 · 0.5 · (1 − 0.5)
n= = 384.16
0.052
Therefore, we need a sample size of at least 385 students.

2.6 Sample size estimation for comparing two means


When comparing the means of two independent groups, the sample size can be estimated using the
formula:
2 · (Zα/2 + Zβ )2 · σ 2
n=
∆2
where:
• n is the sample size for each group.
• Zα/2 is the critical value of the normal distribution at the desired confidence level.
• Zβ is the critical value for the desired power (e.g., 0.84 for 80% power).
• σ is the pooled standard deviation of the two groups.
• ∆ is the minimum detectable difference between the two means.

17
2.6.1 Example
Suppose we want to compare the average exam scores of two teaching methods with a 95% confidence
level and 80% power. If the pooled standard deviation is 15 points and we want to detect a difference
of 5 points, the required sample size for each group is:

2 · (1.96 + 0.84)2 · 152


n= = 87.68
52
Therefore, we need a sample size of at least 88 students in each group.

2.7 Sample size estimation for comparing two proportions


For comparing two proportions, the sample size can be estimated using the formula:

(Zα/2 + Zβ )2 · (p1 · (1 − p1 ) + p2 · (1 − p2 ))
n=
(p1 − p2 )2

where:

• n is the sample size for each group.

• Zα/2 is the critical value of the normal distribution at the desired confidence level.

• Zβ is the critical value for the desired power.

• p1 and p2 are the estimated proportions in each group.

• p1 − p2 is the minimum detectable difference between the two proportions.

2.7.1 Example
Suppose we want to compare the proportion of smokers between two age groups with a 95% confidence
level and 80% power. If we expect the proportions to be 20% and 30% in the two groups, the required
sample size for each group is:

(1.96 + 0.84)2 · (0.2 · (1 − 0.2) + 0.3 · (1 − 0.3))


n= = 195.94
(0.2 − 0.3)2

Therefore, we need a sample size of at least 196 participants in each group.

2.8 Statistical power and effect size


Statistical power, effect size, and sample size are interrelated concepts crucial in designing experi-
ments and studies. Understanding these concepts helps ensure that a study is well-designed to detect
meaningful effects if they exist.

18
Statistical power
The probability that a test will correctly reject a false null hypothesis (i.e., the probability of avoiding
a Type II error).

• Importance: High power reduces the risk of concluding that there is no effect when one actually
exists.

• Typical value: Researchers often aim for a power of 0.80 or 80%, which means there’s an 80%
chance of detecting an effect if it exists.

Effect size
A measure of the strength of the relationship between two variables or the magnitude of the difference
between groups.

Types of effect size


• Cohen’s d: Used for measuring the effect size between two means.

• Pearson’s r: Used for measuring the strength of the correlation between two variables.

• Odds ratio: Used in logistic regression to measure the association between an exposure and an
outcome.

Importance:Helps to understand the practical significance of a study’s findings, not just the statistical
significance.

2.8.1 Sample size estimation


Purpose: To determine the number of observations or subjects needed to detect an effect of a given
size with a certain level of confidence.

Factors influencing sample size:


• Effect size: Larger effects are easier to detect, requiring smaller sample sizes.

• Significance level (α): The probability of rejecting the null hypothesis when it is true (Type I
error). Common values are 0.05 or 0.01.

• Power (1 − β): As discussed, higher power requires a larger sample size.

• Variability: More variability in the data requires a larger sample size to detect the effect.

19
Table 2.1: Type I and Type II error

Accept Ho Reject Ho
If Ho is correct Correct decision (1 − α) Type I error (α)
If Ho is not correct Type II error (β) Correct decision (1 − β)

Example: Estimating sample size for a two-Sample t-Test


We want to compare the mean blood pressure between two groups (treatment and control).
Parameters:

• Effect size (Cohen’s d): 0.5 (moderate effect)

• Significance level (α) : 0.05

• Power (1 − β) : 0.80

Using a standard sample size formula for a two-sample t-test:

2(σ 2 )(Zα/2 + Zβ )2
n=
∆2
where:

• σ 2 is the variance (assumed to be equal for both groups),

• Zα/2 and Zβ are the z-values corresponding to the significance level and power, respectively,

• ∆ is the difference in means.

Example: Estimating sample size for proportion testing


We want to estimate the proportion of a population that supports a new policy.
Parameters:

• Estimated proportion (p): 0.50 (conservative estimate)

• Margin of error (E): 0.05

• Significance level (α): 0.05

Using the sample size formula for a proportion:


2
Zα/2 · p · (1 − p)
n=
E2

20
2.9 Chapter Examples
Question 1: Properties of estimators
A researcher is using a sample mean (X̄) to estimate the population mean (µ). Suppose the sample
mean from a random sample of 50 observations is 75, and the population standard deviation is 10.

1. Calculate the standard error of the sample mean.

2. What is the 95% confidence interval for the population mean?

Solution 1
1. The standard error (SE) of the sample mean is calculated as:
σ
SE = √
n

where σ is the population standard deviation and n is the sample size.


10
SE = √ ≈ 1.414
50

2. The 95% confidence interval is given by:

X̄ ± Zα/2 × SE

For a 95% confidence level, Zα/2 ≈ 1.96:

CI = 75 ± 1.96 × 1.414 ≈ (72.23, 77.77)

Question 2: Accuracy level


A company wants to estimate the average time taken by their employees to complete a task with a
90% accuracy level. If the desired margin of error is 5 minutes and the population standard deviation
is known to be 20 minutes, what should be the minimum sample size needed?

Solution 2
The sample size (n) required for a given margin of error (E) and confidence level can be calculated
using: ( )
Zα/2 × σ 2
n=
E
For a 90% confidence level, Zα/2 ≈ 1.645:
( )2
1.645 × 20
n= ≈ 43.1
5

So, the minimum sample size needed is 44 (rounded up).

21
Question 3: Estimation of sample size
A researcher wants to estimate the proportion of voters supporting a particular candidate in an election
with a margin of error of 0.03 and a confidence level of 95%. If the estimated proportion is 0.6, what
is the required sample size?

Solution 3
The sample size for estimating proportions can be calculated using:
p(1 − p) × Zα/2
2
n=
E2
where p is the estimated proportion, E is the margin of error, and Zα/2 is the Z-value for the confidence
level. For a 95% confidence level, Zα/2 ≈ 1.96:
0.6 × (1 − 0.6) × 1.962
n= ≈ 1, 067
0.032
So, the required sample size is 1,067.

Question 4: Estimators and accuracy level


A researcher wants to estimate the mean income of households in a city. A sample of 100 households is
taken, and the sample mean income is found to be $50,000 with a sample standard deviation of $8,000.
Calculate the 95% confidence interval for the population mean income.

Solution 4
To calculate the 95% confidence interval for the population mean, we use the formula:
( )
σ
x̄ ± z √
n
where:
• x̄ is the sample mean,
• z is the z-value for the 95% confidence level (1.96),
• σ is the sample standard deviation,
• n is the sample size.
Given: x̄ = 50, 000, σ = 8, 000, n = 100, z = 1.96
The margin of error (E) is:
( )
8000
E = 1.96 √ = 1.96 × 800 = 1568
100
Thus, the 95% confidence interval is:
50, 000 ± 1, 568
So, the 95% confidence interval for the population mean income is:
(48, 432, 51, 568)

22
Question 5: Estimation of sample size
A survey is being planned to estimate the proportion of voters who support a new policy. The desired
margin of error is 3%, and the confidence level is 95%. Estimate the minimum sample size required if
the estimated proportion is 0.5.

Solution 5
The formula to estimate the required sample size for a proportion is:

z 2 · p · (1 − p)
n=
E2
where:

• z is the z-value for the 95% confidence level (1.96),

• p is the estimated proportion,

• E is the margin of error.

Given: z = 1.96, p = 0.5 and E = 0.03


The sample size (n) is:

1.962 · 0.5 · (0.5) 3.8416 · 0.25 0.9604


n= = = ≈ 1067.11
0.032 0.0009 0.0009
Since the sample size must be an integer, we round up to the next whole number:

n ≈ 1068

Thus, the minimum sample size required is 1068.

Question 6: Power and effect size


Suppose you want to conduct a study to compare the means of two independent groups using a t-test.
You expect a medium effect size (Cohen’s d = 0.5) and you aim for a statistical power of 0.80 with a
significance level of 0.05. Calculate the required sample size for each group.

Solution 6
: To estimate the sample size, we use the following formula for the two-sample t-test:
( )2
2 Zα/2 + Zβ σ 2
n=
d2
where:

• n is the sample size per group,

• Zα/2 is the Z-score for the significance level,

• Zβ is the Z-score for the statistical power,

23
• σ is the standard deviation of the outcome,

• d is the effect size (Cohen’s d).

Given:

• α = 0.05, so Zα/2 = 1.96 (for a two-tailed test),

• Power = 0.80, so Zβ = 0.84,

• d = 0.5,

• Assume σ = 1 (standardized effect size).

Plugging in the values:

2 (1.96 + 0.84)2 × 12
n=
0.52
2 × 7.84
n= = 62.72
0.25
You should round up to the next whole number, so n ≈ 63. Each group should have 63 participants.

Example 7: Power and effect size


You are planning a study to estimate the proportion of a population with a certain characteristic. You
expect a small effect size (proportion difference p1 − p2 = 0.1) and want a power of 0.90 with a
significance level of 0.01. Calculate the required sample size for each group.

Solution
: For estimating sample size for comparing two proportions, we use the following formula:
( )2
Zα/2 + Zβ [p1 (1 − p1 ) + p2 (1 − p2 )]
n=
(p1 − p2 )2
where:

• n is the sample size per group,

• Zα/2 is the Z-score for the significance level,

• Zβ is the Z-score for the statistical power,

• p1 and p2 are the proportions,

• p1 − p2 is the effect size (proportion difference).

Given:

• α = 0.01, so Zα/2 = 2.58 (for a two-tailed test),

• Power = 0.90, so Zβ = 1.28,

24
• p1 − p2 = 0.1,

• Assume p1 = 0.5 and p2 = 0.4.


Plugging in the values:

(2.58 + 1.28)2 [0.5 × (1 − 0.5) + 0.4 × (1 − 0.4)]


n=
0.12
14.41 [0.25 + 0.24]
n=
0.01
14.41 × 0.49
n= = 706.1
0.01
You should round up to the next whole number, so n ≈ 707. Each group should have 707 partici-
pants.

2.10 Chapter Exercises


1. Suppose X1 , X2 , . . . , Xn is a random
∑ sample from a distribution with mean µ and variance σ 2 .
Consider the estimator µ̂ = n1 ni=1 Xi . Is µ̂ an unbiased estimator of µ? Justify your answer.

2. Let X1 , X2 , . . . , Xn be a random sample


∑ from a normal distribution N (µ, σ 2 ). Show that the
n 2
variance of the sample mean µ̂ = n1 i=1 Xi is σn .

3. A random sample of size 25 is drawn from a normal population with an unknown mean µ and a
known standard deviation σ = 10. The sample mean is found to be X̄ = 50. Construct a 95%
confidence interval for µ.

4. You wish to estimate the mean cholesterol level in a population with a margin of error of 5 mg/dL
and a confidence level of 95%. The population standard deviation is known to be 20 mg/dL.
What sample size is required?

5. Define the consistency of an estimator and prove whether the sample mean µ̂ = n1 ni=1 Xi is a
consistent estimator of the population mean µ.

6. A researcher wants to estimate the proportion of voters in a large city who support a particular
candidate. She wants the estimate to be within 3% of the true proportion with 95% confidence.
What sample size is needed if no preliminary estimate of the proportion is available?

7. Suppose we want to estimate the mean income of a population. From previous studies, we know
that the standard deviation of income in the population is σ = $10, 000. We want our estimate
to be within $1,000 of the true mean with 95% confidence.

(a) What sample size is needed to achieve this level of accuracy?


(b) If we increase our desired confidence level to 99%, how does the required sample size change?

8. Power and effect size: A researcher wants to conduct a study to compare the means of two
independent groups. The researcher aims for a statistical power of 0.80 and expects a medium
effect size (Cohen’s d = 0.50). The significance level (α) is set at 0.05. Calculate the required
sample size for each group to detect the effect with the desired power.

25
9. A pharmaceutical company is testing the effectiveness of a new drug. They want to estimate the
proportion of patients who experience a positive effect from the drug. They desire a margin of
error of 3% at a 95% confidence level.

(a) What sample size should they use if they have no prior estimate for the proportion?
(b) How does the sample size change if they estimate the proportion to be around 0.5 based on
preliminary studies?

26
Chapter 3

Sampling Methods

3.1 Stratified Random Sampling


Stratified random sampling is a method of sampling that involves dividing a population into distinct
subgroups, known as strata, that are homogeneous with respect to certain characteristics. Each stratum
is then sampled independently using simple random sampling. The main objective of stratified random
sampling is to increase the precision of the sample estimates by reducing the variability within each
stratum.

3.1.1 Advantages of stratified sampling


• Increased precision: Stratified sampling often provides more precise estimates of population pa-
rameters compared to simple random sampling (SRS), especially when strata are homogeneous
within themselves but differ from each other.

• Improved representation: Ensures that each stratum is represented in the sample, providing a
more accurate representation of the entire population.

• Reduced variability: Reduces variability within each stratum, leading to more reliable estimates.

• Focused analysis: Allows for separate analysis of different strata, useful for understanding specific
subgroups.

• Cost-effective: Can be more cost-effective than SRS if data collection is more efficient within
strata.

3.1.2 Steps in stratified random sampling


1. Define the population: Clearly identify the population from which the sample is to be drawn.

2. Determine the strata: Divide the population into non-overlapping subgroups (strata) based on
specific characteristics relevant to the study.

3. Determine the sample size: Decide the total sample size and allocate samples to each stratum.
This can be done proportionally or using optimal allocation.

27
4. Select samples from each stratum: Use simple random sampling to select the required number of
samples from each stratum.

5. Combine the Samples: Combine the samples from all strata to form the final sample.

3.1.3 Objective of optimal allocation


The goal of optimal allocation is to determine the sample size for each stratum in a way that minimizes
the total variance of the estimator of the population mean. Neyman’s method provides a way to allocate
samples optimally based on the standard deviation within each stratum.

3.1.4 Neyman allocation formula


Neyman’s allocation method allocates samples to each stratum in proportion to both the stratum’s
size and the stratum’s standard deviation. The formula for calculating the optimal sample size ni for
stratum i is:
Ni S i
ni = ∑L ·n
j=1 Nj Sj

where:
• Ni = Population size of stratum i

• Si = Standard deviation of stratum i


∑L
• j=1 Nj Sj = Sum of products of population size and standard deviation across all strata

• n = Total sample size

Steps for optimal allocation


1. Determine Population Parameters: Obtain the size Ni and standard deviation Si for each stratum.

2. Calculate Products Ni Si : For each stratum, calculate the product of its population size and its
standard deviation.

3. Compute the Sum: Calculate the total sum of all products Lj=1 Nj Sj .

4. Calculate Sample Sizes: Use Neyman’s formula to compute the sample size for each stratum.

5. Apply Rounding: Since sample sizes must be integers, round the results to the nearest whole
number if necessary.

Example
Consider a population divided into three strata with the following characteristics: Total sample size
n = 100.
Step-by-Step Solution:
1. Compute Ni Si :

• Stratum 1: 500 × 10, 000 = 5, 000, 000

28
Stratum Population Size Ni Mean Income X̄i Standard Deviation Si
1 500 45,000 10,000
2 300 55,000 8,000
3 200 60,000 12,000

• Stratum 2: 300 × 8, 000 = 2, 400, 000


• Stratum 3: 200 × 12, 000 = 2, 400, 000
2. Calculate Sum:

3
Nj Sj = 5, 000, 000 + 2, 400, 000 + 2, 400, 000 = 9, 800, 000
j=1

3. Compute Sample Sizes:


• For Stratum 1:
500 × 10, 000
n1 = × 100 ≈ 51
9, 800, 000
• For Stratum 2:
300 × 8, 000
n2 = × 100 ≈ 25
9, 800, 000
• For Stratum 3:
200 × 12, 000
n3 = × 100 ≈ 25
9, 800, 000
Thus, the optimal sample sizes are approximately 51 for Stratum 1, 25 for Stratum 2, and 25 for
Stratum 3.

3.1.5 Benefits of Neyman allocation


• Efficiency: Reduces variance of the estimator by allocating more samples to strata with higher
variability.
• Accuracy: Provides a more accurate estimate of the population parameters by reflecting the
variability within each stratum.

Considerations
• Accuracy of Estimates: Ensure the estimates of standard deviations are as accurate as possible.
• Practical Constraints: The theoretical optimal sample sizes may need adjustment based on prac-
tical constraints such as budget or resources.

Conclusion
Optimal allocation using Neyman’s method enhances the precision of estimates in stratified sampling
by allocating more resources to strata with higher variability. It is a powerful tool in survey design and
analysis. Optimal allocation further enhances the efficiency of sampling by considering the variability,
size, and cost associated with each stratum

29
3.1.6 Estimation of population mean and variance
Estimation of population mean
The population mean µ is estimated by:

1 ∑
L
µ̂ = Nh X̄h
N h=1

where:

• L is the number of strata.

• Nh is the population size of the h-th stratum.

• X̄h is the mean of the h-th stratum.



• N is the total population size, N = Lh=1 Nh .

Estimation of population variance


The variance of the stratified sample mean is:

1 ∑ 2 σh2
L
Var(µ̂) = N
N 2 h=1 h nh

where:

• σh2 is the variance within the h-th stratum.

• nh is the sample size from the h-th stratum.

3.1.7 Numerical example


Consider a population divided into 3 strata with the following characteristics:

• Stratum 1: N1 = 100, X̄1 = 50, σ12 = 25, Sample size n1 = 10

• Stratum 2: N2 = 150, X̄2 = 55, σ22 = 20, Sample size n2 = 15

• Stratum 3: N3 = 250, X̄3 = 60, σ32 = 15, Sample size n3 = 20

Total population size N = N1 + N2 + N3 = 100 + 150 + 250 = 500.

Estimate of population mean


1
µ̂ = (100 × 50 + 150 × 55 + 250 × 60)
500
1 28250
µ̂ = (5000 + 8250 + 15000) = = 56.5
500 500

30
Estimate of population variance
( )
1 2 25 2 20 2 15
Var(µ̂) = 100 + 150 + 250
5002 10 15 20
( )
1 4
Var(µ̂) = 10000 × 2.5 + 22500 × + 62500 × 0.75
250000 3
1 101875
Var(µ̂) = (25000 + 30000 + 46875) = = 0.4075
250000 250000

3.2 Introduction to systematic sampling


Systematic sampling is a type of probability sampling method in which sample members from a larger
population are selected according to a random starting point but with a fixed, periodic interval. This
interval, called the sampling interval, is calculated by dividing the population size by the desired sample
size. Systematic sampling is simple to execute and is often used in various fields, including survey
research and quality control.

3.2.1 Steps in systematic sampling


1. Define the population (N):Determine the total number of units in the population.

2. Determine the sample size (n):Decide how many units you want to include in the sample.

3. Calculate the sampling interval (k): Compute the interval by dividing the population size by the
sample size, k = Nn .

4. Select a random starting point (r): Choose a random number between 1 and k to determine
where to start sampling.

5. Select every k-th unit: Starting from the random starting point, select every k-th unit in the
population until the desired sample size is reached.

3.2.2 Advantages of systematic sampling


• Simplicity and ease of implementation.

• Ensures even coverage of the population.

• Requires less time and resources compared to simple random sampling.

3.2.3 Disadvantages of systematic sampling


• Potential for periodicity bias if the population has a hidden pattern that matches the sampling
interval.

• Not suitable if the population size is not known.

31
Numerical Example
Consider a population of 100 students in a university department. We want to select a sample of 10
students using systematic sampling.

Step-by-step solution
• Population size (N): 100

• Sample size (n): 10

• Calculate the sampling interval (k):

N 100
k= = = 10
n 10

• Select a random starting point (r): Suppose we randomly select 3.

• Select every k-th unit: Starting from the 3rd student, we select every 10th student.

Selected students: 3, 13, 23, 33, 43, 53, 63, 73, 83, 93

Thus, the sample of 10 students selected using systematic sampling are the students at positions 3, 13,
23, 33, 43, 53, 63, 73, 83, and 93 in the population list.

Conclusion
Systematic sampling is a practical and efficient method for selecting a sample from a large population.
It provides a simple way to ensure that the sample is spread evenly across the population, although care
must be taken to avoid periodicity bias.

3.3 Cluster sampling


Cluster sampling involves dividing the population into separate groups, known as clusters, and then
randomly selecting a few of these clusters to form the sample. All elements within the chosen clusters
are then surveyed.

3.3.1 One-stage cluster sampling


In one-stage cluster sampling, the population is divided into clusters (usually based on geographical or
administrative boundaries). A random sample of clusters is selected, and then all elements within these
selected clusters are included in the sample.

Example
If a city is divided into 50 blocks (clusters), and we randomly select 10 blocks, then all households
within these 10 blocks are surveyed.

32
Advantages
• Cost-effective and convenient, especially when the population is spread over a large area.

• Reduces travel and administrative costs.

Disadvantages
• Increased sampling error if the clusters are heterogeneous within but homogeneous between.

3.3.2 Two-stage cluster sampling


In two-stage cluster sampling, the population is divided into clusters. First, a random sample of clusters
is selected. Then, within each selected cluster, a random sample of elements is chosen.

Example
If a city is divided into 50 blocks, we first randomly select 10 blocks. Then, within each selected block,
we randomly select 20 households to survey.

Advantages
• More precise than one-stage cluster sampling.

• Allows for a more manageable sample size.

Disadvantages
• More complex and time-consuming than one-stage sampling.

• Requires a complete list of elements within each selected cluster.

3.3.3 Multi-stage cluster sampling


Multi-stage cluster sampling involves dividing the population into clusters and then taking a series of
random samples from these clusters, progressing through multiple stages.

Example
• Stage 1: Randomly select districts.

• Stage 2: Within each selected district, randomly select schools.

• Stage 3: Within each selected school, randomly select students.

Advantages
• Flexible and can be adapted to different types of populations and research questions.

• Reduces costs and effort compared to sampling every element in the population.

33
Disadvantages
• More complex and can introduce more sampling error at each stage.

• Requires a good understanding of the population structure.

Numerical Example: Three-Stage Cluster Sampling


Suppose we want to estimate the average income of households in a large country. The country is
divided into districts, each district into blocks, and each block into households.

Stages
1. Stage 1: Randomly select 5 districts from the country.

2. Stage 2: From each selected district, randomly select 3 blocks.

3. Stage 3: From each selected block, randomly select 10 households.

Data
Assume the following average incomes (in thousands) and number of households (per block):
• District 1: Block 1 ($50, 40), Block 2 ($55, 35), Block 3 ($53, 30)

• District 2: Block 1 ($48, 45), Block 2 ($52, 50), Block 3 ($51, 60)

• District 3: Block 1 ($47, 42), Block 2 ($49, 48), Block 3 ($50, 45)

• District 4: Block 1 ($54, 35), Block 2 ($56, 40), Block 3 ($55, 37)

• District 5: Block 1 ($50, 50), Block 2 ($52, 45), Block 3 ($51, 47)

Solution
1. Calculate the average income for each selected block:

• District 1:
– Block 1: $50 × 40 = 2000
– Block 2: $55 × 35 = 1925
– Block 3: $53 × 30 = 1590
• District 2:
– Block 1: $48 × 45 = 2160
– Block 2: $52 × 50 = 2600
– Block 3: $51 × 60 = 3060
• District 3:
– Block 1: $47 × 42 = 1974
– Block 2: $49 × 48 = 2352
– Block 3: $50 × 45 = 2250

34
• District 4:
– Block 1: $54 × 35 = 1890
– Block 2: $56 × 40 = 2240
– Block 3: $55 × 37 = 2035
• District 5:
– Block 1: $50 × 50 = 2500
– Block 2: $52 × 45 = 2340
– Block 3: $51 × 47 = 2397

2. Sum the products for each district and divide by the total number of households:

• District 1: 2000 + 1925 + 1590 = 5515 and total households 40 + 35 + 30 = 105


• District 2: 2160 + 2600 + 3060 = 7820 and total households 45 + 50 + 60 = 155
• District 3: 1974 + 2352 + 2250 = 6576 and total households 42 + 48 + 45 = 135
• District 4: 1890 + 2240 + 2035 = 6165 and total households 35 + 40 + 37 = 112
• District 5: 2500 + 2340 + 2397 = 7237 and total households 50 + 45 + 47 = 142

3. Calculate the overall average income:

• Sum of average incomes: 5515 + 7820 + 6576 + 6165 + 7237 = 33313


• Sum of households: 105 + 155 + 135 + 112 + 142 = 649
• Overall average income: 33313
649
≈ 51.34 thousand

The estimated average income for the households in the country is $51.34 thousand.

3.4 Criteria for choosing a sampling design


Selecting an appropriate sampling design is crucial for ensuring the validity and reliability of research
findings. The choice of sampling design depends on several criteria:

3.4.1 Research Objectives


• Clarity of objectives: The research objectives must be clear and specific. The sampling design
should align with these objectives to effectively address the research questions.

• Nature of study: Different studies (e.g., descriptive, analytical, experimental) may require different
sampling designs.

3.4.2 Population Characteristics


• Size of the population: The total number of elements in the population can influence the choice
of sampling design.

• Diversity: The heterogeneity or homogeneity of the population affects the sampling design. More
diverse populations may require stratified or cluster sampling to ensure representation.

35
3.4.3 Sampling frame
• Availability of a sampling frame: A complete and accurate list of the population elements (sam-
pling frame) is essential for many sampling designs like simple random sampling and stratified
sampling.

• Quality of the sampling frame: The sampling frame should be up-to-date and free from duplica-
tions or omissions.

3.4.4 Resources
• Budget constraints: The available budget can limit the choice of sampling design. Some designs,
like simple random sampling, may be less expensive than others like stratified sampling.

• Time constraints: The time available for data collection can also influence the sampling design.
Cluster sampling can be quicker in some scenarios compared to other designs.

3.4.5 Desired precision and accuracy


• Precision: The level of precision required for the estimates determines the sample size and design.
Stratified sampling, for example, can provide more precise estimates than simple random sampling.

• Accuracy: Higher accuracy may necessitate larger sample sizes or more sophisticated designs to
minimize bias and error.

3.4.6 Method of data collection


• Feasibility: The sampling design must be feasible given the method of data collection (e.g.,
face-to-face interviews, online surveys).

• Data collection techniques: Some designs may be more suitable for certain data collection tech-
niques. For instance, telephone surveys might be easier to conduct using systematic sampling.

3.4.7 Statistical analysis


• Analysis requirements: The planned statistical analysis can influence the choice of sampling
design. Some analyses may require certain types of data that are best obtained through specific
sampling methods.

• Complexity of analysis: More complex designs like multistage sampling can make data analysis
more complicated and require advanced statistical techniques.

3.4.8 Example: Choosing a sampling design


A researcher wants to estimate the average household income in a city with 100,000 households.
The research objectives include understanding income distribution across different neighborhoods. The
researcher has a limited budget and needs results within a month.

36
Solution
1. Research Objectives: The objective is to estimate average household income and understand the
income distribution across neighborhoods.

2. Population Characteristics: The population (100,000 households) is large and diverse, with sig-
nificant income variation across neighborhoods.

3. Sampling Frame: A complete and accurate list of households is available, categorized by neigh-
borhoods.

4. Resources: The budget is limited, and the results are needed within a month.

5. Desired Precision and Accuracy: The researcher requires a high level of precision to understand
income differences across neighborhoods.

6. Method of Data Collection: The data collection method will be face-to-face interviews, which
can be resource-intensive.

7. Statistical Analysis: The analysis will involve comparing income levels across different neighbor-
hoods.

Chosen sampling design


Stratified Random Sampling is chosen because:

• It aligns with the objective of comparing income levels across neighborhoods (strata).

• It ensures high precision within each neighborhood.

• The complete list of households (sampling frame) is available and categorized by neighborhood.

• It balances the need for precision and the constraints of limited budget and time.

Sampling procedure
1. Stratify: Divide the city into neighborhoods.

2. Random Sampling: Randomly select households within each neighborhood.

3. Sample Size: Determine the sample size for each neighborhood based on the desired precision.

3.5 Chapter Examples


Optimal allocation in stratified sampling
You are conducting a survey to estimate the average income of a population divided into three strata.
The strata and their characteristics are given in the table below:
You need to determine the optimal allocation of sample sizes ni for each stratum using the Neyman
allocation method. Assume the total sample size is 100.

37
Stratum Population Size Ni Mean Income X̄i Standard Deviation Si
1 500 45,000 10,000
2 300 55,000 8,000
3 200 60,000 12,000

Solution
To find the optimal sample size for each stratum, use Neyman’s allocation formula:
Ni S i
ni = ∑L · n, Where:
j=1 Nj Sj

• Ni is the population size of stratum i,

• Si is the standard deviation of stratum i,

• n is the total sample size.

Calculations:

1. Compute the product Ni Si for each stratum:

• For Stratum 1: 500 × 10, 000 = 5, 000, 000


• For Stratum 2: 300 × 8, 000 = 2, 400, 000
• For Stratum 3: 200 × 12, 000 = 2, 400, 000
∑L
2. Calculate the sum j=1 Nj S j :


3
Nj Sj = 5, 000, 000 + 2, 400, 000 + 2, 400, 000 = 9, 800, 000
j=1

3. Determine the optimal sample size ni for each stratum:

• For Stratum 1:
500 × 10, 000
n1 = × 100 ≈ 50.97 ≈ 51
9, 800, 000
• For Stratum 2:
300 × 8, 000
n2 = × 100 ≈ 24.69 ≈ 25
9, 800, 000
• For Stratum 3:
200 × 12, 000
n3 = × 100 ≈ 24.69 ≈ 25
9, 800, 000

Thus, the optimal sample sizes are approximately 51 for Stratum 1, 25 for Stratum 2, and 25 for
Stratum 3.

38
3.6 Chapter Exercises
1. A researcher wants to estimate the average height of students in a university. The university has
three faculties: Science, Arts, and Engineering, with 300, 200, and 100 students, respectively.
If the researcher decides to use stratified random sampling and wants a total sample size of
60 students, how many students should be sampled from each faculty to ensure proportional
representation?

2. A survey is conducted to determine the average monthly expenditure on groceries by households


in a city. The city is divided into four income groups: low, lower-middle, upper-middle, and high.
The variances of monthly expenditures for these groups are 250, 400, 300, and 200, respectively.
If a total sample size of 100 is allocated optimally to minimize the variance of the estimate, how
should the sample be distributed among the four income groups?

3. A government agency is interested in estimating the average number of hours of internet usage
per week by households in a region. The region is divided into 10 clusters, each containing
50 households. The agency randomly selects 3 clusters and surveys all households within those
clusters. The average hours of internet usage for the selected clusters are 20, 25, 30 hours,
respectively. Estimate average number of hours of internet usage per week for the entire region.

4. A company wants to inspect the quality of products coming off a production line. The production
line produces 1,000 items per day. The company decides to use systematic sampling to select 50
items for inspection each day. If the first item is selected randomly from the first 20 items, and
every 20th item thereafter is selected, list the indices of the 50 items that will be inspected.

5. A researcher wants to estimate the average daily calorie intake of children in a large city. The
city is divided into 20 districts, each district into 10 schools, and each school into 5 classes. The
researcher uses a three-stage cluster sampling method. First, 4 districts are randomly selected.
Then, 3 schools are randomly selected from each chosen district. Finally, 2 classes are randomly
selected from each chosen school. If the average daily calorie intake for the selected classes is
1800, 1900, 1750, 1850, 1700, 1950, 2000, 2100, and 2200 calories, estimate the average daily
calorie intake for all children in the city.

6. In a town of 10,000 households, a researcher uses a telephone directory containing 8,000 house-
holds as a sampling frame. If the researcher selects 500 households using systematic sampling,
what is the sampling interval?

7. A researcher is conducting a survey to estimate the average expenditure on health care across a
population divided into four strata. The total sample size available for the survey is 120. Using
Neyman’s optimal allocation method, determine the sample size for each stratum. The strata
and their characteristics are given in the table below:

Stratum Population Size Ni Standard Deviation Si


1 800 1,200
2 600 1,000
3 400 1,500
4 200 900

39
Chapter 4

Project

4.1 Project Instructions


To write a research essay, start by selecting a clear, focused topic. Conduct thorough research using
credible sources, and take detailed notes. Create an outline to organize your main points logically.
Write a strong introduction with a research statement, followed by body paragraphs that support your
research. Conclude by summarizing your main points. Finally, revise and proofread to ensure clarity,
coherence, and correctness.

1. Select a topical topic for your research essay

2. Each group should have a unique topic of study (Duplicate and copied work is prohibited)

3. The project is worth 30 marks with a maximum of 8 pages.

4. Single spacing, Times New Romans font 12

5. Each group should consist of a minimum of 5 and a maximum of 7 students.

6. The project should include the following components:

(a) Title page (1 page) (2 Marks)


i. Title of the work (Short and informative)
ii. Course title and code
iii. Names of the students and their registration number
iv. An abstract (less than 200 words)
(b) Introduction (1 page maximum) (5 Marks)
i. Introduction of the study (less than half a page)
ii. State one or two hypothesis
iii. Literature review of the study (half a page). Be clinical in writing your literature review
(c) Methodology (2 pages maximum) (8 Marks)
i. Define the sampling frame
ii. State your target and study population
iii. Explain the sampling method(s) used

40
iv. Show how you estimated the sample size
v. What is your dependent and independent variables?
vi. Mention the study variables that will assist you in questionnaire development and se-
lection of the data analysis methods.
(d) Questionnaire development (2 pages maximum) (6 Marks)
i. A simple questionnaire and easy to understand
ii. Be creative in the questionnaire development
(e) Data analysis (1 page maximum) (5 Marks)
i. Descriptive statistics methods to analyze the data from the questionnaire
ii. Inferential statistics methods to analyze the data from the questionnaire
iii. Since NO DATA IS COLLECTED, the team will not perform any data analysis
(f) References and Appendix (1 Page maximum)
i. References from the citations
ii. Any other information necessary in your work
(g) Please be creative in your writing and work (4 Marks)
(h) Other pertinent information
i. Submission deadline is 21st November 2024
ii. Submit a hard copy typed in word or latex
iii. Adhere closely to the instructions given but diversity is allowed

4.2 Sample Questionnaire


This is a guideline for developing a questionnaire: Understanding generation Z in Kenya

Introduction
This guideline is designed to help undergraduate students develop a questionnaire to collect data on
Generation Z in Kenya. The aim is to understand their views on education, political, and social life. A
well-structured questionnaire will enable researchers to gather relevant and insightful data.

Steps to Develop a Questionnaire


1. Define the objective: Clearly define the objective of the questionnaire. In this case, the objective
is to understand Generation Z’s perspectives on education, political, and social life in Kenya.
2. Identify the Key Topics: Identify the key topics you want to cover in the questionnaire. For this
study, the key topics are: education, political views and social life.
3. Identify the hypothesis to test: Identify the hypothesis that you wish to test with the data that
will be collected
4. Develop questions: Create questions for each key topic. Ensure the questions are clear, concise,
and unbiased. Use a mix of question types (e.g., multiple-choice, Likert scale, open-ended, etc)
to gather comprehensive data.

41
4.2.1 Sample Questionnaire
Section 1: Demographics
1. Age: _____

2. Gender:

• Male
• Female
• Other

3. Location: _____

4. Educational Level:

• Primary
• Secondary
• Tertiary

Section 2: Education
1. How satisfied are you with the current education system in Kenya?

• Very Satisfied
• Satisfied
• Neutral
• Dissatisfied
• Very Dissatisfied

2. Do you believe that the education system adequately prepares you for the job market?

• Yes
• No
• Not Sure

Section 3: Political Views


1. How interested are you in politics?

• Very Interested
• Interested
• Neutral
• Not Interested
• Not Interested at All

2. How much do you trust the current political system in Kenya?

42
• Completely Trust
• Trust
• Neutral
• Distrust
• Completely Distrust

Section 4: Social Life


1. How do you rate your overall quality of social life?

• Very High
• High
• Neutral
• Low
• Very Low

2. How important is social media in your daily life?

• Very Important
• Important
• Neutral
• Unimportant
• Very Unimportant

3. What are the biggest social issues facing Generation Z in Kenya?

• _____

Conclusion
This guideline provides a structured approach to developing a questionnaire aimed at understanding
Generation Z in Kenya. By following these steps and using the sample questionnaire, researchers can
gather valuable data on the perspectives of this demographic group.

43
Chapter 5

Diagrams

5.1 Systematic sampling diagram

S1 S5
Systematic
S9
Sampling
S13 S17
Population
Sampled Points

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Population Units

Figure 5.1: Illustration of Systematic Sampling

44
5.2 Diagram of Cluster Sampling
Population

Cluster 2

Cluster 1

5.3 Stratified Sampling Diagram


Total Population

Sampling
Sample Size
Stratum 1

Sampling
Sample Size
Stratum 2

Sampling
Sample Size
Stratum 3

45
5.4 Diagram of Multi-stage sampling
Population

Primary Sampling Units (PSUs)

Sample of PSUs

Secondary Sampling Units (SSUs)

Sample of SSUs

Final Sampling Units

46

You might also like