0% found this document useful (0 votes)
49 views53 pages

Lecture 3 - Sampling Design - 2018

This document discusses sampling methods and sampling error. It outlines different sampling methods like simple random sampling, stratified random sampling, and cluster sampling. It explains that drawing a random sample allows each unit to have a known probability of being selected, making the sample representative of the population. The document also discusses how the mean of random samples follows a normal distribution around the true population mean, and how the standard error decreases as the sample size increases. It emphasizes that proper sampling and an adequately sized sample are needed to make reliable inferences about the population.

Uploaded by

Yamin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views53 pages

Lecture 3 - Sampling Design - 2018

This document discusses sampling methods and sampling error. It outlines different sampling methods like simple random sampling, stratified random sampling, and cluster sampling. It explains that drawing a random sample allows each unit to have a known probability of being selected, making the sample representative of the population. The document also discusses how the mean of random samples follows a normal distribution around the true population mean, and how the standard error decreases as the sample size increases. It emphasizes that proper sampling and an adequately sized sample are needed to make reliable inferences about the population.

Uploaded by

Yamin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

SAMPLING DESIGN

Dr. Muhammad Farooq Naseer

Lahore University of Management Sciences


Center for Economic Research in Pakistan
• Yesterday you learnt
about the Smart Policy
Design framework

• Today we will be
focusing on gathering
data

• Access to good data


lies at the heart of
each stage in the SPD
process
OUTLINE

I. Sampling Methods
II. Sampling Error
III. Power Calculations for
Sample Size
IV. Example
PART I:
SAMPLING
WHY SAMPLE A PORTION?
 Measuring the whole
(population) can be too costly!
 Data can be obtained more
quickly… with lower
measurement error

 Proper sampling provides


reliable information about the
population (… Part II)
HOW (NOT) TO DRAW A SAMPLE?

 Convenience Sampling
 choose the easiest respondents available e.g. the first ten people you meet on the
street

 Judgment, or Purposive, Sampling


 choose objects that are believed to give accurate results e.g. select the ‘average’
student in 3 different colleges

 Quota Sampling
 select a certain number (i.e. quota) of each type
 can produce substantial bias (e.g. 1992 UK election polls)
 Still widely used especially for telephone surveys with high non-response levels

6
HOW TO DRAW A SAMPLE?
 Random Sampling
 use a mechanical chance process
 that gives each unit a known, positive, probability of being sampled

 for instance, given a list of ALL individuals in the population you could:
 toss a coin
 draw lots out of a basket
 use a random number table
 use a computer software

 A random sample is ‘representative’ of the population


COMMON MISCONCEPTIONS
 A large sample size (n) compensates for
poor sampling design
 Literary Digest 1936 poll of 2.4 million
voters... still got it very wrong!

 The required sample size depends on the


size of the population (N)
 No!
 Imagine a population of identical clones…
 Just poll 1200 voters in any election
EXAMPLE: MEASURING JOB
SATISFACTION

Doctors at the city hospitals


were listed and contacted by
telephone starting with the
biggest hospital. The first
fifty (50) doctors who
agreed to be interviewed
formed the sample.
EXAMPLE: COUNTING WORN-OUT
BOOKS

A librarian uses a random


number table to select 100
locations in the library
shelves. He physically
examines all the books at
selected locations and
records the number of
books that require binding.
EXAMPLE: HAZARDOUS CHILD
LABOR

A researcher states that 2%


of all children in a city
suffered a work-related
injury…
by surveying children who
received medical attention
at different hospitals/clinics
in the city
TYPES OF RANDOM
SAMPLING DESIGNS
 Simple Random Sampling
 List every individual in the population of interest
 Randomly pick n individuals from the population such that each individual and
group is equally likely to be in the sample

 Stratified Random Sampling


 Mark separate sub-groups, or strata, in the population list
 Randomly pick (possibly a different fraction) from each group

 Cluster Sampling
 Randomly pick clusters and then (randomly) sample multiple individuals from
each cluster
RECAP
 A good sample must be representative of the population
 Therefore, draw a random sample

 Different kinds of random sampling designs exist


 Choosing the appropriate design and sample size for your problem is
important…
 requires skill and careful thought

Next up:
A random sample produces reliable information
about the population
PART II:
UNDERSTANDING
SAMPLING ERROR
…AND HOW TO PLAN FOR IT
SAMPLING: A DEEPER CONSIDERATION

• Margin of
sampling error

• Standard error
of estimates

Very often interested in the sample average… let’s understand its


behavior across different random samples
AGE OF COURSE APPLICANTS…
POPULATION (N=124)
AVERAGE AGE IN A RANDOM SAMPLE
(n=10)
AVERAGE AGE IN A LARGER SAMPLE
(n=20)
AVERAGE AGE ACROSS 300 RANDOM
SAMPLES (n=10)
AVERAGE AGE ACROSS 300 RANDOM
SAMPLES (n=20)
LOOKING AT SMOOTHED
DISTRIBUTIONS…
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES ACROSS
DIFFERENT SAMPLES
 Follow a Normal distribution
 Center: = true (pop mean) value
[unknown]
 In random samples, sample
mean tends to lie around the
true value
 Spread: “sampling error” across
repeated samples [Std Error;
known]
 Reduced as n 
 Symmetry: Half of the random
samples give an estimate that’s
larger (smaller) than the actual
 Range of possible values: =infinite.
But 95% of sample values lie
within 2 SEs of the true value.
SAMPLE MEAN VALUES…
Distribution of Age in the Population Average Age across Random Samples
RECAP…
 Often interested in population mean of some outcome variable
 Alas, sample mean contains sampling error
 Any given random sample is not an exact copy of the population
 Therefore, sample mean ≠ population mean

 Still, random sampling yields reliable information


 Sample mean value tends to lie around the true population mean [always following a
well-defined Normal distribution]
 Moreover, we can measure the magnitude of sampling error
 … and control it by varying the sample size

Next up:
How to determine the correct
sample size for our study
PART III:
DETERMINING
SAMPLE SIZE
IMPORTANT POINTS TO
CONSIDER
 In reality
 The population mean is unknown
 You can only ever draw one sample from the population
 Can calculate the sample mean and its standard error (width) but not the population
mean (location)
 Often interested in difference in means.. e.g. to measure treatment effects

 If the sample is small, sampling error is too large and the sample mean is not
so informative
 Can’t reject the “Zero effect” assumption even if sample mean is large and positive

 Can the sample be too large??

 Want the sample size to be “just right” for the job at hand
 Need to do power calculations
WHEN THE TRUE MEAN IS
UNKNOWN…
STATISTICAL POWER (TO
DISTINGUISH BETWEEN
POSSIBLE TRUE STATES)
STATISTICAL POWER (TO
DISTINGUISH BETWEEN
POSSIBLE TRUE STATES)

 Minimize overlap!
POWER FORMULA FOR CLUSTERED RCT
Significance
Effect Size Variance
Power Level

2
EffectSize 1 
 t1   t * *
1   (m  1) P1  P  n
Proportion in
Average Treatment Sample
ICC Cluster Size Size
POWER: MAIN INGREDIENTS

For a given significance level, power depends on the following:

1. Assumed Effect Size


2. Sample Size
3. Variance of outcome in the study population
4. Clustering
5. Proportion of sample in T vs C
STANDARDIZED EFFECT
SIZE
An effect Is considered… …and it means that…
size of…
0.2 Modest The average member of the
treatment group had a better
outcome than the 58th 0.4

percentile of the control group 0.2

0.5 Large The average member of the 0

treatment group had a better -4 -3 -2 -1 0 1 2 3 4 5 6

outcome than the 69th


0.5
percentile of the control group 0.4
0.3
0.8 Whoa…that’s a The average member of the 0.2
big effect size! treatment group had a better 0.1

outcome than the 79th 0


-4 -3 -2 -1 0 1 2 3 4 5 6
percentile of the control group
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4 5 6
POWER: MAIN
INGREDIENTS
For a given significance level, power depends on the following:

1. Assumed Effect Size


2. Sample Size
3. Variance of outcome in the study population
4. Clustering
5. Proportion of sample in T vs C
SAMPLE SIZE AND EFFECT SIZE
AGAINST POWER
POWER: MAIN INGREDIENTS

For a given significance level, power depends on the following:

1. Assumed Effect Size


2. Sample Size
3. Variance of outcome in the study population
4. Clustering
5. Proportion of sample in T vs C
VARIANCE
 The “sampling error” in the sample average is directly proportional to the
variance (“natural noise”) in the outcome variable in the population
 There is sometimes very little we can do to reduce the noise
 The underlying variance is what it is
 We can try to “absorb” variance:
 by, say, controlling for other variables
POWER: MAIN INGREDIENTS

For a given significance level, power depends on the following:

1. Assumed Effect Size


2. Sample Size
3. Variance of outcome in the study population
4. Clustering
5. Proportion of sample in T vs C
CLUSTERED SAMPLING:
INTUITION
Suppose you want to know how close the next elections will be:
 Method 1: Randomly select 50 people from the entire population
 Method 2: Randomly select 10 families (“clusters”), and ask 5 members of
each family their opinion
 Method 2 will yield relatively imprecise/noisy estimates if the political
opinion within families does not vary a lot (high intra-cluster correlation)
 In the presence of intra-cluster correlation, Cluster Sampling gives reduced
power than SRS.
 Need a larger sample for the same power in case of cluster sampling
HIGH INTRA-CLUSTER
CORRELATION
LOW INTRA-CLUSTER
CORRELATION
POWER: MAIN INGREDIENTS

For a given significance level, power depends on the following:

1. Assumed Effect Size


2. Sample Size
3. Variance of outcome in the study population
4. Clustering
5. Proportion of sample in T vs C
SAMPLE SPLIT: 50% T, 50% C
POWER: 91%
0.5

0.45

0.4

0.35

0.3
control

0.25
treatment

0.2 power

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6
SAMPLE SPLIT: 75% T, 25% C
POWER: 83%
THE POWER CURVE

B = Assumed effect size in SD units


PART IV:
EXERCISE ON
POWER
CALCULATION
SAMPLE SIZE CALCULATION IN STATA

 Let’s think about an actual problem requiring sample size


computation…
 How much sample is needed to test the effectiveness of a school-based
education intervention?
 Outcome: student learning (math test scores)
 Treatment 1: school meals
 Treatment 2: teacher training

 Offer the treatment to a group of schools and compare the outcomes


with another set of schools
 Method: avg_score_t - avg_score_c
SAMPLE SIZE CALCULATION IN STATA

 sample size computation…


 Mean Math score: 46.6%
 SD: 15.2
 0.1 SD shift in mean: 48.1%
 0.2 SD shift in mean: 49.6%
 0.5 SD shift in mean: 54.2%
 ICC: 0.41
sampsi 46.6 48.1, sd(15.2) power(.80)
sampclus, obsclus(20) rho(0.41)
sampclus, obsclus(10) rho(0.41)
KEY TAKEAWAYS
 Random sampling is the best method of selecting a study
sample from your population
 Produces reliable results

 Given sampling error, sample size should give enough power (>=
80%) to distinguish “No effect” from alternative scenarios of
interest
 Important to choose the appropriate sampling design and sample
size for your problem

NEXT LECTURE:
Non-Experimental Evaluation Methods

You might also like