0% found this document useful (0 votes)
12 views50 pages

Sampling

Uploaded by

josiah Dennie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views50 pages

Sampling

Uploaded by

josiah Dennie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Sampling

Summarising & Presenting Data


Measures of Central Tendency & Spread
Outline
• Sampling – 1 hr

• Summarizing and Presenting Data – 1hr

• Measures of Central Tendency and Spread – 1hr


Sampling
Learning Objectives
• Distinguish between a population and a sample

• Identify a sampling frame

• Outline inclusion/exclusion criteria

• Distinguish between a probability and non-probability of sampling methods

• Describe the advantages and disadvantages of probability samples

• Recognise bias in sample


Definition of Sampling

Procedure by which some members

of a given population are selected as

representatives of the entire population


Why do we use Samples ?

Get information from large populations


 At minimal cost
 At maximum speed
 At increased accuracy
 Using enhanced tools
Key Sampling Terms

Sample
 Subset of the population
 Population: groups of individuals with same
characteristics eg. age, disease

Study Population
 That group of individuals/study units about
which a particular investigation may provide
information
 Example: Children <5 years, hospital
discharges, health events…
Definition of Sampling Terms

Target Population
 The whole group of study units to which we
are interested in applying our conclusions

Representative Sample
 Has all the important characteristics of the
population from which it is drawn
Definition of Sampling Terms

Sampling frame
 Any list of all the sampling units in the
population
 List of households, health care units…
Sampling scheme/method
 Method of selecting sampling units from
sampling frame
 Randomly, convenience sample…
Sampling and Representativeness
Target Population  Sampling Population  Sample

Sampling Sample
Population

Target Population
Sampling and Representativeness
Target Population  Sampling Population  Sample
Study on prevalence of chlamydial infection in women
in St Georges

Females in Females
POS

Females in St Georges
KEY QUESTIONS
 What is a study population?
 must be clearly defined

 How many study units do we need in our


sample?
 need to calculate sample size

 How will we select these study units?


 must know what sampling method to use
STUDY POPULATION
 Definition important

 Set Inclusion - Exclusion criteria

 Inclusion – specify characteristics that define


subjects relevant to research question

 Exclusion – subjects suitable but for characteristics


that will interfere with data quality

 Implications must be considered


INCLUSION – EXCLUSION
Inclusion Exclusion
 Demographic  Inability to provide good data
characteristics
 High possibility of loss during
 Clinical characteristics follow-up

 High risk of side effects


 Disease status
 Unethical reasons
 Geographic characteristics
(administrative)
Examples
SAMPLING METHODS
Frame unavailable Frame available

Non-probability Probability sampling


sampling methods methods

 Convenience sampling  Simple random sampling


 Purposive sampling  Systematic random
 Quota sampling sampling
 Snowball sampling  Stratified sampling
 Cluster and Multi-stage
sampling
NONPROBABILITY SAMPLING
METHODS
Sampling frame not available

 Probability of being chosen : unknown


 cannot sample study units so that the probability for different units to be
selected is known

 These methods inappropriate for generalization


 Cannot extrapolate to larger population
CONVENIENCE
SAMPLING

 For convenience sake

 Available units are


selected at the time
of data collection

 Unrepresentative;
biased
PURPOSIVE
SAMPLING

 Subjects specifically
selected on
researcher’s
subjective judgment
that they are the
most representative
 Used in qualitative
research
 Can be quite biased
QUOTA
SAMPLING
 Ensures a certain number
of sampling units with
specific characteristics
appear in sample
 Sample reflects population
structure

 The sample does not claim


to be representative of the
entire population

 Time/resources
constraints
SNOWBALL SAMPLING
 Subjects successively recruited by referrals from
other subjects

 Very useful for difficult to recruit groups

 Commonly used in HIV research of populations that


are difficult to locate (e.g., homeless, IV drug users)
Advantages & Disadvantages of Non-Probability Sampling
 Advantages of Non-Probability Sampling
1. Cost-effective and time-effective compared to probability sampling
2. Practical and effective when it is unfeasible or impractical to conduct probability
sampling
3. Responses are faster

 Disadvantages of Non-Probability Sampling


1. Lack of representation of the entire population
Unknown proportion of the entire population is not included in the sample group
2. Lower level of generalization of research findings
3. Difficulties in estimating/calculating sampling variability and identifying possible
bias
PROBABILITY SAMPLING METHODS
 Involves random selection procedures

 Each unit of the sample is chosen on the basis of chance

 Reduces possibility of selection bias

 All units should have an equal or at least a known chance of being


included in the sample

 Allows application of statistical theory to results


SIMPLE RANDOM SAMPLING
To select a simple random sample
you need to:

 Make a numbered list of all


the units in the study
population

 Determine the sample size

 Use a "lottery" method or a table of random


numbers to select the sample
SIMPLE RANDOM SAMPLING
Example:
Evaluate the prevalence of tooth decay among
the 1200 children attending a school

 List of children attending the school


 Children numerated from 1 to 1200
 Sample size = 100 children
 Random sampling of 100 numbers between 1
and 1200

How to randomly select?


SIMPLE RANDOM SAMPLING
SIMPLE RANDOM SAMPLING
 Advantages
 Simple
 Sampling error easily measured

 Disadvantages
 Need complete list of units
 Does not always achieve best representativeness
 Units may be scattered and poorly accessible
SYSTEMATIC SAMPLING
 Individuals or study units are chosen at regular intervals (e.g. every
8th ) from the sampling frame

 List all units (persons) in a population; Assign a number to each unit

 Calculate sampling fraction (population size ÷ sample size)

 Select first unit at random; based on sampling fraction


 A point between 1 and the sampling fraction

 Subsequent units are chosen at equal intervals


SYSTEMATIC SAMPLING

Example:
N = 1200, and n = 60
 sampling fraction = 1200/60 = 20

 List persons from 1 to 1200


 Randomly select a number between 1 and
20 (example: 8)
 1st person selected = the 8th on the list
 2nd person = 8 + 20 = the 28th etc .....
SYSTEMATIC SAMPLING
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

46 47 48 49 50 51 52 53 54 55 ……..
SYSTEMATIC SAMPLING
Advantages
 Less time consuming and easier than simple random
sampling

Disadvantages
 Need complete list of units
 Risk of bias
STRATIFIED SAMPLING
 Sampling frame divided into sub-groups
/strata each with specific characteristics
 Assign a number to each unit in each
stratum; select random or systemic samples
of predetermined sizes from each
stratum or group

 Combine strata samples for full sample; includes units


representative of all groups/strata with specific characteristics
 Units in each group have the same probability of selection, but
probability differs between groups
STRATIFIED SAMPLING

 Advantages
 Advantage: we can take a relatively large sample from a small
group in our population

 All subgroups represented, allowing separate conclusions about


each of them
 Disadvantages
 Sampling error difficult to measure
 Loss of precision if very small numbers sampled in individual
strata
STRATIFIED SAMPLING
Example:
Calculate the prevalence of tooth decay among 1200
children attending a school, with equal representation of
males and females (sample size =100)

 List all children attending the school•


 Divide the children into two groups: 540 M and 660 F
 Assign each child a number
Males: 1 to 540 Females: 1 to 660
 Randomly select 50 males and 50 females
Source: CDC
CLUSTER SAMPLING
 Difficult or impossible to select a simple random sample

 Population to be studied divided into natural, geographically


distinct groups or “clusters”
 (such as schools, health centres, villages, or camps).

 A list of groupings of study units is available


 Can also divide a large population into a number of small units or clusters
CLUSTER SAMPLING
 Each cluster is assigned a number and then a sample
of clusters is then randomly selected.

 Once the clusters are selected:


 all units within the selected clusters are included in
the sample (called one-stage cluster sampling),
OR

 take a random sample of units within the selected


clusters (called two-stage cluster sampling).
MULTISTAGE SAMPLING

 Carried out in phases

 Usually involves >1 sampling procedure

 Used in very large & diverse populations


Example :
sampling unit = household

 1st stage : drawing areas or blocks


 2nd stage : drawing buildings, houses
 3rd stage : drawing households
CLUSTER and MULTISTAGE SAMPLING
Advantages
 A sampling frame of individual units is not required for the whole
population;
 Less travel/resources required

 Sample is easier to select than a simple random sample


 Efficient for face-to-face interviews when units are dispersed
over a large area
Disadvantage
 Loss of precision if within clusters homogeneous; sample
variation greater than population variation (large design effect)
 Homogeneity can be taken into account in sample size
calculations and analysis (“design effect”)
BIAS IN SAMPLING

 Bias in sampling is a systematic error in


sampling procedures that leads to a distortion
in the results of the study

 Study fails to collect a representative sample

 There are 3 broad categories of bias:


 Selection bias
 Measurement bias
 Confounding bias
SELECTION BIAS
 Non-representative sample

 Selection of subjects leads to results that differ from


what you would have gotten if you had enrolled the entire
target population

 Subjects differ in aspects which may affect outcome


 eg volunteers exercise to prevent CHD
Example
 A research center is investigating a new weight loss
programme. Advertisement go out via social media
seeking volunteers.

 Sampling bias:
 limited to people who use the social media site. Individuals who
enroll could be different from overall population.
 volunteers likely interested as they may be actively trying to
lose weight.
 not a representative sample; may have characteristics very
different from the population of interest.
MEASUREMENT BIAS
 Information collected for use as a study variable is inaccurate
 Systematic error - favors a particular result/outcome

 Inconsistent measurement techniques


 measurement process systematically over- or under estimates true value
(outcome variable)
 standardize machines, train interviewers

 Persons exposed to disease more likely to remember exposure


 eg trauma
 E.g Survey interviewers asking about deaths were poorly trained and included
deaths which occurred before the time period of interest.
CONFOUNDING BIAS
 Extra variable we did not
account for SMOKING
(Confounder)

 Association between
exposure and outcome
distorted by another variable

 Accounts for some/all of Coffee CHD


observed relationship (Exposure) (Outcome)
between exposure and
outcome
BIAS DUE TO NONRESPONSE
 Systematic favoring of certain outcomes when individuals who
chose to participate differ from those who whose not to
participate
 Pretest data collection tools
 Follow-up non-responders
 Carry out separate study of non-responders
 Include additional people in the sample
 The bigger the non-response rate, the more important it is to take
remedial action
 One must report the non-response rate and honestly discuss
whether and how it might have influenced the results
SAMPLING ERROR

 No sample is the exact mirror image of the population


 Magnitude of error can be measured in probability
samples
 Expressed by standard error
 of mean, proportion, differences, etc
 Function of
 amount of variability in measuring factor of interest
 sample size
Sources:
Designing Clinical Research – Stephen Hulley et al
-EPIET Introductory course,
-IDEA

You might also like