Preparatory Lecture: Confounding and Bias
Preparatory Lecture: Confounding and Bias
Introduction
Most epidemiological studies measure
disease frequency in two (or more) groups
that differ only on the exposure of interest.
The two measures of disease frequency
are combined into a single measure of
association risk or rate ratio, odds ratio,
risk or rate difference
2 Random Error
Introduction
The next step is to evaluate whether the
result that has been observed in the
data is true, or whether the observed
result is false and there is an alternate
explanation. This is the process of
assessing validity of a study result.
3 Random Error
Survey 2
Survey 3
Real
population
value
In sum
Sampling error
Difference between survey result and population value due
to random selection of sample
Greater with smaller sample sizes
Induces lack of precision
Bias
Difference between survey result and population value due
to error in measurement, selection of non-representative
sample or other factors
Due to factors other than sample size
Therefore, a large sample size cannot guarantee absence
of bias
Induces lack of accuracy, even with good precision
Definitions
ERROR:
1. A false or mistaken result obtained in a
study or experiment
2. Random error is the portion of variation
in measurement that has no apparent
connection to any other measurement or
variable, generally regarded as due to
chance
3. Systematic error which often has a
recognizable source, e.g., a faulty
measuring instrument, or pattern, e.g., it is
consistently wrong in a particular direction
(Last)
Bias
Deviation of results or inferences from the
truth, or processes leading to such
deviation. Any trend in the collection,
analysis, interpretation, publication, or
review of data that can lead to conclusions
that are systematically different from the
truth.
(Last)
A process at any stage of inference tending
to produce results that depart
systematically from true values
(Fletcher)
Bias
Bias is a systematic error in
an
epidemiologic study that results in an
incorrect estimation of the
association between exposure and
outcome
Systematic errors
(= bias)
Results in low validity of
the epidemiological
measure measure is
not true
Selection bias 1
Information bias 2
Confounding 3
Random errors
Errors in epidemiological
studies
Error
Study size
Estimation
When we measure OR, we estimate a
point estimate
Will never know the true value
Classification of bias
There are three broad categories of
bias:
selection bias
confounding
measurement bias
Systematic error
Does not decrease with increasing
sample size
Selection bias
Information bias
Confounding
Bias
Systematic deviations in study findings
from the truth
Results from errors in the collection,
analysis, interpretation, publication, or
review of data
Selection
Bias
Error due to systematic
difference between the
characteristics of the
people selected for a
study and those who are
.not
Selection bias
Errors due to systematic differences in
characteristics between those who are
selected for study and those who are not .
(Last; Beaglehole)
Ascertainment Bias
Systematic failure to represent
equally all classes of cases or
persons supposed to be represented
in a sample. This bias may arise
because of the nature of the sources
from which the persons come, e.g., a
specialized clinic;
Case ascertainment
Who is your case?
Patient?
Deceased person?
non-case
Information Bias
(Observation Bias,
Measurement Bias)
Measurement bias
Systematic error arising from inaccurate
measurements (or classification) of subjects
or study variables.
(Last)
Measurement / (Mis)
classification
Exposure misclassification occurs
when exposed subjects are incorrectly
classified as unexposed, or vice versa
Disease misclassification occurs when
diseased subjects are incorrectly
classified as non-diseased, or vice
versa
)Norell(
Causes of misclassification
1. Measurement gap: gap between the
measured and the true value of a
variable
- Observer / interviewer bias
- Recall bias
- Reporting bias
2. Gap b/w the theoretical and
empirical definition of exposure /
disease
Empirical definition
Exposure: passive
smoking time spent
with smokers (having
smokers as roommates)
Disease: Myocardial
infarction certain
diagnostic criteria
(chest pain, enzyme
levels, signs on ECG)
Exposure misclassification
Non-differential
Misclassification does not differ
between cases and non-cases
Generally leads to dilution of effect,
i.e. bias towards RR=1 (no
association)
ExampleNon-differential
Exposure Misclassification
DISEASE
Breast Cancer
EXPOSURE
X-ray exposure
+nt
-nt
Total
+nt 40
80
120
-nt
10000 40000 50000
RR= 40/10000
80/40000
2=
EXPOSURE
X-ray exposure
+nt
-nt
Total
+nt 60
60
120
-nt
20000 30000 50000
RR= 60/20000
60/30000
1.5 =
An example of non-differential
misclassification in an exposure variable
We want to compare mean of blood
pressure levels between cases and
controls.
The blood pressure checker has a
problem and always gives 5mmHghigher than true values.
All subjects were examined by the
same blood pressure checker.
no problem for internal
comparison
ExampleDifferential
Exposure Misclassification
DISEASE
Breast Cancer
EXPOSURE
X-ray exposure
+nt
-nt
Total
+nt 40
80
120
-nt 9960 39920 49880
10000 40000 50000
RR= 40/10000
80/40000
2=
EXPOSURE
X-ray exposure
+nt
-nt
Total
+nt 40
80
120
-nt 19940 29940 49880
19980 30020 50000
RR= 40/19980
80/30020
0.75 =
Causes of Differential
Exposure Misclassification
Recall Bias:Systematic error due to
differences in accuracy or
completeness of recall to memory
of past events or experience.
For e.g. patients suffering from MI
are more likely to recall and report
lack of exercise in the past than
controls
Causes of Differential
Exposure Misclassification
Measurement bias:
e.g. analysis of Hb by different
methods (cyanmethemoglobin and
Sahli's) in cases and controls.
e.g.biochemical analysis of the two
groups from two different
laboratories, which give consistently
different results
Causes of Differential
Exposure Misclassification
Interviewer / observer bias:
systematic error due to observer
variation (failure of the observer to
measure or identify a phenomenon
correctly)
e.g. in patients of thrombo-embolism,
look for h/o OCP use more
aggressively
Confounding
1. A relationship b/w the effects of
two or more causal factors as
observed in a set of data such that
it is not logically possible to
separate the contribution that any
single causal factor has made to an
effect
(Last)
Confounding
When another exposure exists in the
study population (besides the one
being studied) and is associated both
with disease and the exposure being
studied. If this extraneous factor
itself a determinant of or risk factor
for health outcome is unequally
distributed b/w the exposure
subgroups, it can lead to confounding
)Beaglehole(
Confounding
Confounders are risk
factors for the outcome.
Confounders are related to
exposure of your interest.
Confounders are NOT in
the process of causal
relationship between the
exposure and the outcome
of your interest.
?Causation
Down regulation
of pineal hormone
EMF
Examples confounding
SMOKING
LUNG CANCER
AGE
As age advances(
chances of lung
)cancer increase
Examples confounding
COFFEE DRINKING
HEART DISEASE
Smoking increases(
)the risk of heart ds
SMOKING
Examples confounding
ALCOHOL
INTAKE
MYOCARDIAL
INFARCTION
SEX
Biases operating
Selection: volunteers might have had
initial lower risk (e.g. lower lipids etc.)
Measurement: exercise group had a
better chance of having a coronary
event detected since more likely to be
examined more frequently
Confounding: if exercise group
smoked cigarettes less, a known risk
factor for CHD
Randomization
The only way to equalize all
extraneous factors, or everything
else is to assign patients to groups
randomly so that each has an equal
chance of falling into the exposed or
unexposed group
Equalizes even those factors which
we might not know about!
But it is not possible always
Restriction
Subjects chosen for study are
restricted to only those possessing
a narrow range of characteristics,
to equalize important extraneous
factors
Example restriction
Study: effect of age on prognosis of MI
Restriction: Male / White /
Uncomplicated anterior wall MI
Important extraneous factors controlled
for: sex / race / severity of disease
Limitation: results not generalizable to
females, people of non-white
community, those with complicated MI
For example:
Matching - definition
The process of making a study group
and a comparison group comparable
with respect to extraneous factors
(Last)
Types of Matching
Caliper matching: process of matching
comparison group to study group within a
specific distance for a continuous variable
(e.g., matching age to within 2 years)
Frequency matching: frequency
distributions of the matched variable(s)
be similar in study and comparison
groups
Category matching: matching the groups
in broad classes such as relatively wide
age ranges or occupational groups
:Limitations
controls for bias for only those factors involved in the match
Usually not possible to match for more than a few factors because of the
practical difficulties of finding patients
that meet all matching criteria
If categories for matching are relatively crude, there may be room for substantial
differences b/w matched groups
Stratification
The process of or the result of separating
a sample into several sub-samples
according to specified criteria such as age
groups, socio-economic status etc.
(Last)
Examples confounding
Disease
MI
Exposure-alcohol
+nt
-nt
+nt
140
-nt
100
RR = 140/30000
100/30000
1.4 =
Disease
MI
Exposure-alcohol
RR = 120/20000
+nt
-nt
60/10000 (M)
male female male female
1=
+nt 120
20
60
40
RR = 20/10000
-nt
40/20000 (F)
1=
Total 20000 10000 10000 20000
Standardization
A set of techniques used to remove as far
as possible the effects of differences in
age or other confounding variables when
comparing two or more populations
The method uses weighted averaging of
rates specific for age, sex, or some other
potentially confounding variable(s),
according to some specified distribution
of these variables
)Last(
Example direct
standardization
HOSPITAL A
Preop
Pts Deaths %
High
500
30
6
Medium 400
16
4
Low
300
2
.67
Total 1200
48
4
HOSPITAL Std
Preop
Multivariate adjustment
Simultaneously controlling the effects of
many variables to determine the
independent effects of one
Can select from a large no. of variables
a smaller subset that independently
and significantly contributes to the
overall variation in outcome, and can
arrange variables in order of the
strength of their contribution
Only feasible way to deal with many
variables at one time during the
analysis phase
Examples Multivariate
adjustment
CHD is the joint result of lipid
abnormalities, HT, smoking, family
history, DM, exercise, personality type.
Start with 2x2 tables using one
variable at a time
Contingency tables, i.e. stratified
analyses, examining the effect of one
variable changed in the
presence/absence of one or more
variables
Blinding
Subject
Observer / interviewer
Analyser
Strict definition / standard definition
for exposure / disease / outcome
3. Equal efforts to discover events
equally in all the groups
Controlling confounding
Similar to controlling for selection
bias
Use randomization, restriction,
matching, stratification,
standardization, multivariate analysis
.etc
non-differential misclassification
EXAMPLE OF RANDOM
ERROR
By chance, there are more episodes of
gastroenteritis in the bottle-fed group in
, the study sample
Or, also by chance, no difference in risk
, was found
EXAMPLE OF RANDOM
MISCLASSIFICATION
Lack of good information on feeding
history results in some breastfeeding mothers being randomly
classified as bottle-feeding, and vice.. versa
EXAMPLE OF BIAS
The medical records of bottle-fed babies
only are less complete (perhaps bottle
fed babies go to the doctor less) than
those of breast fed babies, and thus
record fewer episodes of gastro-enteritis
in them only.
This is called bias because the
observation itself is in error.
EXAMPLE OF CONFOUNDING
The mothers of breast-fed babies are of higher
social class, and the babies thus have better
hygiene, less crowding and perhaps other factors
that protect against gastroenteritis. Crowding and
hygiene are truly protective against
gastroenteritis, but we mistakenly attribute their
effects to breast feeding. This is called
confounding. because the observation is correct,
but its explanation is wrong.
Prevention of Bias
Sampling
Sample Size
Study design
Sources of data
collection
Methods of data
Selection bias
Error because the association
exposure disease
is different for participants and nonparticipants in the study
Errors in the
procedures to select participants
factors that influence participation
Non response
non-response occurs when certain
questions in a survey are not
answered by a respondent.
non-response takes place also when
a randomly sampled individual
cannot be contacted or refuses to
participate in a survey.
Information bias
Error because the measurement of
exposure or disease
is different between the comparison
groups.
Errors in the
procedures to measure exposure
procedures to diagnose disease
Examples of information
bias
Diagnostic bias
Recall bias
Researcher influence
Measurement bias
Inaccurate measurement of study variables
can lead to bias
Sources of inaccurate measurement:
subject error error within the individual for
any reason, eg imperfect recall of past
exposures
Instrument error eg equipment not
properly calibrated, wording of question
Observer error error in use of instrument
or recording
Misclassification
True
TBE-cases
Controls
Differential
Nondifferential
TBE-cases
Controls
TBE-cases
Controls
Dog
20
20
Dog
24
20
Dog
24
28
Nodog
20
60
Nodog
16
60
Nodog
16
52
O
R=ad/bc=3,0
O
R=ad/bc=4,5
O
R=ad/bc=2,8
Non-differential
misclassification
Same degree of misclassification in
both cases and controls
OR will be underestimated
True value is higher
Clear definitions
Good measuring methods
Blinding
Standardised procedures
Quality control
Minimising measurement
bias
1. use valid reliable tools to measure all study
subjects
2. train staff and monitor their use of research
tools
3. regular quality checks of research tools
4. blinding of study subjects and assessors
5. subjects in C-C study unaware of study
hypothesis
6. consider sub-study to determine validity and
reliability of measurements
Confounding
It occurs when there is a confounder,
which is associated with both exposure
. and disease independently
Exposure
Disease
Confounder
97 SLIDE
Confounding
defined as: a situation in which the
measure of effect of exposure on
disease is distorted because of the
association of the study factor with
other factors that influence the
outcome.
These other factors are called
confounders
exposure
Confounding: Example
Alcohol
Lung cancer
Smoking
Confounding: example
Lung cancer
No lung
cancer
Drinker
50
50
Non-drinker
50
150
100
200
Confounding: example
45
Non-drinker
30
75
No lung
cancer
15
10
Among smokers,
45/75=60% of lung
cancer cases drink and
of controls 60%=15/25
.drink
25
Non-smoker
Lung cancer
Drinker
No lung
cancer
Among non-smokers
5/25=20% of lung
140 cancer cases drink and
Non-drinker
35
20
25
175
of 20%=35/175
.controls drink
An Example
Maternal coffee
consumption during
pregnancy
Example
Low Birth
Weight
Normal Birth
Weight
Coffee
170
96
No Coffee
90
88
Smokers
Low Birth
Weight
Normal Birth
Weight
Coffee
160
16
No Coffee
80
Non-smokers
Low Birth
Weight
Normal Birth
Weight
Coffee
10
80
No Coffee
10
80
Evidence of Confounding
ORcrude = 1.73
ORsmokers = 1.00
ORnon-smokers = 1.00
The association between coffee consumption
and having a low birth weight baby is
confounded by smoking. This is demonstrated
.by the lack of effect in each stratum
88
62
150
68
82
150
:Question
?Is male gender causally related to the risk of malaria
Yes
No
Further study is needed
Malaria
Malaria
?
Outdoor
occupation
Male
gender
Malaria
Outdoor
Indoor
Males
N
(%)
68 (43.5)
88
156 (100)
Females
N
(%)
13
(9.0)
131
144 (100)
OR=7.8
:Question
?Is outdoor occupation associated with male gender
Yes
No
Second criterion: Is the putative confounder associated with the outcome (case?control status)
Male
gender
Outdoor
occupation
?
?
Malaria
Outdoor
Indoor
Malaria
Cases
N
(%)
63
(42.0)
87
150 (100)
.
Controls
N
(%)
18
(12.0)
132
150 (100)
OR=5.3
Question:
Is outdoor occupation (or something for which this
variable is a marker of --e.g., exposure to mosquitoes)
causally related to malaria?
Yes
No
Male
gender
Yes, it could be
Probably not
Malaria
: Question
Provided that:
Crude association between male gender and malaria: OR=1.71
and
Controlling confounding
In the design
Restriction of the
study
Matching
In the analysis
Restriction of the
analysis
Stratification
Multivariable
regression
Strategy
Specification
Include only
non-smokers.
Easily understood
Limits generalizability
May limit sample size
Matching
Match
smoking status
of cases and
controls
119 SLIDE
Easily understood
Reversible
Statistical
Multiple confounders
adjustment
can be controlled.
Conduct
Reversible
multivariate
analysis controlling
(adjusting) for
smoking status.
May be limited by
sample size for each
stratum
Difficult to control for
multiple confounders
Need advanced
statistical techniques
Results may be difficult
to understand
120 SLIDE
Restriction
We study only mothers of a certain age
Many children
Downs
year old 35
mothers
Matching
Selection of controls to be identical to the
cases with respect to distribution of one or
.more potential confounders
Many children
Downs
Maternal
age
Multivariable regression
Analyse the data in a statistical model
that includes both the presumed cause
and possible confounders
Measure the odds ratio OR for each of
the exposures, independent from the
others
Logistic regression is the most common
model in epidemiology
Example
miners exposure and lung
cancer
Example
Selection bias and
confounding
Confounding variables
In our study of miners:
1. .smoking is an independent risk factor
(cause) of the disease (lung cancer)
2. .more underground miners smoke ie
smoking is unevenly distributed among
the exposed and non-exposed
3. .smoking is not on the causal pathway
between exposure and disease
A confounding factors
is one that affects both the
exposure and the disease-that is
(has an association with both the
disease and the risk factor under
study) that may distort relationships
between the two and confound
(confuse) the study results.
Confounding
ExposureOutcome
Third variable
Confounding
Coffee
CHD
Smoking
Smoking is correlated with coffee drinking and
a risk factor even for those who do not drink
coffee
: Confounding factor
Drinking coffee causes CHD
Drinking coffee may not be the cause of
CHD, but rather the fact that smokers
are also coffee drinkers.
Confounding
Risk Factor
Independent
Variable
Coffee
Disease
Dependent
Variable
CHD
Covariable
Confounder
Smoking
133
:Example
In a study of the association between
tobacco smoking and lung cancer,
age would be a confounding factor if
the average ages of the non-smoking
and smoking groups in the study
population were very different, since
lung cancer incidence increases with
age.
:Another example
the possible association between
meat consumption and cancer
colon may be due to other
accompanying factors such as
decreased intake of vegetables
or increased intake of fat rather
than the meat consumption itself.
problem
The annual report of POF Hospital for the year
2006 shows 200 cases of Myocardial Infarction, 35
cases of Cholecystitis, 105 cases of Pneumonia
and 350 cases of Acute Gastroenteritis. The result
of this report cannot be generalized on the total
population of Faisalabad on account of:
a. Confounding bias
b. Memory bias
c. Selection bias
d. Berkesonian bias
e. Interviewers bias
Key: True: d
A study was done to compare the lung capacity of coal miners to the
lung capacity of farm workers. The researcher studied 200 workers
of each type. Other factors that might affect lung capacity are
smoking habits and exercise habits. The smoking habits of the two
worker types are similar, but the coal miners generally exercise
less than the farm workers
?Which of the following is the explanatory variable in this study . 1
a. Exercise
b. Lung capacity
c. Smoking or not
d. Occupation
?Which of the following is a confounding variable in this study . 2
a. Exercise
b. Lung capacity
c. Smoking or not
d. Occupation