Core Concepts in Statistics and Research Methods.
Core Concepts in Statistics and Research Methods.
doi: 10.1016/j.bjae.2024.11.006
Advance Access Publication Date: 15 January 2025
experimental clinical research designs (including the rando- Ethical principles of clinical research
mised trial) will follow.
The ethical principles underpinning clinical practice, namely
respect for individual autonomy, beneficence, non-
maleficence and justice, are also fundamental to the design
Clinical research design and conduct of good research and medical statistics. The
Clinical scientific endeavour can be divided into primary principles of ethical human research were first outlined in the
research, which reports and interprets new data directly 1947 Nuremberg Code, subsequently organised in the 1964
produced by undertaking the research; and secondary Declaration of Helsinki (most recently updated in 2013) and
research, which gathers and analyses data previously re- implemented in European and American jurisdictions by
ported, often by other researchers, to generate new in- Good Clinical Practice regulations.2e4 Although arrangements
terpretations. An example of secondary research is a of ethical approval of clinical research vary by country, the
systematic review, with or without meta-analysis. Secondary principle that all research involving human participation or
research methods and interpretation will be considered in a personal data (whether observational or experimental) re-
forthcoming article in this journal. quires careful consideration of ethical implications before it
Clinical studies generating new data, that is to say primary commences is universal. Such scrutiny should be both inter-
research, can be categorised as observational or experimental. nal (i.e. ethical risks posed by the study considered by the
In an observational study, researchers do not assign partici- researchers themselves) but also administered externally (e.g.
pants to an exposure, intervention, or treatment as part of the by a specialised research ethics committee otherwise inde-
research protocol, whereas in experimental research partici- pendent of the study and investigators). In the UK, ethical
pants are allocated to one or more interventions by the scrutiny of clinical research connected with National Health
research team. In both observational and experimental Service (NHS) patients is provided by a network of more than
research we often wish to study a specific outcome (the 80 multidisciplinary NHS Research Ethics Committees (RECs),
‘dependent variable’) and study its relationship to the expo- organised by the NHS Health Research Authority and obli-
sure or treatment (the independent variable). One type of gated to provide opinion within a statutory 60-day timeline.
experimental study, commonly conducted in clinical Ethical review of medical research falling outside the remit of
research, is the randomised trial. The term ‘clinical trial’ is NHS RECs (i.e. not connected in any way with NHS patients) is
often used to mean research with an experimental, rather usually scrutinised by University or other institution-specific
than observational, design. This nomenclature is used in this ethics committees. Poorly conducted studies, for example
and a future article in this series on randomised trials. with imprecise aims or hypotheses, inappropriate designs,
Another way to classify clinical research is according to the defective data management leading to data falsification, badly
intended clinical question. Much research is devoted to applied statistical methods etc, are themselves unethical
assessing the impact of treatment on ill health, but in- because they waste resources and may lead to incorrect
vestigations of disease prevention, diagnostic testing and to conclusions harmfully influencing clinical practice or policy.
clarify genetic or epidemiological uncertainty are also com- It is therefore an ethical imperative that research is conducted
mon. A general taxonomy of primary clinical research is to a satisfactory standard.
shown in Figure 1. Importantly, data generated from research
may be either quantitative (i.e. numerical) or qualitative (i.e.
narrative). Many research studies collect both forms of data to
Observational study design
report a mixed methods analysis, which draws on the Observational studies intend to provide useful data on the
strengths of each type of information. characteristics of samples of participants from a population.
Clinical research
Observational Experimental
Not allocation to an intervention Allocation to one or more interventions
By virtue of their lack of assignment of participants to an observational evidence, followed by retrospective cohort
intervention, observational studies are inherently unable to studies, case-control studies, cross-sectional studies then
draw conclusive causative inferences on treatment-outcome case series, in practice it is more important to acknowledge
relationships. The volume and quality of routinely collected that different designs may be best suited to answering
health data has increased in recent decades (in part because of different research questions, and to appraise the specific
electronic data storage), which invites observational analyses methodological and analytical strengths and weaknesses of
on available data seeking answers to important research any individual study before determining its place in the wider
questions. literature.9
Observational studies may have several advantages over
experimental research, such as the randomised trial. Obser-
vational studies commonly describe much larger samples Comparison of case-control and cohort studies
than randomised trials and so can usefully explore rare out-
The most prevalent designs in observational clinical research
comes. Their non-interventional nature means they can be
are case-control and cohort studies, although these labels are
comparatively inexpensive and less time-consuming to
frequently misapplied. The important difference between the
conduct compared with randomised trials. Certain research
two methodologies is that case-control studies enrol partici-
questions may be fundamentally unethical to answer by an
pants based on their outcome status (with or without a disease
interventional study design, or involve variables (such as
or outcome) and seek exposure risk factors, whereas a cohort
inherited traits) that are impossible to manipulate, and so
design enrols on exposure status (without or without a risk
observational research is sometimes the only reasonable op-
factor) and describes one or more subsequent outcomes. As a
tion to advance science on certain topics.
consequence, rare outcomes are frequently examined via
A summary of common quantitative observational study
case-control design but, conversely, rare exposures are best
designs in anaesthetic research is found in Table 1. Although
examined by a cohort study. Both designs can be subject to
it has been proposed that prospectively conducted cohort
prospective or retrospective data aggregation, although
studies should be ranked highest in the hierarchy of
cohort studies are more frequently prospective and case-
Table 1 Common quantitative observational study designs in anaesthetic and critical care research with illustrative examples of each
study type.
Case report Description of one or more Useful to illustrate novel No control comparison, Alexeev M, and colleagues.
or series clinical vignettes from a and unusual features in limited generalisability. Carbon dioxide embolism
real-world setting. patients in the context of during posterior
existing knowledge on a retroperitoneal
topic, which may warrant adrenalectomy.5
subsequent further study
by other methodologies.
Cross-sectional A description of outcome, Commonly deployed to Cannot determine causal Walker EMK, and
exposure data, or both determine prevalence. relationships between colleagues. Patient reported
from individuals collected Describes routinely exposure and outcome. outcome of adult
at a single time point. A collected health data, perioperative anaesthesia in
‘snapshot’ of a population research-specific data, or the United Kingdom: a cross-
at a specific time. both as a population-based sectional observational
description of current study.6
practice.
Cohort Participants are included Can estimate the incidence Follow-up of participants Robba C, and colleagues.
based on their exposure of multiple outcomes and over time (longitudinally) Intracranial pressure
status (‘exposed’ or risk factors linked by can be expensive, time- monitoring in patients with
‘unexposed’) and either temporality (i.e. consuming and incur data acute brain injury in the
retrospective or participants at the time of loss. Selection bias must be intensive care unit
prospective longitudinal enrolment had the minimised by ensuring (SYNAPSE-ICU): an
data described. exposure but not the exposed and unexposed international, prospective
outcome). participants are similar in observational cohort study.7
all important regards
except for their exposure
status.
Case-control Participants are defined by Can suggest associative Choice of control group Rujirojindakul P, and
their outcome status, for relationships but not vitally important to colleagues. Risk factors for
example presence of a establish causation. prevent selection bias as reintubation in the post-
condition or disease Retrospective collection of controls must be similar to anaesthetic care unit: a case
(‘cases’) that has already exposure data is usually cases except for the econtrol study.8
occurred and compared less time-consuming than outcome in question.
with subjects without the prospective study, Cannot provide incidence
outcome (‘controls’) to especially for rare rates.
seek exposure risk factors. outcomes.
control studies more frequently retrospective. Whether a a distortion of the true effect of any given exposure factor on a
cohort study is designated ‘prospective’ or ‘retrospective’ is specific outcome.17 Bias undermines the internal validity of
most straightforwardly determined by the manner in which observational research by impeding the reader’s confidence
exposure data are gathered in relation to the subject outcome. that the study has measured what it aimed to measure.18 Bias
In a prospective cohort subjects are enrolled and exposure can also affect external validity if the reported results cannot
data collected before subjects develop the outcome of interest. be usefully generalised from the studied sample to the wider
In a retrospective cohort enrolment and exposure data are population. Biases may increase or decrease the sample data
established after subjects have already developed the RR or OR reported in studies, the direction of change
outcome. Sometimes studies labelled as ‘retrospective’ (e.g. depending on the specific bias under examination and this
research using previously collected large registries or re- direction may not be predictable in advance. It is not unusual
positories of health data) could in fact by designated ‘pro- for observational analyses examining identical clinical ques-
spective’ in that such exposure data were collected from tions to produce discordant results. Discordant results may be
subjects before an outcome of interest occurred and before because of how each analysis has mitigated potential biases.
the research was conceived. An example from the recent anaesthesia literature is the
Case-control studies often report the proportion of sub- comparative benefit of sugammadex vs neostigmine for the
jects with (‘cases’) and without (‘controls’) an outcome who prevention of postoperative pulmonary complications. Two
experienced one or more exposure variables. This relationship large observational datasets have produced differing conclu-
is best expressed as an odds ratio (OR) with 95% confidence sions and these differences could plausibly be ascribed in part
interval (95% CI). Case-control studies cannot establish the to their respective approaches to handling confounding vari-
prevalence or the incidence of outcomes. In contrast, a cohort ables and reducing bias.19e21 It is critical that researchers
study can report the incidence of an outcome in an exposed consider and seek to minimise as much as possible the
group and compare this with the outcome incidence in a non- different types of bias set out below.
exposed group, producing a relative risk (RR) with 95% CI.
Means of calculating OR and RR have been detailed in a pre- Selection/attrition bias
vious article in this series.10 In broadest terms, selection bias occurs when a study sample
does not accurately represent the target population.22 Truly
random study sampling from a population of interest is usu-
Association, correlation and causation
ally not possible in clinical observational research, as it is
The terms ‘association’, ‘correlation’ and ‘causation’ have more common to execute selected sampling using data that
distinct meanings in clinical research.5,11 Association is the has already been collected from the population. In cohort and
most general term and means only that one variable provides case-control studies, selection bias most seriously compro-
information about another variable. An associative relation- mises internal validity when the sampling technique results
ship does not need to be linear. Correlation describes an in groups being different in one or more important regards
increasing or decreasing linear trend between variables and other than their exposure or outcome status, respectively. For
thus implies association. Correlation between linearly related, example, in a hypothetical prospective cohort study of quality
normally distributed, variables can be quantified, in terms of of recovery (outcome) after surgery and general anaesthesia
direction and strength, by a Pearson correlation coefficient. (exposure) requiring the administration of postoperative
Non-normally distributed data are better assessed by the written questionnaires, patients with high literacy and pre-
Spearman rank correlation coefficient, which requires no as- vious educational attainment may be more likely to submit a
sumptions of normality. Correlation coefficients, which are complete dataset, resulting in potential bias. Selection bias
calculated from 1 (perfect negative correlation) to þ1 (perfect can also limit external validity, for example when the sample
positive correlation), should always be reported with a 95% CI is derived from a single institution making generalisability to
to provide bounds of certainty in the reported results. Corre- other healthcare settings challenging.
lation does not predict how one variable causes another one to
change and correlation is not causation.12 Causation occurs if Measurement bias
one variable relies on another variable for its value.13 Causation Measurement bias describes systematic errors in the mea-
requires an association between variables but does not imply surement or classification of either exposures or outcomes
correlation. Using obstructive sleep apnoea (OSA) as an and is also known as information or observation bias. In
example, we can say OSA is associated with age because there cohort and case-control studies, measurement bias is mini-
is a non-linear (bimodal) relationship between presence of mised by collecting information from all participants,
OSA and age in years. We may also state that OSA severity is regardless of group, in the same manner. For example, in a
correlated with body mass index, and that OSA causes post- hypothetical cohort study of accidental awareness during
operative pulmonary complications. general anaesthesia (AAGA; outcome) after elective general
Causality is usually assessed by the experimental rando- surgery (exposure), patients undergoing day-case procedures
mised trial. However, causal inference can be speculated from may report lower rates of AAGA when answering a remote
observational data using a variety of possible criteria and written assessment, compared with patients who underwent
statistical techniques reviewed elsewhere.14e16 Such causal inpatient surgery and subsequently undergo in-person
speculation from observational evidence may be subse- follow-up interview with a healthcare professional. The
quently sustained or overruled by later experimental studies. method of determining whether the outcome has occurred
may itself induce error in the results.
Covariate
(e.g.) Use of
Mediator NMBD
(e.g.) Difficult
tracheal intubation
Confounder
(e.g.) Obesity
Exposure or
Outcome
treatment
(e.g.) AAGA
(e.g.) Emergency
Caesarean delivery
(e.g.) Use of pEEG
Moderator monitoring
Collider
(e.g.) Post-
traumatic stress
Fig 2 Simplified causal diagram illustrating confounding, mediating and moderating variables affecting the relationship between exposure and outcome, with a
theoretical example from anaesthesia and critical care. AAGA, accidental awareness under general anaesthesia; NMBD, neuromuscular blocking drug; pEEG,
processed electroencephalography.
confounding variable affects the outcome in question and is strength of the relationship between exposure and outcome.
related in some non-causative manner to the measured Figure 2 illustrates these concepts in a basic causal schematic
exposure variables.23 The relationship between exposure and with putative examples of each variable in the context of an
outcome is thus distorted by the effect of the third extraneous observational study of accidental awareness under general
variabledthe confounder. This effect is particularly trouble- anaesthesia in parturients.
some when the confounding variable is present to a varying An extreme example of the harmful effect of confounding
extent in different study groups. A key strength of randomised on data interpretation is Simpson’s paradox. Simpson’s
clinical trials over observational studies is that randomisation paradox describes the phenomenon of a relationship between
itself prevents non-random unequal distribution of con- exposure and outcome variables being entirely reversed
founding variables between groups and therefore eliminates depending on whether analysis is conducted at an aggregated
confounding as a methodological concern. Confounding can or a stratified level. Box 1 provides a hypothetical illustrative
thus be considered the central methodological difficulty of example of this phenomenon.
observational studies. In a hypothetical observational exami-
nation of renal complications (outcome) after antibiotic pre-
Methods to control for confounding in observational
scription (exposure) in critically ill patients, illness severity can
be anticipated as a confounding variable, because the most
studies
unwell patients are likely to receive more antibiotics and are Measures to reduce selection and measurement bias must be
also expected to sustain a higher rate of complications. The instituted in the design phase of an observational study.
study must therefore control for illness severity to delineate Similarly, possible confounding factors should be anticipated
the underlying relationship between exposure and outcome. during study design to allow collection of these variables and
Unlike a confounder, a collider variable is one which itself subsequent control by a variety of possible statistical adjust-
causally arises from both the exposure and outcome. If such a ments. Unknown or unmeasured confounders cannot be
collider is statistically controlled for (in the manner of a adjusted for, so careful consideration of possible confounders
confounder) a distorted association between exposure and at the study planning phase is important. Adjustments usu-
outcome may arise. Misidentification of a collider variable as a ally aim to isolate the effect of one or more exposures on the
confounder with subsequent controlling according to the outcome of interest by holding other, potentially confounding
methods outlined below itself induces bias. factors, constant during analysis. In increasing order of
In further contrast to a confounder, which exerts an effect complexity, we present a summary of common adjustment
on both exposure and outcome, a mediator is a variable that techniques in observational research below.
intervenes between exposure and outcome. A mediator arises
from the exposure and precedes the outcome. Another type of Matching
variable affecting the exposureeoutcome relationship is a Matching can be applied prospectively or retrospectively to
moderator. A moderator variable influences the direction or cohort or case-control studies by identifying key confounders
Box 1
The harmful effect of confoundingda worked example of Simpson’s paradox
Consider a hypothetical prospective cohort study comparing thoracic epidural analgesia (TEA) with paravertebral blockade
(PVB) to prevent severe pain on coughing (11-point numeric rating score 7) in 1000 adults after blunt force chest wall
trauma. Five hundred patients received TEA and 500 patients received PVB.
The following contingency table summarises the success rates of these analgesic techniques, expressed as number (%) of
patients experiencing severe pain 24 h after injury, stratified by number of rib fractures sustained. TEA is associated with
higher success rates (i.e. reduced occurrence of severe pain) for both lower (6.9% vs 12.8%; relative risk [RR]¼0.54) and higher
(27.3% vs 32.8%; RR¼0.83) numbers of rib fractures, but TEA appears less effective than PVB if number of fractured ribs is not
considered (21.4% vs 15.4%; RR¼1.39).
This paradox arises because the probability of a patient receiving either TEA or PVB depended on the number of fractured
ribs sustained and this confounding variable was disproportionately present between treatment groups. Most patients with
4 fractured ribs (i.e. 436/581, 75%) received PVB whereas sustaining >4 rib fractures more commonly (355/419 or 84.7%) led
to TEA insertion.
TEA PVB
and ensuring participants enrolled to study groups are straightforward to undertake and interpret in the presence of
matched by these characteristics. Matching can be conducted only one or two potential confounders, it becomes unwieldy if
on an individual (also known as a ‘one-to-one’ or ‘pairwise’) or many potential confounders are reported, especially if each
frequency basis. Individual matching pairs each study subject confounder itself necessitates reporting of multiple relevant
with a comparator subject sharing the matched characteris- levels (e.g. body mass index grouped by <18.5, 18.5e24.9,
tic(s). Frequency matching intends to ensure the frequency of 25e29.9, 30e34.9, 35e39.9, >40 kg m2), in contrast to simple
nominated confounders is equal between groups. For dichotomised data (e.g. biological sex: male, female). For
example, in a case-control study of patients’ age and periop- example, stratification adjustment for a hypothetical obser-
erative myocardial injury, individual matching might pair vational study examining the relationship between surgical
participants undergoing the same type of operation, preop- site infection (outcome) and presence of diabetes mellitus
erative smoking and diabetes mellitus status (all potential (exposure) where biological sex, body mass index and smok-
confounders). In contrast, frequency matching aims for these ing status (current, former or never) were offered as potential
confounders to be present in equal proportions overall in both confounders, would require the reporting of 36 substrata. The
‘case’ and ‘control’ groups. If conducted correctly, matching more substrata analyses are undertaken, the smaller the
will result in no statistical difference in the prespecified corresponding sample size and the increased risk that a pos-
confounder between study groups. The main limitations of itive finding may arise purely be chance. To control simulta-
matching are practical and analytical. From a practical neously for multiple confounders, regression analysis is
perspective, matching increases the difficulty of recruitment frequently used.
into the study, especially if multiple matching characteristics
are proposed, as each participant needs to share characteris-
tics with their opposing matched subject. Matching difficulty Regression analysis
can be mitigated by applying a degree of reasonable flexibility In observational research, regression analysis is commonly
in the matched characteristic; for example, accepting a used to model the relationship between an outcome (depen-
matched body mass index range within plus or minus 5 kg dent variable) and one or more exposures (independent vari-
m2, rather than a single value. But nevertheless, matching ables). Such techniques are useful to control for confounders,
can only be conducted on a limited number of potential con- but also to make predictions based on observed data. The
founders, sometimes far fewer than could reasonably be relationship between a single continuous, dichotomous or
supposed to exist in a given clinical scenario. The analytical categorical exposure variable and a continuous outcome can
limitation imposed by matching is that the matched charac- be quantified by linear regression producing a regression co-
teristics themselves cannot be examined with regard to their efficient (b) indicating the direction and strength of the rela-
effect on outcome (in a cohort) or exposure (in a case-control tionship, and associated 95% CI. At its most basic, linear
study).18 regression represents the plotting of a line of best fit though
two variables graphically represented on a scatter plot and
will allow an estimate of how much the outcome variable
Stratification changes for every unit change in exposure variable. In the
Analysis by stratification involves grouping sample data by circumstance of a dichotomous outcome variable (e.g. dead or
potential confounding variables (into ‘strata’) and undertak- alive at a specific time point), logistic regression analysis can
ing analysis by these subgroups to seek the relationship be- be used. Logistic regression provides the relative probability
tween exposure and outcome unmodified by that confounder. (OR) of experiencing the outcome for different levels of the
The main limitation of stratification is that while it is exposure.
Indication
(e.g.) Placental
abruption
Prompts Causes
Exposure or Outcome
treatment (e.g.) Major
(e.g.) Emergency haemorrhage
general anaesthesia
Distorted or false
association
Fig 3 Confounding by indication, with an example from anaesthesia and critical care.
Linear and logistic regression can be extended to explore baseline characteristics including identified confounders.26
the relationship of multiple simultaneous exposure variables The purpose of propensity matching is to limit confounding
on the outcome of interest, known as multivariable regression by indication, a concept illustrated in Figure 3.27 The decision
analysis. Multivariable regression estimates the contribution of which variables to include in a propensity score model can
of each exposure variable while controlling for the other in- be facilitated by construction of causal directed acyclic
dependent variables (including possible confounders) by graphs, which serve to collate relationships between expo-
holding them constant. The ability of each type of regression sure, outcome and other variables of interest, including
analysis to accurately model data relies on certain assump- (among others) confounders, mediators and moderators.28
tions specific to the model about the observed sample data. The propensity score is usually calculated using logistic
For example, in multivariable linear regression these are that regression with treatment/exposure as the dependent vari-
the regression coefficients are linearly related, that all differ- able and the baseline characteristics and specified con-
ences between observed and modelled values are normally founders which predict the treatment/exposure as
distributed with a fixed variance (normality and homosce- independent variables. The effect of a treatment intervention
dasticity) and that sample observations are independent from is then estimated among subjects with the same propensity
one another (independence).24 score, thereby controlling the bias induced by those con-
The sample size must be considered when planning a founding variables. One advantage of propensity scoring over
multivariable regression model. Broadly, the more indepen- multivariable regression is its ability to account for more po-
dent variables that are included in a regression analysis, the tential confounders than can typically be accommodated us-
larger the sample size that may be required. For example, it is ing regression analysis in circumstances when the outcome is
usually not appropriate to test for 10 variables in a regression rare. Another advantage of propensity scoring is that, unlike
model (e.g. age, sex, weight, height, ASA score, duration of multivariable regression, propensity scoring is separate from
anaesthesia, presence of neuraxial analgesia, intraoperative outcome analysis and therefore less likely to be influenced by
opioid use, surgical complexity, length of inpatient stay) in a a researcher’s expectations and bias.
dataset of only 50 subjects. Statistical input should always be Matching subjects by propensity score creates balance
sought to ensure an analysis is adequately powered. Trans- between groups for confounders, and therefore between-
parent reporting of regression analysis should always confirm group differences in outcome can be ascribed with less bias
the assumptions upon which it is conducted (which can to direct treatment effect.29 The disadvantage of propensity
largely be done by examination of a scatter plot) and model score matching is the necessary exclusion of non-matched
goodness-of-fit (the detailed description of which is beyond subjects, leading to loss of sample size, power and precision of
the scope of this article). Failure to do so may increase un- estimates. Use of matching for propensity score analysis has
certainty associated with the regression data.24 A fuller been criticised as unintentionally increasing bias, to the
exposition of regression analysis will feature in a forthcoming extent that some authors recommend that matching is not
article in this journal. used as part of a propensity score analysis.30
Multiple alternative techniques exist to handle propensity
Propensity scoring scores that do not rely on subject matching, including pro-
Propensity scoring extends matching, stratification and pensity score stratification, weighting and covariate adjust-
regression, as described above, by summarising multiple ment. The execution and comparative advantages and
measured confounders into a single value.25 This value, the disadvantages of these methods are beyond the scope of this
propensity score, is the probability (defined from 0e1) a sub- article but have been reviewed in detail elsewhere.31,32 The
ject receives a given treatment conditional on measured method used to apply propensity scoring can markedly affect
the arising study results and so expert statistical input should mandatory components of manuscript submission for peer
be sought for all planned propensity analyses.33e36 review to high impact specialty journals. Specific statistical
Propensity scoring imitates the inherent facility of rando- techniques, such as propensity scoring, should be reported to
mised trials to balance confounders between groups.37 The published standards, where these exist, and should be suffi-
chief difference, and therefore weakness, of propensity ciently detailed to allow replication of the analysis by
scoring is that whereas randomisation controls for both readers.42,43
measured and unmeasured confounders, propensity scoring Interpretation of observational research findings should
can only control known and measured confounding variables. always pay regard to associative, rather than causative, na-
When presented with propensity scored analyses it is ture of the observed exposureeoutcome relationship. When
important to assess the balance of baseline characteristic evaluating an item of observation research, we believe clini-
variables between study groups. Balance may be assessed cians should pay regard to the following fundamental aspects
most straightforwardly by comparing summary statistics of any study: the clarity of the research question, aims or
such as mean values or proportions. Many analyses present objectives and whether the chosen study design is capable of
standardised differences (i.e. for any given variable the dif- providing insight into these stated uncertainties; whether
ference between groups divided by the pooled standard de- justification for the sample size is provided and important
viation of both groups) and a difference of <0.1 is usually potential biases are addressed in the design analyses; the
considered unimportant.38 It is also vital to consider which consistency and simplicity of communication of the study
variables have not been included in the propensity score results with careful acknowledgement of data limitations and
(either by deliberate exclusion by the researchers, or because contradictions.
they were unmeasured), as unaccounted confounding vari-
ables will exert a biasing effect on study results.
Conclusions
The complexities of observational research should not be
Instrumental variable analysis
underestimated: the above description of biases and their
Given the fundamental weakness of propensity scoringdthat
means of control are far from comprehensive. Planning,
it can only attempt to account for known and measured
conducting and reporting such research requires clinical
confounding factorsda statistical technique capable of con-
subject matter experts to work closely with statisticians so
trolling both known and unknown variables would offer a
that appropriately careful conclusions are drawn about the
substantial benefit. Instrumental variable analysis seeks to
relationship between studied exposure and outcome.
achieve this by identifying a variable (the ‘instrumental vari-
able’) which is associated with the treatment/exposure but
has no direct association with the outcome, except through its MCQs
influence on treatment. A recently published example in the
The associated MCQs (to support CME/CPD activity) will be
anaesthetic literature sought to examine the relationship be-
accessible at www.bjaed.org/cme/home by subscribers to BJA
tween intraoperative hydromorphone administration dose
Education.
(exposure) and postoperative pain (outcome) in the presence
of multiple possible unmeasured confounders.39 The unit
dose of hydromorphone contained in single vials and avail- Declaration of interests
able to clinicians was shown to be associated with subsequent
DWH is an editor and editorial board member of BJA Education.
dose of drug given during surgery (exposure) but it is not, in
BS declares no relevant conflicts of interest.
itself, related to pain scores (outcome). Unit dose of hydro-
morphone could therefore serve as the instrumental variable,
facilitating an estimation of the exposureeoutcome relation- References
ship without confounding. The main weakness of indepen-
1. Lind J (1753). The James Lind library. May 26, 2010.
dent variable analysis is that identifying a suitable
Available from: https://www.jameslindlibrary.org/lind-j-
independent variable, fulfilling the various assumptions
1753/(accessed 1 July 2024).
required, is often difficult. Detailed review of this technique is
2. ICH Official web site : ICH. Available from: https://www.ich.
available to interested readers.40
org/page/efficacy-guidelines. [Accessed 1 July 2024]
3. Katz J. The Nuremberg code and the Nuremberg trial. A
reappraisal. JAMA 1996; 276: 1662e6
Reporting and interpretation of
4. World Medical Association. World Medical Association
observational studies Declaration of Helsinki: Ethical principles for medical
By providing a standardised structure to manuscripts inten- research involving human subjects. JAMA 2013; 310:
ded for publication, reporting guidelines improves under- 2191e4
standing and appraisal of clinical observational research. A 5. Alexeev M, Fedorov E, Kuleshov O, Rebrova D, Efremov S.
major barrier to evidence synthesis and generation of high- Carbon dioxide embolism during posterior retroperito-
quality meta-analysis is incomplete or inadequate reporting neal adrenalectomy. Anaesth Rep 2022; 10, e12164
in primary literature. The ‘Enhancing the quality and trans- 6. Walker EM, Bell M, Cook TM, Grocott MP, Moonesinghe SR.
parency of health research (EQUATOR) network’ provides a Patient reported outcome of adult perioperative anaes-
database of >200 reporting guidelines for observational thesia in the United Kingdom: a cross-sectional observa-
research, primus inter pares the ‘Strengthening the reporting of tional study. Br J Anaesth 2016; 117: 758e66
observational studies in epidemiology (STROBE) statement’.41 7. Robba C, Graziano F, Rebora P et al. Intracranial pressure
STROBE, and its subsequent extensions, can be applied to a monitoring in patients with acute brain injury in the
wide variety of observational study designs, and are usually intensive care unit (SYNAPSE-ICU): an international,
prospective observational cohort study. Lancet Neurol specific settings, but not substantially different estimates
2021; 20: 548e58 compared with conventional multivariable methods.
8. Rujirojindakul P, Geater AF, McNeil EB et al. Risk factors J Clin Epidemiol 2006; 59: 437e47
for reintubation in the post-anaesthetic care unit: a 27. Butwick AJ, Palanisamy A. Mode of anaesthesia for
caseecontrol study. Br J Anaesth 2012; 109: 636e42 Caesarean delivery and maternal morbidity: can we
9. Sterrantino AF. Observational studies: practical tips for overcome confounding by indication? Br J Anaesth 2018;
avoiding common statistical pitfalls. Lancet Reg Health 120: 621e3
Southeast Asia 2024; 25, 100415 28. Tennant PWG, Murray EJ, Arnold KF et al. Use of directed
10. Sidebotham D, Hewson D. Core concepts in statistics and acyclic graphs (DAGs) to identify confounders in applied
research methods. Part I: statistical inference. BJA Educ health research: review and recommendations. Int J Epi-
2024; 25: 29e37 demiol 2021; 50: 620e32
11. Altman N, Krzywinski M. Association, correlation and 29. Staffa SJ, Zurakowski D. Five steps to successfully
causation. Nat Methods 2015; 12: 899e900 implement and evaluate propensity score matching in
12. Janse RJ, Hoekstra T, Jager KJ et al. Conducting correlation clinical research studies. Anesth Analg 2018; 127: 1066
analysis: important limitations and pitfalls. Clin Kidney J 30. King G, Nielsen R. Why propensity scores should not be
2021; 14: 2332e7 used for matching. Political Anal 2019; 27: 435e54
13. Pearl J, Glymour M, Jewell NP. Causal inference in statistics: a 31. Austin PC. An introduction to propensity score methods
primer. New York, USA: Wiley; 2016 for reducing the effects of confounding in observational
14. Gianicolo EAL, Eichler M, Muensterer O, Strauch K, studies. Multivariate Behav Res 2011; 46: 399e424
Blettner M. Methods for evaluating causality in observa- 32. Schulte PJ, Mascha EJ. Propensity score methods: theory
tional studies. Dtsch Arztebl Int 2020; 117: 101e7 and practice for anesthesia research. Anesth Analg 2018;
15. Hill AB. The environment and disease: association or 127: 1074
causation? Proc R Soc Med 1965; 58: 295e300 33. Austin PC, Mamdani MM. A comparison of propensity
16. Parascandola M, Weed DL. Causation in epidemiology. score methods: a case-study estimating the effectiveness
J Epidemiol Community Health 2001; 55: 905e12 of post-AMI statin use. Stat Med 2006; 25: 2084e106
17. Garegnani LI. Bias, quality and reporting in health 34. Austin PC. The performance of different propensity score
research: differences and tools for appraisal. BMJ Evid methods for estimating marginal odds ratios. Stat Med
Based Med 2023; 28: 407e9 2007; 26: 3078e94
18. Grimes DA, Schulz KF. Bias and causal associations in 35. Kurth T, Walker AM, Glynn RJ et al. Results of multivariable
observational research. Lancet 2002; 359: 248e52 logistic regression, propensity matching, propensity
19. Suleiman A, Munoz-Acuna R, Azimaraghi O et al. The ef- adjustment, and propensity-based weighting under condi-
fects of sugammadex vs. neostigmine on postoperative tions of nonuniform effect. Am J Epidemiol 2006; 163: 262e70
respiratory complications and advanced healthcare uti- 36. Laborde-Caste rot H, Agrinier N, Thilly N. Performing both
lisation: a multicentre retrospective cohort study. Anaes- propensity score and instrumental variable analyses in
thesia 2023; 78: 294e302 observational studies often leads to discrepant results: a
20. Kheterpal S, Vaughn MT, Dubovoy TZ et al. Sugammadex systematic review. J Clin Epidemiol 2015; 68: 1232e40
versus neostigmine for reversal of neuromuscular 37. Streiner DL, Norman GR. The pros and cons of propensity
blockade and postoperative pulmonary complications scores. Chest 2012; 142: 1380e2
(STRONGER): a multicenter matched cohort analysis. 38. Haukoos JS, Lewis RJ. The propensity score. JAMA 2015;
Anesthesiology 2020; 132: 1371e81 314: 1637e8
21. Sidebotham D, Frampton C. Sugammadex and neostig- 39. Ershoff B. Intraoperative hydromorphone decreases
mine: when better may not be best. Anaesthesia 2023; 78: postoperative pain: an instrumental variable analysis. Br J
557e60 Anaesth 2023; 131: 104e12
22. Greenacre ZA. The importance of selection bias in 40. Iwashyna TJ, Kennedy EH. Instrumental variable ana-
internet surveys. Open J Stat 2016; 6: 397e404 lyses. exploiting natural randomness to understand
23. Skelly AC, Dettori JR, Brodt ED. Assessing bias: the causal mechanisms. Ann Am Thorac Soc 2013; 10: 255e60
importance of considering confounding. Evid Based Spine 41. von Elm E, Altman DG, Egger M et al. The Strengthening
Care J 2012; 3: 9e12 the Reporting of Observational Studies in Epidemiology
24. Ernst AF, Albers CJ. Regression assumptions in clinical (STROBE) statement: guidelines for reporting observa-
psychology research practiceda systematic review of tional studies. Lancet 2007; 370: 1453e7
common misconceptions. PeerJ 2017; 5, e3323 42. Andrew BY, Brookhart MA, Pearse R, Raghunathan K,
25. Brookhart MA, Wyss R, Layton JB, Stürmer T. Propensity Krishnamoorthy V. Propensity score methods in obser-
score methods for confounding control in nonexperi- vational research: brief review and guide for authors. Br J
mental research. Circ Cardiovasc Qual Outcome. 2013; 6: Anaesth 2023; 131: 805e9
604e11 43. Yao XI, Wang X, Speicher PJ. Reporting and guidelines in
26. Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, propensity score analysis: a systematic review of cancer
Schneeweiss S. A review of the application of propensity and cancer surgical studies. J Natl Cancer Inst 2017; 109:
score methods yielded increasing use, advantages in djw323