The Table I Fallacy P Values in Baseline Tables.13

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

e71(1)

C OPYRIGHT Ó 2022 BY T HE J OURNAL OF B ONE AND J OINT S URGERY, I NCORPORATED

the
Orthopaedic
Downloaded from http://journals.lww.com/jbjsjournal by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hC
ywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 11/25/2024

forum
The Table I Fallacy: P Values in Baseline Tables of
Randomized Controlled Trials
Bart G. Pijls, MD, PhD

Investigation performed at the Department of Orthopaedics, Leiden University Medical Center, Leiden, the Netherlands

In randomized controlled trials (RCTs), it is common practice baseline have occurred because of chance when we already
to present the baseline characteristics of the study groups in a know that all differences are due to chance/randomization1-6.
table, generally referred to as “Table I,” which allows the inves- When the significance level, alpha, that is chosen is 5%, there
tigators and readers to assess if there are imbalances of prog-
nostic variables at baseline1. It is important to be able to identify
differences in the baseline characteristics of the study groups TABLE I Total Number of RCTs and Number of RCTs That
Reported P Values in Baseline Tables Published in
because imbalances of prognostic variables may confound the
2019 and 2020 in Q1 Orthopaedic Journals*
observed results of the RCT1-6. Randomization is a tool that can
be used to achieve a balanced distribution of the known and Total No. of No. (%) Reporting P Values
unknown prognostic variables at baseline1-6. However, ran- Journal RCTs in Baseline Table
domization does not guarantee such balance, and imbalances
JOA 55 43 (78%)
due to chance are possible with randomization, especially for
JBJS 51 35 (69%)
small (pilot) RCTs and cluster randomized trials1-6.
Significance tests are often used to identify imbalances of BJJ 47 27 (57%)
prognostic variables, and the results of these tests are presented AJSM 43 33 (77%)
in the first table alongside baseline characteristics1,2. However, KSSTA 35 22 (63%)
significance testing at baseline is not necessary when random- Spine 27 22 (81%)
ization has been performed correctly, and this unnecessary Arthroscopy 25 21 (84%)
testing may thus be called a “Table I fallacy.”3,4 In addition, item CORR 21 17 (81%)
15 of the Consolidated Standards of Reporting Trials (CON-
AO 21 1 (5%)
SORT) Statement clearly advises against it1.
JOPT 19 0 (0%)
There are several reasons why p values in baseline tables
are considered a Table I fallacy. First, it is irrelevant whether The Spine 15 15 (100%)
Journal
differences at baseline are significant because even nonsignifi-
cant differences can influence the association between treat- OC 14 0 (0%)
ment and outcome5. Second, using significance testing to JOSPT 11 1 (9%)
compare baseline variables is illogical because, in this setting, continued
significance testing assesses the probability that differences at

Disclosure: The Disclosure of Potential Conflicts of Interest forms are provided with the online version of the article (http://links.lww.com/JBJS/G962).

J Bone Joint Surg Am. 2022;104:e71(1-2) d http://dx.doi.org/10.2106/JBJS.21.01166


e71(2)
TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG
d
T H E T A B L E I F A L L AC Y : P VA LU E S I N B A S E L I N E TA B L E S O F R A N D O M I Z E D
V O LU M E 10 4-A N U M B E R 16 A U G U S T 17, 2 022
d d
CONTROLLED TRIALS

The Table I fallacy also exists in orthopaedic literature. To


TABLE I (continued)
quantify the magnitude of the Table I fallacy in orthopaedic
Total No. of No. (%) Reporting P Values
literature, a review was performed of all RCTs published in
Journal RCTs in Baseline Table 2019 and 2020 in the first quartile (Q1, top 25%) of orthopae-
dic journals, as identified by the latest (2020) journal citation
Downloaded from http://journals.lww.com/jbjsjournal by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hC

CJSM 5 3 (60%) reports7. After screening the table of contents of 18 journals,


JOT 4 4 (100%) 399 of 10,774 articles were identified as RCTs. Of these 399
ywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 11/25/2024

JOR 3 1 (33%) RCTs, 247 (62%) reported p values in Table I or in the accom-
BJR 2 2 (100%) panying text; the proportion did not differ greatly between
Cartilage 1 0 (0%) 2019 (60%) and 2020 (64%). There were considerable differ-
ences between journals: the percentage of published RCTs in
Total 399 247 (62%)
which p values were presented in baseline tables ranged from
*JOA = The Journal of Arthroplasty; JBJS = The Journal of Bone & Joint 0% to 100% (Table I).
Surgery; BJJ = The Bone & Joint Journal; AJSM = The American Previous reviews of nonorthopaedic journals have esti-
Journal of Sports Medicine; KSSTA = Knee Surgery, Sports Trau- mated that testing for baseline differences in RCTs declined
matology, Arthroscopy; CORR = Clinical Orthopaedics and Related from 58%4 and 48%8 in the late 20th century to 35%2 in
Research; AO = Acta Orthopaedica; JOPT = Journal of Physiotherapy; 2011, which is far less common than the 62% for RCTs pub-
OC = Osteoarthritis and Cartilage; JOSPT = Journal of Orthopaedic &
lished in leading orthopaedic journals in 2019 and 2020.
Sports Physical Therapy; CJSM = Clinical Journal of Sport Medicine;
JOT = Journal of Orthopaedic Translation; JOR = Journal of Ortho-
In conclusion, as an orthopaedic clinical and research
paedic Research; BJR = Bone & Joint Research. community, we can and should do better. Authors, reviewers,
and editors should be aware of the Table I fallacy; should follow
the CONSORT Statement, which states that statistical tests of
is a 1 in 20 (or 5%) probability of finding a baseline charac- baseline characteristics should not be performed1; and should
teristic with a p value of <0.05 in Table I6. Third, p values are stop reporting p values in baseline tables of RCTs. n
used in the context of hypothesis testing, whereas Table I
describes baseline characteristics4,5. Fourth, the baseline char-
acteristics by which an analysis is adjusted should preferably
be chosen a priori (in the trial protocol)3. These characteris- 1
tics should be selected on the basis of clinical reasoning and Bart G. Pijls, MD, PhD
include variables that are known to be important predictors of 1
Department of Orthopaedics, Leiden University Medical Center, Leiden,
outcome3. Performing significance tests on baseline charac- the Netherlands
teristics and reporting their results (i.e., p values) in RCTs
should thus be avoided. Email for corresponding author: [email protected]

References
1. CONSORT 2010. 15. Baseline Data. CONSORT Transparent Reporting of Trials. 5. Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994 Sep 15;
Accessed 2021 Sep 22. http://www.consort-statement.org/checklists/view/ 13(17):1715-26.
32—consort-2010/510-baseline-data 6. Altman DG. Comparability of randomised groups. The Statistician. 1985;34:
2. Knol MJ, Groenwold RHH, Grobbee DE. P-values in baseline tables of randomised 125-36.
controlled trials are inappropriate but still common in high impact journals. Eur J Prev 7. Clarivate Analytics. Journal Citation Reports 2020 Journal Impact Factor.
Cardiol. 2012 Apr;19(2):231-2. 2021. https://clarivate.com/webofsciencegroup/solutions/journal-citation-
3. Roberts C, Torgerson DJ. Understanding controlled trials: baseline imbalance in reports/
randomised controlled trials. BMJ. 1999 Jul 17;319(7203):185. 8. Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other
4. Altman DG, Doré CJ. Randomisation and baseline comparisons in clinical trials. (mis)uses of baseline data in clinical trials. Lancet. 2000 Mar 25;355(9209):
Lancet. 1990 Jan 20;335(8682):149-53. 1064-9.

You might also like