Radiol 2021210063

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ORIGINAL RESEARCH • THORACIC IMAGING

Deep Learning to Determine the Activity of Pulmonary


Tuberculosis on Chest Radiographs
Seowoo Lee, MD  •  Jae-Joon Yim, MD  •  Nakwon Kwak, MD  •  Yeon Joo Lee, MD  •  Jung-Kyu Lee, MD  • 
Ji Yeon Lee, MD  •  Ju Sang Kim, MD  •  Young Ae Kang, MD, PhD  •  Doosoo Jeon, MD, PhD  • 
Myoung-jin Jang, PhD  •  Jin Mo Goo, MD, PhD  •  Soon Ho Yoon, MD, PhD
From the Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, 101 Daehak-ro, Chongno-gu, Seoul
03080, Korea (S.L., J.M.G., S.H.Y.); Division of Pulmonary and Critical Medicine, Department of Internal Medicine, Seoul National University College of Medi-
cine, Seoul, Korea (J.J.Y., N.K.); Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University Bundang Hos-
pital, Seongnam, Korea (Y.J.L.); Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul Metropolitan Government–Seoul
National University Boramae Medical Center, Seoul, Korea (J.K.L.); Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Na-
tional Medical Center, Seoul, Korea (J.Y.L.); Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Incheon St. Mary’s Hos-
pital, College of Medicine, The Catholic University of Korea, Incheon, Korea (J.S.K.); Division of Pulmonary and Critical Care Medicine, Department of Internal
Medicine, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea (Y.A.K.); Department of Internal Medicine, Pusan National University Yang-
san Hospital, Pusan National University School of Medicine, Yangsan, Korea (D.J.); Medical Research Collaborating Center, Seoul National University Hospi-
tal, Seoul, Korea (M.J.J.); Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Korea (J.M.G.); and Department of Radiol-
ogy, UMass Memorial Medical Center, Worcester, Mass (S.H.Y.). Received January 9, 2021; revision requested March 9; revision received May 14; accepted June 3.
Address correspondence to S.H.Y. (e-mail: [email protected]).
Supported by Research Program 2019 funded by the Seoul National University College of Medicine Research Foundation.

Conflicts of interest are listed at the end of this article.

Radiology 2021; 301:435–442  •  https://doi.org/10.1148/radiol.2021210063  •   Content codes:

Background:  Determining the activity of pulmonary tuberculosis on chest radiographs is difficult.

Purpose:  To develop a deep learning model to identify active pulmonary tuberculosis on chest radiographs.

Materials and Methods:  Chest radiographs were retrospectively gathered from a multicenter consecutive cohort with pulmonary tu-
berculosis who were successfully treated between 2011 and 2017, along with normal radiographs to enrich a negative class. The
pretreatment and posttreatment radiographs were labeled as positive and negative classes, respectively. A neural network was trained
with those radiographs to calculate the probability of active versus healed tuberculosis. A single-center consecutive cohort (test set
1; 89 patients, 148 radiographs) and data from one multicenter randomized controlled trial (test set 2; 366 patients, 3774 radio-
graphs) were used to test the model. The area under the receiver operating characteristic curve (AUC) was used to evaluate the per-
formance of the model and of the four expert readers.

Results:  In total, 6654 pre- and posttreatment radiographs from 3327 patients (mean age 6 standard deviation, 55 years 6 19;
1884 men) with pulmonary tuberculosis and 3182 normal radiographs from as many patients (mean age, 53 years 6 14; 1629
men) were gathered. For test set 1, the model showed a higher AUC (0.83; 95% CI: 0.73, 0.89) than one pulmonologist (0.69;
95% CI: 0.61, 0.76; P , .001) and performed similarly to the other readers (AUC, 0.79–0.80; P = .14–.23). For 200 randomly se-
lected radiographs from test set 2, the model had a higher AUC (0.84) than the pulmonologists (0.71 and 0.74; P , .001 and .01,
respectively) and performed similarly to the radiologists (0.79 and 0.80; P = .08 and .06, respectively). The model output increased
by 0.30 on average with a higher degree of smear positivity (95% CI: 0.20, 0.39; P , .001) and decreased during treatment (base-
line, 3 months, and 6 months: 0.85, 0.51, and 0.26, respectively).

Conclusion:  A deep learning model performed similarly to radiologists for accurately determining the activity of pulmonary tubercu-
losis on chest radiographs; it also was able to follow posttreatment changes.
© RSNA, 2021

Online supplemental material is available for this article.

M ycobacterium tuberculosis is one of the top 10 causes


of death worldwide and remains a major public
health problem. According to the World Health Organi-
method for detecting tuberculosis (4). Chest radiographs
with a visual scoring system provide a high sensitivity
(up to 98%) but suffer from low specificity (76%) for
zation global tuberculosis report (1), 10 million people culture-confirmed pulmonary tuberculosis (4), and there
were estimated to be infected with tuberculosis in 2020. is substantial variability across readers. The reader must
Sputum smear microscopy after Ziehl-Neelsen staining also have sufficient experience (5).
and microbiologic culture are mandatory tests for the A deep neural network can outperform human experts
diagnosis and treatment monitoring of pulmonary tu- in detecting radiographic abnormalities of pulmonary tu-
berculosis. Polymerase chain reaction assay and T-cell– berculosis from normal chest radiographs. Hwang et al (6)
based tests are used for diagnosis but not for treatment showed state-of-the-art classification performance for screen-
monitoring (2,3). Alongside microbiologic confirma- ing active pulmonary tuberculosis with a deep learning
tion, chest radiography is the first choice of initial imag- algorithm, with area under the receiver operating charac-
ing study as a triage tool, diagnostic aid, and screening teristic curves (AUCs) between 0.98 and 1.00. However,
This copy is for personal use only. To order printed copies, contact [email protected]
Deep Learning to Determine the Activity of Pulmonary Tuberculosis on Chest Radiographs

World Health Organization guidelines; and (c) available pre-


Abbreviations and posttreatment chest radiographs.
AUC = area under the receiver operating characteristic curve, IQR = Pretreatment chest radiographs were obtained within the pe-
interquartile range
riod between 1 month before the initiation of antituberculosis
Summary medication and 1 week after the initiation. Posttreatment radio-
A deep learning model was able to determine the activity of tuberculo- graphs were obtained within the period between 1 week before
sis on chest radiographs, reflecting bacilli burden. the last medication and 1 month after the last medication. We
labeled the pretreatment radiographs as the positive class and
Key Results
the posttreatment radiographs as the negative class. We included
N In two test sets that included radiographs depicting active and
healed tuberculosis (test set 1, n = 148; test set 2 subset, n = 200), both posteroanterior and anteroposterior chest radiographs. We
a deep learning model (area under the receiver operating charac- excluded a total of 1127 patients; exclusion criteria are shown in
teristic curves [AUCs], 0.83 and 0.84, respectively) differentiated Table E1 (online).
active from healed tuberculosis on radiographs, with comparable To enrich the set of radiographs that did not have any ra-
performance to that of expert readers (AUCs, 0.69–0.80 [P , .001
to P = .23] and 0.71–0.80 [P , .001 to P = .08]). diologic sequelae with healed tuberculosis, we further collected
N A higher degree of sputum smear positivity increased model out- a similar number of normal chest radiographs from hospital A
put (odds ratio, 1.36 per 0.1 increase in the disease activity score), from the same period. A lack of radiographic abnormalities was
which decreased during treatment by 0.37 per month on average confirmed with corresponding CT scans within 1 month of the
(P , .001). radiographic examination. All of these radiographs were labeled
as the negative class.
the networks were not trained with chest radiographs in treated Designing and Training the Deep Neural Network
patients with tuberculosis who had varying residual fibrotic ab- After compiling 9836 chest radiographs from 6509 patients, we
normalities (7). Thus, little is known regarding how current net- randomly assigned radiographs for training and validation at a
works would perform in countries with a high burden of tuber- ratio of 7:3 (Table 2; Table E2 [online]). EfficientNet, which was
culosis. These networks would be most useful in these countries introduced by Tan and Le (10), was adopted as a base feature ex-
due to the shortage of trained imaging professionals. In these tractor. The network was built and trained by using open-source
countries, patients with treated or spontaneously healed tuber- software (TensorFlow, version 1.11.0; Keras, version 2.2.4) (Ap-
culosis are relatively common, and the frequency of radiologic pendix E1 [online]). The output of the neural network was a
sequelae is up to 90% (8). In addition, immigrants or asylum- value between 0 and 1 for the probability of active tuberculosis.
seekers from high- and low-incidence countries pose concerns The network was implemented on a public website (version 1.0;
for tuberculosis control. For immigration control, the initial https://radiologist.app/activetb).
screening and monitoring of active tuberculosis primarily de-
pends on chest radiographs (9). In such situations, assessing the Validation of the Neural Network
radiographic activity of pulmonary tuberculosis is more relevant We applied the model to two temporal test sets. For test set 1, we
than differentiating patients with active pulmonary tuberculosis included 89 patients with the same eligibility criteria from hospital
from healthy individuals. A in 2018 (Table E3 [online]), providing a total of 148 radio-
The purpose of this study was to develop a deep neural net- graphs: 80 pretreatment radiographs and 68 posttreatment
work to determine the activity of pulmonary tuberculosis on radiographs. For test set 2, we included a total of 3774 con-
chest radiographs and to compare its performance with that of secutive radiographs (349 pretreatment, 296 posttreatment, and
imaging professionals. In addition, we sought to determine if 3129 in-treatment radiographs) from 366 patients from a ran-
this method could be used to follow the therapeutic response on domized controlled trial from February 2014 through January
chest radiographs. 2017 (11) at three of the participating centers (hospitals A–C)
and another hospital (hospital G) (Table 2; Table E4 [online]).
Materials and Methods We excluded patients enrolled in that trial from the training and
This retrospective study was approved by the institutional review validation sets and test set 1 and assigned them to the test set 2
boards of the participating hospitals with a waiver of the require- to avoid patient overlap.
ment for informed consent from each patient.
Comparison with Expert Human Readers
Study Sample Two pulmonologists dedicated to managing tuberculosis
We retrospectively searched a consecutive cohort of patients (reader 1, J.J.Y., and reader 2, N.K., with 26 years and 10 years
with pulmonary tuberculosis who were successfully treated at of experience, respectively) and two expert thoracic radiologists
six Korean hospitals (hospitals A–F) between 2011 and 2017 (reader 3, S.H.Y., and reader 4, J.M.G., with 15 years and 30
(Fig 1, Table 1). We applied the following eligibility criteria: years of experience) participated as human readers. The read-
(a) patients who were confirmed to have active pulmonary ers could access information on each patient’s age and sex but
tuberculosis with use of sputum microscopy, culture, or poly- were blinded to disease activity information. They consecu-
merase chain reaction test; (b) the treatment outcome was tively and independently evaluated the 148 chest radiographs
“cured” or “completed” according to the definitions from the from test set 1 and 200 randomly selected pre- and posttreat-

436 radiology.rsna.org  n  Radiology: Volume 301: Number 2—November 2021


Lee et al

Figure 1:  Study diagram.

Table 1: Demographic Characteristics of Patients Treated for Pulmonary Tuberculosis in the Training and Validation Data Sets

Characteristic Total Hospital A Hospital B Hospital C Hospital D Hospital E Hospital F


No. of patients 3327 550 513 430 595 580 659
Sex
 Male 1884 321 274 266 339 331 353
 Female 1443 229 239 164 256 249 306
Age (y)* 55 6 19 63 6 18 55 6 19 52 6 19 54 6 19 54 6 19 53 6 19
Note.—Unless otherwise specified, data are numbers of patients.
* Data are means 6 standard deviations.

Table 2: Number of Chest Radiographs in the Training, Validation, and Test Data Sets

Negative Class

Data Set Total Positive Class (Pretreatment) Posttreatment Healthy


Training and validation data set 9836 3327 3327 3182
  Training data set 6883 2328 2328 2227
  Validation data set 2953 999 999 955
Test set 1 148 80 68 …
Test set 2* 3774 349 296 …
  Pre- and posttreatment only 645 349 296 …
  Randomly selected for human reviewers 200 100 100 …
* Test set 2 included 3129 radiographs obtained during treatment.

Radiology: Volume 301: Number 2—November 2021  n  radiology.rsna.org 437


Deep Learning to Determine the Activity of Pulmonary Tuberculosis on Chest Radiographs

Table 3: Areas under the Receiver Operating Characteristic Curve for the Deep Learning Model and Human Readers

Human Readers*

Data Set Model Reader 1 Reader 2 Reader 3 Reader 4


Test set 1 (n = 148) 0.83 (0.76, 0.89) 0.69 (0.61, 0.76) 0.79 (0.71, 0.85) 0.80 (0.73, 0.86) 0.79 (0.71, 0.85)
[P , .001] [P = .16] [P = .23] [P = .14]
Test set 2 (n = 645)† 0.82 (0.79, 0.85) … … … …
Randomly selected radiographs 0.84 (0.78, 0.89) 0.71 (0.64, 0.77) 0.74 (0.67, 0.80) 0.79 (0.72, 0.84) 0.80 (0.73, 0.85)
from test set 2 for human [P , .001] [P = .001] [P = .08] [P = .06]
reading (n = 200)
Note.—Data in parentheses are 95% CIs. P values in brackets indicate the result of comparison between neural network and each reader.
* Readers 1 and 2 were pulmonologists; readers 3 and 4 were thoracic radiologists.

Pre- and posttreatment radiographs only.

ment chest radiographs from test set 2. There was a 1-month


interval between readings of the two data sets. They evaluated
pulmonary tuberculosis activity on each radiograph with use of
a five-point ordinal scale derived from modified definitions of
active and inactive tuberculosis from the literature (12): defi-
nitely active tuberculosis, probably active tuberculosis, equivo-
cal activity, probably healed tuberculosis, and definitely healed
tuberculosis (Fig E1 [online]).

Statistical Analysis
The area under the curve at receiver operating characteristic
analyses was used to evaluate the performance of the deep neural
network and human readers in test sets 1 and 2. The empirical
method suggested by DeLong et al (13) was used to compare re-
ceiver operating characteristic curves with each other. The degree
of agreement across human readers was assessed with the Cohen
k coefficient. We evaluated the relationship between the neural Figure 2:  Graph shows receiver operating characteristic curves of the
network score and the readers’ grading and across the readers by neural network model and human readers in test set 2.
using Spearman correlation coefficients.
For test set 2 analyses, we assessed the correlation between
the network-driven logit-transformed disease activity score and
the degree of smear positivity on sputum smear test (Appendix 3182 patients from hospital A (mean age, 53 years 6 14; 1629
E1 [online]) results by using a linear mixed model with pa- men) for training and validation.
tients as a random effect to account for multiple measurements
per patient. The association of the activity score with the degree Performance of the Deep Neural Network
of smear positivity was evaluated for an ordinal logistic regres- For the test set 1, which consisted of 148 chest radiographs, the
sion model with use of generalized estimating equation analysis network had an AUC of 0.83 (95% CI: 0.76, 0.89; P , .001)
to account for multiple measurements per patient. The linear (Table 3; Fig E2 [online]). The two pulmonologists had AUCs of
mixed model was also used to evaluate whether the activity 0.69 (95% CI: 0.61, 0.76; P , .001) and 0.79 (95% CI: 0.71,
score decreased with treatment. 0.85; P , .001), and the two radiologists had AUCs of 0.80
P , .05 was considered indicative of statistically significant (95% CI: 0.73, 0.86; P , .001) and 0.79 (95% CI: 0.71, 0.85;
difference. Statistical analyses were performed by using SAS ver- P , .001). The neural network showed higher performance than
sion 9.4 (SAS Institute), SPSS version 23 (IBM), and MedCalc reader 1 (P , .001), but we did not find evidence of a difference
version 19.2.6 (MedCalc Software). in its performance from that of the other human readers (P =
.16, .23, and .14). The model’s and readers’ AUCs for smear-
Results positive cases were 0.89 and 0.83–0.93, respectively, and those
for smear-negative cases were 0.82 and 0.64–0.76. The Cohen
Characteristics of the Study Sample k values between pairs of pulmonologists and radiologists were
A total of 6654 pre- and posttreatment chest radiographs from 0.28 and 0.33, respectively, on a five-point scale. The k values
3327 patients with pulmonary tuberculosis (mean age 6 standard were 0.53 for the two pulmonologists and 0.54 for the two radi-
deviation, 55 years 6 19; 1884 men) were included in this study ologists when the results were dichotomized as active or healed
(Fig 1, Table 1), along with 3182 normal chest radiographs in tuberculosis (Table E5 [online]). The Spearman correlation coef-

438 radiology.rsna.org  n  Radiology: Volume 301: Number 2—November 2021


Lee et al

ficients between the network and each reader were between 0.59 .001). The neural network showed higher performance than
and 0.73, and those across the readers were between 0.55 and the pulmonologists (P , .001 and P = .001, respectively), but
0.80 (Table E5 [online]). we did not find evidence of a difference in performance be-
For test set 2, which comprised 645 chest radiographs when tween the model and the thoracic radiologists (P = .08 and
considering pre- and posttreatment radiographs only, the neu- P = .06, respectively) (Table 3, Fig 2). The Cohen k values be-
ral network’s AUC was 0.82 (95% CI: 0.79, 0.85; P , .001). tween pairs of pulmonologists and radiologists were 0.20 and
For 200 randomly selected chest radiographs from test set 2, the 0.33, respectively, on a five-point scale. The k values were 0.33
AUC was 0.84 (95% CI: 0.78, 0.89; P , .001) (Table 3, Fig 2). for the two pulmonologists and 0.58 for the two radiologists
For the same data set, the pulmonologists had AUCs of 0.71 when the results were dichotomized as active or healed tuber-
(95% CI: 0.64, 0.77; P , .001) and 0.74 (95% CI: 0.67, 0.80; culosis (Table E5 [online]). Spearman correlation coefficients
P , .001), and the radiologists had AUCs of 0.79 (95% CI: between the network and each reader were between 0.54 and
0.72, 0.84; P , .001) and 0.80 (95% CI: 0.73, 0.85; P , 0.76, and those across the readers were between 0.64 and 0.79
(Table E5 [online]).
Disease activity scores of 0.15 and 0.82 deter-
mined active pulmonary tuberculosis at sensitivity
and specificity of 95%, respectively, in test set 2.
The cutoff at 95% sensitivity provided a specific-
ity of 48.5% (95% CI: 37.1, 60.2) in test set 1
and 26.0% (95% CI: 21.3, 31.3) in test set 2. The
cutoff at 95% specificity produced a sensitivity
of 40.0% (95% CI: 30.0, 51.0) in test set 1 and
49.6% (95% CI: 44.3, 54.8) in test set 2. For ra-
diographs that the network misclassified in test set
2, the readers’ AUCs were below 0.3 (95% sen-
sitivity setting) and 0.5 (95% specificity setting),
indicating that on the radiographs misclassified by
the model, it was also difficult for human experts to
determine the disease activity (Table E6 [online]).
Model-derived disease activity scores according to
the Likert-based human rater assessment are sum-
Figure 3:  Box and whisker plot shows positive correlation between disease activity score and marized in Table E7 (online).
microscopic grade of Mycobacterium sputum smear test (negative, no bacilli detected on smear;
Relationship between the Disease Activity Score
6 [ie, equivocal positive], one to two bacilli found on 300 microscopic fields; 11, one to nine
and Sputum Smear Test
bacilli found per 100 fields; 21, one to nine bacilli found per 10 fields; 31, one to nine bacilli
per field; 41, more than nine bacilli found per field). The Spearman correlation coefficient of the
The median disease activity scores were 0.36 (in-
two variables was 0.34 (P , .001). Linear regression showed a sputum grade coefficient of 0.083
terquartile range [IQR], 0.15–0.74), 0.72 (IQR,
and a y-intercept of 0.53 (P , .001). Center lines represent median values; upper and lower bor-
0.44–0.83), 0.82 (IQR, 0.37–0.94), 0.70 (IQR,
ders of the boxes indicate 25th and 75th percentile values; upper and lower ends of vertical dotted
0.55–0.90), 0.85 (IQR, 0.66–0.98), and 0.89
lines show the 25th and 75th percentiles 6 1.5 interquartile range; dots represent outlier values.
(IQR, 0.81–0.99) for the negative
(no bacilli detected on smear), equiv-
Table 4: Temporal Changes in Model Output during Antituberculosis Treatment
ocal positive (one to two bacilli found
Month of on 300 microscopic fields), 11 (one
Treatment No. of Radiographs Mean Model Output* Median Model Output† to nine bacilli found per 100 fields),
0 783 0.72 6 0.28 0.85 (0.52–0.96) 21 (one to nine bacilli found per 10
1 1083 0.69 6 0.30 0.83 (0.43–0.95) fields), 31 (one to nine bacilli per
2 525 0.63 6 0.32 0.72 (0.31–0.93) field), and 41 (more than nine bacilli
3 384 0.52 6 0.31 0.51 (0.21–0.85) found per field) degrees of sputum
4 390 0.42 6 0.28 0.36 (0.17–0.66) smear positivity, respectively (Fig 3;
5 333 0.37 6 0.26 0.29 (0.16–0.55) Table E8 [online]). The linear mixed
6 294 0.33 6 0.23 0.26 (0.15–0.46) model showed that the log odds of
7 155 0.34 6 0.25 0.25 (0.13–0.49) the disease activity score increased by
8 21 0.27 6 0.21 0.18 (0.12–0.39) 0.30 on average (95% CI: 0.20, 0.39;
9 4 0.15 6 0.11 0.11 (0.09–0.22) P , .001) when the degree of spu-
tum smear positivity became higher.
Note.—Model output was a value between 0 and 1 for the probability of active tuberculosis.
The generalized estimating equation
* Data are means 6 standard deviations.
analysis of the ordinal logistic regres-

Data in parentheses are interquartile ranges.
sion model showed that the odds of

Radiology: Volume 301: Number 2—November 2021  n  radiology.rsna.org 439


Deep Learning to Determine the Activity of Pulmonary Tuberculosis on Chest Radiographs

having a higher degree of the sputum smear test increased by Disease Activity Scores during Antituberculosis Treatment
1.36 (95% CI: 1.22, 1.50; P , .001) per 0.1 increase in the In test set 2, the median disease activity scores were 0.85 (IQR,
disease activity score. The analysis also showed that the odds of 0.52–0.96), 0.51 (IQR, 0.21–0.85), 0.26 (IQR, 0.15–0.46) and
having a lower degree of sputum smear positivity increased by 0.11 (IQR, 0.09–0.22) at 0, 3, 6, and 9 months, respectively,
1.94 (95% CI: 1.63, 2.30; P , .001) per 1-month elapse of the since the start of treatment (Table 4). The linear mixed model
antituberculosis treatment. showed that the disease activity score gradually decreased by 0.37
on average (95% CI: 0.35, 0.39;
P , .001) on the logit scale
of disease activity score per 1
month of treatment duration
(Figs 4–6).

Discussion
Several deep learning ap-
proaches have been reported to
differentiate active pulmonary
tuberculosis from normal lungs
on chest radiographs (6,14,15),
but this study differs from prior
research by introducing radio-
graphs that showed postin-
flammatory sequelae, including
fibrotic changes, granulomas,
and volume loss of the lung. We
Figure 4:  Box and whisker plot shows a temporal change in disease activity score (on log-odds scale) during antituber- collected only one image pair
culosis treatment. Center lines represent median values; upper and lower borders of the boxes indicate 25th and 75th per- per patient (ie, one radiograph
centile values; upper and lower ends of vertical dotted lines show the 25th and 75th percentiles 6 1.5 interquartile range; showing active tuberculosis and
dots represent outlier values.

Figure 5:  Representative images in a 44-year-old woman with pulmonary tuberculosis in test set 1 (A) before treatment, (B) after treatment, and (C) with magnified
view of the lung lesion. (A) The model was able to accurately determine tuberculosis activity, producing a high disease activity score of 0.96. Pulmonologists interpreted
the radiograph as definitely active tuberculosis, and radiologists interpreted it as probably active tuberculosis. (B) After 6 months of antituberculosis medication, the disease
activity score decreased to a low value of 0.09. Pulmonologists interpreted the radiograph as probably healed tuberculosis and definitely healed tuberculosis, and radiolo-
gists interpreted it as probably healed tuberculosis and equivocal activity. Human experts mostly accurately determined tuberculosis activity, but there were some discrepan-
cies on posttreatment chest radiographs. (C) The model detected clustered tiny nodular opacities with fuzzy margins and suggested a high score of active tuberculosis. On
the other hand, the model did not respond to tiny nodular opacities with clear margins.

440 radiology.rsna.org  n  Radiology: Volume 301: Number 2—November 2021


Lee et al

Figure 6:  Representative consecutive images in a 53-year-old man during treatment for pulmonary tuberculosis at (A) 7 days (disease activity score, 0.94), (B) 49
days (disease activity score, 0.68), (C) 140 days (disease activity score, 0.38), and (D) 175 days (disease activity score, 0.18) since the start of antituberculosis treatment.
Disease activity scores calculated by the model decreased gradually during antituberculosis treatment. Heat maps (right panels) also showed a gradual decrease in inten-
sity and territories of active lung lesions.

one showing healed tuberculosis) in thousands of consecutive tuberculosis on radiographs (16). Multidrug- or extensively
multicenter patients to encompass the full spectrum of initial ra- drug-resistant pulmonary tuberculosis typically requires long-
diologic manifestations and sequelae in the data set while avoid- term treatment. Current proxy markers for treatment success
ing similar or redundant images. Notably, the training data set are time to sputum culture conversion and culture conversion
comprised a 7-year consecutive cohort from six hospitals; thus, status at 2 months or 6 months (17,18). However, these mark-
the data set included the full spectrum of typical and atypical ers have some limitations in either sensitivity or specificity, and
radiographic findings of active and healed pulmonary tuberculo- confirming negative conversion takes 2 and 8 weeks in liquid
sis that accompanied intrathoracic extrapulmonary tuberculosis and solid culture media, respectively. Given the finding that
(eg, tuberculous pleurisy) and underlying lung disease (eg, inter- lower network-derived disease activity scores were associated
stitial lung abnormalities). with a lower bacilli burden, monitoring the activity score on
This deep learning network may be advantageous in countries chest radiographs during treatment may supplement current
with a high burden of tuberculosis where spontaneously healed prognostic markers. The same approach can be attempted to
or previously treated patients with tuberculosis are prevalent. monitor the treatment response of nontuberculous mycobacte-
Those countries usually suffer from low incomes and a shortage rial lung diseases that share similar radiographic findings with
of expert imaging professionals (1). It is important to triage tuberculosis (19), for which objective quantitative tools for
patients with suspected tuberculosis at chest radiography into monitoring during treatment are scarce.
those who should be tested bacteriologically or with the Xpert This study had limitations. First, the network was validated
MTB/RIF assay (4), and any radiographic abnormalities are with a limited quantity of retrospectively collected data. Also,
generally regarded as a positive result, given that tuberculosis can the data for test set 2 were obtained from three of the four hos-
sometimes manifest atypically (4). If a deep learning network pitals where the training data were collected. Furthermore, the
can accurately differentiate active tuberculosis from healed tu- addition of normal radiographs to negative cases may inflate the
berculosis on radiographs, then this will provide more efficient performance of the network. Second, the network could not
triage for testing in limited-resource settings by decreasing un- separate patients with healed tuberculosis from healthy individu-
necessary testing in individuals less likely to have active tubercu- als. Radiographic abnormalities in active tuberculosis can com-
losis (Table E9 [online]). pletely resolve in up to 60% of patients with treated tuberculosis
Another potential application of the network is to monitor (20), and posttreatment fibrotic sequelae may not develop if
and determine the treatment response of intractable mycobacte- lesions are healed before forming necrosis (21). Differentiating
rial diseases. Specifically, the disease activity score was correlated patients with healed tuberculosis from healthy individuals on
with the grade of the sputum smear and decreased with the radiographs would be more difficult than differentiating active
duration of treatment. Those findings are in accordance with tuberculosis from healed tuberculosis, and further study is war-
previously reported results for the visual severity of pulmonary ranted for this task. Third, despite the decrease in the disease

Radiology: Volume 301: Number 2—November 2021  n  radiology.rsna.org 441


Deep Learning to Determine the Activity of Pulmonary Tuberculosis on Chest Radiographs

activity score during treatment, a considerable overlap existed. 3. Clifford V, He Y, Zufferey C, Connell T, Curtis N. Interferon gamma release
assays for monitoring the response to treatment for tuberculosis: a system-
Physicians would be needed to assess treatment response or tu- atic review. Tuberculosis (Edinb) 2015;95(6):639–650.
berculosis activity in a comprehensive evaluation of symptomatic 4. World Health Organization. Chest radiography in tuberculosis detection:
improvement, score change, and sputum or culture conversion. summary of current World Health Organization recommendations and
guidance on programmatic approaches. Geneva, Switzerland: World Health
Fourth, our cohort did not include patients with human immu- Organization, 2016.
nodeficiency virus and tuberculosis (22), so the performance of 5. Balabanova Y, Coker R, Fedorin I, et al. Variability in interpretation of chest
the network is unknown in those patients. Fifth, a high disease radiographs among Russian clinicians and implications for screening pro-
grammes: observational study. BMJ 2005;331(7513):379–382.
activity score with the network may result from parenchymal 6. Hwang EJ, Park S, Jin KN, et al. Development and validation of a deep
abnormalities of other pathogens in patients with healed tuber- learning-based automatic detection algorithm for active pulmonary tuber-
culosis, such as nontuberculous mycobacteria or aspergillosis culosis on chest radiographs. Clin Infect Dis 2019;69(5):739–747.
7. Kim HY, Song KS, Goo JM, Lee JS, Lee KS, Lim TH. Thoracic sequelae
(7,12). Sixth, we did not apply lung segmentation, which might and complications of tuberculosis. RadioGraphics 2001;21(4):839–858;
improve the network’s performance. Seventh, human readers and discussion 859–860.
our network tried to differentiate active tuberculosis from healed 8. Ali MG, Muhammad ZS, Shahzad T, Yaseen A, Irfan M. Post tuberculosis
sequelae in patients treated for tuberculosis: an observational study at a ter-
tuberculosis on a single radiograph. Human readers will differ- tiary care center of a high TB burden country. Eur Respir J 2018;52:PA2745.
entiate them more straightforwardly if previous radiographs are 9. Aldridge RW, Zenner D, White PJ, et al. Tuberculosis in migrants moving
available for comparison, and our network did not consider this. from high-incidence to low-incidence countries: a population-based cohort
study of 519 955 migrants screened before entry to England, Wales, and
Eighth, expert readers did not have any instructions for a harmo- Northern Ireland. Lancet 2016;388(10059):2510–2518.
nized interpretation of radiographs, and this might have resulted 10. Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional
in limited agreements across readers. Neural Networks. In: Kamalika C, Ruslan S, eds. Proceedings of the 36th
International Conference on Machine Learning. Proceedings of Machine
In conclusion, the deep neural network was able to determine Learning Research: PMLR Proceedings of Machine Learning Research,
the activity of tuberculosis on chest radiographs, reflecting ba- 2019; 6105–6114.
cilli burden and changes after treatment. The network may help 11. Lee JK, Lee JY, Kim DK, et al. Substitution of ethambutol with linezolid
during the intensive phase of treatment of pulmonary tuberculosis: a pro-
radiologically triage patients with active tuberculosis by exclud- spective, multicentre, randomised, open-label, phase 2 trial. Lancet Infect
ing healed tuberculosis in high-burden countries and may assist Dis 2019;19(1):46–55.
in monitoring the activity of mycobacterial diseases that require 12. Nachiappan AC, Rahbar K, Shi X, et al. Pulmonary tuberculosis: role of ra-
diology in diagnosis and management. RadioGraphics 2017;37(1):52–72.
long-term treatment. 13. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under
two or more correlated receiver operating characteristic curves: a nonpara-
Author contributions: Guarantors of integrity of entire study, S.L., Y.J.L., S.H.Y.; metric approach. Biometrics 1988;44(3):837–845.
study concepts/study design or data acquisition or data analysis/interpretation, all au- 14. Khan FA, Majidulla A, Tavaziva G, et  al. Chest x-ray analysis with deep
thors; manuscript drafting or manuscript revision for important intellectual content, learning-based software as a triage test for pulmonary tuberculosis: a pro-
all authors; approval of final version of submitted manuscript, all authors; agrees to spective study of diagnostic accuracy for culture-confirmed disease. Lancet
ensure any questions related to the work are appropriately resolved, all authors; litera- Digit Health 2020;2(11):e573–e581.
ture research, S.L., N.K., Y.J.L., J.Y.L., J.S.K., D.J., S.H.Y.; clinical studies, S.L., J.J.Y., 15. Pasa F, Golkov V, Pfeiffer F, Cremers D, Pfeiffer D. Efficient deep network
N.K., Y.J.L., J.K.L., J.Y.L., J.S.K., D.J., J.M.G., S.H.Y.; experimental studies, N.K.; architectures for fast chest x-ray tuberculosis screening and visualization. Sci
statistical analysis, S.L., J.J.Y., N.K., J.S.K., M.J.J., S.H.Y.; and manuscript editing, Rep 2019;9(1):6268.
S.L., J.J.Y., N.K., Y.J.L., J.S.K., Y.A.K., J.M.G., S.H.Y. 16. Ralph AP, Ardian M, Wiguna A, et al. A simple, valid, numerical score for
grading chest x-ray severity in adult smear-positive pulmonary tuberculosis.
Disclosures of Conflicts of Interest: S.L. Activities related to the present article: Thorax 2010;65(10):863–869.
will receive stock options from Medical IP for the potential commercialization of the 17. Kurbatova EV, Cegielski JP, Lienhardt C, et al. Sputum culture conversion
model presented in the article. Activities not related to the present article: disclosed no as a prognostic marker for end-of-treatment outcome in patients with mul-
relevant relationships. Other relationships: disclosed no relevant relationships. J.J.Y. tidrug-resistant tuberculosis: a secondary analysis of data from two observa-
disclosed no relevant relationships. N.K. disclosed no relevant relationships. Y.J.L. tional cohort studies. Lancet Respir Med 2015;3(3):201–209.
disclosed no relevant relationships. J.K.L. disclosed no relevant relationships. J.Y.L. 18. Rajpurkar P, Irvin J, Bagul A, et  al. MURA Dataset: Towards Radiolo-
disclosed no relevant relationships. J.S.K. disclosed no relevant relationships. Y.A.K. gist-Level Abnormality Detection in Musculoskeletal Radiographs. arX-
disclosed no relevant relationships. D.J. disclosed no relevant relationships. M.J.J. iv:1712.06957 2017. https://arxiv.org/abs/1712.06957. Published 2017.
disclosed no relevant relationships. J.M.G. Activities related to the present article: is Accessed July 19, 2020.
member of Radiology editorial board. Activities not related to the present article: re- 19. Koh WJ, Kwon OJ, Lee KS. Nontuberculous mycobacterial pulmonary dis-
ceived or will receive research grants from Infinitt Healthcare, Dongkook Lifescience, eases in immunocompetent patients. Korean J Radiol 2002;3(3):145–157.
and LG Electronics. Other relationships: disclosed no relevant relationships. S.H.Y. 20. Menon B, Nima G, Dogra V, Jha S. Evaluation of the radiological sequelae
Activities related to the present article: will receive stock options from Medical IP for after treatment completion in new cases of pulmonary, pleural, and medias-
the potential commercialization of the model presented in the article. Activities not re- tinal tuberculosis. Lung India 2015;32(3):241–245.
lated to the present article: is the chief medical officer of Medical IP and receives stock 21. Long R, Maycher B, Dhar A, Manfreda J, Hershfield E, Anthonisen N. Pul-
options as compensation. Other relationships: disclosed no relevant relationships. monary tuberculosis treated with directly observed therapy: serial changes in
lung structure and function. Chest 1998;113(4):933–943.
22. Lee CH, Hwang JY, Oh DK, et al. The burden and characteristics of tuber-
References culosis/human immunodeficiency virus (TB/HIV) in South Korea: a study
1. World Health Organization. Global tuberculosis report 2020. Geneva, from a population database and a survey. BMC Infect Dis 2010;10(1):66.
Switzerland: World Health Organization, 2020.
2. Friedrich SO, Rachow A, Saathoff E, et  al. Assessment of the sensitivity
and specificity of Xpert MTB/RIF assay as an early sputum biomarker of
response to tuberculosis treatment. Lancet Respir Med 2013;1(6):462–470.

442 radiology.rsna.org  n  Radiology: Volume 301: Number 2—November 2021

You might also like