Deep Learning COVID-19 Features On CXR Using Limited Training Data Sets
Deep Learning COVID-19 Features On CXR Using Limited Training Data Sets
8, AUGUST 2020
Abstract — Under the global pandemic of COVID-19, diagnosis and assessment of disease progression. Chest com-
the use of artificial intelligence to analyze chest X-ray (CXR) puted tomography (CT) screening on initial patient presenta-
image for COVID-19 diagnosis and patient triage is becom- tion showed outperforming sensitivity to RT-PCR [2] and even
ing important. Unfortunately, due to the emergent nature of
the COVID-19 pandemic, a systematic collection of CXR data confirmed COVID-19 infection on negative or weakly-positive
set for deep neural network training is difficult. To address RT-PCR cases [1]. Accordingly, recent COVID-19 radiological
this problem, here we propose a patch-based convolutional literature primarily focused on CT findings [2], [3]. However,
neural network approach with a relatively small number of as the prevalence of COVID-19 increases, the routine use
trainable parameters for COVID-19 diagnosis. The proposed of CT places a huge burden on radiology departments and
method is inspired by our statistical analysis of the potential
imaging biomarkers of the CXR radiographs. Experimen- potential infection of the CT suites; so the need to recognize
tal results show that our method achieves state-of-the-art COVID-19 features on chest X-ray (CXR) is increasing.
performance and provides clinically interpretable saliency Common chest X-ray findings reflect those described by
maps, which are useful for COVID-19 diagnosis and patient CT such as bilateral, peripheral consolidation and/or ground
triage. glass opacities [2], [3]. Specifically, Wong et al. [4] described
Index Terms — COVID-19, chest X-ray, deep learning, seg- frequent chest X-ray (CXR) appearances on COVID-19.
mentation, classification, saliency map. Unfortunately, it is reported that chest X-ray findings have
a lower sensitivity than initial RT-PCR testing (69% versus
I. I NTRODUCTION 91%, respectively) [4]. Despite this low sensitivity, CXR
abnormalities were detectable in 9% of patients whose initial
C ORONAVIRUS disease 2019 (COVID-19), caused by
severe acute respiratory syndrome coronavirus 2 (SARS-
CoV-2), has become global pandemic in less than four months
RT-PCR was negative.
As the COVID-19 pandemic threatens to overwhelm health-
care systems worldwide, CXR may be considered as a tool
since it was first reported, reaching a 3.3 million confirmed
for identifying COVID-19 if the diagnostic performance with
cases and 238,000 death as of May 2nd, 2020. Due to its
CXR is improved. Even if CXR cannot completely replace
highly contagious nature and lack of appropriate treatment and
the RT-PCR, the indication of pneumonia is a clinical mani-
vaccines, early detection of COVID-19 becomes increasingly
festation of patient at higher risk requiring hospitalization, so
important to prevent further spreading and to flatten the curve
CXR can be used for patient triage, determining the priority
for proper allocation of limited medical resources.
of patients’ treatments to help saturated healthcare system in
Currently, reverse transcription polymerase chain reaction
the pandemic situation. This is especially important, since
(RT-PCR), which detects viral nucleic acid, is the golden
the most frequent known etiology of community acquired
standard for COVID-19 diagnosis, but RT-PCR results using
pneumonia is bacterial infection in general [5]. By excluding
nasopharyngeal and throat swabs can be affected by sampling
these population by triage, limited medical resource can be
errors and low viral load [1]. Antigen tests may be fast, but
spared substantially.
have poor sensitivity.
Accordingly, deep learning (DL) approaches on chest X-ray
Since most COVID-19 infected patients were diagnosed
for COVID-19 classification have been actively explored [6]–
with pneumonia, radiological examinations may be useful for
[12]. Especially, Wang and Wong [6] proposed an open source
Manuscript received April 23, 2020; revised May 2, 2020; accepted deep convolutional neural network platform called COVID-
May 5, 2020. Date of publication May 8, 2020; date of current version Net that is tailored for the detection of COVID-19 cases
July 30, 2020. This work was supported by the National Research
Foundation of Korea under Grant NRF-2020R1A2B5B03001980. (Yujin from chest radiography images. They claimed that COVID-
Oh and Sangjoon Park are co-first authors.) (Corresponding author: Net can achieve good sensitivity for COVID-19 cases with
Jong Chul Ye.) 80% sensitivity.
The authors are with the Department of Bio and Brain Engi-
neering, Korea Advanced Institute of Science and Technology Inspired by this early success, in this paper we aim to further
(KAIST), Daejeon 34141, South Korea (e-mail: [email protected]; investigate deep convolutional neural network and evaluate
[email protected]; [email protected]). its feasibility for COVID-19 diagnosis. Unfortunately, under
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. the current public health emergency, it is difficult to collect
Digital Object Identifier 10.1109/TMI.2020.2993291 large set of well-curated data for training neural networks.
© IEEE 2020. This article is free to access and download, along with rights for full text and data mining, re-use and analysis
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2689
Fig. 1. Overall architecture of the proposed neural network approach: (a) Segmentation network, and (b) Classification network.
Therefore, one of the main focuses of this paper is to develop Furthermore, by extending the idea of the gradient-weighted
a neural network architecture that is suitable for training with class activation map (Grad-CAM) [14], yet another important
limited training data set, which can still produce radiologically contribution of this paper is a novel probabilistic Grad-CAM
interpretable results. Since most frequently observed distribu- that takes into account of patch-wise disease probability in
tion patterns of COVID-19 in CXR are bilateral involvement, generating global saliency map. The resulting class activation
peripheral distribution and ground-glass opacification (GGO) map clearly show the interpretable results that are well corre-
[13], a properly designed neural network should reflect such lated with radiological findings.
radiological findings.
To achieve this goal, we first investigate several imag- II. P ROPOSED N ETWORK A RCHITECTURE
ing biomarkers that are often used in CXR analysis, such The overall algorithmic framework is given in Fig. 1. The
as lung area intensity distribution, the cardio-thoracic ratio, CXR images are first pre-processed for data normalization,
etc. Our analysis found that there are statistically significant after which the pre-processed data are fed into a segmentation
differences in the patch-wise intensity distribution, which is network, from which lung areas can be extracted as shown
well-correlated with the radiological findings of the local- in Fig. 1(a). From the segmented lung area, classification
ized intensity variations in COVID-19 CXR. This findings network is used to classify the corresponding diseases using
lead us to propose a novel patch-based deep neural network a patch-by-patch training and inference, after which the final
architecture with random patch cropping, from which the decision is made based on the majority voting as shown in
final classification result are obtained by majority voting Fig. 1(b). Additionally, a probabilistic Grad-CAM saliency
from inference results at multiple patch locations. One of the map is calculated to provide an interpretable result. In the
important advantages of the proposed method is that due to following, each network is described in detail.
the patch training the network complexity is relative small
and multiple patches in each image can be used to augment
training data set, so that even with the limited data set the A. Segmentation Network
neural network can be trained efficiently without overfitting. Our segmentation network aims to extract lung and heart
By combining with our novel preprocessing step to normalize contour from the chest radiography images. We adopted an
the data heterogeneities and bias, we demonstrate that the extended fully convolutional (FC)-DenseNet103 to perform
proposed network architecture provides better sensitivity and semantic segmentation [15]. The training objective is
interpretability, compared to the existing COVID-Net [6] with
argmin L() (1)
the same data set.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2690 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
where L() is the cross entropy loss of multi-categorical a classification network. Classification network were imple-
semantic segmentation and denotes the network parameter mented in two different versions: global and local approaches.
set, which is composed of filter kernel weights and biases. In the global approach, the masked images were resized to
Specifically, L() is defined as 224 × 224, which were fed into the network. This approach is
focusing on the global appearance of the CXR data, and was
L() = − λs 1(y j = s)log( p (x j )) (2) used as a baseline network for comparison. In fact, many of
s j
the existing researches employs similar procedure [6]–[9].
where 1(·) is the indicator function, p (x j ) denotes the In the local patch-based approach, which is our proposed
softmax probability of the j -th pixel in a CXR image x, and y j method, the masked images were cropped randomly with a size
denotes the corresponding ground truth label. s denotes class of 224 × 224, and resulting patches were used as the network
category, i.e., s ∈ {background, heart, left lung, right lung}. inputs as shown in Fig. 1(b). In contrast to the global approach,
λs denotes weights given to each class category. various CXR images are resized to a much bigger 1024×1024
CXR images from different dataset resources may induce image for our classification network to reflect the original pixel
heterogeneity in their bits depth, compression type, image distribution better. Therefore, the segmentation mask from
size, acquisition condition, scanning protocol, postprocessing, Fig. 1(a) are upsampled to match the 1024 × 1024 image size.
etc. Therefore, we develop a universal preprocessing step To avoid cropping the patch from the empty area of the masked
for data normalization to ensure uniform intensity histogram image, the centers of patches were randomly selected within
throughout the entire dataset. The detailed preprocessing steps the lung areas. During the inference, K -number of patches
are as follows: were randomly acquired for each image to represent the entire
1) Data type casting (from uint8/uint16 to float32) attribute of the whole image. The number K was chosen to
2) Histogram equalization (gray level = [0, 255.0]) sufficiently cover all lung pixels multiple times. Then, each
3) Gamma correction (γ = 0.5) patch was fed into the network to generate network output,
4) Image resize (height, width = [256, 256]) and among K network output the final decision was made
Using the preprocessed data, we trained FC-DenseNet103 based on majority voting, i.e. the most frequently declared
[15] as our backbone segmentation network architecture. class were regarded as final output as depicted in Fig. 1(b). In
Network parameters were initialized by random distribution. this experiments, the number of random patches K was set to
We applied Adam optimizer [16] with an initial learning 100, which means that 100 patches were generated randomly
rate of 0.0001. Whenever training loss did not improve by from one whole image for majority voting.
certain criterion, the learning rate was reduced by factor For network training, pre-trained parameters from Ima-
10. We adopted early stopping strategy based on validation geNet are used for network weight initialization, after which
performance. Batch size was optimized to 2. We implemented the network was trained using the CXR data. As for opti-
the network using PyTorch library [17]. mization algorithm, Adam optimizer [16] with learning rate
of 0.00001 was applied. The network were trained for
100 epochs, but we adopted early stopping strategy based
B. Classification Network on validation performance metrics. The batch size of 16 was
The classification network aims to classify the chest X-ray used. We applied weight decay and L 1 regularization to
images according to the types of disease. We adopted the prevent overfitting problem. The classification network was
relatively simple ResNet-18 as the backbone of our classi- also implemented by Pytorch library.
fication algorithm for two reasons. The first is to prevent from
overfitting, since it is known that overfitting can occur when
using an overly complex model for small number of data. C. Probabilistic Grad-CAM Saliency Map Visualization
Secondly, we intended to do transfer learning with pre-trained
weights from ImageNet to compensate for the small training We investigate the interpretability of our approach by visu-
data set. We found that these strategy make the training stable alizing a saliency map. One of the most widely used saliency
even when the dataset size is small. map visualization methods is so-called gradient weighted class
The labels were divided into four classes: normal, bac- activation map (Grad-CAM) [14]. Specifically, the Grad-CAM
terial pneumonia, tuberculosis (TB), and viral pneumonia saliency map of the class c for a given input image x ∈ Rm×n
which includes the pneumonia caused by COVID-19 infec- is defined by
tion. We assigned the same class for viral pneumonia from
other viruses (e.g. SARS-cov or MERS-cov) with COVID-19,
since it is reported that they have similar radiologic features l (x) = U P σ
c
αkc f (x)
k
∈ Rm×n (3)
even challenging for the experienced radiologists [18]. Rather, k
we concentrated on more feasible work such as distinguishing
bacterial pneumonia or TB from viral pneumonia, which show where f k (x) ∈ Ru×v is the k-th feature channel at the last
considerable differences in the radiologic features and are still convolution layer (which corresponds to the layer 4 of ResNet-
useful for patient triage. 18 in our case), U P(·) denotes the upsampling operator from a
The pre-processed images were first masked with the lung u×v feature map to the m×n image, σ (·) is the rectified linear
masks from the segmentation networks, which are then fed into unit (ReLU) [14]. Here, αkc is the feature weighted parameter
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2691
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2692 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
TABLE IV
D ATASET FOR C OMPARISON W ITH COVID-N ET
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2693
TABLE V
CXR S EGMENTATION R ESULTS
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2694 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
TABLE VII
L UNG A REAS I NTENSITY VARIANCE S TATISTICS
TABLE VIII
L UNG A REAS I NTENSITY VARIANCE S TATISTICS BY E XCLUDING AP
S UPINE R ADIOGRAPHS
TABLE IX
C ARDIOTHORACIC R ATIO S TATISTICS
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2695
TABLE X
C LASSIFICATION R ESULTS F ROM THE G LOBAL A PPROACH AND THE
P ROPOSED PATCH -B ASED C LASSIFICATION N ETWORK
TABLE XI
S ENSITIVITY OF THE G LOBAL A PPROACH AND THE L OCAL
PATCH -B ASED C LASSIFICATION N ETWORK
global method and the (b) local patch-based method are shown
in Fig. 6. The proposed local patch-based approach showed
consistently better performance than global approach in all
metrics. In particular, as depicted in Table XI, our method
showed the sensitivity of 92.5% for COVID-19 and viruses,
which was acceptable performance as a screening method,
considering the fact that the sensitivity of COVID-19 diagnosis
by X-ray image is known to be 69% even for clinical experts
and that the current gold standard, RT-PCR, has sensitivity
of 91% [4]. Moreover, compared to the global approach,
the sensitivity of other classes are significantly high, which
confirms the efficacy of our method.
Fig. 5. Scatter plot (Left) and corresponding mean values with one E. Interpretability Using Saliency Map
standard deviation error bars. Each scatter depicts a patch which was
correctly classified to the ground truth label. All the parameter values Fig. 7 and Fig. 8 illustrate the examples of visualization of
were normalised to an arbitrary unit. Statistically differentiable classes saliency map. As shown in Fig. 7(a), the existing Grad-CAM
from the COVID-19 and viral cases (p < 0.001) are marked at each error
bar. method for global approach showed the limitation that it only
focuses on the broad main lesion so that it cannot properly
class showed distict lower intensity values ( p < 0.001 for differentiate multifocal lesions within the image. On the other
all) to other classes and highly intensity-variant characteris- hand, with the probabilistic Grad-CAM, multifocal GGOs
tics which can be represented as the large error bar. This and consolidations were visualized effectively by our local
result is in accordance with the result of lung area intensity patch-based approach as shown in Fig. 7(c), which was in
and intensity variance (Fig. 4(a), (b)). Intra-patch intensity consistent with the findings reported by clinical experts. In
distribution, however, showed no difference compared to the particular, when we compute the probabilistic Grad-CAM for
normal class ( p > 0.05). From these intra- and inter-patch the COVID-19 class using patient images from various classes
intensity distribution results, we can infer that intra-patch (i.e., normal, bacterial, TB, and COVID-19), a noticeable
variance, which represents local texture information, was not activation map was observed only in the COVID-19 patient
crucially informative, whereas the globally distributed multi- data set, whereas almost no activations were observed in
focal intensity change may be an important discriminating patients with other diseases and conditions as shown in Fig. 8.
feature for COVID-19 diagnosis, which is strongly correlated These results strongly support our claim that the probabilistic
with the radiological findings. Grad-CAM saliency map from our local patch-based approach
One common finding among the marker candidates was is more intuitive and interpretable compared to the existing
no difference between the COVID-19 and the viral case methods.
( p > 0.05 for all the markers), which is also correlated
with radiological findings [18]. Therefore, in the classification V. D ISCUSSION
network, the COVID-19 and viral classes were integrated into
A. COVID-19 Features on CXR
one class.
In the diagnosis of COVID-19, other diseases mimick-
ing COVID-19 pneumonia should be differentiated, including
D. Classification Performance community-acquired pneumonia such as streptococcus pneu-
The classification performances of the proposed method monia, mycoplasma and chlamydia related pneumonia, and
are provided in Table X. The confusion matrices for the (a) other coronavirus infections.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2696 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2697
Fig. 8. Examples of probabilistic Grad-CAM of COVID-19 class for (a) normal, (b) bacterial, (c) tuberculosis, and (d) COVID-19 pneumonia patients.
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2698 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
TABLE XII
C OMPARISON OF O UR M ETHOD W ITH COVID-N ET
TABLE XIII
L UNG S EGMENTATION R ESULTS C OMPARISON
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
OH et al.: DL COVID-19 FEATURES ON CXR USING LIMITED TRAINING DATA SETS 2699
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.
2700 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 39, NO. 8, AUGUST 2020
Authorized licensed use limited to: IEEE Xplore. Downloaded on October 30,2022 at 17:38:56 UTC from IEEE Xplore. Restrictions apply.