Das, Santosh, Pal - 2020
Das, Santosh, Pal - 2020
Das, Santosh, Pal - 2020
https://doi.org/10.1007/s13246-020-00888-x
SCIENTIFIC PAPER
Abstract
Since December 2019, the Coronavirus Disease (COVID-19) pandemic has caused world-wide turmoil in a short period
of time, and the infection, caused by SARS-CoV-2, is spreading rapidly. AI-driven tools are used to identify Coronavirus
outbreaks as well as forecast their nature of spread, where imaging techniques are widely used, such as CT scans and chest
X-rays (CXRs). In this paper, motivated by the fact that X-ray imaging systems are more prevalent and cheaper than CT scan
systems, a deep learning-based Convolutional Neural Network (CNN) model, which we call Truncated Inception Net, is
proposed to screen COVID-19 positive CXRs from other non-COVID and/or healthy cases. To validate our proposal, six dif-
ferent types of datasets were employed by taking the following CXRs: COVID-19 positive, Pneumonia positive, Tuberculosis
positive, and healthy cases into account. The proposed model achieved an accuracy of 99.96% (AUC of 1.0) in classifying
COVID-19 positive cases from combined Pneumonia and healthy cases. Similarly, it achieved an accuracy of 99.92% (AUC
of 0.99) in classifying COVID-19 positive cases from combined Pneumonia, Tuberculosis, and healthy CXRs. To the best
of our knowledge, as of now, the achieved results outperform the existing AI-driven tools for screening COVID-19 using the
acquired CXRs, and proves the viability of using the proposed Truncated Inception Net as a screening tool.
Keywords COVID-19 · Deep learning · CNN · Inception net · Pneumonia · Tuberculosis · Chest X-rays
13
Vol.:(0123456789)
Physical and Engineering Sciences in Medicine
Fig. 1 COVID-19 pneumonia is characterized primarily by patches with white arrows. These annotations were made in the original data-
of Ground-Glass Opacity (GGO) and consolidations. In these CXRs, set, which solely attribute the clinical implications
the GGO areas, in early stages of COVID-19, are identified/annotated
have proposed the use of radiography techniques such skilled radiologists across the globe. Ever since the advent
as Computed Tomography (CT) scans and chest X-rays of digital radiography, CT scans and CXRs have been used
(CXRs) for COVID-19 screening. Early studies of COVID- globally, but the final interpretations are required to be done
19 positive patients have shown that their CT scans and by the experts, which could be time-consuming. Besides,
CXRs show identifiable abnormalities [5, 6], according authors in [5] have demonstrated through an experiment that
to which COVID-19 pneumonia was more likely to have sensitivity and specificity of screening COVID-19 positive
peripheral distribution, ground-glass opacity, fine reticular CT images fluctuate significantly when done by radiologists.
opacity, and vascular thickening, but less likely to have a Therefore, for mass screening, automated or specifically AI-
central+peripheral distribution, pleural effusion and lym- driven tools are necessary to be deployed across the globe,
phadenopathy. The idea is further strengthened by observ- particularly in resource-constrained regions.
ing the high correlation between the PCR and radiological For all healthcare and/or (bio)medical problems, for more
results, as demonstrated in [7]. In [8, 9], authors establish than a decade, deep learning has been a pinnacle in automa-
that the sensitivity of CT scan imaging outperforms the tion, especially in medical imaging. This motivates its use in
conventional PCR technique. The possible reasons may be the COVID-19 screening. Recently in [14], the author stated
immature development of nucleic acid detection technology, that to detect COVID-19, AI-driven tools are expected to
low patient viral load or improper clinical sampling, as stated have active-learning based cross-population train/test mod-
in [8]. According to [6, 10–12], the infestation of COVID- els that employ multitudinal and multimodal data. In this
19 can primarily be characterized through radiographs by work [14], the use of deep learning and image data, such
patches of Ground-Glass Opacity (GGO) and consolidation as CT scans and CXRs are addressed. Even though mul-
(See Fig. 1). Additionally, authors in [4, 13] have provided timodal data can improve confidence in decision-making,
a deep insight into the statistical growth of radiological cues for the COVID-19 case, such data are not available as of
in COVID-19 positive patients and the temporal stages of the today. Due to lack of data, COVID-19 reveals the limits of
disease’s growth in a host’s body, respectively. According to AI-driven tools. As soon as the COVID-19 pandemic came
these works, the disease can be temporally divided into four into play, several systems have been released to automate
phases: early phase, progressive phase, severe phase, and the screening procedure. Alibaba released an AI-based sys-
dissipative phase. During this time, the CT scores and num- tem to screen COVID-19 infection from CT scans, with an
ber of zones involved progress rapidly, which peak during accuracy of 96% [15]. Researchers in [16] proposed a Con-
illness days 6–11, followed by persistence of the high levels. volutional Neural Network (CNN) based technique to differ-
The major pattern of abnormality found after symptom onset entiate COVID-19 from Pneumonia and normal cases of CT
was ground-glass opacity, in accordance to the other con- scans, with a classification sensitivity of 0.90 for COVID-19.
temporary works. These paved the way to a faster screening In [17], researchers have used a 3-dimensional deep learn-
procedure than the PCR. Despite the radiological findings, ing model to segment infected regions from CT scans, fol-
there still exists a problem to use radiography as the primary lowed by an attention driven network to classify COVID-19
screening tool for COVID-19. The problem being the lack of from Influenza-A viral pneumonia and normal cases, and an
13
Physical and Engineering Sciences in Medicine
Fig. 2 (Above) The original architecture of the Inception Net V3 which is proposed in our work for screening COVID-19 positive
model, which was implemented for classifying images of the Ima- CXRs. The model retains 3 inception modules and 1 grid size reduc-
geNet database [27]. (Below) The Truncated Inception Net model, tion module from the original architecture (given above)
accuracy of 86.7% was reported. Further, researchers in [18] positive CXRs, made available by Cohen [28], we analyse
have also proposed image segmentation schemes to detect our model’s performance as a binary classifier, where addi-
lesions in CT scans. The prospective system is based on tional datasets are also taken into account. As the COVID-
the popular UNet++ architecture, and it produces bounding 19 dataset collection, alone, is not trivial, our experimen-
boxes around lesion regions. The system achieved a result of tal datasets that are composed of COVID-19, Pneumonia,
100% per patient and 94.34% per image sensitivity. Authors Tuberculosis, and healthy cases, which are sufficient to
in [19, 20] have proposed deep learning models to classify validate COVID-19 positive cases. For this, several publicly
COVID-19 positive CXRs from normal and Pneumonia available datasets, such as Pneumonia dataset [29], Tubercu-
cases, respectively. In [19], authors have investigated vari- losis datasets (Shenzhen, China, and Montgomery County,
ous standard CNN models, such as ResNet and AlexNet to USA) [30] are used to create six different experimental tests.
extract deep features, which was followed by a Support Vec- Such a varied dataset combinations for analyzing deep learn-
tor Machine (SVM) to classify COVID-19 positive cases. A ing models on COVID-19 CXRs has not been demonstrated
maximum accuracy of 95.38% was reported using ResNet50 in the literature so far. On the whole, through this work, we
as the feature extractor. The latter work proposed a tailored demonstrate that the Truncated Inception Net deep learning
CNN model using residual connections to achieve prom- model is a viable option for COVID-19 screening and it
ising results of 80% positive predictive value (PPV)[20]. outperforms the state-of-the-art results for COVID-19 posi-
Additionally, authors in [21] have demonstrated the use of tive cases, on the obtained and manually combined datasets.
ResNet50, Inception Net V3, and Inception-Resnet V2 for
identifying COVID-19 positive CXRs from healthy ones.
Similar experiments have also been demonstrated in [22], Proposed method
which focuses on the importance of transfer learning for
medical image classifications, in the context of COVID-19. Given that COVID-19 shows patches of GGO and consoli-
On the whole, researchers found that the use of chest radio- dation in CXRs [10], to detect COVID-19 positive cases,
graphs is better in terms of lung abnormalities screening [11, a multi-resolution analysis of the CXR images is deemed
14, 23–26]. With these, COVID-19 can be analyzed better useful. This functionality of analyzing spatial data at multi-
using radiological image data [7, 8]. ple resolutions is possessed by the Inception module, which
In this work, considering the fact that X-ray imaging sys- is the fundamental block of the popular Inception Net V3
tems are more prevalent and cheaper than CT scan systems, [27]. Additionally, considering the fact that the number of
we use deep learning to screen COVID-19 using CXRs. data samples of COVID-19 positive CXRs is very scarce at
We propose a CNN-based model, which we call Truncated present, a modified version of the Inception Net V3 model
Inception Net, solely based on the Inception Net V3 archi- [27] is proposed, which we call Truncated Inception Net.
tecture [27] (Fig. 2). Using the limited number of COVID-19 The Truncated Inception Net is primarily designed to avoid
13
Physical and Engineering Sciences in Medicine
Truncated architecture
Input
Image Since the original Inception Net V3 model was built for
the ImageNet database, the architectural complexity of the
model is well justified. On the contrary, the COVID-19 data-
set used in our work is immensely small compared to the
ImageNet database. Therefore, a truncation of the model is
1x1 1x1 3x3
Convolution Convolution Max pooling necessary to reduce the model complexity and eventually
1x1 the number of trainable parameters to prevent the model
3x3 5x5 1x1
Convolution from overfitting issues. The model was truncated at a point,
Convolution Convolution Convolution
where it retained 3 Inception modules and 1 grid size reduc-
tion block from the beginning, followed by the cascading
of a Max Pooling and a Global Average Pooling layer to
reduce the output dimension. The point of truncation was
Depth-wise
feature maps chosen experimentally, that yielded the best classification
(extracted) results. Finally, a fully connected layer was cascaded to per-
form the classification task. The truncation of the model
not only reduced training time and trainable parameters but
also reduced the processing time while evaluating a CXR to
Fig. 3 Block diagram: The block diagram presents the internal pipe- detect COVID-19 positive cases. As a consequence, it facili-
line of an Inception module, which forms the building block of the tates mass screening at an efficient speed and accuracy. For
InceptionNet. Multiple sized kernels (e.g. 3×3 and 5×5) are used to
convolve with the input image, to extract features of varied spatial more detailed information about computational efficiency,
resolution. Finally, the activation maps obtained from the parallel we refer to “Experimental setup” section. The architecture
computations are stacked depth-wise to form the output of the complete Truncated Inception Net can be visualized
in Fig. 2.
13
Physical and Engineering Sciences in Medicine
Collecting COVID-19 dataset is not trivial. We, however, The primary motivation behind constructing the vari-
collect a number of CXR benchmark collections (C1 to ous data combinations (D1 to D6) is to show the robust-
C3) from the literature (See Table 1). They help to show- ness of the Truncated Inception Net to detect COVID-19
case/validate the usability and robustness of our model. positive cases. Further, COVID-19 is believed to have a
13
Physical and Engineering Sciences in Medicine
Fig. 5 Few samples: a COVID-19, b Pneumonia, c Tuberculosis, and d) Healthy CXRs. GGO and consolidations are observed in COVID-19
CXRs
Table 1 Data collection (publicly available) a crucial element. In our datasets, the healthy CXRs in D1,
Collection # of positive cases # of
D2, and D3 are collected from different regions of the world.
negative Considering multiple combination of data from different
cases places can help develop cross-population train/test models.1
As an input to our model, CXR images were scaled down
C1: COVID-19 162 –
to the size of 224×224× 3 to match the input dimensions
C2: Pneumonia 4280 1583
of the Truncated Inception Net. Such a resizing can also
C3: TB (China) 342 340
reduce computational complexity. Since the pixels of the
TB (USA) 58 80
CXRs have bounded discrete values, the images were nor-
malized using the min-max scaling scheme. The choice is
further backed by the fact that standardization (zero-mean
close relationship with traditional Pneumonia. Therefore, unit variance) assumes the data to always have a Gauss-
a separate dataset (D5) was constructed to show whether ian distribution that might not always be the case. Addi-
our proposed model is able to differentiate COVID-19 posi- tionally, pixel intensities of COVID-19 features like GGO
tive cases from those traditional Pneumonia positive cases. patches and consolidation falls in the same range of bones in
Besides, CXRs of Tuberculosis manifestation were also
added in D6 to prove that our model is robust enough to
identify COVID-19 from other diseases like TB, Pneumonia, 1
Even though, our tests proved that the proposed model can be con-
and healthy CXRs. The robustness also lies in the way we sidered as a cross-population train/test model, it is beyond the scope
collect data, where regional variation can be considered as of the paper.
13
Physical and Engineering Sciences in Medicine
D1 162 – – – – 340 – –
D2 162 – – – – – – 80
D3 162 – – 1583 – – – –
D4 162 – – 1583 – 340 – 80
D5 162 – 4280 1583 – – – –
D6 162 – 4280 1583 342 340 58 80
CXRs, as demonstrated by quick preliminary experiments. where tp, fp, tn, and fn are the total number of true positives,
So histogram matching was also excluded as a normaliza- false positives, true negatives, and false negatives. The mean
tion scheme, since it decreases the signal to noise ratio in scores from all 10 folds were taken for each of the above
this scenario. metrics, to get the final results on a particular dataset.
In traditional deep learning tasks, a primary metric like
Validation protocol and evaluation metrics accuracy is sufficient to judge the performance of a deep
learning model as binary classifier. On the contrary, such
To validate our proposed model, a 10 fold cross-validation an assumption does not work well when considering imbal-
scheme was opted for training and testing purposes on all anced datasets. In such cases (like, in medical datasets), the
six datasets: D1–D6. The process of 10 fold cross-validation positive class to be predicted often has much lower data
works in the following way: say there are 100 data samples samples than the negative class. Therefore, accuracy would
in the total dataset. Then samples 1–10 are made a subset demonstrate a fairly high value even if the model labels all
and labelled as fold-1, samples 11–20 are labelled as fold-2 the test data to be negative. Therefore, special attention is
and so on. These creates 10 disjoint subsets of the original given to metrics like Sensitivity/Recall, Precision, and F1
dataset. Following this, the model to be tested is first trained score here.
on subsets 1–9 and tested on subset 10. Similarly, in the sec- In the context of COVID-19, the Sensitivity metric plays
ond trial the model is tested on subset 9 after being trained a very crucial role when deploying a model for screening
on the remaining subsets. This scheme ensures that the mod- patients in the early stages of a pandemic. Sensitivity meas-
el’s performance is not biased by the presence of outlier data ures the likelihood that the model would not miss classi-
samples in the training or testing datasets. Following this fying COVID-19 positive samples/patients. This prevents
strategy, each of the constructed datasets (D1–D6) was sub- the further spreading of the infection. Secondly, the preci-
divided into 10 subsets of almost equal number of data sam- sion measures the likelihood that a model would not make a
ples. Then the model was trained on 9 subsets and tested on mistake to classify normal patients as COVID-19 positive.
the remaining 1 subset. This process was repeated using each This metric becomes very important in the later stages of a
of the subsets as a test set for once. After the ten separate pandemic, when medical resources are limited, and they are
trials of training and testing, the result was averaged over the available only to the patients that are in need. Besides, F1
ten trials to assess the mean (and standard deviation) perfor- score is used to extract the combined performance score of
mance of the model on that dataset. This procedure can be a model, which is the harmonic mean of the precision and
well understood by observing the result pattern in Table 4, sensitivity of a model.
which tabulates the tenfold cross-validation performance of
the model on dataset D6. For each of the 10 folds, six differ-
ent evaluation metrics were employed: (a) Accuracy (ACC); Results and analysis
(b) Area under the ROC curve (AUC); (c) Sensitivity (SEN);
(d) Specificity (SPEC); (e) Precision (PREC); and (f) F1 Before providing quantitative results, we first provide activa-
score. These can be computed as follows: tion maps generated by our proposed model for a COVID-19
positive, Pneumonia positive, and TB positive CXR can be
ACC = (tp + tn )∕(tp + tn + fp + fn ), SEN = tp ∕(tp + fn ),
visualized in Fig. 6. It can be observed that in the prelimi-
SPEC = tn ∕(tn + fp ), PREC = tp ∕(tp + fp ), and nary layers (like Conv2D), the lung region is clearly vis-
F1 score = 2(( PREC × SEN )∕( PREC + SEN )), ible in the activation map for normal CXR, while the clarity
gradually decreases for pneumonia and further for COVID-
19 CXR. This corresponds to the growth of GGO patches
13
Physical and Engineering Sciences in Medicine
Fig. 6 Activation maps generated by the second convolutional layer taken from a COVID-19 positive, b Pneumonia positive, and c Tuber-
(Conv2D), the second inception module (Mixed1), and the grid-size culosis positive CXRs
reduction module (Mixed3) in our model. The input samples are
Table 3 Results: average ACC Dataset ACC AUC SEN SPEC PREC F1 Score
in %, AUC, SEN, SPEC, PREC,
and F1 score using 10 fold D1 99.50 ± 0.245 0.99 ± 0.053 0.96 ± 0.015 1.0 ± 0.0 1.0 ± 0.0 0.97 ± 0.007
cross-validation with 𝜎 standard
D2 94.04 ± 3.250 1.0 ± 0.0 0.88 ± 0.092 1.0 ± 0.0 1.0 ± 0.0 0.93 ± 0.045
deviation
D3 100 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0
D4 99.87 ± 0.019 0.99 ± 0.100 0.96 ± 0.020 1.0 ± 0.0 1.0 ± 0.0 0.97 ± 0.015
D5 99.96 ± 0.002 1.0 ± 0.0 0.98 ± 0.015 0.99 ± 0.100 0.98 ± 0.002 0.98 ± 0.013
D6 99.92 ± 0.100 0.99 ± 0.006 0.93 ± 0.096 1.0 ± 0.0 1.0 ± 0.0 0.96 ± 0.055
𝜇 98.77 0.99 0.95 0.99 0.99 0.97
𝜎 ± 0.702 ± 0.026 ± 0.039 ± 0.016 ± 0.001 ± 0.021
in COVID-19 positive CXRs. However, in the later layers achieved using tenfold cross-validation train-test scheme, on
of the model, the activation maps become more abstract, for each of the six different datasets: D1–D6. The experimental
which the terminal dense layer is used in the model to map results are well documented in Table 3. Also, standard devia-
these abstract feature representations to their corresponding tion (𝜎 ) is reported in all cases, whose very low value proves
labels (COVID+ or COVID−). the statistical robustness of our model. Our proposed Trun-
Following the validation protocol and evaluation metrics cated Inception Net model achieves a classification ACC,
mentioned in the previous “Validation protocol and evalua- AUC, SEN, SPEC, PREC, and F1 score of 99.96%, 1.0,
tion metrics” section, we present the mean scores that were 0.98, 0.99, 0.98, and 0.98, respectively, on the dataset: D5
13
Physical and Engineering Sciences in Medicine
Table 4 Results: ACC in %, Dataset-fold ACC AUC SEN SPEC PREC F1 Score
AUC, SEN, SPEC, PREC, and
F1 score for each fold of 10 D6-1 100 1.0 1.0 1.0 1.0 1.0
fold cross-validation on the D6
D6-2 99.85 0.99 0.86 1.0 1.0 0.92
dataset
D6-3 99.85 0.99 0.86 1.0 1.0 0.92
D6-4 100 1.0 1.0 1.0 1.0 1.0
D6-5 100 1.0 1.0 1.0 1.0 1.0
D6-6 100 1.0 1.0 1.0 1.0 1.0
D6-7 100 1.0 1.0 1.0 1.0 1.0
D6-8 99.85 0.99 0.86 1.0 1.0 0.92
D6-9 99.70 0.99 0.71 1.0 1.0 0.83
D6-10 100 1.0 1.0 1.0 1.0 1.0
𝜇 99.92 0.99 0.93 1.0 1.0 0.96
𝜎 ± 0.100 ± 0.006 ± 0.096 ± 0.0 ± 0.0 ± 0.055
Table 5 Comparison: computational time (in ms) between Inception Net V3 (full architecture) and Truncated Inception Net
10 Samples (randomly selected)
Model CXR1 CXR2 CXR3 CXR4 CXR5 CXR6 CXR7 CXR8 CXR9 CXR10 Mean (𝜇)
Inception Net V3 22.10 28.80 21.30 20.20 20.60 19.90 22.50 20.90 21.40 21.40 21.90±2.40
Truncated Inception Net 8.63 11.00 9.53 8.02 8.93 8.63 8.70 9.64 9.30 10.30 9.27±0.84
Ratio 2.56 2.61 2.23 2.52 2.30 2.30 2.58 2.16 2.30 2.07 2.36±0.18
13
Physical and Engineering Sciences in Medicine
comparison, ResNet50 and SVM [19], COVID-Net [20], explicit information regarding the stages of COVID-19 in
ResNet50 [21], and Inception Net V3 [21] are considered the individual CXRs. Further, the system is limited by its
even though they are not peer-reviewed research articles. capacity to localize the disease in the CXR. As seen in the
We have compared with these pieces works using exact same activation maps of deeper layers (Fig. 6), the model develops
evaluation metrics (ACC in %, AUC, SEN, SPEC, PREC, an intrinsic representation of the CXR features rather than
and F1 score) and nature of dataset. Like other works, we accurate spatial heat-map, which is then mapped to the out-
take COVID-19 positive and healthy CXRs from Pneu- put using a dense layer classifier. The mentioned goal can
monia dataset (D3 in our case), and used this result as a be achieved by using increased number of data or a deep
comparison to other works. Besides, since all models were learning model(s) that is/are pre-trained on a large number
based on deep learning models, we consider an essential ele- of CXRs of different diseases (like CheXNet [31]), which
ment i.e., number of parameters in our comparison. Table 6 shall be our future goal.
provides a complete comparative study. Not all the authors
reported AUC, SPEC, and F1 score. Note that, our model
was used as a binary classifier to screen a CXR as COVID+
or COVID-, while not all the stated works performed the Conclusion and future works
same. The mentioned results belong to the COVID+ positive
class, wherever multi-class classification was done instead In this work, we have proposed the Truncated Inception Net
of binary classification. On the whole, considering the num- deep learning model to detect COVID-19 positive patients
ber of parameters, the proposed Truncated Inception Net using chest X-rays. For validation, experimental tests were
outperforms all. Note that, since our model is the derivative done on six different experimental datasets by combining
of Inception Net V3 model, it is worth to compare between COVID-19 positive, Pneumonia positive, Tuberculosis posi-
them. We observe that, in both computational time (Table 5) tive, and healthy CXRs. The proposed model outperforms
and performance scores (Table 6), Truncated Inception Net the state-of-the-art results in detecting COVID-19 cases
performs better than Inception Net V3 [21]. For a better from non-COVID ones. Besides, considering the number of
understanding, three different performance scores: poor, the parameters used in our proposed model, it is computationally
best and average are considered from Table 3. This suggests efficient as compared to original Inception Net V3 model
that the Truncated Inception Net is not only more computa- and other works proposed in the literature. It is important
tionally effective in terms of training and usability, but also to note that the study has no clinical implications. Instead,
more flexible for the purpose of active learning [14]. we solely aimed to check whether the proposed Truncated
Even though the performed experiments validate that Inception Net could be used in detecting COVID-19 positive
the proposed deep learning model for screening COVID-19 cases using CXRs.
positive CXRs, it is important to understand that the system Observing the performance scores, the Truncated Incep-
relies completely on visual cues in the input data. Therefore, tion Net can serve as a milestone for screening COVID-19
in the early stages of COVID-19, when the radiologically under active-learning framework on multitudinal/multi-
observable cues have not yet developed, the system might modal data [14]. It also motivates to work on cross-popula-
fail to perform as stated. A detailed study on this is a scope tion train/test models. Integrating this model with CheXNet
for future work, where the input data shall be addition- model [31] will be our immediate plan, since ChexNet is
ally labeled with the stage of COVID-19 it depicts as well. primarily employed to analyze CXRs.
However, data acquired for this work did not contain any
13
Physical and Engineering Sciences in Medicine
13