Das, Santosh, Pal - 2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Physical and Engineering Sciences in Medicine

https://doi.org/10.1007/s13246-020-00888-x

SCIENTIFIC PAPER

Truncated inception net: COVID‑19 outbreak screening using chest


X‑rays
Dipayan Das1 · K. C. Santosh2 · Umapada Pal3

Received: 1 May 2020 / Accepted: 11 June 2020


© Australasian College of Physical Scientists and Engineers in Medicine 2020

Abstract
Since December 2019, the Coronavirus Disease (COVID-19) pandemic has caused world-wide turmoil in a short period
of time, and the infection, caused by SARS-CoV-2, is spreading rapidly. AI-driven tools are used to identify Coronavirus
outbreaks as well as forecast their nature of spread, where imaging techniques are widely used, such as CT scans and chest
X-rays (CXRs). In this paper, motivated by the fact that X-ray imaging systems are more prevalent and cheaper than CT scan
systems, a deep learning-based Convolutional Neural Network (CNN) model, which we call Truncated Inception Net, is
proposed to screen COVID-19 positive CXRs from other non-COVID and/or healthy cases. To validate our proposal, six dif-
ferent types of datasets were employed by taking the following CXRs: COVID-19 positive, Pneumonia positive, Tuberculosis
positive, and healthy cases into account. The proposed model achieved an accuracy of 99.96% (AUC of 1.0) in classifying
COVID-19 positive cases from combined Pneumonia and healthy cases. Similarly, it achieved an accuracy of 99.92% (AUC
of 0.99) in classifying COVID-19 positive cases from combined Pneumonia, Tuberculosis, and healthy CXRs. To the best
of our knowledge, as of now, the achieved results outperform the existing AI-driven tools for screening COVID-19 using the
acquired CXRs, and proves the viability of using the proposed Truncated Inception Net as a screening tool.

Keywords COVID-19 · Deep learning · CNN · Inception net · Pneumonia · Tuberculosis · Chest X-rays

Introduction in 2019 in Wuhan, China, and has since spread globally,


resulting in the 2019–2020 Coronavirus pandemic [2]. With
Coronavirus Disease 2019 (COVID-19) is an infectious dis- more than 5.92 million confirmed cases of infection and
ease caused by Severe Acute Respiratory Syndrome Corona- 364,000 cases of death by the fifth month of its discovery
virus 2 (SARS-CoV-2) [1]. The disease was first identified (as on May 30, 2020), the SARS-CoV-2 continues to infect
people worldwide [3]. The virus is primarily transmitted
Note that the study has no clinical implications. Instead, we solely among individuals through respiratory droplets. Studies have
aimed to check whether the proposed Truncated Inception Net also shown that the virus can persist on surfaces which an
could possibly be used in detecting COVID-19 positive cases infected individual might have touched. As a consequence,
using CXRs.
by the end of March 2020, the spread of this virus had been
* K. C. Santosh described as exponential [3].
[email protected] The gold standard for the diagnosis and detection of
Dipayan Das COVID-19 is the polymerase chain reaction (PCR). It can
[email protected] detect the SARS-CoV-2 RNA from respiratory specimens
Umapada Pal through nasopharyngeal or oropharyngeal swabs. Despite
[email protected] the high sensitivity and accuracy of the PCR technique, the
method is highly time-consuming and resource-intensive.
1
Department of Electronics and Communication Engineering, Therefore, considering the unprecedented spread rate of
National Institute of Technology, Durgapur, India
the virus across the globe and the rapid temporal progres-
2
Department of Computer Science, University of South sion of the disease throughout a subject’s body [4], a faster
Dakota, Vermillion, SD 57069, USA
screening tool is necessary for COVID-19 outbreaks. As
3
Computer Vision and Pattern Recognition Unit, Indian an alternative to the traditional PCR technique, researchers
Statistical Institute, Kolkata, India

13
Vol.:(0123456789)
Physical and Engineering Sciences in Medicine

Fig. 1  COVID-19 pneumonia is characterized primarily by patches with white arrows. These annotations were made in the original data-
of Ground-Glass Opacity (GGO) and consolidations. In these CXRs, set, which solely attribute the clinical implications
the GGO areas, in early stages of COVID-19, are identified/annotated

have proposed the use of radiography techniques such skilled radiologists across the globe. Ever since the advent
as Computed Tomography (CT) scans and chest X-rays of digital radiography, CT scans and CXRs have been used
(CXRs) for COVID-19 screening. Early studies of COVID- globally, but the final interpretations are required to be done
19 positive patients have shown that their CT scans and by the experts, which could be time-consuming. Besides,
CXRs show identifiable abnormalities [5, 6], according authors in [5] have demonstrated through an experiment that
to which COVID-19 pneumonia was more likely to have sensitivity and specificity of screening COVID-19 positive
peripheral distribution, ground-glass opacity, fine reticular CT images fluctuate significantly when done by radiologists.
opacity, and vascular thickening, but less likely to have a Therefore, for mass screening, automated or specifically AI-
central+peripheral distribution, pleural effusion and lym- driven tools are necessary to be deployed across the globe,
phadenopathy. The idea is further strengthened by observ- particularly in resource-constrained regions.
ing the high correlation between the PCR and radiological For all healthcare and/or (bio)medical problems, for more
results, as demonstrated in [7]. In [8, 9], authors establish than a decade, deep learning has been a pinnacle in automa-
that the sensitivity of CT scan imaging outperforms the tion, especially in medical imaging. This motivates its use in
conventional PCR technique. The possible reasons may be the COVID-19 screening. Recently in [14], the author stated
immature development of nucleic acid detection technology, that to detect COVID-19, AI-driven tools are expected to
low patient viral load or improper clinical sampling, as stated have active-learning based cross-population train/test mod-
in [8]. According to [6, 10–12], the infestation of COVID- els that employ multitudinal and multimodal data. In this
19 can primarily be characterized through radiographs by work [14], the use of deep learning and image data, such
patches of Ground-Glass Opacity (GGO) and consolidation as CT scans and CXRs are addressed. Even though mul-
(See Fig. 1). Additionally, authors in [4, 13] have provided timodal data can improve confidence in decision-making,
a deep insight into the statistical growth of radiological cues for the COVID-19 case, such data are not available as of
in COVID-19 positive patients and the temporal stages of the today. Due to lack of data, COVID-19 reveals the limits of
disease’s growth in a host’s body, respectively. According to AI-driven tools. As soon as the COVID-19 pandemic came
these works, the disease can be temporally divided into four into play, several systems have been released to automate
phases: early phase, progressive phase, severe phase, and the screening procedure. Alibaba released an AI-based sys-
dissipative phase. During this time, the CT scores and num- tem to screen COVID-19 infection from CT scans, with an
ber of zones involved progress rapidly, which peak during accuracy of 96% [15]. Researchers in [16] proposed a Con-
illness days 6–11, followed by persistence of the high levels. volutional Neural Network (CNN) based technique to differ-
The major pattern of abnormality found after symptom onset entiate COVID-19 from Pneumonia and normal cases of CT
was ground-glass opacity, in accordance to the other con- scans, with a classification sensitivity of 0.90 for COVID-19.
temporary works. These paved the way to a faster screening In [17], researchers have used a 3-dimensional deep learn-
procedure than the PCR. Despite the radiological findings, ing model to segment infected regions from CT scans, fol-
there still exists a problem to use radiography as the primary lowed by an attention driven network to classify COVID-19
screening tool for COVID-19. The problem being the lack of from Influenza-A viral pneumonia and normal cases, and an

13
Physical and Engineering Sciences in Medicine

Fig. 2  (Above) The original architecture of the Inception Net V3 which is proposed in our work for screening COVID-19 positive
model, which was implemented for classifying images of the Ima- CXRs. The model retains 3 inception modules and 1 grid size reduc-
geNet database [27]. (Below) The Truncated Inception Net model, tion module from the original architecture (given above)

accuracy of 86.7% was reported. Further, researchers in [18] positive CXRs, made available by Cohen [28], we analyse
have also proposed image segmentation schemes to detect our model’s performance as a binary classifier, where addi-
lesions in CT scans. The prospective system is based on tional datasets are also taken into account. As the COVID-
the popular UNet++ architecture, and it produces bounding 19 dataset collection, alone, is not trivial, our experimen-
boxes around lesion regions. The system achieved a result of tal datasets that are composed of COVID-19, Pneumonia,
100% per patient and 94.34% per image sensitivity. Authors Tuberculosis, and healthy cases, which are sufficient to
in [19, 20] have proposed deep learning models to classify validate COVID-19 positive cases. For this, several publicly
COVID-19 positive CXRs from normal and Pneumonia available datasets, such as Pneumonia dataset [29], Tubercu-
cases, respectively. In [19], authors have investigated vari- losis datasets (Shenzhen, China, and Montgomery County,
ous standard CNN models, such as ResNet and AlexNet to USA) [30] are used to create six different experimental tests.
extract deep features, which was followed by a Support Vec- Such a varied dataset combinations for analyzing deep learn-
tor Machine (SVM) to classify COVID-19 positive cases. A ing models on COVID-19 CXRs has not been demonstrated
maximum accuracy of 95.38% was reported using ResNet50 in the literature so far. On the whole, through this work, we
as the feature extractor. The latter work proposed a tailored demonstrate that the Truncated Inception Net deep learning
CNN model using residual connections to achieve prom- model is a viable option for COVID-19 screening and it
ising results of 80% positive predictive value (PPV)[20]. outperforms the state-of-the-art results for COVID-19 posi-
Additionally, authors in [21] have demonstrated the use of tive cases, on the obtained and manually combined datasets.
ResNet50, Inception Net V3, and Inception-Resnet V2 for
identifying COVID-19 positive CXRs from healthy ones.
Similar experiments have also been demonstrated in [22], Proposed method
which focuses on the importance of transfer learning for
medical image classifications, in the context of COVID-19. Given that COVID-19 shows patches of GGO and consoli-
On the whole, researchers found that the use of chest radio- dation in CXRs [10], to detect COVID-19 positive cases,
graphs is better in terms of lung abnormalities screening [11, a multi-resolution analysis of the CXR images is deemed
14, 23–26]. With these, COVID-19 can be analyzed better useful. This functionality of analyzing spatial data at multi-
using radiological image data [7, 8]. ple resolutions is possessed by the Inception module, which
In this work, considering the fact that X-ray imaging sys- is the fundamental block of the popular Inception Net V3
tems are more prevalent and cheaper than CT scan systems, [27]. Additionally, considering the fact that the number of
we use deep learning to screen COVID-19 using CXRs. data samples of COVID-19 positive CXRs is very scarce at
We propose a CNN-based model, which we call Truncated present, a modified version of the Inception Net V3 model
Inception Net, solely based on the Inception Net V3 archi- [27] is proposed, which we call Truncated Inception Net.
tecture [27] (Fig. 2). Using the limited number of COVID-19 The Truncated Inception Net is primarily designed to avoid

13
Physical and Engineering Sciences in Medicine

Truncated architecture
Input
Image Since the original Inception Net V3 model was built for
the ImageNet database, the architectural complexity of the
model is well justified. On the contrary, the COVID-19 data-
set used in our work is immensely small compared to the
ImageNet database. Therefore, a truncation of the model is
1x1 1x1 3x3
Convolution Convolution Max pooling necessary to reduce the model complexity and eventually
1x1 the number of trainable parameters to prevent the model
3x3 5x5 1x1
Convolution from overfitting issues. The model was truncated at a point,
Convolution Convolution Convolution
where it retained 3 Inception modules and 1 grid size reduc-
tion block from the beginning, followed by the cascading
of a Max Pooling and a Global Average Pooling layer to
reduce the output dimension. The point of truncation was
Depth-wise
feature maps chosen experimentally, that yielded the best classification
(extracted) results. Finally, a fully connected layer was cascaded to per-
form the classification task. The truncation of the model
not only reduced training time and trainable parameters but
also reduced the processing time while evaluating a CXR to
Fig. 3  Block diagram: The block diagram presents the internal pipe- detect COVID-19 positive cases. As a consequence, it facili-
line of an Inception module, which forms the building block of the tates mass screening at an efficient speed and accuracy. For
InceptionNet. Multiple sized kernels (e.g. 3×3 and 5×5) are used to
convolve with the input image, to extract features of varied spatial more detailed information about computational efficiency,
resolution. Finally, the activation maps obtained from the parallel we refer to “Experimental setup” section. The architecture
computations are stacked depth-wise to form the output of the complete Truncated Inception Net can be visualized
in Fig. 2.

possible overfitting due to the lack of COVID-19 positive


samples. Further, the Truncated Inception Net is computa- Adaptive learning rate protocol
tionally efficient. Originally, the Inception Net model was
used on the ImageNet database, consisting of more than 1.3 For training the Truncated Inception Net, a dynamic pro-
million images from 1000 different classes. In what follows, tocol was used to control the learning rate at each epoch
the different aspects of the Truncated Inception Net model because of the following reasons: (a) A constant learning
architecture and implementation are discussed. rate of high value often leads to divergence of the weight
vectors’ trajectory from global minimum in the loss function
Inception module space during the optimization process, and (b) an arbitrar-
ily chosen low value often takes longer period of training
The multi-resolution analysis capability of the Inception time. Therefore, a dynamic procedure was opted, where the
module comes from its inherent architecture. In traditional learning rate was initialized with a value of 0.001 and then
CNN models, kernels of specific receptive field sizes are it was reduced by a factor of 2 every time the validation loss
used in specific layers to capture features through the use remained same or did not decrease for more than 3 epochs.
of convolution. On the contrary, in an inception module, In this case, the factor of 3 epochs is known as the patience
kernels of various receptive field sizes (1×1, 3×3, and 5×5) factor. The process is well explained through the diagram in
are used in parallel to extract features of varying sizes. The Fig. 4. This procedure yielded the behavior of reducing the
extracted parallel features are then stacked depth-wise to velocity of approach when the weight vectors are close to the
form the output of the inception module. A 3×3 max pooled global minimum, to prevent overshooting. Note that, the ini-
version of the input is also stacked along with the previous tial value of 0.001, reduction factor of 2, and patience factor
feature maps. The combined output of the inception module of 3 epochs were chosen after experimenting with multiple
provides rich feature maps of varying perspectives as inputs values and monitoring the classification performance. For
to the next convolutional layer of the CNN. Such a property the model and datasets considered in this work, the afore-
of the Inception module explains its unique performance in mentioned values proved to have the best results. Additional
medical imaging, and in our case, on the COVID-19 CXRs. optimization procedures like Grid Search, Particle Swarm
For better understanding, the schematic representation of the Optimization, or Genetic Algorithm can be used for the opti-
Inception module is presented in Fig. 3. mization of the same.

13
Physical and Engineering Sciences in Medicine

C1: COVID-19 collection[28] is an open-source collection


that is made available and maintained by Joseph Paul
Cohen. At the time of the present study, it is composed
of 162 COVID-19 positive CXRs, along with some
other CXRs of diseases like MERS, SARS, and viral
Pneumonia. For our purpose, only COVID-19 positive
posteroanterior CXRs are considered.
C2: Pneumonia collection[29] (Kaggle CXR collection) is
composed of 5863 CXRs. Out of this, 1583 CXRs are
normal or healthy CXRs and the remaining 4280 CXRs
show various manifestations of viral and bacterial Pneu-
monia.
C3: Two publicly available Tuberculosis (TB) collections[30]
are considered: (a) Shenzhen, China and (b) Montgom-
ery County, USA. These CXR benchmark collections
Fig. 4  The learning rate is reduced every time the validation loss does
were made available by the U.S. National Library of
not improve for more than the specified patience factor, which is 3 Medicine, National Institutes of Health (NIH). The
epochs (empirically designed) Shenzhen, China collection is composed of 340 normal
cases and 342 positive cases of TB. The Montgomery
County, USA collection is composed of 80 normal
CXRs and 58 TB positive CXRs.
Transfer learning
A few samples from the aforementioned collections are
Deep learning models are inherently data intensive. How- visualized in Fig. 5. Using aforementioned collections, we
ever, since the size of the COVID-19 dataset is very small constructed six different combinations of data to train and
compared to standard datasets used in deep learning, the validate our model. As provided in Table 2, these six differ-
concept of transfer learning can be applied to augment the ent combinations of datasets (D1 to D6) are enlisted below:
decision-making process. Transfer learning uses the con-
cept of transferring knowledge from one domain to another D1: In dataset D1, 162 COVID-19 positive CXRs and 340
by using trained weights from the previous domain. Tra- healthy CXRs from the Shenzhen, China collections are
ditionally in a CNN, the weight matrices of several layers considered.
from the beginning are frozen while training on the sec- D2: For this dataset D2, 162 COVID-19 positive CXRs and
ondary domain, and only the remaining layers are fine- 80 healthy CXRs from the Montgomery County, USA
tuned. This process works well when both the domains are considered.
share an overlapping region in the low-level features. In D3: D3 consists of 162 COVID-19 positive CXRs and 1583
our case, since the ImageNet and the COVID-19 datasets healthy CXRs from the Pneumonia collections are con-
belong to non-overlapping domains, the trained weights sidered.
from the ImageNet dataset were used to initialize the D2: D4 contains 162 COVID-19 positive CXRs and 2003
weights of our model, but none of them were frozen. This healthy CXRs, combined from the Shenzhen, Montgom-
kept all the layers initialized with relatively more mean- ery and Pneumonia collections are considered.
ingful weights than random initialization, and subject to D5: In dataset D5, 162 COVID-19 positive CXRs, 4280
learning during the training procedure. Pneumonia positive CXRs and 1583 healthy CXRs from
the Pneumonia collections are considered.
D6: In dataset D6, 162 COVID-19 positive CXRs and 6683
non-COVID CXRs (comprising of 4280 Pneumonia
Experimental setup positive, 400 TB positive and 2003 healthy CXRs) are
considered.
Datasets

Collecting COVID-19 dataset is not trivial. We, however, The primary motivation behind constructing the vari-
collect a number of CXR benchmark collections (C1 to ous data combinations (D1 to D6) is to show the robust-
C3) from the literature (See Table 1). They help to show- ness of the Truncated Inception Net to detect COVID-19
case/validate the usability and robustness of our model. positive cases. Further, COVID-19 is believed to have a

13
Physical and Engineering Sciences in Medicine

Fig. 5  Few samples: a COVID-19, b Pneumonia, c Tuberculosis, and d) Healthy CXRs. GGO and consolidations are observed in COVID-19
CXRs

Table 1  Data collection (publicly available) a crucial element. In our datasets, the healthy CXRs in D1,
Collection # of positive cases # of
D2, and D3 are collected from different regions of the world.
negative Considering multiple combination of data from different
cases places can help develop cross-population train/test models.1
As an input to our model, CXR images were scaled down
C1: COVID-19 162 –
to the size of 224×224× 3 to match the input dimensions
C2: Pneumonia 4280 1583
of the Truncated Inception Net. Such a resizing can also
C3: TB (China) 342 340
reduce computational complexity. Since the pixels of the
TB (USA) 58 80
CXRs have bounded discrete values, the images were nor-
malized using the min-max scaling scheme. The choice is
further backed by the fact that standardization (zero-mean
close relationship with traditional Pneumonia. Therefore, unit variance) assumes the data to always have a Gauss-
a separate dataset (D5) was constructed to show whether ian distribution that might not always be the case. Addi-
our proposed model is able to differentiate COVID-19 posi- tionally, pixel intensities of COVID-19 features like GGO
tive cases from those traditional Pneumonia positive cases. patches and consolidation falls in the same range of bones in
Besides, CXRs of Tuberculosis manifestation were also
added in D6 to prove that our model is robust enough to
identify COVID-19 from other diseases like TB, Pneumonia, 1
Even though, our tests proved that the proposed model can be con-
and healthy CXRs. The robustness also lies in the way we sidered as a cross-population train/test model, it is beyond the scope
collect data, where regional variation can be considered as of the paper.

13
Physical and Engineering Sciences in Medicine

Table 2  Experimental datasets Dataset COVID-19 Pneumonia TB (China) TB (USA)


using Table 1
+ve −ve +ve −ve +ve −ve +ve −ve

D1 162 – – – – 340 – –
D2 162 – – – – – – 80
D3 162 – – 1583 – – – –
D4 162 – – 1583 – 340 – 80
D5 162 – 4280 1583 – – – –
D6 162 – 4280 1583 342 340 58 80

Index: +ve = positive cases and −ve = negative/healthy cases

CXRs, as demonstrated by quick preliminary experiments. where tp, fp, tn, and fn are the total number of true positives,
So histogram matching was also excluded as a normaliza- false positives, true negatives, and false negatives. The mean
tion scheme, since it decreases the signal to noise ratio in scores from all 10 folds were taken for each of the above
this scenario. metrics, to get the final results on a particular dataset.
In traditional deep learning tasks, a primary metric like
Validation protocol and evaluation metrics accuracy is sufficient to judge the performance of a deep
learning model as binary classifier. On the contrary, such
To validate our proposed model, a 10 fold cross-validation an assumption does not work well when considering imbal-
scheme was opted for training and testing purposes on all anced datasets. In such cases (like, in medical datasets), the
six datasets: D1–D6. The process of 10 fold cross-validation positive class to be predicted often has much lower data
works in the following way: say there are 100 data samples samples than the negative class. Therefore, accuracy would
in the total dataset. Then samples 1–10 are made a subset demonstrate a fairly high value even if the model labels all
and labelled as fold-1, samples 11–20 are labelled as fold-2 the test data to be negative. Therefore, special attention is
and so on. These creates 10 disjoint subsets of the original given to metrics like Sensitivity/Recall, Precision, and F1
dataset. Following this, the model to be tested is first trained score here.
on subsets 1–9 and tested on subset 10. Similarly, in the sec- In the context of COVID-19, the Sensitivity metric plays
ond trial the model is tested on subset 9 after being trained a very crucial role when deploying a model for screening
on the remaining subsets. This scheme ensures that the mod- patients in the early stages of a pandemic. Sensitivity meas-
el’s performance is not biased by the presence of outlier data ures the likelihood that the model would not miss classi-
samples in the training or testing datasets. Following this fying COVID-19 positive samples/patients. This prevents
strategy, each of the constructed datasets (D1–D6) was sub- the further spreading of the infection. Secondly, the preci-
divided into 10 subsets of almost equal number of data sam- sion measures the likelihood that a model would not make a
ples. Then the model was trained on 9 subsets and tested on mistake to classify normal patients as COVID-19 positive.
the remaining 1 subset. This process was repeated using each This metric becomes very important in the later stages of a
of the subsets as a test set for once. After the ten separate pandemic, when medical resources are limited, and they are
trials of training and testing, the result was averaged over the available only to the patients that are in need. Besides, F1
ten trials to assess the mean (and standard deviation) perfor- score is used to extract the combined performance score of
mance of the model on that dataset. This procedure can be a model, which is the harmonic mean of the precision and
well understood by observing the result pattern in Table 4, sensitivity of a model.
which tabulates the tenfold cross-validation performance of
the model on dataset D6. For each of the 10 folds, six differ-
ent evaluation metrics were employed: (a) Accuracy (ACC); Results and analysis
(b) Area under the ROC curve (AUC); (c) Sensitivity (SEN);
(d) Specificity (SPEC); (e) Precision (PREC); and (f) F1 Before providing quantitative results, we first provide activa-
score. These can be computed as follows: tion maps generated by our proposed model for a COVID-19
positive, Pneumonia positive, and TB positive CXR can be
ACC = (tp + tn )∕(tp + tn + fp + fn ), SEN = tp ∕(tp + fn ),
visualized in Fig. 6. It can be observed that in the prelimi-
SPEC = tn ∕(tn + fp ), PREC = tp ∕(tp + fp ), and nary layers (like Conv2D), the lung region is clearly vis-
F1 score = 2(( PREC × SEN )∕( PREC + SEN )), ible in the activation map for normal CXR, while the clarity
gradually decreases for pneumonia and further for COVID-
19 CXR. This corresponds to the growth of GGO patches

13
Physical and Engineering Sciences in Medicine

Fig. 6  Activation maps generated by the second convolutional layer taken from a COVID-19 positive, b Pneumonia positive, and c Tuber-
(Conv2D), the second inception module (Mixed1), and the grid-size culosis positive CXRs
reduction module (Mixed3) in our model. The input samples are

Table 3  Results: average ACC Dataset ACC​ AUC​ SEN SPEC PREC F1 Score
in %, AUC, SEN, SPEC, PREC,
and F1 score using 10 fold D1 99.50 ± 0.245 0.99 ± 0.053 0.96 ± 0.015 1.0 ± 0.0 1.0 ± 0.0 0.97 ± 0.007
cross-validation with 𝜎 standard
D2 94.04 ± 3.250 1.0 ± 0.0 0.88 ± 0.092 1.0 ± 0.0 1.0 ± 0.0 0.93 ± 0.045
deviation
D3 100 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0 1.0 ± 0.0
D4 99.87 ± 0.019 0.99 ± 0.100 0.96 ± 0.020 1.0 ± 0.0 1.0 ± 0.0 0.97 ± 0.015
D5 99.96 ± 0.002 1.0 ± 0.0 0.98 ± 0.015 0.99 ± 0.100 0.98 ± 0.002 0.98 ± 0.013
D6 99.92 ± 0.100 0.99 ± 0.006 0.93 ± 0.096 1.0 ± 0.0 1.0 ± 0.0 0.96 ± 0.055
𝜇 98.77 0.99 0.95 0.99 0.99 0.97
𝜎 ± 0.702 ± 0.026 ± 0.039 ± 0.016 ± 0.001 ± 0.021

in COVID-19 positive CXRs. However, in the later layers achieved using tenfold cross-validation train-test scheme, on
of the model, the activation maps become more abstract, for each of the six different datasets: D1–D6. The experimental
which the terminal dense layer is used in the model to map results are well documented in Table 3. Also, standard devia-
these abstract feature representations to their corresponding tion (𝜎 ) is reported in all cases, whose very low value proves
labels (COVID+ or COVID−). the statistical robustness of our model. Our proposed Trun-
Following the validation protocol and evaluation metrics cated Inception Net model achieves a classification ACC,
mentioned in the previous “Validation protocol and evalua- AUC, SEN, SPEC, PREC, and F1 score of 99.96%, 1.0,
tion metrics” section, we present the mean scores that were 0.98, 0.99, 0.98, and 0.98, respectively, on the dataset: D5

13
Physical and Engineering Sciences in Medicine

on these datasets. For a better understanding of the results, six


different ROC curves are shown in Fig. 7; one for each dataset,
starting from D1 to D6.
Additionally, since for every dataset we computed tenfold
cross-validation, for better understanding of how average
scores and their standard deviation were computed, the results
obtained from each fold on the dataset: D6 are provided in
Table 4. Besides, the proposed Truncated Inception Net model
performs 2.3 ± 0.18 times on an average faster than Inception
Net V3 model. In Table 5, computational times (by taking 10
different CXR samples as input) are used to demonstrate the
differences between them. The primary reason being the large
number of parameters in the original Inception Net V3 model.
Precisely, this model contains more than 21.7 million trainable
parameters in contrast our model which contains only 2.1 mil-
Fig. 7  The ROC curves obtained for the six different datasets D1–D6.
The black dotted curve represents the ROC of a random guessing lion trainable parameters, making it a better choice for train-
classifier ing on small datasets and also for active learning. Therefore,
for mass screening in resource-constrained areas, employing

Table 4  Results: ACC in %, Dataset-fold ACC​ AUC​ SEN SPEC PREC F1 Score
AUC, SEN, SPEC, PREC, and
F1 score for each fold of 10 D6-1 100 1.0 1.0 1.0 1.0 1.0
fold cross-validation on the D6
D6-2 99.85 0.99 0.86 1.0 1.0 0.92
dataset
D6-3 99.85 0.99 0.86 1.0 1.0 0.92
D6-4 100 1.0 1.0 1.0 1.0 1.0
D6-5 100 1.0 1.0 1.0 1.0 1.0
D6-6 100 1.0 1.0 1.0 1.0 1.0
D6-7 100 1.0 1.0 1.0 1.0 1.0
D6-8 99.85 0.99 0.86 1.0 1.0 0.92
D6-9 99.70 0.99 0.71 1.0 1.0 0.83
D6-10 100 1.0 1.0 1.0 1.0 1.0
𝜇 99.92 0.99 0.93 1.0 1.0 0.96
𝜎 ± 0.100 ± 0.006 ± 0.096 ± 0.0 ± 0.0 ± 0.055

Table 5  Comparison: computational time (in ms) between Inception Net V3 (full architecture) and Truncated Inception Net
10 Samples (randomly selected)
Model CXR1 CXR2 CXR3 CXR4 CXR5 CXR6 CXR7 CXR8 CXR9 CXR10 Mean (𝜇)

Inception Net V3 22.10 28.80 21.30 20.20 20.60 19.90 22.50 20.90 21.40 21.40 21.90±2.40
Truncated Inception Net 8.63 11.00 9.53 8.02 8.93 8.63 8.70 9.64 9.30 10.30 9.27±0.84
Ratio 2.56 2.61 2.23 2.52 2.30 2.30 2.58 2.16 2.30 2.07 2.36±0.18

(COVID-19 positive case detection against Pneumonia and


a faster tool is the must.
healthy cases) and that of 99.92%, 0.99, 0.93, 1.0, 1.0, and
0.96, respectively, on the D6 dataset (COVID-19 positive
case detection against Pneumonia, TB, and healthy CXRs).
Discussion
Since the custom datasets being used were highly imbalanced
in terms of class representation, sensitivity and precision are
Since COVID-19 outbreak, very few pieces of works have
the most significant metrics in our case, as said in “Valida-
been proposed/reported using CXRs to detect COVID-
tion protocol and evaluation metrics” section. Consequently,
19 positive cases (see “Introduction” section): In our
the proposed model achieves high sensitivity and precision

13
Physical and Engineering Sciences in Medicine

Table 6  Comparison table


Model # of # of non ACC (in %) AUC​ SEN SPEC PREC F1 score Remarks # of param-
COVID-19 COVID-19 eters (in
CXRs CXRs million)

ResNet50 and SVM [19] 25 25 95.38 – 0.97 0.93 – 0.95 – 23.5


COVID-Net [20] 68 2794 83.50 – 1.0 – – – – 116.6
ResNet50 [21] 50 50 98.0 – 0.96 1.0 1.0 0.98 – 23.5
Inception Net V3 [21] 50 50 97.0 – 0.94 1.0 1.0 0.96 – 21.7
Truncated inception net 162 80 (D2) 94.04 1.0 0.88 1.0 1.0 0.93 Poor 2.1
162 1583 (D3) 100.0 1.0 1.0 1.0 1.0 1.0 Best
162 – 98.77 0.99 0.95 0.99 0.99 0.97 Average (D1–D6)

comparison, ResNet50 and SVM [19], COVID-Net [20], explicit information regarding the stages of COVID-19 in
ResNet50 [21], and Inception Net V3 [21] are considered the individual CXRs. Further, the system is limited by its
even though they are not peer-reviewed research articles. capacity to localize the disease in the CXR. As seen in the
We have compared with these pieces works using exact same activation maps of deeper layers (Fig. 6), the model develops
evaluation metrics (ACC in %, AUC, SEN, SPEC, PREC, an intrinsic representation of the CXR features rather than
and F1 score) and nature of dataset. Like other works, we accurate spatial heat-map, which is then mapped to the out-
take COVID-19 positive and healthy CXRs from Pneu- put using a dense layer classifier. The mentioned goal can
monia dataset (D3 in our case), and used this result as a be achieved by using increased number of data or a deep
comparison to other works. Besides, since all models were learning model(s) that is/are pre-trained on a large number
based on deep learning models, we consider an essential ele- of CXRs of different diseases (like CheXNet [31]), which
ment i.e., number of parameters in our comparison. Table 6 shall be our future goal.
provides a complete comparative study. Not all the authors
reported AUC, SPEC, and F1 score. Note that, our model
was used as a binary classifier to screen a CXR as COVID+
or COVID-, while not all the stated works performed the Conclusion and future works
same. The mentioned results belong to the COVID+ positive
class, wherever multi-class classification was done instead In this work, we have proposed the Truncated Inception Net
of binary classification. On the whole, considering the num- deep learning model to detect COVID-19 positive patients
ber of parameters, the proposed Truncated Inception Net using chest X-rays. For validation, experimental tests were
outperforms all. Note that, since our model is the derivative done on six different experimental datasets by combining
of Inception Net V3 model, it is worth to compare between COVID-19 positive, Pneumonia positive, Tuberculosis posi-
them. We observe that, in both computational time (Table 5) tive, and healthy CXRs. The proposed model outperforms
and performance scores (Table 6), Truncated Inception Net the state-of-the-art results in detecting COVID-19 cases
performs better than Inception Net V3 [21]. For a better from non-COVID ones. Besides, considering the number of
understanding, three different performance scores: poor, the parameters used in our proposed model, it is computationally
best and average are considered from Table 3. This suggests efficient as compared to original Inception Net V3 model
that the Truncated Inception Net is not only more computa- and other works proposed in the literature. It is important
tionally effective in terms of training and usability, but also to note that the study has no clinical implications. Instead,
more flexible for the purpose of active learning [14]. we solely aimed to check whether the proposed Truncated
Even though the performed experiments validate that Inception Net could be used in detecting COVID-19 positive
the proposed deep learning model for screening COVID-19 cases using CXRs.
positive CXRs, it is important to understand that the system Observing the performance scores, the Truncated Incep-
relies completely on visual cues in the input data. Therefore, tion Net can serve as a milestone for screening COVID-19
in the early stages of COVID-19, when the radiologically under active-learning framework on multitudinal/multi-
observable cues have not yet developed, the system might modal data [14]. It also motivates to work on cross-popula-
fail to perform as stated. A detailed study on this is a scope tion train/test models. Integrating this model with CheXNet
for future work, where the input data shall be addition- model [31] will be our immediate plan, since ChexNet is
ally labeled with the stage of COVID-19 it depicts as well. primarily employed to analyze CXRs.
However, data acquired for this work did not contain any

13
Physical and Engineering Sciences in Medicine

Compliance with ethical standards .org/2020/03/01/ai-algor​ithm-detec​ts-coron​aviru​s-infec​tions​-in-


patie​nts-from-ct-scans​-with-96-accur​acy, 2020(accessed March
02, 2020)
Conflict of interest Authors declared no conflict of interest.
16. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z,
Song Q et al (2020) Artificial intelligence distinguishes covid-19
Ethical approval This article does not contain any studies with human
from community acquired pneumonia on chest CT. Radiology
participants performed by any of the authors.
284:574
17. Xu X, Jiang X, Ma C, Du P, Li X, Lv S, Yu L, Chen Y, Su J, Lang
G et al (2020) Deep learning system to screen coronavirus disease
2019 pneumonia. arXiv preprint arXiv​:2002.09334​
References 18. Chen J, Wu L, Zhang J, Zhang L, Gong D, Zhao Y, Hu S, Wang
Y, Hu X, Zheng B et al (2020) Deep learning-based model for
1. World Health Organization (2020) Naming the coronavirus dis- detecting 2019 novel coronavirus pneumonia on high-resolution
ease (covid-19) and the virus that causes it. https​://www.who. computed tomography: a prospective study. medRxiv
int/emerge​ ncies​ /diseas​ es/novel-​ corona​ virus​ -2019/techni​ cal-guida​ 19. Sethy PK, Behera SK (2020) Detection of coronavirus disease
nce/namin​g-the-coron​aviru​s-disea​se-(covid​-2019)-and-the-virus​ (covid-19) based on deep features. Preprints
-that-cause​s-it 20. Wang L, Wong A (2020) Covid-net: a tailored deep convolutional
2. World Health Organization (2020) The continuing 2019-ncov epi- neural network design for detection of covid-19 cases from chest
demic threat of novel coronaviruses to global health - the latest radiography images. arxiv​:2003.09871​
2019 novel coronavirus outbreak in wuhan, China. https​://pubme​ 21. Narin A, Kaya C, Pamuk Z (2020) Automatic detection of coro-
d.ncbi.nlm.nih.gov/31953​166/ navirus disease (covid-19) using X-ray images and deep convolu-
3. World Health Organization (2020) Coronavirus disease (covid- tional neural networks. arXiv preprint arXiv​:2003.10849​
2019) situation reports. https​://www.who.int/emerg​encie​s/disea​ 22. Apostolopoulos Ioannis D, Mpesiana Tzani A (2020) Covid-19:
ses/novel​-coron​aviru​s-2019/situa​tion-repor​ts automatic detection from X-ray images utilizing transfer learning
4. Li M, Lei P, Zeng B, Li Z, Peng Y, Fan B, Wang C, Li Z, Zhou with convolutional neural networks. Phys Eng Sci Med 1:1
J, Shaobo H et al (2020) Coronavirus disease (covid-19): spec- 23. Santosh KC, Vajda S, Antani SK, Thoma GR (2016) Edge map
trum of ct findings and temporal progression of the disease. Acad analysis in chest X-rays for automatic pulmonary abnormality
Radiol 27:603 screening. Int J Comput Assist Radiol Surg 11(9):1637–1646
5. Bai HX, Hsieh B, Xiong Z, Halsey K, Choi JW, Tran Thi ML, 24. Karargyris A, Siegelman J, Tzortzis D, Jaeger S, Candemir S, Xue
Pan I, Shi L-B, Wang D-C, Mei J et al (2020) Performance of Z, Santosh KC, Vajda S, Antani SK, Folio LR, Thoma GR (2016)
radiologists in differentiating covid-19 from viral pneumonia on Combination of texture and shape features to detect pulmonary
chest ct. Radiology, pp 200823 abnormalities in digital chest X-rays. Int J Comput Assist Radiol
6. Gross A, Thiemig D, Koch F-W, Schwarz M, Gläser S, Albrecht Surg 11(1):99–106
T (2020) CT appearance of severe, laboratory-proven coronavirus 25. Vajda S, Karargyris A, Jäger S, Santosh KC, Candemir S, Xue Z,
disease 2019 (covid-19) in a caucasian patient in Berlin, Germany. Antani Sameer K, Thoma George R (2018) Feature selection for
Georg Thieme Verlag KG, In RöFo-Fortschritte auf dem Gebiet automatic tuberculosis screening in frontal chest radiographs. J
der Röntgenstrahlen und der bildgebenden Verfahren Med Syst 42(8):1–11
7. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Tao Q, Sun Z, Xia L 26. Santosh KC, Antani S (2017) Automated chest X-ray screening:
(2019) Correlation of chest ct and RT-PCR testing in coronavirus can lung region symmetry help detect pulmonary abnormalities?
disease, (covid-19) in China: a report of 1014 cases. Radiology IEEE Trans Med Imaging 37(5):1168–1177
200642:2020 27. Christian S, Vincent V, Sergey I, Jon S, Zbigniew W (2016)
8. Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, Ji W (2020) Rethinking the inception architecture for computer vision. In:
Sensitivity of chest ct for covid-19: comparison to RT-PCR. Radi- Proceedings of the IEEE conference on computer vision and pat-
ology. https​://doi.org/10.1148/radio​l.20202​00432​ tern recognition, pp 2818–2826
9. Chung JH, Elicker BM, Ketai LH, Kanne JP, Little BP (2020) 28. Paul CJ (2020) Covid-19 image data collection. https​://githu​
Essentials for radiologists on covid-19: an update-radiology sci- b.com/ieee8​023/covid​-chest​xray-datas​et
entific expert panel. Radiology 200527:27 29. Paul M (2020) Kaggle chest X-ray images (pneumonia) dataset.
10. Kong W, Agarwal PP (2020) Chest imaging appearance of covid- https​://www.kaggl​e.com/pault​imoth​ymoon​ey/chest​-xray-pneum​
19 infection. Radiology 2(1):e200028 onia/
11. Huang C, Wang Y, Li X, Ren L, Zhao J, Yi H, Zhang L, Fan G, 30. U.S. National Library of Medicine. Tuberculosis chest X-ray
Jiuyang X, Xiaoying G et al (2020) Clinical features of patients image data sets. https:​ //ceb.nlm.nih.gov/tuberc​ ulosi​ s-chest-​ X-ray-
infected with 2019 novel coronavirus in Wuhan. China. Lancet image​-data-sets/ (2020)
395(10223):497–506 31. Pranav R, Jeremy I, Kaylie Z, Brandon Y, Hershel M, Tony D,
12. Ng M-Y, Lee EY, Yang J, Yang F, Li X, Wang H, Lui MM, Shing- Daisy D, Aarti B, Curtis L, Katie S et al (2017) Chexnet: radiolo-
Yen LC, Leung B, Khong P-L et al (2020) Imaging profile of gist-level pneumonia detection on chest X-rays with deep learning.
the covid-19 infection: radiologic findings and literature review. arXiv preprintarXiv​:1711.05225​
Radiology 2(1):e200034
13. Wang Y, Dong C, Hu Y, Co Li, Ren Q, Zhang X, Shi H, Zhou Publisher’s Note Springer Nature remains neutral with regard to
M (2020) Temporal changes of CT findings in 90 patients with jurisdictional claims in published maps and institutional affiliations.
covid-19 pneumonia: a longitudinal study. Radiology 10:200843
14. Santosh KC (2020) Ai-driven tools for coronavirus outbreak: need
of active learning and cross-population train/test models on mul-
titudinal/multimodal data. J Med Syst 44(5):1–5
15. Technology org, ai algorithm detects coronavirus infections in
patients from ct scans with 96% accuracy. https:​ //www.techno​ logy​

13

You might also like