Measurement: Amit Kumar Jaiswal, Prayag Tiwari, Sachin Kumar, Deepak Gupta, Ashish Khanna, Joel J.P.C. Rodrigues
Measurement: Amit Kumar Jaiswal, Prayag Tiwari, Sachin Kumar, Deepak Gupta, Ashish Khanna, Joel J.P.C. Rodrigues
Measurement: Amit Kumar Jaiswal, Prayag Tiwari, Sachin Kumar, Deepak Gupta, Ashish Khanna, Joel J.P.C. Rodrigues
Measurement
journal homepage: www.elsevier.com/locate/measurement
a r t i c l e i n f o a b s t r a c t
Article history: The rich collection of annotated datasets piloted the robustness of deep learning techniques to effectuate
Received 11 March 2019 the implementation of diverse medical imaging tasks. Over 15% of deaths include children under age five
Received in revised form 1 May 2019 are caused by pneumonia globally. In this study, we describe our deep learning based approach for the
Accepted 21 May 2019
identification and localization of pneumonia in Chest X-rays (CXRs) images. Researchers usually employ
Available online 4 June 2019
CXRs for the diagnostic imaging study. Several factors such as positioning of the patient and depth of
inspiration can change the appearance of the chest X-ray, complicating interpretation further. Our iden-
2010 MSC:
tification model (https://github.com/amitkumarj441/identify_pneumonia) is based on Mask-RCNN, a
00-01
99-00
deep neural network which incorporates global and local features for pixel-wise segmentation. Our
approach achieves robustness through critical modifications of the training process and a novel post-
Keywords: processing step which merges bounding boxes from multiple models. The proposed identification model
Chest X-ray achieves better performances evaluated on chest radiograph dataset which depict potential pneumonia
Medical imaging causes.
Object detection Ó 2019 Elsevier Ltd. All rights reserved.
Segmentation
https://doi.org/10.1016/j.measurement.2019.05.076
0263-2241/Ó 2019 Elsevier Ltd. All rights reserved.
512 A.K. Jaiswal et al. / Measurement 145 (2019) 511–518
rate. Deep learning has already been proved an effective approach object in the image should be detected carefully and segmentation
in object detection and segmentation, image classification, natural of each instance should be done precisely. Therefore, a different
language processing etc. Further, deep learning has also shown its approach is required to handle both instance segmentation and
potential in medical image analysis for object detection and seg- object detection. Such powerful methods are faster region based
mentation such as radiology image analysis in order to study CNN (F-RCNN) [21] and FCN (Fully Convolutional Network) [22].
anatomical or pathological structures of human body [10–12]. Moreover, F-RCNN can be extended with an additional branch
Also, deep learning provided higher accuracy than traditional neu- for segmentation mask prediction on each region of interest along
ral network architectures. with existing branches for classification task. This extended net-
In the remainder of this article, we first review the literature work is called Mask R-CNN and it is better than F-RCNN in terms
related to pneumonia identification in chest X-ray images in Sec- of efficiency and accuracy. Kaiming He et al. [23] presented Mask
tion 2 followed by proposed model architecture in Section 3 detail- R-CNN approach for object instance segmentation. They compared
ing algorithm and training steps in different stages. We have their results with best models from COCO 2016 [24,25]. Luc et al.
detailed our extensive analysis of RSNA dataset in Section 4 with [26] extended their approach by introducing an instance level seg-
image augmentation steps including the result from cleaned data, mentation by predicting convolutional features.
and evaluation metrics followed by evaluation result in Section 5 of
our proposed model as well as ensembles of our model. Finally, we 3. Proposed architecture
conclude our work in Section 6 along with future work.
In this section, we formulate and explore the problem pipeline
followed by our model based on Mask-RCNN in detecting pneumo-
2. Literature survey nia symptoms from chest X-ray images.1.
Fig. 1. Mask R-CNN based model for opacity identification and pixel-wise disease segmentation.
prone region in lungs around the rectangular bounding boxes. For Table 1
instance, refer to the input image and prediction sample in Fig. 1. List of parameters in post-processing stage.
2 5
https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rc- The RSNA pneumonia dataset can be found athttps://www.kaggle.com/c/rsna-
nn_coco.h5. pneumonia-detection-challenge/data.
3 6
The augmentation step discussed in later section. Here examples represent the patients‘ chest examination during data collection
4
https://github.com/jrosebr1/imutils/blob/master/imutils/object_detection.py. from expert team of radiographers
514 A.K. Jaiswal et al. / Measurement 145 (2019) 511–518
Table 2 Table 3
Features of RSNA dataset. RSNA training and test image set.
that the isolated bounding box is valid then the box will endures
a positive minority belief, in other cases it will be discarded. Ini-
Also, we give a brief overview on training and test data of RSNA
tially, they assigned a confidence score to the bounding boxes.
dataset in these two stages reported in Table 3: The detailed infor-
Also, a low confidence bounding boxes was discarded and high/
mation about the difference between number of images in Stage 1
intermediate boxes was aggregated into a group of appropriate
and 2 is given in later section below.
pneumonia. Given a low probability based bounding box, they dis-
We have examined the training data classifying the positive and
card the box and check whether the labeling is abnormal or no lung
negative features among patients‘ of different age reported in
opacity. The opposed bounding boxes was adjudicated by one of
Fig. 2, whereas the various feature class among patients‘ of differ-
two thoracic radiology practitioners in multi-read cases which
ent age group reported in Fig. 3.
does not consent. Also, the practitioners found that the annotations
of all three readers in adjudicated case is more than 15%. They used
intersection for the rest of the bounding boxes in case of at least
4. Experimental evaluation
50% coincide by one of the bounding boxes, this step has ample
effect in discarding few pixel data for multiple readers including
4.1. Data preparation and augmentation
positive pixels. They used 1500 read cases out of the 4500 triple
cases into the training set to average out few probable distinction
Dataset: We used a large publicly available chest radiographs
among single and multi-read cases. Rest 3000 triple read cases
dataset from RSNA7 which annotated 30,000 exams from the origi-
allocated to the test set. The majority vote is used to distinguish
nal 112,000 chest X-ray dataset [18] to identify instances of potential
weak labels. The radiologists followed the below requirement dur-
pneumonia as a training set and STR8 approximately generated con-
ing data collection:
sensus annotations for 4500 chest X-rays to be used as test data. The
annotated collection contains participants ground truth which fol-
1. Bounding box (Lung Opacity): A patient’s chest radiograph
lows training our algorithm for evaluation. The sets containing
includes finding of fever and cough for potential signs of
30,000 samples is actually made up of 15,000 samples with pneumo-
pneumonia.
nia related labels such as ‘Pneumonia’, ‘Consolidation’, and ‘Infiltra-
2. They made few conjectures during the sabbatical of lateral
tion’, where 7500 samples are chosen randomly with ‘No Findings’
radiography, serial examination and clinical information.
label, and another randomly selected 7500 samples without the
3. Based on Fleischner’s [32] work, they considered every region
pneumonia related labels and ‘No Findings’ label. They created a
which was more opaque than the neighbouring area.
unique identifier for each of those 30,000 samples.
4. They also excluded area such as nodule(s), evident mass(es),
Annotation: Samples were annotated using a proprietary web
linear atelectasis and lobar collapse.
based annotation system and permanently inaccessible to the
other peoples’ annotations. Every radiologists practitioners who Data augmentation: We performed augmentation on lung opac-
took part in training initially executed on the similar set of 50 ities and images data with random scaling including shifting in
exemplary chest X-rays in a hidden manner, and then were visibly coordinate space ððx1 ; y1 Þ; ðx2 ; y2 ÞÞ as well as increasing/decreasing
annotated to the other practitioners for the similar 50 chest X-rays brightness and contrast including blurring with Gaussian blur
for evaluations, as it enable for questions such as does an X-ray under batches. Following these image augmentation, we found
with healed rib fractures enumerate and no enumeration as ‘Nor- images after augmentation reported in Fig. 4.
mal’ and for preliminary calibration. The final sets of label com- Considering the outcome in Figs. 2 and 3 signifies status of
prises of as given in Table 4: There are ‘Question’ labels to patient class and labels from the X-ray images, as we see a highly
suggest questions which will be answered by a chest radiologist imbalanced dataset. The imbalance of training and test dataset
practitioner. Overall coequal distribution of 30,000 human lungs among too many negatives and too few positives generates a crit-
is annotated by six radiologists experts to assess whether the col- ical issue, as we want high recall but the model could predict all
lected images of lungs opacities equivocal for pneumonia with negatives to attain high accuracy and the recall significantly
their analogical bounding box to set forth the status. Also, other undergo. In this case, it is unsure whether or not the present imbal-
twelve experts from STR collaborated in annotating fairly 4500 ance is acceptable or not. We test whether balancing the class dis-
human lungs. Out of 4500 triple read conditions, we divided these tribution would yield any improvement in this case. To do this, we
chest X-rays into three sets containing 1500 human lungs in train- have trained our model on two training data sets, one balanced and
ing set, 1000 in test set (initial stage) and rest 2000 in test set at the other not. We then create the balanced dataset by augmenting
final stage. However, the test sets is double checked by five radiol- more images to the negative (0) class. We discussed previously the
ogist practitioners including six other radiologists from the first augmentation steps which includes flipping, rotating, scaling, crop-
group. ping, translating, and noise adding. Introducing the images in the
Primary consideration: We discuss the adjudicate during the current negative class can possibly create radically new feature
data collection for such sophisticated task. A bounding box is that does not exist in the other class. For example, if we choose
assessed as isolated in multi-read case provided that it does not to flip every negative-class image, then we have in the negative
coincide with the bounding boxes of the other two readers i.e., class a set of images that have the right and left parts of the bodies
these two readers fails to flag that particular area of the image as switched while the other class does not have this feature. This is
being unsure for pneumonia. Whenever the adjudicator concurs not desirable because, the network may learn unnecessary (and
incorrect) features such as the image with the left part of the body
7
Radiological Society of North America. being to a certain side is more likely to exhibit non-pneumonia. So
8
Society of Thoracic Radiology.
A.K. Jaiswal et al. / Measurement 145 (2019) 511–518 515
Fig. 2. Positive and negative features among patients’ of different age group.
Table 4 that there are more chest X-ray images of males than females, both
List of labels. genders have the most classes of ‘‘No Lung Opacity/Not Normal”,
Probability Opacity No opacity Abnormal however other than this fact the men are more likely to have a
High Yes No No
class of ‘‘Lung Opacity” where as women are by proportion less
Intermediate Yes No No likely. This clearly explains about the class probability ranking
Moderate Yes No No among the men and women.
we scaled (cropping a little then resizing to the original size) the 4.2. Performance measures
images.
We also classify the distribution of positional view features We employ the mean of the intersection over union (IoU) of
which is a radiographic view allied with the patient position given pairing ground truth bounding boxes and prediction at varied
in training and test data as in Table 5: thresholds. The IoU can be computed from the paired threshold
Data cleaning: We have performed an extensive data cleaning which is the region of the predicted bounding boxes and ground-
on Stage 2 dataset and have explored the class probability ranking truth bounding boxes as an evaluation metric for pneumonia iden-
among males and females which is reported in Fig. 5 this shows tification task. It follows the below formula for IoU:
Table 5
Distribution of positional features in RSNA dataset.
T
Bpredicted Bgroundtruth 1 X
Image
IoU region ðBpredicted ; Bgroundtruth Þ ¼ S ð3Þ MSdataset ¼ MSi ð6Þ
Bpredicted Bgroundtruth jImagej i
The IoU determines a true-positive during pairing of predicted
where Image in the dataset can be either a predicted bounding box
object with ground-truth object above the threshold which ranges
or ground-truth bounding box.
from 0.4 to 0.75 at a step size of 0.05 to classify ‘‘misses” and ‘‘hits”.9
Pairing among predicted bounding boxes and ground-truth bound-
ing boxes is assessed in descending order of the predictions and 5. Evaluation results
strictly injective which is based on their confidence levels.
Given any threshold value, the mean threshold value (MTV) We report our prediction result in this section followed by
over the outcomes for a particular threshold can be computed fol- results from ensemble model.
lowing the counts of true positives ðcTP Þ, false negatives ðcFN Þ, and We perform ensembling in Stage 2 due to labelled dataset,
false positives ðcFP Þ whereas the dataset in Stage 1 was highly imbalanced. The vari-
ance in the dataset is due to radiologists are overlooked with read-
cTPðtÞ ing high volumes of images every shift. We have discussed this in
MTVðtÞ ¼ ð4Þ
cTPðtÞ þ cFPðtÞ þ cFNðtÞ earlier section of this article. In Fig. 6, we overlay the probabilities
of ground truth labels to check whether it is flipped or not. This
Also, we compute mean score (MS) for every image over all thresh-
also shows the successful predictions depicting inconsistency
old values:
between ground-truth and prediction bounding boxes. We trained
1 X our proposed model in Stage 2 on NVIDIA Tesla P100 GPU and
MSi ¼ MTVðtÞ ð5Þ Tesla K80 in Stage 1, which also depicts that one needs an efficient
jThresholdj t
computing resources to model such task on highly imbalanced
Therefore, we can compute the mean score for the dataset as dataset.
follows: The prediction outcome of our model at given threshold is
reported in Table 6, in which the best prediction set of bounding
9
A predicted box hits when it reaches at a threshold of 0.5 provided its IoU with a boxes and ground-truth boxes results in Stage 2. Also, the pre-
ground-truth box is greater than 0.5. dicted sample set depicting pneumonia showing the position
A.K. Jaiswal et al. / Measurement 145 (2019) 511–518 517
Fig. 6. The results from stage 2 dataset. The probability overlaid on few images which includes all patient classes and labels. Green, orange, blue and red overlays shows
predictions and ground truth labels, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 6 Table 7
Result: prediction at given threshold. Ensemble model results.
Acknowledgements
Our proposed approach, as illustrated in the beginning of this
section, implies a typical ensemble model after post-processing
We would like to acknowledge the Radiological Society of North
step which is then employed to obtain prediction set of patients’
America for the chest-Xray dataset and Kaggle for computing
having pneumonia. We ensembled our Mask-RCNN based model
infrastructure support.
developed on ResNet50 and ResNet101 and the result is reported
Amit Kumar Jaiswal and Prayag Tiwari has received funding
in Table 7.
from the European Union’s Horizon 2020 research and innovation
programme under the Marie Sklodowska-Curie grant agreement
6. Conclusion and future work No 721321.
Sachin Kumar has received funding from Ministry of Education
In this work, we have presented our approach for identifying and Science of Russian Federation (government order
pneumonia and understanding how the lung image size plays an 2.7905.2017/8.9).
important role for the model performance. We found that the dis- Joel J. P. C. Rodrigues has received funding by the National
tinction is quite subtle for images among presence or absence of Funding from the FCT – Fundação para a Ciência e a Tecnologia
pneumonia, large image can be more beneficial for deeper informa- through the UID/EEA/50008/2019 Project; by RNP, with resources
tion. However, the computation cost also burden exponentially from MCTIC, Grant No. 01250.075413/2018-04, under the Centro
when dealing with large image. Our proposed architecture with de Referência em Radiocomunicações – CRR project of the Instituto
regional context, such as Mask-RCNN, supplied extra context for Nacional de Telecomunicações (Inatel), Brazil; by Brazilian
generating accurate results. Also, using thresholds in background National Council for Research and Development (CNPq) via Grant
while training tuned our network to perform well in the this task. No. 309335/2017-5.