Abnormality Detection in Chest X-Ray Images Using Uncertainty Prediction Autoencoders
Abnormality Detection in Chest X-Ray Images Using Uncertainty Prediction Autoencoders
net/publication/346070063
CITATIONS READS
4 231
6 authors, including:
Some of the authors of this publication are also working on these related projects:
Catch a killer - Using the eye to detect and prevent major adverse cardiac events View project
ACE-LP: Augmenting Communication using Environmental data for Language Prediction View project
All content following this page was uploaded by Ruixuan Wang on 12 June 2021.
1 Introduction
Chest X-ray has been widely adopted for annual medical screening, where the
main purpose is to check whether the lung is healthy or not. Considering the huge
amount of regular medical tests worldwide, it would be desirable if there exists an
intelligent system helping clinicians automatically detect potential abnormality
in chest X-ray images. Here we consider such a specific task of abnormality detec-
tion, for which there is only normal (i.e., healthy) data available during model
c Springer Nature Switzerland AG 2020
A. L. Martel et al. (Eds.): MICCAI 2020, LNCS 12266, pp. 529–538, 2020.
https://doi.org/10.1007/978-3-030-59725-2_51
530 Y. Mao et al.
2 Method
The problem of interest is to automatically determine whether any new chest
X-ray image is abnormal (‘unhealthy’) or not, only based on a collection of nor-
mal (‘healthy’) images. Since abnormality in X-ray images could be due to small
area of lesions or unexpected change in subtle contrast between local regions,
extracting an image-level feature representation may suppress such small-scale
features, while extracting features for each local image patch may fail to detect
the contrast-based abnormalities, both resulting in the failing of abnormality
detection. In comparison, reconstruction error based on pixel-level differences
between the original image and its reconstructed version by an autoencoder
model may be a more appropriate measure to detection abnormality in X-ray
images, because both local and global features have been implicitly considered to
reconstruct each pixel by the autoencoder. However, it has been observed that
there often exists relatively large reconstruction errors around the boundaries
between different regions (e.g., lung vs. the others, foreground vs. background,
Fig. 2) even in normal images. Such large errors could result in false positive
detection, i.e., considering a normal image as abnormal. Therefore, it would
be desirable to automatically suppress the contribution of such reconstruction
errors in anomaly detection. Simply detecting edges and removing their contribu-
tions in reconstruction error may not work well due to the difficulty in detecting
low-contrast boundaries in X-ray images and due to possibly larger reconstruc-
tion errors close to region boundaries. In this paper, we applied a probabilistic
approach to automatically downgrade the contribution of normal regions with
larger reconstruction errors. The basic idea is to train an autoencoder to simul-
taneously reconstruct the input image and estimate the pixel-wise uncertainty in
reconstruction (Fig. 1), where larger uncertainties often appear at normal regions
with larger reconstruction errors. On the other hand, there are often relatively
large reconstruction errors with small reconstruction uncertainties at abnormal
regions in the lung area. All together, normal images would be more easily sep-
arated from abnormal images based on the uncertainty-weighted reconstruction
errors.
Fig. 1. Autoencoder with both reconstruction μ(x) and predicted pixel-wise uncer-
tainty σ 2 (x) as outputs.
532 Y. Mao et al.
where xi,k is the k-th element of the expected output yi (i.e., the input xi ),
and μk (xi ) is the k-th element of the real output μ(xi ). Then the autoencoder
can be optimized by maximizing the log-likelihood over all the normal (training)
images, i.e., by minimizing the negative log-likelihood function L(θ),
N D
1 (xi,k − μk (xi ))2
L(θ) = + log σ 2
k (x i ) . (3)
N D i=1 σk2 (xi )
k=1
Equation (3) would be simplified to the mean squared error (MSE) loss based on
either Mahalanobis distance or Euclidean distance, when the variance elements
σk2 (xi )’s are fixed and not dependent on the input xi or when they are not only
fixed but also equivalent.
Note that for each input image xi , the model generates two outputs,
the reconstruction μ(xi ) and the noise variance σ 2 (xi ) = (σ12 (xi ), σ22 (xi ), ...,
σD2
(xi ))T (Fig. 1). Interestingly, while μ(xi ) is supervised to approach to xi ,
Abnormality Detection 533
Based on the above analysis, for any new image x, it is natural to use the pixel-
wise normalized reconstruction error (as first term in Eq. (3)) to represents the
degree of abnormality for each pixel xk , and the average of such errors over all
pixels for the abnormality A(x) of the image, i.e.,
D
1 (xk − μk (x))2
A(x) = . (4)
D σk2 (x)
k=1
Since the pixel-wise uncertainties σk2 (x) depend on the input x, it is not as easily
estimated as for fixed variance. As far as we know, it is the first time to apply
such pixel-wise input-dependent uncertainty to estimate of abnormality. If the
image x is normal, pixels or regions with larger reconstruction errors are often
534 Y. Mao et al.
3 Experiments
3.1 Experimental Setup
Datasets. Our method is tested on two publicly available chest X-ray datasets:
1) RSNA Pneumonia Detection Challenge dataset1 and 2) pediatric chest X-ray
dataset2 . The RSNA dataset is a subset of ChestXray14 [19]; it contains 26,684
X-rays with 8,851 normal, 11,821 no lung opacity/not normal and 6,012 lung
opacity. The pediatric dataset consists of 5,856 X-rays from normal children and
patients with pneumonia.
Protocol. For the RSNA dataset, we used 6,851 normal images for training,
1,000 normal and 1,000 abnormal images for testing. On this dataset, our method
was tested on three different settings: 1) normal vs. lung opacity; 2)normal
vs. not normal and 3) normal vs. all (lung opacity and not normal). For the
pediatric dataset, 1,249 normal images were used for training, and the original
author-provided test set was used to evaluate the performance. The test set
contains 234 normal images and 390 abnormal images. All images were resized
to 64 × 64 pixels and pixel values of each image were normalized to [-1,1]. The
area under the ROC curve (AUC) is used to evaluate the performance, together
with equal error rate (EER), F1-score (at EER) reported.
Implementation. The backbone of our method is a convolutional autoencoder.
The network is symmetric containing an encoder and a decoder. The encoder
contains four layers (each with one 4 × 4 convolution with a stride 2), which
is then followed by two fully connected layers whose output sizes are 2048 and
16 respectively. The decoder is connected by two fully connected layers and
four transposed convolutions, which constitute the encoder. The channel sizes
are 16-32-64-64 for encoder and 64-64-32-16 for decoder. All convolutions and
transposed convolutions are followed by batch normalization and ReLU nonlin-
earity except for the last output layer. We trained our model for 250 epochs. The
optimization was done using the Adam optimizer with a learning rate 0.0005.
For numerical stability we did not directly predict σ 2 in Eq. (3). Instead, the
uncertainty output by the model is the log variance (i.e., log σ 2 ).
3.2 Evaluations
Baselines. Our method is compared with three baselines as well as state-of-the-
art methods for anomaly detection. Below summarizes the methods compared.
1
https://www.kaggle.com/c/rsna-pneumonia-detection-challenge.
2
https://doi.org/10.17632/rscbjbr9sj.3.
Abnormality Detection 535
Table 1. Comparison with others with different metrics. Bold face indicates the best,
and italic face for the second best.
Fig. 2. Exemplar reconstructions of normal (rows 1–2) and abnormal (rows 3–4) test
images. x is input; x , x , and μ(x) are reconstructions from AE, f-AnoGAN, and our
method; operators are pixel-wise. Green bounding boxes for abnormal regions.
Fig. 3. Histograms of abnormality score for normal (blue) and abnormal (red) images
in the test set (RSNA Setting-1). Left: without uncertainty normalization. Right: with
uncertainty normalization. Scores are normalized to [0, 1] in each subfigure. (Color
figure online)
Ablation Study. Table 2 shows that only incorporating uncertainty loss with
autoencoder (i.e., without uncertainty normalization) doesn’t improve the per-
formance (Table 2, ‘without-U’, AUC = 0.68 which is similar to that of vanilla
AE). In contrast, uncertainty normalized abnormality score (‘with-U’) largely
improves the performance. Interestingly, adding skip connections downgraded
Table 2. Ablation study on RSNA Setting-1. ‘U’ denotes uncertainty output. ‘0’–‘4’:
number of skip connections between encoder and decoder convolutional layers, with ‘1’
for the connection between encoder’s last and decoder’s first convolutional layers.
Skip connections 0 1 2 3 4
with-U 0.89 0.62 0.50 0.44 0.38
without-U 0.68 0.43 0.38 0.33 0.33
Abnormality Detection 537
4 Conclusion
References
1. Alaverdyan, Z., Jung, J., Bouet, R., Lartizien, C.: Regularized siamese neural net-
work for unsupervised outlier detection on brain multiparametric magnetic res-
onance imaging: application to epilepsy lesion screening. Med. Image Anal. 60,
101618 (2020)
2. Baur, C., Wiestler, B., Albarqouni, S., Navab, N.: Fusing unsupervised and super-
vised deep learning for white matter lesion segmentation. In: International Confer-
ence on Medical Imaging with Deep Learning, pp. 63–72 (2019)
3. Chen, X., Konukoglu, E.: Unsupervised detection of lesions in brain MRI using
constrained adversarial auto-encoders. In: International Conference on Medical
Imaging with Deep Learning (2018)
4. Chen, X., Pawlowski, N., Glocker, B., Konukoglu, E.: Unsupervised lesion detection
with locally Gaussian approximation. In: Suk, H.-I., Liu, M., Yan, P., Lian, C. (eds.)
MLMI 2019. LNCS, vol. 11861, pp. 355–363. Springer, Cham (2019). https://doi.
org/10.1007/978-3-030-32692-0 41
5. Diederik, P.K., Welling, M., et al.: Auto-encoding variational bayes. In: Proceedings
of the International Conference on Learning Representations, vol. 1 (2014)
6. Dorta, G., Vicente, S., Agapito, L., Campbell, N.D., Simpson, I.: Structured uncer-
tainty prediction networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 5477–5485 (2018)
7. Gong, D., et al.: Memorizing normality to detect anomaly: memory-augmented
deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE
International Conference on Computer Vision, pp. 1705–1714 (2019)
8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Infor-
mation Processing Systems, pp. 2672–2680 (2014)
9. He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with
uncertainty for accurate object detection. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 2888–2897 (2019)
538 Y. Mao et al.
10. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning
for computer vision? In: Advances in Neural Information Processing Systems, pp.
5574–5584 (2017)
11. Mourão-Miranda, J., et al.: Patient classification as an outlier detection problem:
an application of the one-class support vector machine. NeuroImage 58(3), 793–804
(2011)
12. Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class
classifier for novelty detection. In: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 3379–3388 (2018)
13. Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U.: f-
AnoGAN: fast unsupervised anomaly detection with generative adversarial net-
works. Med. Image Anal. 54, 30–44 (2019)
14. Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsu-
pervised anomaly detection with generative adversarial networks to guide marker
discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp.
146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9 12
15. Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Sup-
port vector method for novelty detection. In: Advances in Neural Information Pro-
cessing Systems, pp. 582–588 (2000)
16. Seeböck, P., et al.: Unsupervised identification of disease marker candidates in
retinal OCT imaging data. IEEE Trans. Med. Imaging 38(4), 1037–1047 (2018)
17. Sidibe, D., et al.: An anomaly detection approach for the identification of DME
patients using spectral domain optical coherence tomography images. Comput.
Methods Programs Biomed. 139, 109–117 (2017)
18. Tang, Y.X., Tang, Y.B., Han, M., Xiao, J., Summers, R.M.: Abnormal chest x-ray
identification with generative adversarial one-class classifier. In: IEEE International
Symposium on Biomedical Imaging, pp. 1358–1361 (2019)
19. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8:
hospital-scale chest x-ray database and benchmarks on weakly-supervised classi-
fication and localization of common thorax diseases. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)
20. Ziegler, G., Ridgway, G.R., Dahnke, R., Gaser, C., Initiative, A.D.N., et al.: Indi-
vidualized Gaussian process-based prediction and detection of local and global gray
matter abnormalities in elderly subjects. NeuroImage 97, 333–348 (2014)