Research Paper04
Research Paper04
Research Paper04
Highlights:
Automatic and accurate esophageal lesion classification and segmentation is of great significance to clinically
estimate the lesion status of esophageal disease and make suitable diagnostic schemes. Due to individual variations
and visual similarities of lesions in shapes, colors and textures, current clinical methods remain subject to potential
high-risk and time-consumption issues. In this paper, we propose an Esophageal Lesion Network (ELNet) for
automatic esophageal lesion classification and segmentation using deep convolutional neural networks (DCNNs). The
underlying method automatically integrates dual-view contextual lesion information to extract global features and
local features for esophageal lesion classification of four esophageal image types (Normal, Inflammation, Barrett, and
Cancer) and proposes lesion-specific segmentation network for automatic esophageal lesion annotation of three
esophageal lesion types at pixel level. For established clinical large-scale database of 1051 white-light endoscopic
images, ten-fold cross-validation is used in method validation. Experiment results show that the proposed framework
achieves classification with sensitivity of 0.9034, specificity of 0.9718 and accuracy of 0.9628, and the segmentation with
sensitivity of 0.8018, specificity of 0.9655 and accuracy of 0.9462. All of these indicate that our method enables an
efficient, accurate and reliable esophageal lesion diagnosis in clinical.
The main contributions of our work can be generalized as follows:
1 For the first time, proposed ELNet enables an automatically and reliably comprehensive esophageal lesions
classification of four esophageal lesion types (Normal, Inflammation, Barrett, and Cancer) and lesion-specific
segmentation from clinically white-light esophageal images to make suitable and repaid diagnostic schemes for
clinicians.
2 A novel Dual-Stream network (DSN) is proposed for esophageal lesion classification. DSN automatically
integrates dual-view contextual lesion information using two CNN streams to complementarily extract the global
features from the holistic esophageal images and the local features from the lesion patches.
3 Lesion-specific esophageal lesion annotation with Segmentation Network with Classification (SNC) strategy is
proposed to automatically annotate three lesion types (Inflammation, Barrett, Cancer) at pixel level to reduce the
intra-class differences of esophageal lesions.
4 A clinically large-scale database esophageal database is established for esophageal lesions classification and
segmentation. This database includes 1051 white-light esophageal images, which consists of endoscopic images in
four different lesion types. Each image in this database has a classification label and its corresponding
segmentation annotation.
2 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 1996
I. Introduction
Accurate classification and segmentation of the esophageal lesions are effective tools to help
clinicians make reliable diagnostic schemes intensively depending on the potential lesions image
analysis on the basis of classification and segmentation for esophageal lesion. Accurate classification
* Corresponding authors:
chenyang.list@seu.edu.cn (Yang Chen) and help_007@126.com (Xiaopu He)
† These authors contributed equally to this work
AUTHOR ET AL.: TITLE 3
for the esophageal lesions is important because it reveals the esophageal lesion statuses which can
further determine the prognosis of patients with esophageal lesions (Hu, Hu et al. 2010). In advanced
stages, the five-year survival rate of esophageal cancer is 20.9% while greater than 85% in the early
stages (Janurova and Bris 2014). (2) Accurate segmentation for the esophageal lesions can provide the
annotation information of esophageal lesion regions and the elaborate feature analysis of lesions sizes,
shapes, and colors. Classification and segmentation for esophageal lesions are indispensable and
complementary, which provides comprehensive information together for a thorough understanding of
esophageal lesions in clinical studies.
Changes in esophagus mucosa are closely related to the stages of cancerous developments, which
has important significance in classifying and segmenting esophageal lesions for the clinician (Wang and
Sampliner 2008). Different stages of esophageal lesions produce physiological and visual variations in
the esophagus mucosa. Typical white-light images with three types of esophageal lesions and one
normal type are shown in Fig. 1. The normal type shows no apparent lesion areas in Fig. 1(a). The type
Inflammation is featured by red and white mucosa with strip shapes as shown in Fig. 1(b). For type
Barrett, a clear boundary between normal areas and lesion areas appears in the epithelium as shown in
Fig. 1(c). Esophageal cancer refers to a malignant lesion formed by abnormal proliferation in the
epithelium of esophageal, which has irregular mucosae, disordered, and missing blood vessels in
esophageal images as shown in Fig. 1(d) (Zellerhoff, Lenze et al. 2016). These visual lesion differences
from esophageal images make esophageal lesion classification own theoretical support.
However, due to individual lesion variations in shape, color, and texture, depending on the
clinician experience for esophageal lesion detection still exists potential misjudgments and time-
consuming problems. To overcome the aforementioned problems, automated esophageal lesion
detection using computer vision methods can be used for lesion classification and segmentation
(Domingues, Sampaio et al. 2019).
For the accurate and automated classification and segmentation from esophageal images, it is still
an open and challenging task because: 1) Significant intra-lesion differences in shape, size, and color
seriously hamper the classification and segmentation performance as shown in Fig. 2(a). 2) Inter-lesion
similarities between two different lesion types easily fall into the same category as shown in Fig. 2(b).
3) Varying illumination and noise degrees such as specular reflection easily produce the negative
impacts on the esophageal lesion classification and segmentation (Tanaka, Fujiwara et al. 2018).
Fig. 1. Four types of white-light esophageal images: (a) The normal type represents there is no any lesion. (b) The type
Inflammation is featured by red and white mucosa with strip shapes. (c) The type Barrett is featured by a clear
boundary between normal areas and lesion areas appearing in the epithelium. (d) The type Cancer is characterized by
the irregular mucosae, disordered and missing blood vessels.
Fig. 2. Intra-lesion differences and Inter-lesion similarities make esophageal lesion classification and segmentation
4
challenging. (a) Intra-lesion differences in shape, size, and color for Inflammation lesion. (b) Inter-lesion similarities
between the type Barrett and Normal.
types in shape, color and size, which hampers capturing common lesion features. In this paper, to tackle
this problem, we design the lesion-specific segmentation network to automatically annotate three lesion
types (Inflammation, Barrett, Cancer) at pixel level.
1.2. Contributions
In this paper, we propose Esophageal Lesion Network (ELNet) based on deep CNNs to classify
and segment the esophageal lesions with four interdependent functional parts: Preprocessing module,
Location module, Classification module, and Segmentation module. (1) To normalize esophageal
images, reduce obstruction of irrelevant information and tackle data imbalance problem, the
Preprocessing module is used for normalization, specular reflection removal, and data augmentation
from original esophageal images. (2) To highlight esophageal lesions, the Location module employs
the Faster RNN for focusing on the ROIs of esophageal lesions. (3) To accurately predict esophageal
lesion statuses, the Classification module is designed for classifying four esophageal lesion types. (4)
To obtain accurate annotation at pixel level, the Segmentation module is employed to automatically
segment three lesion types.
The main contributions of our work can be generalized as follows:
5 For the first time, we propose the ELNet for automatically and reliably comprehensive esophageal
lesions classification and lesion-specific segmentation from clinically white-light esophageal
images. It enables an efficient esophageal lesion detection to make suitable and repaid diagnostic
schemes for clinicians.
6 A novel Dual-Stream Network (DSN) is proposed for esophageal lesion classification. DSN
automatically integrates dual-view contextual lesion information using two CNN streams to
complementarily extract the global and local features. It effectively improves the esophageal lesion
classification performance to automatically predict the esophageal lesion statuses.
7 The lesion-specific esophageal lesion annotation with the Segmentation Network with Classification
(SNC) strategy is proposed to reduce the intra-lesion differences for automatically segmenting three
lesion types at pixel level.
8 A clinically large-scale database esophageal database is established for esophageal lesions
classification and segmentation. This database includes 1051 white-light esophageal images, which
consists of endoscopic images in four different lesion types. Each esophageal image in this database
has a classification label and its corresponding segmentation annotation.
Experiment results show that the proposed ELNet for esophageal lesions achieves the
classification results with sensitivity of 0.9397, specificity of 0.9825, and accuracy of 0.9771, and the
segmentation results with sensitivity of 0.8018, specificity of 0.9655, and accuracy of 0.9462. All of
these indicate that our method enables an efficient, accurate and reliable esophageal lesion diagnosis in
clinics.
The remainder of this paper is organized as follows: In Section II, we present proposed ELNet for
esophageal lesion classification and segmentation. In Section III, the implementation details about
ELNet are reported. The experiment and evaluation are given to validate the performance of ELNet in
Section IV. Finally, we conclude the proposed ELNet and discuss related future work in Section V.
II. Method
The proposed ELNet includes the following interactional functional parts: (1) the Preprocessing
module performs the operations of normalization, spectral reflection removal, and data augmentation to
normalize the esophageal images, reduce the irrelevant information obstruction, and tackle overfitting
6
problem; (2) the Location module employs Faster-RCNN to highlight the ROIs of the esophageal
lesions; (3) the Classification module utilizes the proposed DSN consisting of Global and Local
Streams to simultaneously extract the global and local features for four-class esophageal lesion
classification (Normal, Inflammation, Barrett and Cancer); (4) the Segmentation module performs the
automatic esophageal lesion annotation at pixel level using the proposed lesion-specific segmentation
network. The main workflow is outlined in Fig. 3.
Fig. 3. Overview of the proposed ELNet for esophageal lesion classification and segmentation.
1 3
( 2 r - g - b )= ( r-m ) if ( b + r ) > 2 g
2 2
s= , ............................................(2)
1 3
( r + g - 2 b )= ( m -b ) if ( b + r ) 2 g
2 2
where m is the pixel intensity, s is the saturation, and r, g and b represent the red, green and blue
channels in images. Specular reflections can be detected via two threshold values m m a x and s m a x based on
the bi-dimensional histogram. A pixel p will be a part of the specular region if it meets the following
conditions:
1
m m m ax
p 2
, ..................................................................................(3)
s 1 s
p m ax
3
where m m ax and s m ax are the maximum intensities of mp and sp among all the pixels in an image
respectively. Related parameters are obtained by experiment with a large quantity of esophageal images.
(2) Correction: The Navier-Stokes method (Bertalmio, Bertozzi et al. 2001) performs the linear
correction. The corrected images are obtained by replacing the specular reflection points with the
average neighboring pixel values. The images before and after this spectral reflection removal are
depicted in Fig. 4.
Fig. 4. The spectral reflection removal for original esophageal images (top) and corresponding images after spectral
reflection removal (bottom). The non-uniformity of the illumination caused by the deviation of the light sources
generates the specular reflection with white spots in endoscope images. The arrows in the original esophageal images
point to these white spots.
where x is the input pixel value of an esophageal image, M in V a lu e and M a x V a lu e is the minimum and
maximum pixel value of this esophageal image, y is the pixel value output after normalization.
2.1.3. Data Augmentation
Data augmentation tackles the over-fitting problem. It includes the translation to simulate the left
and right position change of gastroscopes, the rescaling to simulate the gastroscope stretch, the rotation
to simulate the rotation movement of gastroscope, and the flipping to simulate the mirroring of
gastroscope in our experiment. A summary of the transformations with the parameters is given in
TABLE 1.
TABLE 1.
Data Augmentation Parameters.
Transformation Type Description
Rotation Randomly rotate an angle of 0 3 6 0
Flipping 0 (without flipping) or 1(with flipping)
Rescaling Randomly with scale factor between 1/1.6 and 1.6
Translation Randomly with shift between −10 and 10 pixels
Global Stream
Residual Block 1
Residual Block 2
Residual Block 3
Residual Block 4
Conv1
Conv2
Conv3
Conv4
Pool1
Pool2
Pool3
Pool4
Pool5
Normal
Inflammation
Softmax
Fusion
Clipping Barrett
Residual Block 1
Residual Block 2
Cancer
Conv1
Conv2
Conv3
Conv4
Pool1
Pool2
Pool3
Local Stream
Fig. 6. The structure of the proposed Dual-Stream Network (DSN).
The Classification module performs the classification for four esophageal image types including
Normal, Inflammation, Barrett, and Cancer. To automatically integrate dual-view contextual lesion
information and accurately classify four esophageal lesion types, the DSN is designed, and it consists of
two complementary streams – the Global Stream and the Local Stream. The Global Stream extracts
global features via inputting the holistic esophageal images to focus on the holistic esophageal image
information reflecting the contrast between lesion regions and background. The Local Stream extracts
local lesion features related to information about textures, shapes, and colors of lesions via inputting
four types of lesion patches generated by the Location module. The structure of the DSN for esophageal
lesions is depicted in Fig. 6.
The detailed configuration of the proposed DSN is shown in TABLE 2. Given the input data scale
and image size, the Global Stream has 21 layers including 16 convolution layers and 5 pooling layers.
The stride of convolution layer is set to 1 to capture the lesion features besides the Conv 1 with stride 2
to reduce the computation parameters in the Global Stream. The Local Stream is designed to have 13
layers including 10 convolution layers and 3 pooling layers. The Conv 4 is added in the Local Stream to
keep the same output sizes with the Global Stream. Each convolution layer is followed by a batch
normalization layer and a ReLU activation layer.
written as:
x
e
yˆ m = 4
, ....................................................................(8)
x
e
m=1
where yˆ is the output probability of the mth class, x represents the input neurons of the upper layer.
m
We choose cross entropy loss as the objective function of DSN to accelerate training. The cross-
entropy based loss function is given by (9):
l o s s = - m e a n ( y l o g ( yˆ m ) + ( 1 - y ) l o g ( 1 - yˆ m ) ) , ..........................................(9)
where, y is the label vector and ym is the predicted output vector of the proposed DSN.
TABLE 2
The detail configuration of proposed DSN for esophageal lesion classification.
Dual-Stream Network (DSN)
Global Stream Local Stream
Kernel Size, Kernel Size,
Layer Channel Output Size Layer Channel Output Size
Number Number
Data - 512×512 Data 64×64
Conv 1 3×3, 64 256×256 Conv1 3×3,64 64×64
Pool 1 2×2, 64 128×128 Pool 1 2×2, 64 32×32
Residual Conv 1 1×1, 32 128×128 Residual Conv1 1×1, 32 32×32
AUTHOR ET AL.: TITLE 11
3.1. Material.
The clinical database containing standard 1051 white-light endoscopic images from 748 patients is
obtained from the First Affiliated Hospital of Nanjing Medical University between July 2017 and July
2018. The inclusion criteria of this database is the selection of those images with available conventional
white-light endoscopy and pathologic analysis. Those images with poor quality and images captured
from patients undergoing surgical or endoscopic resection are excluded. All the included esophageal
images contain the pixel-level lesion annotations manually marked by licensed physicians and four
types of the esophageal lesion labels based on strict histological proof.
Ten-fold cross-validation is used. 80 percent of the esophageal images are used for training, and 10
percent of the images are used for testing. The rest of the images are used as the validation dataset to
optimize the training parameters and improve the generalization capability of the proposed network. It
is noted that there is no data overlapping between the training dataset, validation dataset, and test
dataset. The detailed statistics of collected esophageal images can be seen in TABLE 3.
TABLE 3
Statistics distribution from esophageal image database.
Normal Inflammation Cancer Barrett
Train 203 345 100 207
Validation 25 43 12 25
Test 26 44 14 27
Total 254 412 126 259
TN
SPEC = , ............................................................. (11)
FP+TN
(T P + T N )
ACC= , .....................................................(12)
(T P + T N + F P + F N )
where:
TP, True Positives: the number of positive samples (foreground pixels) is correctly classified.
FP, False Positives: the number of negative samples (background pixels) is wrongly classified as
positive.
TN, True Negatives: the number of negative sam-ples (negative pixels) is correctly classified.
FN, False Negatives: the number of positive sam-ples (foreground pixels) is wrongly classified as
negative.
Fig. 8. ROC curves and AUC values of the DSN (Red), the Global Network (Blue) and the Local Network (Green).
As shown in Fig. 9, the DSN reduces the confusion degree in comparison with the Global Stream.
The confusion matrix is a quantitative graph used to reflect the performance of a classification method
on a testing set (Zhang, Shao et al. 2006). The diagonal values represent the number of correct
classification for each class, and the others represent the confusion number between every two classes.
The DSN increases the correct classification number in Inflammation (4 increased) and Barrett (6
increased). The type Cancer will get more excellent performance if more data can be available.
(a) (b)
Fig. 9. Confusion matrixes of (a) the proposed DSN and (b) the Global Stream in the esophageal image database.
To help understand the features extracted by the COVIDNet, we compute the class activation map
(CAMs) from the Dual-Stream Network (DSN) (Selvaraju, Cogswell et al. 2017). The visualization
results of the proposed DSN on the three esophageal lesion types are shown in Fig. 10. The closer to red
in the heatmaps, the stronger activation in the original image, which indicates that information from that
area contributes more to the final decision. As it can be seen from Fig. 10, the proposed DSN efficiently
extracts esophageal lesion features, suppresses the irrelevant background information, and achieves
excellent classification performance.
AUTHOR ET AL.: TITLE 15
Fig.10. Visualization results of three types of esophageal lesion images (top) and the corresponding heatmaps (bottom)
of the proposed DSN. The pixel areas related to esophageal lesions can be accurately highlighted by the proposed DSN
for three types of esophageal lesion images in our database.
segmentation performance in all three average metrics including sensitivity of 0.8018, specificity of
0.9655, and accuracy of 0.9462. Compared with SNNC strategy, the lesion-specific segmentation with
SNC significantly increases the effectiveness and stability in the terms of accuracy (6.96%
improvement), sensitivity (21.80% improvement), and specificity (6.87% improvement). Fig.
qualitatively compares the segmentation performance of between the proposed SNC strategy and the
SNNC strategy. The lesion-specific segmentation with SNC strategy has a good match with the ground
truth made by the specialist. For the SNNC strategy, the segmentation results of type Cancer and
Inflammation produce a relatively higher false positive at pixel level due to under-fitting in three
esophageal lesion types. These increases of our method are resulted from the fact that the lesion-
specific segmentation provides an independent and efficient network for every esophageal lesion type
and it adapts to each lesion type and reduces false positive at pixel levels.
TABLE 6.
The quantitative segmentation results of the SNC strategy and the SNNC strategy for esophageal lesions. The results
of SNNC strategy are present in the brackets.
Inflammation Cancer Barrett Average
0.9282 0.9075 0.9915 0.9462
ACC
(0.8806) (0.7676) (0.9152) (0.8766)
0.6909 0.8020 0.9387 0.8018
SENS
(0.5824) (0.8455) (0.5315) (0.5838)
0.9648 0.9337 0.9954 0.9655
SPEC
(0.9095) (0.7374) (0.9462) (0.8968)
Fig. 11. Qualitative segmentation results of both the SNC and SNNC strategy for three types of esophageal lesions
(Barrett, Cancer and Inflammation). The figures above show the ground-truth highlighted with a red line and the
figures below show the predicted mask (the SNC and SNNC) highlighted with a green line. The three columns
represent these three different types of esophageal lesions, respectively.
carcinoma and adenocarcinoma (Asan and Nature 2017). It is beneficial for estimating the esophageal lesion
statuses and making suitable diagnostic schemes; (2) a semi-supervised CNN-based method for esophageal
lesions is required due to lack of the classification labels and annotations when larger training databases in
clinics(Ge, Yang et al. 2019, Ge, Yang et al. 2019, Yin, Zhao et al. 2019).
Acknowledgment
This research was supported in part by the State’s Key Project of Research and Development Plan
under Grant 2017YFA0104302, Grant 2017YFC0109202 and 2017YFC0107900, in part by the National
Natural Science Foundation under Grant 61801003, 61871117and 81471752, in part by the China
Scholarship Council under NO. 201906090145.
REFERENCES
Antony, J., K. McGuinness, N. E. O'Connor and K. Moran (2016). Quantifying radiographic knee
osteoarthritis severity using deep convolutional neural networks. 2016 23rd International
Conference on Pattern Recognition (ICPR), IEEE.
Asan, U. and C. G. A. R. N. J. Nature (2017). "Integrated genomic characterization of oesophageal
carcinoma." 541(7636): 169.
Badrinarayanan, V., A. Kendall, R. J. I. t. o. p. a. Cipolla and m. intelligence (2017). "Segnet: A deep
convolutional encoder-decoder architecture for image segmentation." 39(12): 2481-2495.
Bertalmio, M., A. L. Bertozzi and G. Sapiro (2001). Navier-stokes, fluid dynamics, and image and
video inpainting. Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001, IEEE.
Chen, Y., H. Xie and H. J. I. C. V. Shin (2018). "Multi-layer fusion techniques using a CNN for
multispectral pedestrian detection." 12(8): 1179-1187.
Domingues, I., I. L. Sampaio, H. Duarte, J. A. Santos and P. H. J. I. A. Abreu (2019). "Computer vision
in esophageal cancer: a literature review." 7: 103080-103094.
Everingham, M., L. Van Gool, C. K. Williams, J. Winn and A. Zisserman (2007). "The PASCAL visual
object classes challenge 2007 (VOC2007) results."
Ge, R., G. Yang, Y. Chen, L. Luo, C. Feng, H. Ma, J. Ren and S. Li (2019). "K-net: Integrate left
ventricle segmentation and direct quantification of paired echo sequence." IEEE transactions on
medical imaging 39(5): 1690-1702.
Ge, R., G. Yang, Y. Chen, L. Luo, C. Feng, H. Zhang and S. Li (2019). "PV-LVNet: Direct left ventricle
multitype indices estimation from 2D echocardiograms of paired apical views with deep neural
networks." Medical image analysis 58: 101554.
Georgakopoulos, S. V., D. K. Iakovidis, M. Vasilakakis, V. P. Plagianakos and A. Koulaouzidis (2016).
Weakly-supervised convolutional learning for detection of inflammatory gastrointestinal lesions.
2016 IEEE international conference on imaging systems and techniques (IST), IEEE.
He, K., X. Zhang, S. Ren and J. Sun (2016). Deep residual learning for image recognition. Proceedings
of the IEEE conference on computer vision and pattern recognition.
Hong, J., B.-y. Park and H. Park (2017). Convolutional neural network classifier for distinguishing
Barrett's esophagus and neoplasia endomicroscopy images. 2017 39th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE.
Horie, Y., T. Yoshio, K. Aoyama, S. Yoshimizu, Y. Horiuchi, A. Ishiyama, T. Hirasawa, T. Tsuchida, T.
Ozawa and S. J. G. e. Ishihara (2019). "Diagnostic outcomes of esophageal cancer by artificial
intelligence using convolutional neural networks." 89(1): 25-32.
Hu, Y., C. Hu, H. Zhang, Y. Ping and L.-Q. Chen (2010). "How does the number of resected lymph
AUTHOR ET AL.: TITLE 19
nodes influence TNM staging and prognosis for esophageal carcinoma?" Annals of surgical
oncology 17(3): 784-790.
Janurova, K. and R. Bris (2014). "A nonparametric approach to medical survival data: Uncertainty in
the context of risk in mortality analysis." Reliability Engineering & System Safety 125: 145-152.
Kandemir, M., A. Feuchtinger, A. Walch and F. A. Hamprecht (2014). Digital pathology: Multiple
instance learning can detect Barrett's cancer. 2014 IEEE 11th International Symposium on
Biomedical Imaging (ISBI), IEEE.
Kothari, S., H. Wu, L. Tong, K. E. Woods and M. D. Wang (2016). Automated risk prediction for
esophageal optical endomicroscopic images. 2016 IEEE-EMBS International Conference on
Biomedical and Health Informatics (BHI), IEEE.
Li, B., M. Q.-H. J. I. Meng and V. computing (2009). "Texture analysis for ulcer detection in capsule
endoscopy images." 27(9): 1336-1342.
Mendel, R., A. Ebigbo, A. Probst, H. Messmann and C. Palm (2017). Barrett’s esophagus analysis using
convolutional neural networks. Bildverarbeitung für die Medizin 2017, Springer: 80-85.
Noh, H., S. Hong and B. Han (2015). Learning deconvolution network for semantic segmentation.
Proceedings of the IEEE international conference on computer vision.
Qassim, H., A. Verma and D. Feinzimer (2018). Compressed residual-VGG16 CNN model for big data
places image recognition. 2018 IEEE 8th Annual Computing and Communication Workshop and
Conference (CCWC), IEEE.
Ronneberger, O., P. Fischer and T. Brox (2015). U-net: Convolutional networks for biomedical image
segmentation. International Conference on Medical image computing and computer-assisted
intervention, Springer.
Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra (2017). Grad-cam: Visual
explanations from deep networks via gradient-based localization. Proceedings of the IEEE
international conference on computer vision.
Singh, H., S. Rote, A. Jada, E. D. Bander, G. J. Almodovar-Mercado, W. I. Essayed, R. Härtl, V. K.
Anand, T. H. Schwartz and J. P. J. J. o. n. Greenfield (2018). "Endoscopic endonasal odontoid
resection with real-time intraoperative image-guided computed tomography: report of 4 cases."
128(5): 1486-1491.
Souza, L., C. Hook, J. P. Papa and C. Palm (2017). Barrett’s esophagus analysis using SURF features.
Bildverarbeitung für die Medizin 2017, Springer: 141-146.
Szegedy, C., S. Ioffe, V. Vanhoucke and A. A. Alemi (2017). Inception-v4, inception-resnet and the
impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence.
Tanaka, K., M. Fujiwara and H. J. G. Toyoda (2018). "An unlikely lesion to be identified in the cervical
esophagus." 155(3): 610-612.
Tchoulack, S., J. P. Langlois and F. Cheriet (2008). A video stream processor for real-time detection and
correction of specular reflections in endoscopic images. 2008 Joint 6th International IEEE
Northeast Workshop on Circuits and Systems and TAISA Conference, IEEE.
Van Der Sommen, F., S. Zinger and E. J. Schoon (2013). Computer-aided detection of early cancer in
the esophagus using HD endoscopy images. Medical Imaging 2013: Computer-Aided Diagnosis,
International Society for Optics and Photonics.
Van Der Sommen, F., S. Zinger, E. J. Schoon and P. J. N. De With (2014). "Supportive automatic
annotation of early esophageal cancer using local gabor and color features." 144: 92-106.
Wang, K. K. and R. E. J. A. J. o. G. Sampliner (2008). "Updated guidelines 2008 for the diagnosis,
surveillance and therapy of Barrett's esophagus." 103(3): 788-797.
Wu, Z., C. Shen and A. J. P. R. Van Den Hengel (2019). "Wider or deeper: Revisiting the resnet model
20
Graphical abstract
Fig. 3. Overview of the proposed ELNet for esophageal lesion classification and segmentation.
In this paper, we propose Esophageal Lesion Network (ELNet) based on deep CNNs to
classify and segment the esophageal lesions with four interdependent functional parts:
Preprocessing module, Location module, Classification module and Segmentation module.
(1) To normalize esophageal images, reduce obstruction of irrelevant information and tackle
data imbalance problem, Preprocessing module is used for normalization, specular reflection
removal, and data augmentation from original esophageal images. (2) To highlight esophageal
lesions, Location module employs the Faster RNN for focusing on the ROIs of esophageal
lesions. (3) To accurately estimate esophageal lesion statuses and tackle the challenges of intra-
lesion differences and inter-lesion similarities, Classification module is designed for
classifying four esophageal lesion types (Normal, Inflammation, Barrett, and Cancer). (4) To
obtain accurate annotation at pixel level, Segmentation module is employed to automatically
segment three lesion types (Inflammation, Barrett, and Cancer).