2020 Aaai Ia
2020 Aaai Ia
Anonymous Author
Besides, Attention U-net can be easily employed with mini- nels, multiplying image intensities with a random value in the
mal computational requirement while increasing the model range of 0.7 to 1.3, and adding Gaussian, Salt, Pepper noises.
sensitivity and prediction accuracy. Furthermore, instead of splitting data into two separate parts
With each sample in training set, we apply the normalizing for training and testing, we apply 5-fold cross-validations
step and constructing a series of multi-scale images like in to exploit all data for training network and evaluating the
segmentation task with three different resolutions 180 × 180, performance by computing averaging results at each fold.
256 × 256 and 450 × 450 using smooth Gaussian Kernel In order to suppress the parameter network to be biased
operator. This process is also called Pyramid and each im- by negative class, we apply a modified Jaccard loss function
age in Pyramid is called an octave. The strengths of such [21] to evaluate the quality segmentation:
transforms lies on its ability in searching object faster by us- P
y y + α1
ing a coarse-to-fine strategy and enabling network to exploit L=1− P 2 P 2truth predict
P
more information of object through using multiple resolution ytruth + ypredict − ytruth ypredict + α2
levels. Similar to segmentation task, we also combine all (4)
segmentation feature maps of each image in a scaled set after where ypredict and ytruth are prediction and corresponding
processed by Attention U-net (figure 2) to create a new fea- output for a sample input in training set, α1 and α2 are se-
ture vector for a fully connected layer at the next layer. Such lected with 0 and 1 respectively. Finally, to derive the last
combining process can be considered as an ensemble of sev- binary mask, we apply double thresholding strategy [22]
eral scale Attention U-nets, that can represent the variations where the larger threshold T2 yields seed regions of the seg-
of object with shape, boundary and structures at different mentation, then these regions grow in all directions and the
sizes; therefore, enhancing our performance in identifying smaller threshold T1 is applied on neigbbour pixels to define
the lesion attribute. the boundary of object.
Reminding that we are applying the proposed framework
separately for each lesion attribute, we can model the predic- Experiment
tion problem as five independent binary segmentation prob- We implemented several experiments on an Intel Core i9 com-
lems. Due to the lack of labeling data, for example, with Steak puter with 64 GB of RAM and two GPUs Geforce GTX 1080
attribute (non-empty masks for just 100 images) we leverage Ti on a Linux environment. The Keras library is used with
the data augmentation technique to enlarge the training set. Tensorflow as a back-end. The optimization process has been
In particular, a selected list of employed augmentations are: employed using AMSGrad variant of Adam optimizer [23]
horizontal flips and random vertical, random rotation in the with the learning rate and weighs decay are approximately
range of 0 to 30 degrees, image zooming with a random value 10−4 for all layers. The kernel size for convolution layers
between 0.5 to 1.5, random image translation and shearing and max-pooling layer are 3 × 3, 2 × 2 respectively, while
with value range from 0.2 to 0.4, random shift in color chan- the size of the fully connected layer is 180. We find optimal
values for T2 and T1 thresholds are 0.8 and 0.64 by running ISIC 2018 challenge’s Task 2 show that the proposed method
a grid search strategy. A five-fold cross-validation scheme can achieve the performance close to the best approach on a
has been applied for each lesion structure with 60 epochs for significantly less complex model.
each fold and then computing the performance by averaging Thirdly, we propose a visual explanation framework on
derived results. top of these results: we were able to generate accurate deep
To be consistent with the standard requirements of the visual explanations of model behaviour. With the third point,
ISIC challenge, we use the Jaccard index as main score. the likelihood of social impact is high: the implemented sys-
We compare our cross-validation result with the two top tems could be immediately used by a dermatologist without
methods on the Leaderboards for this challenge: NMN’s prior training. We described how to the machine learning
method [11] and LeHealth’s method [10]. NMN’s method and explanation model in practice and introduced a visual
makes an embedding model based on three different deep explanation framework for dermatologists. In addition, the
learning architectures involving: ResNet [5], ResNetv2 [24], machine learning models are based on public datasets (ISIC).
and DenseNet169 [18] while LeHealth’s method proposes a We applied new machine learning techniques to learn ex-
model called PSPNet (pyramid scene parsing network) de- plainable visual models for the dermatology application do-
veloped based on ResNet [5]. At the moment of writing, the main and created an explanation interface for the dermatol-
2018 ISIC Challenge is closed to further submissions; there- ogist through the visualization of typical lesion attributes
fore we refer to the information published on the organizers (figure 4). In future work, we will conduct experiments in
website as results for these methods. which dermatologists perform lesion classification experi-
ments with the help of the medical decision support presented
Results by our visually explainable learning systems for skin lesion
detection in an evaluation study. Future works includes the
Figure 4 illustrates some predicted results of our method for measurement of visual explanation effectiveness in the deci-
two typical lesion attributes: globules, pigment network and sion support experiments and interactive machine learning
streaks in which blue regions indicate ground-truth labels experiments where the doctor can identify (visual) errors
and red regions represent the predictions. Table 2 shows a and correct errors to fuel future AI research of continuous
comparison of the performances of our system against the training of machine learning models in deployed application
NMN and LeHealth teams. Our average Jaccard index result contexts.
is 0.278, which is 0.002 more compared to LeHealths method
and 0.029 less than the best approach. Our method surpasses
the competitors in two categories out of five, Globules and
References
Streaks, but performs poorly in the detection of Negative [1] Rebecca L Siegel, Kimberly D Miller, and Ahmedin
Networks. Jemal. Cancer statistics, 2016. CA: a cancer journal
The critical point here is that even though we use only the for clinicians, 66(1):7–30, 2016.
Attention U-net with moderate network size, still we measure [2] Nabin K Mishra and M Emre Celebi. An overview
an accuracy that is comparable to the best method, which has of melanoma detection in dermoscopy images using
been combined by many large size models such as ResNet image processing and machine learning. arXiv preprint
or DenseNet. In other words, our proposed framework has arXiv:1601.07843, 2016.
the potential to improve performance by applying embedding [3] Noel Codella, Veronica Rotemberg, Philipp Tschandl,
tactics. However, it is a trade-off between the complexity of M Emre Celebi, Stephen Dusza, David Gutman, Brian
the model and the desired accuracy since in practice, such Helba, Aadi Kalloo, Konstantinos Liopyris, Michael
prediction framework only play as a supporting tool for doc- Marchetti, et al. Skin lesion analysis toward melanoma
tors in making the final diagnosis; therefore, since there are detection 2018: A challenge hosted by the interna-
no significant differences between two methods, the models tional skin imaging collaboration (isic). arXiv preprint
complexity may be an essential priority factor as, for example, arXiv:1902.03368, 2019.
when embedding a diagnose system in mobile devices.
[4] Karen Simonyan and Andrew Zisserman. Very deep
convolutional networks for large-scale image recogni-
Conclusion tion. arXiv preprint arXiv:1409.1556, 2014.
In this work we proposed three new contributions: firstly, [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
we proposed a new approach to automatically predict the Sun. Deep residual learning for image recognition. In
locations of visual dermoscopic attributes for Task 2 of ISIC Proceedings of the IEEE conference on computer vision
2018 Challenges. Our method is developed based on Atten- and pattern recognition, pages 770–778, 2016.
tion U-net with multi-scale images as an input.
Secondly, we applied a new strategy to transfer learning, [6] Jonathan Long, Evan Shelhamer, and Trevor Darrell.
adapting from weights of a segmentation deep-network. The Fully convolutional networks for semantic segmenta-
main component in our framework is Attention U-net with tion. In Proceedings of the IEEE conference on com-
weights initialized from a segmentation network built for puter vision and pattern recognition, pages 3431–3440,
ISIC 2018 challenge’s Task 1. Besides, we use a series of 2015.
multi-scale images as inputs for our structure to exploit bet- [7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox.
ter intermediate feature representations. The experiments on U-net: Convolutional networks for biomedical image
Figure 4: illustrating some results for two typical lesion attributes: Pigment Network (top row), Globules (middle row) and
Streaks (bottom row) where the blue region indicating for ground-truth label, red region indicating for our prediction.
Table 2: Comparing our result with NMN and LeHealth team (Jaccard Index). In bold, the best performing method.
Method Pigment Network Globules Milia like Cysts Negative Network Streaks Average
Our Method 0.535 0.312 0.162 0.187 0.197 0.278
NMN’s method 0.544 0.252 0.165 0.285 0.123 0.307
LeHealth’s method 0.482 0.239 0.132 0.225 0.145 0.276
segmentation. In International Conference on Medical computer vision and pattern recognition, pages 4700–
image computing and computer-assisted intervention, 4708, 2017.
pages 234–241. Springer, 2015. [19] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
[8] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross and Li Fei-Fei. Imagenet: A large-scale hierarchical
Girshick. Mask r-cnn. In Proceedings of the IEEE in- image database. In 2009 IEEE conference on computer
ternational conference on computer vision, pages 2961– vision and pattern recognition, pages 248–255. Ieee,
2969, 2017. 2009.
[9] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew [20] Alex Krizhevsky, Geoffrey Hinton, et al. Learning
Lee, Mattias Heinrich, Kazunari Misawa, Kensaku multiple layers of features from tiny images. Technical
Mori, Steven McDonagh, Nils Y Hammerla, Bernhard report, Citeseer, 2009.
Kainz, et al. Attention u-net: Learning where to look for [21] Yading Yuan and Yeh-Chi Lo. Improving dermoscopic
the pancreas. arXiv preprint arXiv:1804.03999, 2018. image segmentation with enhanced convolutional-
[10] Jinyi Zou, Xiao Ma, Cheng Zhong, and Yao Zhang. deconvolutional networks. IEEE journal of biomedical
Dermoscopic image analysis for isic challenge 2018. and health informatics, 23(2):519–526, 2017.
arXiv preprint arXiv:1807.08948, 2018. [22] Yang Shen, Chen Shu-zhen, and Zhang Bing. An im-
[11] Navid Alemi Koohbanani, Mostafa Jahanifar, Neda Za- proved double-threshold method based on gradient his-
mani Tajeddin, Ali Gooya, and Nasir Rajpoot. Lever- togram. Wuhan University Journal of Natural Sciences,
aging transfer learning for segmenting lesions and 9(4):473–476, 2004.
their attributes in dermoscopy images. arXiv preprint [23] Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On
arXiv:1809.10243, 2018. the convergence of adam and beyond. arXiv preprint
[12] Gabriëlle Ras, Marcel van Gerven, and Pim Haselager. arXiv:1904.09237, 2019.
Explanation Methods in Deep Learning: Users, Val- [24] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke,
ues, Concerns and Challenges, pages 19–36. Springer and Alexander A Alemi. Inception-v4, inception-resnet
International Publishing, Cham, 2018. and the impact of residual connections on learning. In
[13] Tim Miller. Explanation in artificial intelligence: In- Thirty-First AAAI Conference on Artificial Intelligence,
sights from the social sciences. Artif. Intell., 267:1–38, 2017.
2019.
[14] Mengnan Du, Ninghao Liu, Qingquan Song, and Xia
Hu. Towards explanation of dnn-based prediction with
guided feature inversion. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD ’18, pages 1358–
1367, New York, NY, USA, 2018. ACM.
[15] David Gunning and David Aha. DARPA’s Explainable
Artificial Intelligence (XAI) Program. AI Magazine,
40(2):44–58, Jun. 2019.
[16] Housam Khalifa Bashier Babiker and Randy Goebel.
An introduction to deep visual explanation. CoRR,
abs/1711.09482, 2017.
[17] François Chollet. Xception: Deep learning with depth-
wise separable convolutions. In Proceedings of the
IEEE conference on computer vision and pattern recog-
nition, pages 1251–1258, 2017.
[18] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and
Kilian Q Weinberger. Densely connected convolutional
networks. In Proceedings of the IEEE conference on