2020 Aaai Ia

AI Paper proposal on IUIs

Uploaded by

Daniel Sonntag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views8 pages

2020 Aaai Ia

AI Paper proposal on IUIs

Uploaded by

Daniel Sonntag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Visually Explainable Learning Systems for Skin Lesion Detection

Anonymous Author

Abstract attributes are negative network, pigment network, streaks,

globules, and milia-like cysts [2]. They indicate whether
One of the most dangerous types of skin cancer is melanoma the lesion is benign or malignant. These features may be
which causes thousands of deaths every year. In recent years,
global distribution, spanning over a massive area or be a local
the International Skin Imaging Collaboration (ISIC) hosted
the Challenges to help the diagnosis in the early stage of distribution in a small area, and even be present at multiple
Melanoma based on dermoscopic images. In this work, first, spots in the lesion. The automatic detection and visualisation
we propose a new approach to automatically predict the lo- of those skin lesion attributes, therefore, is critical and can be
cations of visual dermoscopic attributes for Task 2 of ISIC a tremendous support for doctors when diagnosing melanoma
2018 Challenges. Our method is developed based on Attention in an early phase and explaining the machine learning system
U-net with multi-scale images as an input. Second, we apply decisions. We focus on the performance vs. explainability
a new strategy to transfer learning, adapting from weights trade-off.
of a segmentation deep-network. Third, we propose a visual In recent years, the competitions to seek the best algorithm
explanation framework on top of these results: we are able to
that can diagnose melanoma automatically have been orga-
generate accurate deep visual explanations of model behaviour
to help doctors in diagnosis and treatment decisions. The ex- nized by International Skin Imaging Collaboration (ISIC) [3].
periments show that the proposed algorithms with deep visual The ISIC challenge in 2018 involves three subtasks. The first
explanations outperform the 2nd team (LeHealth) while at the one is to segment the lesion boundaries from images. The
same time have a much better performance vs. explanability second one is to predict the positions of five skin lesion at-
trade-off. tributes, which are negative network, pigment network, milia-
like cysts, streaks, and globules. The final one is to predict
seven different diseases in dermoscopic images, which are
Introduction melanoma, basal cell carcinoma, melanocytic nevus, actinic
Skin cancer is one of the most popular diseases with more keratosis, benign keratosis, vascular lesion, and dermatofi-
than one million skin cancers diagnosed in the United broma. The motivation behind these tasks is to simulate the
States each year. The most dangerous type of skin cancer pipeline for lesion analysis by dermatologists.
is melanoma, that causes over 9.000 deaths and 76.380 new In this paper, we focus on the first and second tasks of
cases according to the American Cancer Society [1]. The late lesion analysis, which seems to be the most challenging ones,
stage in the Melanoma usually spread to other important parts as a basis for the visually explainable learning system. Firstly,
of the body in the form of metastases, making it more difficult the predictive power of deep learning approaches reduces
to treat, leading to a high mortality rate. However, Melanoma the bias of a doctor for lesion classification; secondly, the
in the early stage can be treated successfully and several ef- demonstrated explainability aspect ensures fairness of the
forts have been studied to detect Melanoma in its early phase. widespread application of visual explanation systems in the
Dermoscopy is one common method used in healthcare to medical domain.
examine a skin lesion. It can generate high-resolution images
of the lesion regions on the skin. Unfortunately, in order to Background and Related Work
diagnose Melanoma, it still demands the dermatologist to
evaluate the images manually based on several skin lesion Machine Learning in Dermatology
patterns. Partly automatizing skin cancer detection would
have a strong societal impact. Of the three steps currently solved by Deep CNN in the
In this work, we propose an explainable learning system for medical diagnosis of a skin lesion (segmentation, feature
Skin Lesion Detection. Some common visual dermoscopic extraction, and classification) only the first two give doc-
tor the means to explain, verify, and justify a diagnosis. In
Copyright c 2020, Association for the Advancement of Artificial fact, while the classification can be seen as a black box, seg-
Intelligence (www.aaai.org). All rights reserved. mentation and feature extraction give a visual feedback of
known features which doctors can autonomously evaluate the previous sections. Utilizing U-net, transfer learning, and
and question. multi-scale images with intermediate feature representations,
Until now, a lot of methods have been proposed for the they support model transparency, bringing together explana-
extraction of features of skin lesions, all based on variants tion capabilities from the machine learning perspective of
of convolutional neural networks (CNN) such as VGG [4], deep neural networks [12] with the perspective of cognitive
ResNet [5], FCN [6], U-Net [7] or Mask R-CNN [8]. How- psychology [13] to be explored in the medical domain.
ever, unlike lesion segmentation or lesion diagnosis, the best Existing attempts based on local interpretations aim to
performance in feature extraction (Task 2) is still very low, identify relevant features usually simply ignore the interme-
with the highest score (Jaccard index) just above 30% com- diate layers of the deep neural network that might contain
pared to over 80% and 88% in Task 1 and Task 3, respectively. rich information for interpretation [14].
The reasons behind are the lack of annotated data, severely Against this background, our new approach is the imple-
imbalanced dataset, and complex structures with variants mentation of a visual explanation framework in the derma-
appearance depend on different patients. To deal with these tology domain, based on [15] and illustrated in figure 1. Our
issues, most of the approaches utilized the transfer learning resulting deep visual explanations [16] apply new techniques
strategy to fine-tune powerful pre-trained deep learning mod- to learn a machine learning model and an explainable model
els such as VGG, ResNet, U-Net and usually ensemble them at the same time. Rich information for interpretation can
in one network to make a final prediction. Although these thus be directly related to the prediction of the positions of
techniques can improve performance efficiently, it requires five skin lesion types, which are negative network, pigment
a high memory to store model and time-consuming in the network, milia-like cysts, streaks, and globules in our derma-
training process, thus limiting scalability on multiple devices. tology application domain.
Differ with recent methods, we propose a new strategy for To be more precise, deeply explainable AI systems
prediction five skin lesion attributes based on a single variant (DEXAI) [15] generate accurate explanations of lesion classi-
U-Net called Attention U-Net [9]. Our new approach differs fication behaviour by visually locating negative networks and
in the following way: pigment network of the skin lesions on the dermatoscopic
• First, by utilizing Attention U-Net, the model can learn image to be classified. Those visualisations are most useful
to focus on specific structures that vary on both shapes to users, if they adhere to the mental model of the doctor in
and sizes. With the main advantages in learning to sup- the specific medical application domain. In other words, the
press irrelevant areas while accentuating salient features, domain ontology of medically relevant visual features build
Attention U-Net allows the model to be more stable to the explicit structures that are inherently understandable for the
foreground without requiring complicated heuristics dermatologists, and at the same time, delivered as explana-
tions derived from complex latent representations in the deep
• Second, instead of using a trained model such as ResNet neural networks. The trick is to use rational explanations of
or VGG to initialize network parameters, we present a the user in the sense of a dermatology-related user model to
novel way to transfer learning by training the Attention decide which visual explanations to select.
U-Net directly on segmentation task first to derive the le-
sion boundary, then adapt trained weights as an initialized Method
weight for the lesion attributes prediction task. This idea is
motivated from the key point that most of the lesion struc- There are two principle ways to predict lesion attributes.
tures be inside the lesion boundary, so initializing weights The first kind tries to train a network that can predict for
by this way can be considered as a step to reduce the im- all five attributes together, while the second type focuses on
pact of surrounding foreground thus making model can training separate networks for each type of lesion attribute.
converge faster. Furthermore, by employing only Atten- In this work, we apply the second strategy with two main
tion U-Net architecture, we can downgrade the amount of reasons. The first reason is to avoid the negative impact of
memory for storing model on each device; thus allowing to the unbalanced class issue in the dataset, while the second
train the network without difficulties in finding compatible reason is to facilitate the incorporation of constraints for
devices. each structure such as the shape, texture as well as more
convenient in monitoring during the training process.
• Finally, we use multi-scale images as a sequence of inputs
rather than a single image as conventional approaches to Datasets
exploit better intermediate feature representations. Further- In this paper, we used two datasets downloaded from ISIC
more, we also apply a modified version of the Jaccard 2018 challenges website1 . The first data set pertains to Task
index metric to deal with an imbalanced label. The exten- 1 and includes 2594 images with corresponding ground-truth
sively experimental results in ISIC 2018 challenge dataset mask images for lesion segmentation. The images have dif-
show that our proposed method outperforms the team rank ferent sizes changing from few hundreds to a thousand pix-
2nd (LeHealth) [10] and keeping a small margin with the els with a diverse ratio of width and height. We utilize this
best team (NMN) [11]. dataset to train a network that is able to segment lesion bound-
ary automatically in dermoscopy images. Then, we consider
Explanation Framework
the trained weight as initialization for task 2 – predicting
Recent machine learning systems have success due to opaque
1
deep learning approaches, such as the ones introduced in https://challenge2018.isic-archive.com/
Figure 1: Visual explanation framework

ing a binary mask distinguishing pixels of lesions from the

Table 1: Distribution of mask images. pixels of unaffected skin.
Lesion Attributes Mask count Rate (%)
Unlike the recent approaches, like [11] and [10] for ISIC
Pigment Network 1522 58.7 % 2018 task 2, we do not use the models such as VGG-18 [4],
Milia-like cysts 681 26.3 % ResNet151 [5], Xception [17] or DensNet169 [18] which
Globules 602 23.2 % are trained on huge data set like ImageNet [19] or CiFar
Negative Network 189 7.3 % [20]. Differently, our method tries to use only attention U-net
Streaks 100 2.9% architecture illustrated in Figure 2.
Total images 2594 100% We randomly sampled 70% of the total 2594 images of
task 1 as a training set, holding the remaining ones for testing.
For each image in the training set, we applied a preprocessing
five lesion attributes. The second dataset comprises 2594 step to center data by subtracting the mean per channel and
images with 12.970 ground-truth masks (one separate mask constructing multi-scale versions with three corresponding
for each attribute) as training data. However, since most of sizes: 180 × 180, 256 × 256, and 450 × 450. At the next
the attributes do not appear together for a skin image, the step, each image in the scaled set was fed into one separate
corresponding masks are empty. Table 1 represents in detail a attention U-net illustrated in figure 2. The outputs of three
distribution for each class where the highest and second rate segmentation maps then are concatenated to form a new
are 58.7% and 26.3% for Pigment network and Millia-like input for the fully-connected architecture at the next layer
cysts, respectively, while the lowest rate is 2.9% for Streaks. (see Figure3). Since we are concerned on segmentation task,
This imbalance may make predictions for five lesion class si- the outcome of output layer is a predicted binary image to
multaneously more complicated when a trained network will estimate error with the ground truth one.
be severely biased towards the classes having lot of training We trained the proposed framework with 40 epochs and
data while usually ignoring predictions for classes with less obtained a performance of approximately 76% (Jaccard in-
samples. To deal with this problem, we construct for each dex) on a test set of 780 images — closely matching the
lesion attribute a separate network, thus creating total five baseline results on the leaderboards from ISIC 2018 Task 12 .
different networks in a final system. The main strength of The Jaccard index constrains the generated lesion boundaries
this strategy lies on its ability to give several predictions for a to be spatially and geometrically precise invariant of lesion
same lesion area, thus significantly diminishing misdiagnosis contours; it computes the expectation of pixel-wise similarity
during treatment. between a predicted image segmentation and its correspond-
ing ground truth as a measure of the intersection over union.
Task 1: Segmentation The Jaccard index for a mask image is defined as:
This section describes how we can generate the initialization
network parameters by exploiting data in ISIC 2018 Task 1:
2
the segmentation task. The segmentation task entails generat- https://challenge2018.isic-archive.com/leaderboards/
Figure 2: Attention U-net architecture

architecture comprises a contracting path to extract local fea-

P
y y tures and an expansive path, to resample the feature maps
J=P 2 P 2truth predict
P (1) with contextual information. Besides, low-resolution global
ytruth + ypredict − ytruth ypredict features and high-resolution local features are integrated with
where ytruth and ypredict represent the ground truth and skip connections and thus allowing more semantically mean-
predicted pixel values respectively, with y ∈ {0, 1}, with ingful outputs. However, by using cascaded convolutions and
sums taken over the dimension of the image. non-linear activation function, spatial information tends to
get lost in the high-level output maps. This issue makes U-net
Transfer Learning difficult to reduce false predictions when dealing with small
objects that usually show large shape variability [9]. To over-
Although the results do not surpass the state of the art, we
come this issue, the Attention Gates (AG) was introduced to
obtained good enough initialization to leverage the concate-
recognize relevant spatial information from low-level feature
nated Attention U-Net as our pre-trained model for Task
and spread it to the decoding path. For each input feature
2. This strategy has some benefits. First of all, by examin-
map xL at layer L, AG will provide an attention coefficients
ing the data, we found out that most of the lesion attributes
α to transform from the input feature map xL to an output se-
are located near the lesion boundary, therefore initializing
mantical features x̂L by: x̂L = xL α, where denotes the
weights from the segmentation network can be considered
element-wise product operator. Each attention coefficients αil
as a preprocessing step to lessen the effect of surrounding
in α can be computed by:
foreground. Consequently, the supporting model can predict
more precisely positions of lesion attributes. Secondly, by l
qatt = ψ T (σ1 (WxT xli + WgT gi + bg )) + bψ (2)
employing only Attention U-net architecture, we reduced the
amount of memory for needed for storing a segmentation (or
feature extraction) model. αil = σ2 (qatt
l
)(xli , gi )) (3)
where σ1 , σ2 are ReLU and sigmoid activation function,
Task 2: Feature Extraction respectively, gi is a gating vector is used for each pixel i to
In this section, we present our architecture to predict five determine the focus regions, ψ, WxT , WgT are linear transfor-
lesion attributes. The proposed method is illustrated in Figure mations and bg , bψ are bias terms. The linear transformations
2 where the main component is the Attention U-net [9] de- are calculated using channel-wise 1 × 1 × 1 convolutions for
scribed in Figure 3 whose network parameter for each block the input tensors. The significant benefit of Attention U-net
are initialized from the segmentation task. Attention U-Net is the ability to learn automatically focusing on target struc-
is developed based on the popular U-Net [7], which has been tures without additional supervision, thus enable to eliminate
proven to be very effective with a small dataset. The U-net the necessity of using an external object localization model.
Figure 3: The proposed architecture with multi-scale inputs

Besides, Attention U-net can be easily employed with mini- nels, multiplying image intensities with a random value in the
mal computational requirement while increasing the model range of 0.7 to 1.3, and adding Gaussian, Salt, Pepper noises.
sensitivity and prediction accuracy. Furthermore, instead of splitting data into two separate parts
With each sample in training set, we apply the normalizing for training and testing, we apply 5-fold cross-validations
step and constructing a series of multi-scale images like in to exploit all data for training network and evaluating the
segmentation task with three different resolutions 180 × 180, performance by computing averaging results at each fold.
256 × 256 and 450 × 450 using smooth Gaussian Kernel In order to suppress the parameter network to be biased
operator. This process is also called Pyramid and each im- by negative class, we apply a modified Jaccard loss function
age in Pyramid is called an octave. The strengths of such [21] to evaluate the quality segmentation:
transforms lies on its ability in searching object faster by us- P
y y + α1
ing a coarse-to-fine strategy and enabling network to exploit L=1− P 2 P 2truth predict
P
more information of object through using multiple resolution ytruth + ypredict − ytruth ypredict + α2
levels. Similar to segmentation task, we also combine all (4)
segmentation feature maps of each image in a scaled set after where ypredict and ytruth are prediction and corresponding
processed by Attention U-net (figure 2) to create a new fea- output for a sample input in training set, α1 and α2 are se-
ture vector for a fully connected layer at the next layer. Such lected with 0 and 1 respectively. Finally, to derive the last
combining process can be considered as an ensemble of sev- binary mask, we apply double thresholding strategy [22]
eral scale Attention U-nets, that can represent the variations where the larger threshold T2 yields seed regions of the seg-
of object with shape, boundary and structures at different mentation, then these regions grow in all directions and the
sizes; therefore, enhancing our performance in identifying smaller threshold T1 is applied on neigbbour pixels to define
the lesion attribute. the boundary of object.
Reminding that we are applying the proposed framework
separately for each lesion attribute, we can model the predic- Experiment
tion problem as five independent binary segmentation prob- We implemented several experiments on an Intel Core i9 com-
lems. Due to the lack of labeling data, for example, with Steak puter with 64 GB of RAM and two GPUs Geforce GTX 1080
attribute (non-empty masks for just 100 images) we leverage Ti on a Linux environment. The Keras library is used with
the data augmentation technique to enlarge the training set. Tensorflow as a back-end. The optimization process has been
In particular, a selected list of employed augmentations are: employed using AMSGrad variant of Adam optimizer [23]
horizontal flips and random vertical, random rotation in the with the learning rate and weighs decay are approximately
range of 0 to 30 degrees, image zooming with a random value 10−4 for all layers. The kernel size for convolution layers
between 0.5 to 1.5, random image translation and shearing and max-pooling layer are 3 × 3, 2 × 2 respectively, while
with value range from 0.2 to 0.4, random shift in color chan- the size of the fully connected layer is 180. We find optimal
values for T2 and T1 thresholds are 0.8 and 0.64 by running ISIC 2018 challenge’s Task 2 show that the proposed method
a grid search strategy. A five-fold cross-validation scheme can achieve the performance close to the best approach on a
has been applied for each lesion structure with 60 epochs for significantly less complex model.
each fold and then computing the performance by averaging Thirdly, we propose a visual explanation framework on
derived results. top of these results: we were able to generate accurate deep
To be consistent with the standard requirements of the visual explanations of model behaviour. With the third point,
ISIC challenge, we use the Jaccard index as main score. the likelihood of social impact is high: the implemented sys-
We compare our cross-validation result with the two top tems could be immediately used by a dermatologist without
methods on the Leaderboards for this challenge: NMN’s prior training. We described how to the machine learning
method [11] and LeHealth’s method [10]. NMN’s method and explanation model in practice and introduced a visual
makes an embedding model based on three different deep explanation framework for dermatologists. In addition, the
learning architectures involving: ResNet [5], ResNetv2 [24], machine learning models are based on public datasets (ISIC).
and DenseNet169 [18] while LeHealth’s method proposes a We applied new machine learning techniques to learn ex-
model called PSPNet (pyramid scene parsing network) de- plainable visual models for the dermatology application do-
veloped based on ResNet [5]. At the moment of writing, the main and created an explanation interface for the dermatol-
2018 ISIC Challenge is closed to further submissions; there- ogist through the visualization of typical lesion attributes
fore we refer to the information published on the organizers (figure 4). In future work, we will conduct experiments in
website as results for these methods. which dermatologists perform lesion classification experi-
ments with the help of the medical decision support presented
Results by our visually explainable learning systems for skin lesion
detection in an evaluation study. Future works includes the
Figure 4 illustrates some predicted results of our method for measurement of visual explanation effectiveness in the deci-
two typical lesion attributes: globules, pigment network and sion support experiments and interactive machine learning
streaks in which blue regions indicate ground-truth labels experiments where the doctor can identify (visual) errors
and red regions represent the predictions. Table 2 shows a and correct errors to fuel future AI research of continuous
comparison of the performances of our system against the training of machine learning models in deployed application
NMN and LeHealth teams. Our average Jaccard index result contexts.
is 0.278, which is 0.002 more compared to LeHealths method
and 0.029 less than the best approach. Our method surpasses
the competitors in two categories out of five, Globules and
References
Streaks, but performs poorly in the detection of Negative [1] Rebecca L Siegel, Kimberly D Miller, and Ahmedin
Networks. Jemal. Cancer statistics, 2016. CA: a cancer journal
The critical point here is that even though we use only the for clinicians, 66(1):7–30, 2016.
Attention U-net with moderate network size, still we measure [2] Nabin K Mishra and M Emre Celebi. An overview
an accuracy that is comparable to the best method, which has of melanoma detection in dermoscopy images using
been combined by many large size models such as ResNet image processing and machine learning. arXiv preprint
or DenseNet. In other words, our proposed framework has arXiv:1601.07843, 2016.
the potential to improve performance by applying embedding [3] Noel Codella, Veronica Rotemberg, Philipp Tschandl,
tactics. However, it is a trade-off between the complexity of M Emre Celebi, Stephen Dusza, David Gutman, Brian
the model and the desired accuracy since in practice, such Helba, Aadi Kalloo, Konstantinos Liopyris, Michael
prediction framework only play as a supporting tool for doc- Marchetti, et al. Skin lesion analysis toward melanoma
tors in making the final diagnosis; therefore, since there are detection 2018: A challenge hosted by the interna-
no significant differences between two methods, the models tional skin imaging collaboration (isic). arXiv preprint
complexity may be an essential priority factor as, for example, arXiv:1902.03368, 2019.
when embedding a diagnose system in mobile devices.
[4] Karen Simonyan and Andrew Zisserman. Very deep
convolutional networks for large-scale image recogni-
Conclusion tion. arXiv preprint arXiv:1409.1556, 2014.
In this work we proposed three new contributions: firstly, [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
we proposed a new approach to automatically predict the Sun. Deep residual learning for image recognition. In
locations of visual dermoscopic attributes for Task 2 of ISIC Proceedings of the IEEE conference on computer vision
2018 Challenges. Our method is developed based on Atten- and pattern recognition, pages 770–778, 2016.
tion U-net with multi-scale images as an input.
Secondly, we applied a new strategy to transfer learning, [6] Jonathan Long, Evan Shelhamer, and Trevor Darrell.
adapting from weights of a segmentation deep-network. The Fully convolutional networks for semantic segmenta-
main component in our framework is Attention U-net with tion. In Proceedings of the IEEE conference on com-
weights initialized from a segmentation network built for puter vision and pattern recognition, pages 3431–3440,
ISIC 2018 challenge’s Task 1. Besides, we use a series of 2015.
multi-scale images as inputs for our structure to exploit bet- [7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox.
ter intermediate feature representations. The experiments on U-net: Convolutional networks for biomedical image
Figure 4: illustrating some results for two typical lesion attributes: Pigment Network (top row), Globules (middle row) and
Streaks (bottom row) where the blue region indicating for ground-truth label, red region indicating for our prediction.
Table 2: Comparing our result with NMN and LeHealth team (Jaccard Index). In bold, the best performing method.
Method Pigment Network Globules Milia like Cysts Negative Network Streaks Average
Our Method 0.535 0.312 0.162 0.187 0.197 0.278
NMN’s method 0.544 0.252 0.165 0.285 0.123 0.307
LeHealth’s method 0.482 0.239 0.132 0.225 0.145 0.276

segmentation. In International Conference on Medical computer vision and pattern recognition, pages 4700–
image computing and computer-assisted intervention, 4708, 2017.
pages 234–241. Springer, 2015. [19] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
[8] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross and Li Fei-Fei. Imagenet: A large-scale hierarchical
Girshick. Mask r-cnn. In Proceedings of the IEEE in- image database. In 2009 IEEE conference on computer
ternational conference on computer vision, pages 2961– vision and pattern recognition, pages 248–255. Ieee,
2969, 2017. 2009.
[9] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew [20] Alex Krizhevsky, Geoffrey Hinton, et al. Learning
Lee, Mattias Heinrich, Kazunari Misawa, Kensaku multiple layers of features from tiny images. Technical
Mori, Steven McDonagh, Nils Y Hammerla, Bernhard report, Citeseer, 2009.
Kainz, et al. Attention u-net: Learning where to look for [21] Yading Yuan and Yeh-Chi Lo. Improving dermoscopic
the pancreas. arXiv preprint arXiv:1804.03999, 2018. image segmentation with enhanced convolutional-
[10] Jinyi Zou, Xiao Ma, Cheng Zhong, and Yao Zhang. deconvolutional networks. IEEE journal of biomedical
Dermoscopic image analysis for isic challenge 2018. and health informatics, 23(2):519–526, 2017.
arXiv preprint arXiv:1807.08948, 2018. [22] Yang Shen, Chen Shu-zhen, and Zhang Bing. An im-
[11] Navid Alemi Koohbanani, Mostafa Jahanifar, Neda Za- proved double-threshold method based on gradient his-
mani Tajeddin, Ali Gooya, and Nasir Rajpoot. Lever- togram. Wuhan University Journal of Natural Sciences,
aging transfer learning for segmenting lesions and 9(4):473–476, 2004.
their attributes in dermoscopy images. arXiv preprint [23] Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On
arXiv:1809.10243, 2018. the convergence of adam and beyond. arXiv preprint
[12] Gabriëlle Ras, Marcel van Gerven, and Pim Haselager. arXiv:1904.09237, 2019.
Explanation Methods in Deep Learning: Users, Val- [24] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke,
ues, Concerns and Challenges, pages 19–36. Springer and Alexander A Alemi. Inception-v4, inception-resnet
International Publishing, Cham, 2018. and the impact of residual connections on learning. In
[13] Tim Miller. Explanation in artificial intelligence: In- Thirty-First AAAI Conference on Artificial Intelligence,
sights from the social sciences. Artif. Intell., 267:1–38, 2017.
2019.
[14] Mengnan Du, Ninghao Liu, Qingquan Song, and Xia
Hu. Towards explanation of dnn-based prediction with
guided feature inversion. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD ’18, pages 1358–
1367, New York, NY, USA, 2018. ACM.
[15] David Gunning and David Aha. DARPA’s Explainable
Artificial Intelligence (XAI) Program. AI Magazine,
40(2):44–58, Jun. 2019.
[16] Housam Khalifa Bashier Babiker and Randy Goebel.
An introduction to deep visual explanation. CoRR,
abs/1711.09482, 2017.
[17] François Chollet. Xception: Deep learning with depth-
wise separable convolutions. In Proceedings of the
IEEE conference on computer vision and pattern recog-
nition, pages 1251–1258, 2017.
[18] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and
Kilian Q Weinberger. Densely connected convolutional
networks. In Proceedings of the IEEE conference on