Computer Vision

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO.

2, MARCH 2019 547

DermaKNet: Incorporating the Knowledge of


Dermatologists to Convolutional Neural
Networks for Skin Lesion Diagnosis
Iván González-Dı́az , Member, IEEE

Abstract—Traditional approaches to automatic diagnosis improve it by providing valuable information about the clini-
of skin lesions consisted of classifiers working on sets of cal case, and serving as filtering tools that automatically detect
hand-crafted features, some of which modeled lesion as- those cases with a high confidence of benignity, which can have
pects of special importance for dermatologists. Recently,
the broad adoption of convolutional neural networks (CNNs) a great impact in the final amount of moles that must be analyzed
in most computer vision tasks has brought about a great by the clinicians.
leap forward in terms of performance. Nevertheless, with However, despite the research efforts devoted to the topic,
this performance leap, the CNN-based computer-aided di- these systems have yet to become part of everyday clinical prac-
agnosis (CAD) systems have also brought a notable reduc- tice. From our point of view, there are two factors currently ham-
tion of the useful insights provided by hand-crafted fea-
tures. This paper presents DermaKNet, a CAD system based pering the adoption of CAD systems by dermatologists. Firstly,
on CNNs that incorporates specific subsystems modeling the lack of large, open, annotated datasets, containing images
properties of skin lesions that are of special interest to der- of lesions gathered by different medical institutions and a great
matologists aiming to improve the interpretability of its di- variety of dermatoscopes, has undermined the generalization
agnosis. Our results prove that the incorporation of these capability of developed CAD systems, leading to poor results
subsystems not only improves the performance, but also
enhances the diagnosis by providing more interpretable when applied to different datasets. Additionally, it has prevented
outputs. standard and fair comparisons between proposed methods, thus
hindering the scientific advances in the field. Secondly, most
Index Terms—Skin lesion analysis, melanoma, convolu-
tional neural networks, dermoscopy, CAD.
of CAD systems simply provide a tentative diagnosis to the
clinicians, which does not actually help them much in practice.
I. INTRODUCTION Hence, it would be more desirable for these systems to be able
to provide some insight about the elements and properties of the
ARLY Melanoma Diagnosis is one of the traditional fields
E of application of Computer Aided Diagnosis (CAD) sys-
tems. In addition to the high incidence and aggressiveness of
lesion that support the diagnosis.
In regard to the first factor, the International Skin Imag-
ing Collaboration: Melanoma Project (ISIC1 ) is an academia-
melanoma (it is the skin cancer that causes the most deaths in
industry partnership created to facilitate digital skin imaging
Europe [1]), there are other aspects that make it an specially
technologies to help reduce melanoma mortality. In addition to
suitable field for automatic diagnosis methods. For example,
developing standards to address the technologies, techniques,
the early removal of the lesion completely cures the disease, ef-
and terminology used in skin imaging, ISIC is continuously
fectively preventing metastasis [2]. Melanocytes are one of the
building an open source public access archive (ISIC Archive2 )
very few cells that are naturally colored and visible to the eye,
of skin images that allows researchers to assess and validate
which make them possible to diagnose using clinical images.
their CAD systems. The archive is large and includes images
Also, the use of portable and affordable acquisition instruments,
acquired using different devices from multiple medical insti-
such as dermatoscopes, improves the accuracy in the diagnosis
tutions. Furthermore, since 2016, the association is also pro-
can be improved by 5–30% [3]. As a result, there is a growing
moting the research in the field by organizing an International
interest in incorporating automatic systems in the daily practice
Challenge in which automatic methods for lesion segmentation,
of dermatologists, aiming not to replace their diagnosis, but to
dermoscopic feature detection and skin disease diagnosis are
evaluated using images of the archive [4], [5].
Manuscript received October 13, 2017; revised December 5, 2017
and January 15, 2018; accepted February 12, 2018. Date of publication With respect to the second, in the last few years there have
February 16, 2018; date of current version March 6, 2019.This work was been changes in machine learning technology that have in-
supported in part by the National Grant TEC2014-53390-P and National creased the difficulty of interpreting the results of the CAD
Grant TEC2014-61729-EXP of the Spanish Ministry of Economy and
Competitiveness, and in part by NVIDIA Corporation with the donation systems. Whereas traditional approaches relied on low-level
of the TITAN X GPU. handcrafted features computed over the lesion [6], [7], some of
The author is with the Department of Signal Theory and Communi-
cations, Universidad Carlos III de Madrid, Madrid 28045, Spain (e-mail:
1 http://isdis.net/isic-project/
[email protected]).
Digital Object Identifier 10.1109/JBHI.2018.2806962 2 https://isic-archive.com/

2168-2194 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
548 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

them modeling aspects of special importance for dermatologists These features vary from general-purpose descriptors, e.g.,
in their diagnosis [8], [9], modern approaches, such as [10] and color and texture filter-banks [17]–[19], to problem-dependent
[11], have adopted the use of Convolutional Neural Networks knowledge-based features. The later deserve more interest from
(CNNs), due to their impressive performance in many computer our point of view since they aim to model particular lesion as-
vision tasks such as classification [12], [13], detection [14] and pects of special importance for dermatologists. Consequently,
segmentation [15], [16]. The drawback of CNN-based systems besides improving the system performance, they also enhance
is the lack of clear understanding of the underlying factors and the interpretability of the automatic diagnosis [20]. In [8], the
properties that support the final decision. authors proposed a reduced set of interpretable features model-
This paper presents DermaKNet (Dermatologist Knowledge ing some properties of the ABCD rule [21], such as symmetry
Network), a CAD system for automatic diagnosis of skin lesions and border sharpness. Along the same lines, some other methods
that aims to keep the best of both alternatives and, consequently, start by detecting a set of dermoscopic structures that are later
incorporates the intuitions of dermatologists into a CNN-based used to generate the diagnosis. Examples of these dermoscopic
framework. By developing novel computational blocks in the features include reticular patterns, dots and globules, streaks,
net, we model properties of the lesions that are known to etc. The complete set of structures that are commonly consid-
be discriminative for clinicians. The benefit of our approach ered was defined in the pattern analysis method for melanoma
is twofold: firstly, as we will demonstrate in the experimen- diagnosis [3], which has been widely adopted by specialists due
tal section, the performance of the diagnosis is improved and, to its accurate results.
secondly, the interpretability of the system can be enhanced To tackle the problem of detection of dermoscopic features,
by analyzing the outputs of these expert-inspired blocks. In classical segmentation techniques, such as Gaussian Mixture
particular, our system includes several elements, not found in Models [22], Markov Random Fields [23] and Topic Models
general-purpose classification CNNs, that become the main con- [9], or even discriminative approaches working over textons
tributions of this work: [24], have been adopted in the literature. Once the areas corre-
r A Dermoscopic Structure Segmentation Network which sponding to some of these structures have been identified, diag-
segments the lesion area into a set of high-level dermo- nosis can be inferred: in [25] the ABCD rule is combined with
scopic features corresponding to global and local struc- structure recognition in an attempt to detect suspicious lesions,
tures that have turned out to be of special interest for der- in [26] the 7-point checklist method is applied to the outputs of
matologists in their diagnosis. In the absence of strongly these structure detectors, and in [9] probabilistic segmentation
annotated data, we have trained this network from weakly- maps are used to build a set of specific classifiers, each one fo-
annotated clinical cases. cusing on a particular structure, which are then fused to provide
r A novel Modulation Block that incorporates these segmen- the final diagnosis.
tations into the diagnosis process as probabilistic modu- During the last few years, with the advent and broad adoption
lators of neuron activations. of CNNs in many recognition problems in computer vision,
r Two additional novel blocks, Polar Pooling and Asymme- several works have been proposed that apply this paradigm
try, that mimic the way in which dermatologists analyze to melanoma classification. In [10] CNNs are combined with
skin lesions. sparse coding and SVMs to provide a diagnosis. In [27] a Fully
r 3-branch top-layers in the diagnosis CNN, that provide Convolutional Neural Network (FCNN) is first used to seg-
the final diagnosis using both the traditional information ment the input image into lesion area and surrounding skin;
channels as well as these novel pathways modeling expert then a square and tight cropping is performed, and finally a
intuitions. diagnosis is provided using a CNN that is fine-tuned from the
r Some other elements with a great impact over the final well-known resnet model [13]. In [28], the authors have trained
system performance such as a specifically tailored data a CNN using a very large dataset with 129,450 clinical im-
augmentation process, or an external classifier based on ages and 2,032 different diseases, and tested its performance
non-visual meta-data. against 21 board-certified dermatologists on biopsy-proven clin-
The remainder of this paper is organized as follows: Section II ical images with two critical binary classification use cases:
performs a review of the related literature. In Section III we pro- malignant carcinomas versus benign seborrheic keratoses and
vide a general description of our method for automatic diagno- malignant melanomas versus benign nevi. Their results show
sis of skin lesions. Sections IV and V present our Dermoscopic that the automatic system achieves similar performance than all
Structure Segmentation and Diagnosis networks, respectively. tested experts across both tasks, demonstrating a level of com-
Section VI explains the experiments and discusses the results petence comparable to dermatologists. However, despite their
that support our method and, finally, Section VII summarizes impressive performance when enough training data is available,
our conclusions and outlines future lines of research. CNN-based methods still lack a clear understanding of the un-
derlying factors and properties that support their final decision,
limiting their usability and preventing their broad adoption by
II. RELATED WORK dermatologists.
Traditional approaches address the problem of automatic In this paper, we propose to incorporate knowledge-based
melanoma diagnosis using discriminative methods working over interpretable properties of skin lesions into the framework of
sets of hand-crafted visual features from dermoscopic images. CNNs. Although Majtner et al. [29] have previously tried to
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 549

Fig. 1. Main processing pipeline of DermaKNet. Each clinical case is defined by an image X c . The Lesion Segmentation Net firstly segments the
image into areas corresponding to lesion and surrounding skin, giving rise to the binary masks M c . Then, the Data Augmentation Module extends
the initial visual support of the lesion and generates additional views X̃ vc of the lesion by applying rotations and crops. Next, the Dermoscopic
Structure Segmentation Network segments each lesion view into a set of high level dermoscopic structures s. Finally, the whole set of the lesion
images X̃ vc and their corresponding segmentation maps S vc s are passed to the Diagnosis Network, which generates the diagnosis.

fuse hand-crafted features with CNNs, their approach simply


fused the outputs of two independent classifiers (one based on
hand-crafted features and the other using a CNN) to generate
the final diagnosis. To the best of our knowledge, this is the first
attempt to achieve a seamless integration between the knowl-
edge of dermatologists and CNNs. For that purpose, we have
developed several novel processing blocks, with the dual goal of
improving the system performance and gaining interpretability
in the diagnosis.
Fig. 2. Some examples of clinical cases (top) and the binary lesion
III. AN AUTOMATIC METHOD FOR SKIN LESION DIAGNOSIS segmentations computed by our Lesion Segmentation Module (bottom).

In this section we will provide a general description of our


CAD system and also explain those processing blocks that, structures are dots/globules, regression areas, streaks, etc.
although have an important impact in the system performance, The output of this subsystem is a set of 8 segmentation
do not constitute the main contributions of our paper. Finally, maps Svc s , s = 1...8, each associated with one of the con-
these main contributions will be later described in their own sidered structures. This module, as well as the format of
sections. the segmentation maps, will be introduced in Section IV.
4) The augmented set {X̃vc , Svc s } is passed to the Diagnosis
A. General Description of the System Network (DN), which provides a tentative diagnosis for
the clinical case. The description of this network can be
The main pipeline of DermaKNet is depicted in Fig. 1. It found in Section V.
comprises the following steps: 5) If additional non-visual meta-data about the lesion (e.g.,
1) For each clinical case c, a dermoscopic image Xc is first patient age, sex, etc.) are available, the previous diagnosis
passed to the Lesion Segmentation Network (LSN), which is further factorized using the score of a classifier working
generates a binary mask Mc outlining the area of the image over these non-visual information to produce the final
corresponding to the lesion. A description of this module diagnosis Yc . This classifier is described in Section III-D.
is given in Section III-B.
2) Next, the pair {Xc , Mc } goes through the Data Augmenta-
tion Module. This module extends the initial visual support B. Lesion Segmentation Network (LSN)
of the lesion and generates additional views v of the lesion The Lesion Segmentation Network (LSN) has been devel-
by applying rotations and crops. Hence, the output of this oped by training a Fully Convolutional Network (FCN) [15].
module is an extended set of images X̃vc representing the FCNs have achieved state-of-the-art results on the task of se-
clinical case. Section III-C provides a detailed description mantic image segmentation (general content), as demonstrated
of the data augmentation process. in the PASCAL VOC Segmentation task [30]. In order to train a
3) The following step in the process is performed by the network for our particular task of lesion-skin segmentation, we
Dermoscopic Structure Segmentation Network (DSSN). It have used the training set for the lesion segmentation task in the
aims to segment each view of the lesion X̃v into a set of 2017 ISBI challenge [5].
eight dermoscopic features corresponding to global and In Fig. 2 we show various examples of lesion segmenta-
local structures that have turned out to be relevant for tions computed by this module. Let us note that the goal is not
dermatologists in their daily practice. Examples of these to generate very accurate segmentation maps, but to produce
550 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

Fig. 4. Example of a rotated and cropped view of a lesion and its


Normalized Polar Coordinates. (Left) View of the lesion. (Middle) Values
of the normalized ratio. (Right) Angle.

In particular, in our approach we have considered 8 rota-


tions and 3 crops for each clinical case c, leading to an aug-
mented set of 24 images, each one represented by a tensor
Fig. 3. Illustration of the process of data augmentation: (a) Original X̃vc ∈ R256×256×3 , with v = 1...24.
image, (b) rotated Image, (c) largest inner rectangle containing pixels In addition, and during the data augmentation process, we
of the lesion, (d)–(f) 3 square croppings containing partial views of the
lesion. compute the pixel Normalized Polar Coordinates for each gen-
erated view X̃vc . The goal of these coordinates is to provide
invariance against shifts, rotations, changes in size and even
binary masks Mc that broadly identify the area of the image that irregular shapes of the lesions in subsequent processing steps.
corresponds to the lesion. To do so, we transform the original Cartesian pixel coordinates
(xi , yi ) into the normalized polar coordinates (ri , θi ), where
ri ∈ [0, 1] and θi ∈ [0, 2π) stand for the normalized ratio and
C. Data Augmentation Module and Normalized angle, respectively. The process that computes this transforma-
Polar Coordinates tion is as follows: first, the binary mask of the lesion is ap-
proximated by an ellipse with the same second-order moments.
It is well known that data augmentation often boosts the per-
Then, we learn an affine matrix A that transforms the ellipse
formance of deep neural networks, mainly when the amount of
into a normalized (unit ratio) circle centered at location (0, 0).
available training data is limited. Among all the potential image
Finally, we map each pixel in the original lesion with its projec-
variations and artifacts, invariance to orientation is probably the
tion in the normalized circle, and obtain the normalized polar
main requirement in our particular scenario, as dermatologists
coordinates as the ratio and angle computed from the projected
do not follow a specific protocol during the acquisition of an
pixel coordinates. Fig. 4 shows an example of a rotated and
image with the dermatoscope. More complex geometric trans-
cropped view of a lesion and its corresponding normalized polar
formations such as affine or projective transformations are less
coordinates.
interesting since the dermatoscope is normally placed just over
and orthogonally to the lesion surface.
D. Considering Non-Visual Lesion Meta-Data in
Based on these observations, the particular process of data
augmentation for a given clinical case c is illustrated in Fig. 3 the Diagnosis
and described next: As we will describe in detail in Section VI, the proposed
1) First, starting from the pair {Xc , Mc }, we generate a set of model has been used to participate in the 2017 ISBI Challenge
rotated versions [see Fig. 3(b)]. on Skin Lesion Analysis Towards Melanoma Detection. In this
2) Since rotating an image without losing any visual infor- challenge, additional valuable meta-data was also provided with
mation requires adding new areas that did not exist in the the images that can help automatic systems to improve their per-
original view, we find and crop the largest inner rectangle formance, namely: a) approximate age of the patient, rounded to
satisfying that all pixels belong to the original image [see 5 year intervals (or ‘unknown’ if not available), and b) sex, con-
Fig. 3(c)]. We have observed that removing these black taining the gender of the patient (or ‘unknown’ if not available).
areas (using the inner rectangle) at the expense of loos- Hence, we have complemented the outputs of the CNN-based
ing some regions of the lesion produces better results than system with the score provided by a Support Vector Machine
keeping the whole lesion and the black regions. The ratio- (SVM) [31] working on external non-visual meta-data. Since
nale behind is that, although some of lesion details may be both age and sex variables are discrete, we have transformed
partially lost in some of the views, we are considering all them into numerical inputs following this process: for each con-
of them by analyzing the whole augmented set of lesion sidered category c and the meta-data M , we have modeled the
views. However, this fact makes necessary to perform data corresponding likelihoods p(M |c) as random discrete variables.
augmentation also in test. Then, given a new clinical case, the input to the SVM was com-
3) Finally, since our subsequent CNNs (Structure Segmen- puted by evaluating the likelihood of the current sample.
tation and Diagnosis) require square input images of Furthermore, we have also included an additional input fea-
256 × 256 pixels, we perform various squared crops which ture computed as the relative area of the lesion with respect to
are in turn re-sized to these dimensions [see Fig. 3(d)–(f)]. the total size of the image.
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 551

As shown in Fig. 1, a probabilistic output of this SVM is then mented network disorder (globules or streaks) and highly
factorized with the output of the Diagnosis Network to provide indicative of melanoma. This structure is always considered
the final diagnosis Yc of the system. local in our annotations.
6. Streaks [32, pp. 17–18]: are black or light to dark brown
IV. DERMOSCOPIC STRUCTURE SEGMENTATION NETWORK longish structures of variable thickness, not clearly com-
(DSSN) bined with pigmented networks, and easily observed when
located at the periphery of the lesion. In general, they tend
The goal of the Dermoscopic Structure Segmentation
to converge to the center of the lesion. An even, radial dis-
Network is the following: given an input view of the lesion
tribution of the streaks around the border of the lesion is
X̃vc it aims to provide a segmentation considering a pre-defined
characteristic of Reed nevus. However, an asymmetric or
set of dermoscopic features that correspond with global and
localized distribution of streaks suggests malignancy. This
local structures of special interest for dermatologists in their
structure is always local, and spatially localized on the le-
diagnosis.
sion borders.
7. Vascular structures [32, p. 23]: they are homogeneous ar-
A. Considered Dermoscopic Structures eas with vessels. Depending on their shape, they may be
In this work we have considered a set of eight structures: a clear sign of malignancy. While abundant and prominent
1. Dots, globules and Cobblestone pattern [32, pp. 15–17]: comma vessels often exist in dermal nevi, some other vas-
although different, they have been fused into one for the cular patterns, such as arborizing, hairpin or linear irregular
purpose of the system development due to their visual sim- ones, are more frequent in melanomas. These structures are
ilarities. These patterns consist of a certain number of round always considered local in our annotations.
or oval elements, variously sized, with shades that can be 8.- Unspecific pattern: we group in this category those parts
brown and gray-black. In the case of cobblestone structures, of the lesion that cannot be assigned to any of the previous
they are usually larger, more densely grouped and some- structures. No direct diagnosis implication can be inferred
what angulate. In general, they are often located in lesion from it. Nevertheless, it is more often related to melanoma,
areas that are growing. While an even spatial distribution or at least it suggests that the lesion must be carefully ex-
with regular size and shape is associated with benignity, var- plored. Depending on its relative extent in the lesion area,
ious sizes and shapes, or irregular or localized distribution these feature can be either identified as local or global in
usually occur in melanoma. Depending on their relative ex- our annotations.
tent in the lesion area, these features can be either identified
as local structures or global patterns.
2. Reticular pattern and pigmented networks [32, pp. 10–13]:
they cover most parts of certain lesions. They look as grids B. A Weak Learning Approach for Segmentation
of thin brown lines over a light brown background and The main challenge to develop the DSSN is the annotation
are quite common in melanocytic lesions. If globally dis- of the training dataset. A traditional supervised approach would
tributed, this structure is related to benign lesions. However, require to provide a ground truth pixel-wise segmentation for
variations in size and form are indicative of malignancy. each training image. This kind of strong annotation is often hard
Depending on their relative extent in the lesion area, these to obtain as it demands a huge effort from the dermatologists to
features can be either identified as local structures or global manually outline the segmentations of the structures. Alterna-
patterns. tively, providing weak image-level labels indicating only which
3. Homogeneous areas [32, pp. 14–15]: these areas are dif- dermoscopic structures are present in a lesion is much easier for
fuse, with brown, grey-black, grey-blue or reddish-black dermatologists and becomes more affordable. Henceforth, fol-
shade, where there is no other local feature that can be rec- lowing this alternative approach, we asked dermatologists of a
ognized. A globally distributed pattern of bluish hue is the collaborating medical institution, the Hospital Doce de Octubre
hallmark of the blue nevus. With other shades, it may be in Madrid, to annotate the ISIC 2016 training dataset [4] with
present in several types of lesions, such as Clark-nevi, der- the presence or absence of the 8 aforementioned dermoscopic
mal nevi or nodular and metastatic melanomas. Depending structures. In particular, we asked them to provide one label
on their relative extent in the lesion area, these features can L(s) per structure s and clinical case: L(s) = 0 if the structure
be either identified as local structures or global patterns. is not present, L(s) = 1 if it takes up just a local area of the
4. Regression [32, pp. 20–21]: these structures are generally lesion (local structure), L(s) = 2 if it is present and dominant
well-defined white and/or blue areas that appear when the enough to be considered a global pattern in the lesion.
immune system has attacked the lesion. White areas resem- Given this weakly-annotated dataset, we have developed a
ble a superficial scar, and blue areas may appear as diffuse segmentation network based on the method described in [33],
blue-gray areas or peppering, which is an aggregation of where the authors introduced a Constrained Convolutional Neu-
blue-grey dots. Regression areas are always considered lo- ral Network for weakly supervised segmentation. For the sake
cal structures. of completeness we will include here some equations of the
5. Blue-white veil [32, pp. 22–23]: a region of grey-blue original model that accommodate the extensions and modifica-
to whitish-blue blurred pigmentation, correlated with pig- tions for our particular scenario. For an in-depth discussion and
552 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

derivation of these equations, the interested reader is referred to where p− (s) = 1 − p+ (s), and 1[·] is an indicator function
the original paper [33]. which is evaluated only when the inner condition is satisfied.
To keep the notation simple, we omit the image index in the Given the probability distribution of an image stated in (2),
following paragraphs. Let us consider the dermoscopic structure the constrained CNN optimization for weakly-supervised seg-
segmentation as a pixel-wise labeling problem in which each mentation proposed in [33] is:
pixel i in the lesion area is labeled as belonging to a particular
structure si , s = 1...P = 8 or to a background class (s = 0). find θ
Passing the input image through the segmentation CNN will → −
− →
subject to A Q ≥ b (4)
produce a spatially reduced score map fi (si ; θ) (64 × 64 in our
case) at its top layer, where θ represents the set of parameters →

where Q is the vectorized form of the network output Q(S|θ),
of the CNN. Applying a parametric softmax over the network →

and A ∈ RK ×P N and b ∈ RK define K linear constraints over
scores, we can model the label of each pixel location i as a
the output distribution Q. Since this problem is not convex with
probabilistic random variable with value qi (si |θ):
respect to the network parameters θ, the authors defined a vari-
1 ational latent probability distribution P (S) over the semantic
qi (si |θ) = exp (γfi (si |θ)) (1) labels, which is independent of the CNN parameters θ, applied
Zi
the constraints to this new distribution rather than to the orig-
Here si is the random variable that represents the la- inal network output Q(S|θ), and enforced P (S) and Q(S|θ)
 (dermoscopic structure) at the location i and Zi =
bel to model the same probability distribution by minimizing the
s=0...P =8 exp (γfi (s|θ)) is the partition function at the lo- Kullback-Leibler divergence between them. The resulting for-
cation i. The utility and appropriateness of the parameter γ, mulation becomes a Lagrangian optimization problem and gives
which was not included in the original model, will be discussed rise to the following update equation:
later on.
1
At this point, we have introduced another modification to the pi (s) = exp (γfi (si ; θ) + ATi,s i λ) (5)
original model. Our problem is highly unbalanced and, whereas Zi
structures as dots/globules or reticular patterns are very com- where λ ≥ 0 are the dual variables introduced in the optimiza-
mon, others like blue-white veil or regression patterns are less tion, and Zi = s exp (γfi (s; θ) + ATi,s λ) is the local partition
frequent. Moreover, the frequency of a structure does not cor- in location i. Additionally the final loss and its gradient needed
respond with its impact on the diagnosis, and less frequent pat- by the optimization become:
terns are in general more indicative of malignant lesions. We 
have observed that learning the model directly from the data L(θ) = − wi pi (si ) log qi (si |θ) (6)
leads to solutions that focus more on the correct segmentation i si

of the most frequent patterns, while fail in those less-frequent ∂L(θ)


but more meaningful patterns. To avoid such a situation, we = γwi [qi (si |θ) − pi (si )] (7)
∂fi (si )
have introduced weights that control the influence of the differ-
ent structures in the learning process and produce more balanced The presence or absence of a dermoscopic structure, as well
segmentations. Hence, considering marginal independence, the as additional cues about its size (global, local) or its location
probability distribution on an image can be factorized as: (borders, center) within the lesion, lead to particular constraints
in the model. These constraints
 are applied over the accumu-

N lated probability P (s) = i pi (s|θ), computed over all pixel
Q(S|θ) = qi (si |θ)w i (2) locations in the segmentation map. In contrast to the original
i formulation in (4), we can now apply the constraints to the
→ −
− →
where N is the total number of pixels in the spatially-reduced latent distribution A P ≥ b , considering the following cases:
r Absent structure: If a dermoscopic structure s is not present
image generated by the CNN. The weights wi might simulate
the repetition of a sample in the training data, thus giving it more in an image, we impose one constraint that acts as an upper
influence over the learned model. Although these weights can bound over the accumulated probability:
control the influence of each pixel i in the image, in our case, 
N
the value is the same for all pixels in the image wi = w and pi (s) ≤ 0 (8)
represents a measure of the lesion abnormality, which depends i=1
on the weak ground-truth labels indicating the presence or ab- r Global structure: If a dermoscopic structure s is consid-
sence of each dermoscopic structure. In fact, this weight w is ered as a global pattern, we impose one constraint acting
inversely proportional to the likelihood of the present structures; as a lower bound over the accumulated probability P (s),
if p+ (s) is the probability that a lesion contains the structure s, enforcing that a minimum area of the lesion corresponds to
we compute w as: that structure:

1 
P 
N
w= 1[L(s) > 0]p− (s) + 1[L(s) = 0]p+ (s) (3) ls ≤ pi (s) (9)
P s=1 i=1
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 553

Here ls has been set to ls = 0.5N , thus requiring that at


least 50% of pixels in the lesion belong to that structure.
r Local structure: If a structure s is local, we impose two
constraints acting as lower and upper bounds over the ac-
cumulated probability P (s), respectively:

N
ls ≤ pi (s) ≤ us (10)
i=1

where ls = 0.10N and us = 0.5N , are the lower and upper


bounds.
r Spatially localized structures: since some of the structures
tend to appear in particular locations of the lesion, we can
enforce our model to learn this dependency. This is the Fig. 5. Three illustrative results of the Dermoscopic Structures Seg-
case, for example, of the streak pattern, which only appears mentation Network. Top row: original images. Bottom row: segmenta-
in the borders of the lesion. Hence, considering that the tions. Colors represent dermoscopic structures: brown is dots/globules,
mustard is reticular pattern/pigmented networks, green is homogeneous,
location of a dermoscopic feature is restricted to certain grey is hypopigmented/regression areas, and blue is vascular structures.
region R, and defining R̄ as its complement, we impose It is worth noting that, for the sake of easy visualization, each pixel has
two constraints: been assigned to the most probable category.

pi (s) ≤ 0 (11)
i∈R̄ max that helps to model the problem constraints preventing
 certain malfunctions; we have incorporated instance weights
ls ≤ pi (s) (12)
to the problem statement so that we can deal with the unbal-
i∈R
anced nature of labels; and, finally, we have extended the set
where ls = 0.5NR , being NR the number of pixels in region of constraints by adding one new family that allows us to take
R. We define the region R using the Normalized Polar advantage of the prior knowledge about the spatial location of
Coordinates introduced in Section III-C, which allows us, structures in the lesion.
for example, to define ring-shaped areas modeling the outer In Fig. 5 we show some examples of the segmentation maps
part of a lesion. generated by the DSSN. Let us note that, just to provide a simpli-
Once we have defined the constraints, we can discuss the role fied visualization in this figure, we have transformed the tensor
of the parameter γ in the softmax function [see (1)]. We have Svc containing eight probabilistic maps, into a hard segmenta-
observed that using a simple non-parametric softmax function tion in which each pixel has been assigned to the most likely
leads to situations in which constraints over local structures were category. However, no spatial post-processing techniques (such
often obeyed by simply assigning some residual probability to as Markov Fields or other smoothing algorithms) have been
every pixel in the segmentation map. This residual probability is applied.
not enough to assign any pixel to the local structure (they show
higher probabilities for other structures), but allows for fulfill-
V. DIAGNOSIS NETWORK (DN)
ing the constraints over the accumulated probability. From our
point of view, this is an undesired behavior since what one would The Diagnosis Network (DN) gathers information from the
like to have instead is a small region of pixels with high prob- previous modules and generates a diagnosis for each clinical
abilities of belonging to the corresponding local structure. In case.
other words, we prefer pixels that are clearly associated with a As in the previous module, we have also taken the resnet-50
particular class, as long as they produce an actual image seg- [13] as a basis, which uses residual layers to avoid the degra-
mentation, rather than pixels with some residual probability for dation problem when more and more layers are stacked to the
each category. To address this issue, we use values of γ ≥ 1 network. When applied to our 256 × 256-pixel images, the last
so that we can control how the softmax approximates the max convolutional block (res5c) of this network produces a tensor
function while it remains differentiable. In our case, we have Tc ∈ R8×8×2048 containing the scores of high-level latent con-
used a value of γ = 2. cept detectors (e.g., in Imagenet, the dataset for which it was
We have implemented the DSSN taking the well-known originally designed, those were 2048 latent visual concepts).
resnet-50 [13] as initialization, removing the top layers, and In the original network, an average pooling layer trans-
using the ISIC 2016 training dataset[4] and the described forms this tensor into a single-value per channel and image
constrained optimization with weak annotations. This module Ts ∈ R1×1×2048 , which is followed by a fully connected layer
produces, for each view v of a clinical case c, a tensor and a softmax that generates the vector containing the proba-
Svc ∈ R64×64×8 that contains the 8 probability maps of the con- bilities of the considered visual concepts. Hence, the goal of
sidered structures. the average pooling is to fuse detections at various locations of
To sum up, we have extended the original approach in [33] the input image and to generate a unified score for each latent
with three contributions: we have introduced a parametric soft- high-level concept.
554 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

Fig. 6. Processing pipeline of the Diagnosis Network. The outputs of layer res5c of the original resnet-50 are modulated by the segmentation
maps coming from DSSN, providing an extended set of channels. These channels, after batch normalization and ReLU activation function, are
passed through a 3-branch processing pipeline that analyzes the presence of visual patterns, their spatial location, and the asymmetry of the lesion,
respectively, to generate the diagnosis.

In our approach, we have modified the structure of the top B. Modulation Block
layers of the network, giving rise to the pipeline illustrated in The goal of the Modulation Block is to incorporate the seg-
Fig. 6. In the following sections, we will first introduce the mentations provided by DSSN to the diagnosis process. To do
structure of the top layers in the DN and, then, we will provide a
so, this block fuses the structure segmentation maps described
detailed description of those blocks that have been specifically in Section IV-B with the outputs of the previous layer in the
designed to work with dermoscopic images of skin lesions. CNN.
In particular, if the output of the previous layer is a tensor
A. Overview of the DN x ∈ RM ×N ×O , where M × N are the dimensions of the output
As shown in the Fig. 6, we have introduced several blocks and O is the number of output channels, and s ∈ RM×N×P is
to generate a final diagnosis Yvc for each considered view of a a segmentation map that has been previously re-sized to match
clinical case. Compared to the original resnet-50, we first apply the feature map, the output of this module is an extended and
a Modulation Block over the outputs of the convolutional res5c modulated feature map y ∈ RM×N×O P . To compute this output,
layer. This block, described in Section V-B, aims to modulate we modulate the o-th channel xo with the s-th segmentation
the previous outputs using the probabilistic segmentation maps map ss , producing a modulated channel yk :
provided by DSSN. As explained below, this block multiplies yk = xi  ss , i = 1...O, s = 1...P, k = 1...OP (14)
the total number of channels or latent visual patterns by 9,
which goes from 2048 to 18432. Next, the Modulation Block Since the segmentations computed by DSSN are fixed, this mod-
is followed by a Batch Normalization and non-linear ReLU ule has no parameters to be optimized during the training phase.
activation (Rectified Linear Unit) layers. Finally, rather than Hence, the backpropagation process only requires the derivative
just applying the Average Pooling + Fully Connected approach with respect the data:
as in the original resnet-50, we have subdivided the pipeline into
∂z  ∂z
three parallel processing branches: =  sk (15)
1) Branch 1: the original pipeline with an average pooling (8 ∂xo ∂yk
k ∈K o
× 8 in our case), followed by a fully connected layer (FC1).
2) Branch 2: it performs an average normalized polar pooling where Ko corresponds to all the modulated channels k generated
(see Section V-C for further details) (R × Θ; R = 3, Θ = from the channel o, and sk is the corresponding modulating map
8) followed by a fully connected layer (FC2). This branch for that k.
provides a spatially discriminant analysis of the lesion. The application of this module to our diagnosis network has
3) Branch 3: it follows the previous polar pooling, estimates been adapted as follows: it has been added to the network just
the asymmetry of the lesion (see Section V-D for a complete after the res5c layer of the original resnet-50 [13]. Hence, we
description), and applies a fully connected layer (FC3) over modulate O = 2048 channels using the probabilities of the P = 8
the asymmetry measures. segmentation maps of local and global structures described in
The outputs of these three branches are then linearly com- Section IV-A. In addition, we also concatenate the original input
bined using a Sum Block, and the class-probabilities are com- channels to the modulated ones, resulting into an extended set
puted using a softmax. Finally, in order to generate a unified of O(P + 1) channels (18432 in our case).
final output for each clinical case Yc , we consider independence
between views leading to a factorization: C. Polar Pooling
This block aims to perform pooling operations (average or

V
max pooling), but instead of doing them over rectangular spatial
Yc = Yvc (13) regions, these operations are done over sectors defined in polar
v =1
coordinates. Hence, for a given number of rings R (with r ∈
[0, 1]) and angular sectors Θ (angles θ ∈ [0, 2π)), this block
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 555

transforms an input x ∈ RM ×N ×O into an output y ∈ RR ×Θ×O , During back-propagation, the gradients needed by the
where O is the number of channels. stochastic gradient descent algorithm are:
Furthermore, in order to deal with lesions of irregular Θ/2
shape, we use the normalized polar coordinates described in ∂z 2  ∂z
= ϕ(ri , θj , θk ) (18)
Section III-C. Since, depending on the particular shape of a ∂xr i ,θ j ,o RΘ ∂yθ k ,o
k =1
lesion and the size of the tensor being pooled, some combina-
tions (r, θ) may not contain pixels within the lesion, we can where:

also define overlaps between adjacent sectors to improve the xr i ,θ j − xr i ,θ k −j , θj ∈ [θk , θk + π)
smoothness of the outputs. Moreover, we use a non-uniform ϕ(ri , θj , θk ) =
xr i ,θ k −j − xr i ,θ j , otherwise
radius quantization in order to generate fixed-area rings that (19)
contain the same number of pixels in the hypothetical case
of an ideally circular lesion. To that end, the k − th ring is E. Details About Learning and Evaluation Processes
defined as:
  In this section we provide some useful details about the learn-
k−1 k ing process of DN. As mentioned in Section V, we have taken
≤r< (16) the original resnet-50 as initialization and fine-tuned the network
R R
using our own training data. When nothing else is specified, all
for k = 1...R. Given the proposed normalized polar coordinate the new layers in the network are initialized using weights com-
system, the equations needed to perform the forward and back- puted using Xavier’s method [34].
ward steps in the inference process do not differ from those ones Furthermore, due to the high degree of expressiveness of
of a regular max or average pooling block in Cartesian coordi- branches 2 and 3 with respect to the first branch, we have ob-
nates. Furthermore, it is worth noting that, once this block is served that training the whole system at a time was prone to
applied and data is converted into polar coordinates, no more overfitting. Hence, instead, we have first trained a model us-
convolutional layers can be applied as the spatial relationships ing only the first branch with a learning rate of Lr = 10−4 and
between contiguous values in the output matrix have been re- a weight Decay of W d = 10−4 . Once a coarse convergence is
defined (e.g., considering that columns in the data matrix refer reached, we have added the other two branches, frozen all layers
to angles, the first and last columns are adjacent in the angular up to (and including) the Modulation Block, initialized weights
space). For that reason, in our approach, this module is followed for branch 2 and 3 to zero, and learned the weights of the upper
by some blocks that are, either fully connected, or specifically layers using the following learning rates:
designed to work with polar coordinates (e.g., the Asymmetry r For branch 1 the original learning rate Lr1 = 10−4 and
block). weight Decay W d = 10−4 .
r For branches 2 and 3 the original learning rate and weight
D. Asymmetry Block decays are divided or multiplied by the total number of
input spatial neurons in the fully connected block, respec-
Melanomas tend to grow differently along each direction, tively. This stronger regularization and slower learning rate
becoming more asymmetric than benign lesions. This is why prevents these branches from getting more relevance than
symmetry is present in a variety of diagnosis algorithms, such the original one due to their expressiveness, and therefore
as the ABCD rule of dermoscopy [21]. The symmetry rule re- minimizes the likelihood of overfitting.
quires finding the axis of maximum symmetry according to The code that implements DermaKNet is available online.3
some criteria (e.g., shape, color), and its perpendicular. In doing
so, the lesion is labeled by dermatologists either as symmetric VI. EXPERIMENTAL SECTION
in one or two axes, or as asymmetric.
Our asymmetry block computes metrics that evaluate the A. Datasets and Experimental Setup
asymmetry of a lesion with respect to various axes. In par- DermaKNet has been assessed using the official dataset of
ticular, given a polar division of the lesion into R × Θ sectors, the 2017 ISBI Challenge on Skin Lesion Analysis Towards
we compute the asymmetry for axes aligned with the Θ/2 an- Melanoma Detection4 [5]. This challenge consists of three dif-
gles in the range [0, π). To do so, our approach folds the lesion ferent parts: 1) Lesion Segmentation, 2) Detection and Local-
over each angle θ and computes the accumulated square differ- ization of Visual Dermoscopic Features/Patterns, and 3) Dis-
ence between corresponding sectors. Hence, for a given input ease Classification. We focus on part 3, being our goal the
x ∈ RR ×Θ×O , this module generates an output y ∈ R1×Θ×O as automatic diagnosis of dermoscopic images into three differ-
follows: ent categories: 1) Nevus: benign skin tumor, derived from
R Θ/2
melanocytes (melanocytic), 2) Melanoma: malignant skin tu-
1  2 mor, derived from melanocytes (melanocytic), and 3) Sebor-
yθ k ,o = xr i ,θ k + j −1 ,o − xr i ,θ k − j ,o (17)
RΘ i=1 j =1 rheic Keratosis: benign skin tumor, derived from keratinocytes

where, in case the angle index θj becomes j <= 0 it is substi- 3 https://github.com/igondia/matconvnet-dermoscopy

tuted by Θ − j. 4 https://challenge.kitware.com/#challenge/583f126bcad3a51cc66c8d9a
556 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

(non-melanocytic). Malignancy diagnosis data were obtained


from expert consensus and pathology report information.
The official dataset contains 2750 dermoscopic images gath-
ered from the daily clinical practice of a wide range of med-
ical centers, thus varying in resolution and capturing devices
and conditions. The dataset has been split into training, vali-
dation and test sets with 2000, 150, and 600 images, respec-
tively. The proportions of the classes in the training dataset are
the following: 374 melanomas, 254 seborrheic keratosis, and
1372 benign nevi. Similar proportions were found in the test
dataset, but not in the validation dataset. Besides the images, Fig. 7. AUC results for segmentation of dermoscopic structures. Our
additional non-visual meta-data about the clinical cases such as approach (DSSN) is compared to the original CCNN [33].
the approximate age and sex of the patient were provided when
available.
In addition to the official dataset, we have also considered B. Evaluation of the Dermoscopic Structure
two other external resources for training, namely: Segmentation Network
r A dataset obtained from the EDRA Interactive Atlas of
Prior to the analysis of the segmentation network, which is
Dermoscopy [35], with images gathered with the objective the final goal of this subsystem, we would like to provide an
of providing a panoramic view of skin lesions diagnosis assessment of the DSSN module. To that purpose, we have
using dermoscopy. Images were saved in JPEG format with subdivided our 2016 ISBI challenge training dataset with weak
almost uniform sizes (768 × 512), with ground truth labels annotations into train and validation sets. Fig. 7 shows a com-
determined by histopathological diagnosis, and no meta- parison between the original CCNN [33], which has served as
data provided with the image files. It contains a total set of a baseline for our method, and our approach including spatially
724 melanocytic lesions, 222 of them are melanomas and localized structures, parametric softmax and instance weights
the remaining 502 are benign nevi. that model lesion abnormality in training. Let us note that for
r A dataset built from the ISIC archive [36], the global reposi-
the baseline CCNN we have considered three weak annotations
tory that was used to generate the official challenge datasets. (absent, local and global) using the same parameters as in our
By considering images belonging to any of the three con- solution. Results are given in terms of AUC computed at image
sidered categories, we have found 2104 new images that level, by accumulating the probability of the considered der-
were not included in the official dataset of the challenge: moscopic features over the pixels in the lesion. We can see in
1606 nevi, 466 melanomas and 32 seborrheic keratosis. the figures that our proposal improves the performance of the
In order to assess the performance of our approach, we have baseline for some of the structures, specially on those that are
used the same evaluation metrics proposed by the organizers of spatially localized (e.g., streaks) or less frequent in the dataset
the challenge. In particular, among all the considered metrics, (e.g., streaks and blueveil), in both cases as a consequence of
we will focus on: our extensions in the model (spatially localized structures and
r Area Under Curve (AUC): the AUC is computed indepen-
training weights). In average, we are getting an improvement of
dently for two different binary classification problems: 1) 2.5 in AUC with respect to the baseline.
melanoma vs rest, and 2) seborrheic keratosis vs rest. In
addition, the average AUC of the two problems is also pro-
vided. AUC was considered the main evaluation metric in C. Assessment of the Proposed Blocks in the Automatic
the challenge and served to rank the official submissions Diagnosis System
and select the winning approaches. In this set of experiments, we have assessed how the different
r Specificity evaluated at a sensitivity of 95% (SP95): this blocks designed for our system help to improve the performance
complementary metric evaluates how automatic methods of the diagnosis. Although during the preparation of the chal-
can filter out benign lesions, delivering to dermatologists lenge we have used the train and validation sets to make deci-
only those that become good candidates to be malignant. sions about the system configuration, here we have preferred to
Hence, by fixing a very high value on the sensitivity, we show the results over the test set (see Table I). The rationale be-
ensure that our system minimizes the number of non- hind is that, as we have already mentioned, the validation dataset
detections, and assess its ability to reduce the clinical effort follows a particular data distribution, with different proportions
of dermatologists, which would have a high impact in their of melanoma, keratosis and benign nevu than those found in
daily clinical practice. the training and test sets. Hence, results in validation dataset,
The experiments in this section are organized as follows: although show similar behaviors, are different in absolute terms
first, we assess the performance of DSSN, then we evaluate the and less meaningful for the analysis. In addition, it should be
influence of each individual proposed block in the diagnosis mentioned that these models have been trained using only the
network. Finally, the optimal configuration is evaluated in com- official 2017 ISBI training set.
parison to the official submissions of the challenge. Table I shows the results of this experiment. Let us note
that, unless specified, the data augmentation described in
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 557

TABLE I Considering the official submission and using the AUC,


A COMPARISON BETWEEN SEVERAL VERSIONS OF DERMAKNET
AND THE BASELINE
DermaKNet ranked first in the category of Seborrheic Keratosis
vs rest, and fourth in Melanoma vs rest, achieving a global sec-
ond position in the challenge. However, if we further analyze
No Method Mel AUC SK AUC Avg AUC
the SP95 results, that account for the specificity of the diagnosis
1 Resnet-50 [13] 83.1 91.8 87.4 at a 95% sensitivity, our model clearly outperforms the rest of
2 Modulation + Branch 1 83.4 92.5 88.0
3 Modulation + Branches 1-3 83.6 92.7 88.2
the approaches in both categories. As we have already men-
4 (3) + Test Data Aug. 84.8 94.6 89.7 tioned, this probably represents the most realistic scenario of
5 (4) + meta-data 85.9 95.8 90.8 application, in which CAD systems are used to filter out benign
cases, thus reducing the number of lesions that require care from
AUC is given for Melanomas vs rest (Mel AUC), Seborrheic Keratosis vs rest (SK AUC),
and average (avg AUC). dermatologists.
However, our current implementation achieves even better
results than our previous one. Again, using AUC as the main
Section III-C is only performed over the training data. The first performance metric, it outperforms all the official submissions
evaluated method is the original resnet-50 [13], which has been in the Seborrheic Keratosis vs rest problem, and now also in
fine-tuned using our training data, and becomes the baseline the average of the two categories. Furthermore, our results for
algorithm in the comparison. The second approach substitutes melanoma detection are now very close to those of the winning
the top layers in the original network by our Modulation Block approach in the category. In addition, considering the SP95 met-
(Section V-B) followed by the regular Branch 1, and shows that ric, our method clearly becomes the state-of-the art in the consid-
incorporating the dermoscopic segmentations provides an abso- ered three problems (any of the two tasks and average). Hence,
lute improvement of 0.60 in terms of average AUC. When we these results completely validate our approach and demonstrate
further incorporate the multi-branch processing with spatial and the utility of incorporating intuitions from dermatologists into
asymmetry analysis, we gain an additional increment of 0.2 in the CNN structure.
average AUC, which, although not very notable, is still valuable.
By applying data augmentation also to the test samples and fus-
ing the results using the factorization introduced at the end of E. An Example of an Interpretable Diagnosis
Section V-A, we get a quite significant additional improvement The advantage of incorporating the intuitions of dermatolo-
of 1.5 in average AUC. From our point of view, this improve- gists into the processing pipeline of a CNN is not restricted to
ment comes from the following fact: the inner rectangles during the enhancement of the system performance, but also provides
data augmentation remove the non-lesion black areas at the ex- additional valuable information regarding the clinical case that
pense of also losing some areas of the lesion. This leads to might help medical staff in their diagnosis. Fig. 8 shows two
training images that show partial views of the lesion, which re- examples of a system output built using DermaKNet: we can
quires to perform the same process for the test images in order provide information about the dermoscopic features, not only
to establish fair comparisons between data samples. Finally, the about their location in the lesion, but also about their contri-
full system incorporating all the previous blocks as well as the bution to the final diagnosis. In particular, the bottom-center
score of a SVM classifier over the non-visual meta-data (see diagram shows the accumulated contribution to the diagnosis
Section III-D) achieves the best results, with an additional gain score from each particular dermoscopic structure s [see (14)].
of 1.1 in average AUC. Hence, all these results demonstrate how For that purpose, given the diagnosed class, we compute the
each proposed extension enhances the quality of the diagnosis. non-probabilistic class-score of each dermoscopic feature s as
the output of the Sum Block in Fig. 6 in which only those
channels k (k = 1...18432) that correspond with the structure
D. Comparison With the State-of-the-Art
s are considered in the computations. After applying a ReLU
In this section we assess the performance of DermaKNet that removes negative (inhibiting) scores, we compute the final
(version 5 in Table I) trained over an extended dataset containing relative contribution by applying a normalization that ensures
both the official 2017 ISBI Challenge train dataset and the two a total score of 1 over all the dermoscopic structures. Further-
external resources. In Table II we show a comparison between more, the bottom-right diagram shows a normalized (for visu-
our method and the top 5 performing methods among the 23 alization enhancement) measure of the per-angle accumulated
official submissions to the challenge. symmetry over the k channels modulated by each dermoscopic
It is worth noting that we have included two versions of structure s. This symmetry is computed as follows: we consider
DermaKNet. Our official submission to the challenge, denoted the accumulated
 output of the asymmetry block for each angle
as DermaKNet (Official) was trained using the official train- yθ k = o yθ k ,o [see (17)], and then perform two consecutive
ing dataset and the EDRA external resource and shows some normalizations: the first one adapts the asymmetry values to
minor differences with respect to the model described here  dividing yθ k by the accumulated energy of
the lesion content
(the interested reader is referred to [38] for the corresponding the input e = r,j,o x2r i ,θ j ,o [see (17)]; and the second is a max-
description). Our current proposal, denoted as DermaKNet (Cur- min normalization that ensures a final asymmetry in the range
rent), in contrast, was trained using also images from the ISIC [0, 1] and improves visualization. Finally, symmetry values are
archive. computed as 1-asymmetry.
558 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 23, NO. 2, MARCH 2019

TABLE II
A COMPARISON BETWEEN DERMAKNET AND THE TOP FIVE PERFORMING OFFICIAL SUBMISSIONS TO 2017 ISBI
CHALLENGE ON SKIN LESION ANALYSIS

Method Mel AUC SK AUC Avg AUC Mel SP95 SK SP95 Avg SP95

Matsunaga et al. [37] 86.8 95.3 91.1 36.6 78.4 57.5


DermaKNet (Official) [38] 85.6 96.5 91.0 40.4 82.4 61.4
Menegola et al. [39] 87.4 94.3 90.8 39.5 69.0 54.3
Bi17 et al. [40] 87.0 92.1 89.6 39.8 47.6 43.7
Yang et al. [41] 83.0 94.2 88.6 36.6 74.5 55.6
DermaKNet (Current) 87.3 96.2 91.7 46.0 84.3 65.2

AUC and SP95 are given for Melanomas (Mel) vs rest, Seborrheic Keratosis (SK) vs rest, and average (avg).

Fig. 8. Two examples of an interpretable system output generated with our approach: (a) Melanoma and (b) Seborrheic Keratosis. Each case
contains 6 figures which represent (from top to bottom and left to right): original image and diagnosis, binary mask with lesion/skin, segmentation
into dermoscopic features, automatic diagnosis, contribution of each dermoscopic feature to the final diagnosis, symmetry measures by angle.

In addition to gaining more insight on the automatic diag- our system. Looking at the outputs of those intermediate blocks
nosis, this information might also become the basis for other modeling intuitions from dermatologists, we can get more in-
end-user applications, such as e-learning tools to help in the sight about which dermoscopic features are influencing the di-
training of new specialists. agnosis, the lesion symmetry, and even the spatial locations that
support certain diagnosis.
VII. CONCLUSION AND FURTHER WORK The main lines of further research comprise the design of
In this paper we have introduced DermaKNet, a CAD system new blocks implementing other aspects in the lesions that are
for the diagnosis of skin lesions that is composed of several of interest for dermatologists, the development of segmentation
CNNs, each one devoted to a specific task: lesion-skin segmen- methods that account for other useful dermoscopic features, and
tation, detection of dermoscopic features, and global lesion di- the exploration of novel ways of incorporating the dermoscopic
agnosis. Our goal through the whole system is to incorporate the structures segmentation into the diagnosis process. With respect
expert knowledge provided by dermatologists into the decision to the latter, we will consider multi-task losses [42], which
process, overcoming the traditional limitation of deep learning allow for sharing processing layers in both tasks and fusing
regarding the lack of interpretability of the results. In order to segmentation and diagnosis networks into end-to-end trainable
achieve a seamless integration between CNNs and this expert in- architectures.
formation, we have developed several novel processing blocks. ACKNOWLEDGMENT
We have assessed our system in the challenging dataset
used in the 2017 ISBI Challenge on Skin Lesion Analysis To- The authors would like to kindly thank dermatologists of Hos-
wards Melanoma Detection, in the task of automatic diagnosis pital 12 de Octubre of Madrid because of their inestimable help
of melanoma and seborrheic keratosis. Our results prove that annotating the data contents with the weak labels of structural
modeling expert-based information enhances the system per- patterns.
formance and achieves very competitive results. In particular, REFERENCES
the last version of our model ranks first in the Seborrheic Ker-
[1] J. Ferlay et al., “Cancer incidence and mortality patterns in europe: Esti-
atosis category and average AUCs, and is very competitive in mates for 40 countries in 2012,” Eur. J. Cancer, vol. 49, no. 6, pp. 1374–
melanoma. Furthermore, our results in Specificity at a 95% Sen- 1403, 2013. [Online]. Available: http://www.sciencedirect.com/science/
sitivity are clearly better than those of the rest of the approaches, article/pii/S0959804913000075
[2] M. A. Weinstock, “Cutaneous melanoma: Public health approach to
which makes our system very suitable as an automatic filtering early detection,” Dermatologic Therapy, vol. 19, no. 1, pp. 26–31, 2006.
module reducing the workload of dermatologists. [Online]. Available: http://dx.doi.org/10.1111/j.1529-8019.2005.00053.x
In addition to this gain in performance, we have also shown
that we can produce a more interpretable diagnosis on top of
GONZÁLEZ-DÍAZ: DERMAKNET: INCORPORATING THE KNOWLEDGE OF DERMATOLOGISTS TO CNNs 559

[3] H. Pehamberger, A. Steiner, and K. Wolff, “In vivo epiluminescence [22] A. Sez, C. Serrano, and B. Acha, “Model-based classification methods of
microscopy of pigmented skin lesions. I. Pattern analysis of pigmented global patterns in dermoscopic images,” IEEE Trans. Med. Imag., vol. 33,
skin lesions,” J. Amer. Acad. Dermatology, vol. 17, no. 4, pp. 571– no. 5, pp. 1137–1147, May 2014.
583, 1987. [Online]. Available: http://www.sciencedirect.com/science/ [23] C. Serrano and B. Acha, “Pattern analysis of dermoscopic im-
article/pii/S0190962287702394 ages based on Markov random fields.” Pattern Recognit., vol. 42,
[4] D. Gutman et al., “Skin lesion analysis toward melanoma detection: A no. 6, pp. 1052–1057, 2009. [Online]. Available: http://dblp.uni-
challenge at the international symposium on biomedical imaging (ISBI) trier.de/db/journals/pr/pr42.html#SerranoA09
2016, hosted by the international skin imaging collaboration (ISIC),” [24] M. Sadeghi, T. K. Lee, D. McLean, H. Lui, and M. S. Atkins, “Global
arXiv:1605.01397, 2016. pattern analysis and classification of dermoscopic images using tex-
[5] N. C. F. Codella et al., “Skin lesion analysis toward melanoma de- tons,” Proc. SPIE, vol. 8314, 2012, Art. no. 83144X. [Online]. Available:
tection: A challenge at the 2017 international symposium on biomedi- http://dx.doi.org/10.1117/12.911818
cal imaging (ISBI), hosted by the international skin imaging collabora- [25] A. G. Isasi, B. G. Zapirain, and A. M. Zorrilla, “Melanomas non-
tion (ISIC),” arXiv:1710.05006, 2017. [Online]. Available: http://arxiv.org/ invasive diagnosis application based on the ABCD rule and pat-
abs/1710.05006 tern recognition image processing algorithms,” Comp. Bio. Med.,
[6] A. Madooei, M. S. Drew, M. Sadeghi, and M. S. Atkins, Intrinsic Melanin vol. 41, no. 9, pp. 742–755, 2011. [Online]. Available: http://dblp.uni-
and Hemoglobin Colour Components for Skin Lesion Malignancy Detec- trier.de/db/journals/cbm/cbm41.html#IsasiZZ11
tion. Berlin, Germany: Springer, 2012, pp. 315–322. [Online]. Available: [26] G. D. Leo, A. Paolillo, P. Sommella, G. Fabbrocini, and O. Rescigno,
http://dx.doi.org/10.1007/978-3-642-33415-3_39 “A software tool for the diagnosis of melanomas,” in Proc. IEEE Instrum.
[7] P. Rubegni et al., “Objective follow-up of atypical melanocytic skin Meas. Technol. Conf., May 2010, pp. 886–891.
lesions: a retrospective study,” Arch. Dermatological Res., vol. 302, [27] L. Yu, H. Chen, Q. Dou, J. Qin, and P. A. Heng, “Automated
no. 7, pp. 551–560, 2010. [Online]. Available: http://dx.doi.org/ melanoma recognition in dermoscopy images via very deep residual
10.1007/s00403-010-1051-6 networks,” IEEE Trans. Med. Imag., vol. 36, no. 4, pp. 994–1004,
[8] M. Zortea et al., “Performance of a dermoscopy-based computer Apr. 2017.
vision system for the diagnosis of pigmented skin lesions com- [28] A. Esteva et al., “Dermatologist-level classification of skin cancer with
pared with visual evaluation by experienced dermatologists,” Artif. deep neural networks,” vol. 542, pp. 115–118, 2017.
Intell. Med., vol. 60, no. 1, pp. 13–26, 2014. [Online]. Available: [29] T. Majtner, S. Yildirim-Yayilgan, and J. Y. Hardeberg, “Combining
http://www.sciencedirect.com/science/article/pii/S0933365713001589 deep learning and hand-crafted features for skin lesion classification,”
[9] J. López-Labraca, M. Á. Fernández-Torres, I. González-Dı́az, F. Dı́az-de in Proc. 6th Int. Conf. Image Process. Theory, Tools Appl., Dec. 2016,
Marı́a, and Á. Pizarro, “Enriched dermoscopic-structure-based cad system pp. 1–6.
for melanoma diagnosis,” Multimedia Tools Appl., Jun. 2017. [Online]. [30] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J.
Available: https://doi.org/10.1007/s11042-017-4879-3 Winn, and A. Zisserman, “The pascal visual object classes challenge:
[10] N. Codella, J. Cai, M. Abedini, R. Garnavi, A. Halpern, and J. R. Smith, A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136,
Deep Learning, Sparse Coding, and SVM for Melanoma Recognition in Jan. 2015.
Dermoscopy Images. Cham, Switzerland: Springer, 2015, pp. 118–126. [31] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
[Online]. Available: https://doi.org/10.1007/978-3-319-24888-2_15 vol. 20, no. 3, pp. 273–297, Sep. 1995. [Online]. Available: https://
[11] L. Yu, H. Chen, Q. Dou, J. Qin, and P. A. Heng, “Automated melanoma doi.org/10.1007/BF00994018
recognition in dermoscopy images via very deep residual networks,” IEEE [32] A. Marghoob, J. Malvehy, R. Braun, and A. Kopf, An Atlas of Dermoscopy
Trans. Med. Imag., vol. 36, no. 4, pp. 994–1004, Apr. 2017. (Encyclopedia of Visual Medicine). Boca Raton, FL, USA: CRC Press,
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification 2004.
with deep convolutional neural networks,” in Advances in Neural Informa- [33] D. Pathak, P. Krähenbühl, and T. Darrell, “Constrained convolutional
tion Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. neural networks for weakly supervised segmentation,” in Proc. IEEE Int.
Q. Weinberger, Eds. Red Hook, NY, USA: Curran Associates, Inc., 2012, Conf. Comput. Vis., 2015, pp. 1796–1804.
pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/4824- [34] X. Glorot and Y. Bengio, “Understanding the difficulty of train-
imagenet-classification-with-deep-conv olutional-neural-networks.pdf ing deep feedforward neural networks,” in Proceedings of the Thir-
[13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for teenth International Conference on Artificial Intelligence and Statis-
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog- tics (Proceedings of Machine Learning Research), Y. W. Teh and
nit., Las Vegas, NV, USA, Jun., 2016, pp. 770–778. [Online]. Available: M. Titterington, Eds., vol. 9. Sardinia, Italy: PMLR, May 2010,
https://doi.org/10.1109/CVPR.2016.90 pp. 249–256.
[14] R. Girshick, “Fast R-CNN,” in Proc. Int. Conf. Comput. Vis., 2015, [35] G. Argenziano, H. P. Soyer, and V. D. Giorgi, Interactive Atlas of Der-
pp. 1440–1448. moscopy. Milan, Italy: Edra Medical Publishing and New Media, 2002.
[15] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for [36] “International skin imaging collaboration: Melanoma project,” ISIC Arch.,
semantic segmentation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 2017. [Online]. Available: https://isic-archive.com/
39, no. 4, pp. 640–651, Apr. 1, 2017. [37] K. Matsunaga, A. Hamada, A. Minagawa, and H. Koga, “Image clas-
[16] O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Net- sification of melanoma, nevus and seborrheic keratosis by deep neu-
works for Biomedical Image Segmentation. Cham, Switzerland: Springer, ral network ensemble,” arXiv:1703.03108, 2017. [Online]. Available:
2015, pp. 234–241. [Online]. Available: https://doi.org/10.1007/978-3-319- http://arxiv.org/abs/1703.03108
24574-4_28 [38] I. González-Dı́az, “Incorporating the knowledge of dermatologists
[17] M. Varma and A. Zisserman, “A statistical approach to texture classifica- to convolutional neural networks for the diagnosis of skin le-
tion from single images,” Int. J. Comput. Vis., vol. 62, no. 1–2, pp. 61–81, sions,” arXiv:1703.01976, 2017. [Online]. Available: http://arxiv.org/
2005. abs/1703.01976
[18] M. Abedini, Q. Chen, N. Codella, R. Garnavi, and X. Sun, “Ac- [39] A. Menegola, J. Tavares, M. Fornaciali, L. T. Li, S. E. F. de Avila, and E.
curate and scalable system for automatic detection of malignant Valle, “RECOD titans at ISIC challenge 2017,” arXiv:1703.04819, 2017.
melanoma,” in Digital Imaging and Computer Vision. Boca Raton, FL, [Online]. Available: http://arxiv.org/abs/1703.04819
USA: CRC Press, Sep. 2015, pp. 293–343. [Online]. Available: http:// [40] L. Bi, J. Kim, E. Ahn, and D. Feng, “Automatic skin lesion anal-
dx.doi.org/10.1201/b19107-11 ysis using large-scale dermoscopy images and deep residual net-
[19] H. Zare and M. Taghi Bahreyni Toossi, “Early detection of melanoma in works,” arXiv:1703.04197, 2017. [Online]. Available: http://arxiv.org/
dermoscopy of skin lesion images by computer vision-based system,” in abs/1703.04197
Digital Imaging and Computer Vision. Boca Raton, FL, USA: CRC Press, [41] X. Yang, Z. Zeng, S. Y. Yeo, C. Tan, H. L. Tey, and Y. Su, “A novel
Sep. 2015, pp. 345–384. multi-task deep learning model for skin lesion segmentation and clas-
[20] G. Fabbrocini et al., Automatic Diagnosis of Melanoma Based on the sification,” arXiv:1703.01025, 2017. [Online]. Available: http://arxiv.org/
7-Point Checklist. Berlin, Germany: Springer, 2014, pp. 71–107. [Online]. abs/1703.01025
Available: http://dx.doi.org/10.1007/978-3-642-39608-3_4 [42] P. Kisilev, E. Sason, E. Barkan, and S. Hashoul, Medical Image De-
[21] F. Nachbar et al., “The ABCD rule of dermatoscopy,” J. Amer. Acad. scription Using Multi-Task-Loss CNN. Cham, Switzerland: Springer,
Dermatology, vol. 30, no. 4, pp. 551–559, 2016. [Online]. Available: 2016, pp. 121–129. [Online]. Available: https://doi.org/10.1007/978-3-319-
http://dx.doi.org/10.1016/S0190-9622(94)70061-3 46976-8_13

You might also like