0% found this document useful (0 votes)
39 views11 pages

A Deep Learning Approach Based On Explainable Artificial Intelligence For Skin Lesion Classification

3urkgfegf

Uploaded by

ABHINAV MANDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views11 pages

A Deep Learning Approach Based On Explainable Artificial Intelligence For Skin Lesion Classification

3urkgfegf

Uploaded by

ABHINAV MANDA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 22 September 2022, accepted 22 October 2022, date of publication 26 October 2022, date of current version 4 November 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3217217

A Deep Learning Approach Based on


Explainable Artificial Intelligence
for Skin Lesion Classification
NATASHA NIGAR 1 , MUHAMMAD UMAR2 , MUHAMMAD KASHIF SHAHZAD3 ,
SHAHID ISLAM 1 , AND DOUHADJI ABALO4
1 Department of Computer Science (RCET), University of Engineering and Technology, Lahore 39161, Pakistan
2 Tekhqs Inc., Irvine, CA 92620, USA
3 Power Information Technology Company (PITC), Ministry of Energy, Power Division, Government of Pakistan, Lahore 39161, Pakistan
4 Départment of Mathematics, University of Lomé, Lomé, Togo

Corresponding author: Douhadji Abalo ([email protected])

ABSTRACT The skin lesion types result in delayed diagnosis due to high similarity in early stages of the
skin cancer. In this regard, deep learning algorithms are well-recognized solutions; however, these black box
approaches result in lack of trust as dermatologists are unable to interpret and validate the decisions made by
the models. In this paper, an explainable artificial intelligence (XAI) based skin lesion classification system is
proposed to improve the skin lesion classification accuracy. This will help the dermatologists to make rational
diagnosis in the early stages of skin cancer. The proposed XAI model is validated using International Skin
Imaging Collaboration (ISIC) 2019 dataset. The developed model correctly identifies the eight types of skin
lesions (dermatofibroma, squamous cell carcinoma, benign keratosis, melanocytic nevus, vascular lesion,
actinic keratosis, basal cell carcinoma and melanoma) with classification accuracy, precision, recall and F1
score as 94.47%, 93.57%, 94.01%, and 94.45% respectively. These predictions are further analyzed using
the local interpretable model-agnostic explanations (LIME) framework to generate visual explanations that
match a prior belief and general explanation best practices. The explainability integrated within our model
will enhance its applicability in real clinical practice.

INDEX TERMS Explainable artificial intelligence, skin lesion classification, deep learning.

I. INTRODUCTION first two main categories of this hierarchy are: melanocytic


The skin cancer is a type of cancer that affects the surface of and non-melanocytic. Melanocytic (i.e., pigmented) or non-
the skin. More than 5 million people in the United States have melanocytic (i.e., non-pigmented) is based on the presence
been diagnosed with skin cancer [1]. Thus, the improvement or the lack of melanocytes and melanin pigment in the
in the diagnostic accuracy and the rate of early diagnosis lesion, respectively. Melanocytic lesions have 8 global fea-
is a crucial task. In this regard, both medical experts and tures which aid in the detailed classification of pigmented
researchers are putting their great efforts in advancing medi- skin lesions, and 14 local features that give more accu-
cal diagnosis, treatments, and examinations [2]. rate information about a given lesion [4]. Non-melanocytic
Skin lesion is the abnormal appearance or growth of lesions can appear yellow or orange due to keratin; or red,
skin compared to the skin area around it. Lesions can dif- purple, blue and black due to hemoglobin [5]. Lesions could
fer in type, texture, color, shape, affected location and dis- be cancerous (i.e., malignant) or non-cancerous (i.e., benign).
tribution. They are classified into 2,032 categories that is Dermoscopy is one of the most widely used skin imaging
organized into a hierarchy [3] as shown in Fig. 1. The techniques to improve the diagnostic performance and reduce
skin cancer deaths [6]. It is a non-invasive method in which
The associate editor coordinating the review of this manuscript and a magnified and well illuminated picture of skin is taken to
approving it for publication was Okyay Kaynak . clearly see and understand the lesion area [7]. This technique
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 10, 2022 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 113715
N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 1. Skin lesions hierarchy.

is usually used to diagnose the skin cancer in early stages does not give clear explanation for its conclusions. The lack
and enhances the diagnostic ability of the doctors. Usually, of model transparency associated with DL algorithms in the
dermatologists analyze the dermoscopic images (aka biomed- complete cycle of decision-making cannot be neglected in
ical images) through visual inspection, which requires a high skin cancer diagnosis. It is, therefore, needed to develop such
degree of skill and concentration, and is time-consuming and robust approaches to better understand the black box deci-
prone to operator bias [8]. The reason is that the skin infected sions. Such approaches are commonly referred to as inter-
parts and normal moles are so similar that sometimes it is hard pretable deep learning or XAI [18].
to make an accurate diagnosis. In this study, the state-of-the-art pre-trained deep learning
In order to assist the dermatologists to diagnose the skin algorithm ResNet-18 [19] is applied on on ISIC 2019 dataset
cancer, many computer aided diagnosis (CAD) systems [9], classifying 8 skin lesions, using LIME as an explanation
[10], [11], [12], [13] have been developed, not only bypassing method with enhanced explanation and accuracy. We train the
aforementioned issues but also improving the accuracy, effi- model deeply to resolve the problem of imbalance dataset and
ciency and objectivity of the diagnosis system. In this regard, showed their effect on the accuracy of the model. In summary,
deep learning (DL) algorithms have shown promising results we present a robust model with enhanced accuracy with the
and large potential for image processing and data analysis. involvement of XAI techniques in the skin cancer diagnosis
DL has been widely used due to its popularity and unique fea- and makes the following main contributions:
tures in many complex domains e.g., detection, identification, • Model Transparency: An XAI model is developed
classification, and recognition of objects [14]. It is a machine employing LIME framework and ResNET-18 i) to
learning (ML) technique that adds more ‘depth’ (complexity) explain that why a deep learning model is predicting
into the model and transforms the data using various functions particular skin lesion, and ii) to increase the model accu-
that allow data representation in a hierarchical way, through racy which can lead to increase the level of trust, thus,
several levels of abstraction [15]. DL can solve more complex increasing the safety of the diagnostic system.
problems in a fast and efficient manner due to more complex • Data Set: The developed approach is tested with 25,331
models employed [16]. DL algorithm such as convolutional dermoscopic images using ISIC 2019 dataset.
neural networks (CNNs) and image processing techniques are
the most important part of common CAD systems [17]. This paper is organized as follows: Section II overviews
However, the use of such CAD systems by dermatolo- the background and related work highlighting the merits
gists and patients remains doubtful because the processing and limitations of existing methods. The developed model
cycle behind model learning and features encoding is not is explained in section III followed by experimental analysis
well understood. The DL model without a rational expla- in section IV. The threats to validity of this study are in
nation is a barrier for dermatologists in accurate decision section V. Finally, section VI concludes this study with future
making. Occasionally, the experts find it difficult to under- directions.
stand the predictions made by the model. For example, a DL
model with 87% accuracy result for the diagnosis of skin II. LITERATURE REVIEW
cancer, is frequently difficult to understand that why the DL A. BACKGROUND
model produces inaccurate results in the remaining 13% of 1) LIME FRAMEWORK
cases, and how to improve these decisions. The DL models In this paper, LIME (local interpretable model-agnostic
are not always similar or representational of dermatologists’ explanations) is used as an XAI (eXplainable AI) method.
decision-making processes. Hence, these models are often It is a post hoc method which is applied after the model is
deemed as a ‘black box’ nature of ML algorithms, which trained [31]. Moreover, model-agnostic refers to the group of

113716 VOLUME 10, 2022


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

TABLE 1. Research matrix of related work.

explainers that are not specifically designed for a certain ML the explanation’s closeness to the original model’s predic-
algorithm and has wide scope [31]. tion. In addition, G refers to a group of potentially inter-
The LIME [32] is a popular technique for interpreting and pretable models, such as decision trees. The neighbourhood
explaining the black box decisions made by the ML algo- size around the initial instance x is defined by the proximity
rithms. The objective of LIME is to train surrogate mod- measure x.
els locally and explain an individual prediction [32]. The
high-level structure of LIME is presented in Fig. 2. At the B. RELATED WORK
first step, a synthetic data set is generated by permuting the According to World Health Organization (WHO) [34], cancer
samples around an instance from a normal distribution in a is expected to be the leading cause of death (13.1 million)
random manner. This perturbed dataset is used by LIME to by 2030. The skin cancer is common in human beings which
train an interpretable model (e.g., linear regression) followed arises from the skin due to the abnormal growth of the cells
by corresponding predictions are gathered using the black that can easily invade and spread to the other parts of the
box model. human body [35].
Linear regression is used to estimate relationships amongst Different methods have been presented and implemented
dependent variables and multiple independent variables by in healthcare domain with focus on skin lesion classification
utilizing a regression line as shown in Eq. (1). over recent years. In this regard, Chowdhury et al. [20] used
a custom CNN identifying 7 classes of skin diseases using
y = a + bxi (1) HAM10000 dataset [36]. They used CAM [37] as an XAI
method and maximum achieved accuracy is 82.7% and 78%
where y is dependent variable and x is independent variable,
of precision. Esteva et al. [21] used CNN to identify 7 classes
a is intercept, b is slope of the line and i = 1, 2, . . . , n.
while using ISIC 2018 dataset and Backpropagation [38]
The main purpose of this equation is to predict the value of
as explainable method. They achieved 94% Area Under
target variable from given predictor variables. Further, the
Curve (AUC). Li et al. [22] used CAM [37] as an explain-
number of important features is given as input (K ) to LIME
able method using ISIC 2017 dataset to detect 7 classes
to generate the explanation. The model is easier to understand
of skin diseases. However, they used Wilcoxon’s sign rank
with lower value of K . There are many techniques to select
test [39] to differentiate their results. Li et al. [6] incor-
the K important features e.g., backward or forward selection
porated Occlusion [40] as explainable method using ISIC
of features and highest weights of linear regression coeffi-
2018 dataset to diagnose 7 classes of skin diseases with accu-
cients. The forward feature selection method is used by LIME
racy rate of 85%, while using an ensembled VGG16 [41] and
for small datasets having less than 6 attributes. For higher
ResNet-50 [19].
dimensional datasets, it uses highest weights approach [33].
Nunnari et al. [23] utilized GradCAM [42] as an explain-
The mathematical formulation of LIME is stated in
able method with ISIC 2019 dataset and classifying 8 skin
Eq. (2).
classes. They also used VGG16 [41] and ResNet-50 [19] as
explanation(x) = argmin g ∈ GL(f , g, πx) + (g) (2) explanation models with 72.2% and 76.7% accuracy, respec-
tively. Sadeghi et al. [24] used ResNet-50 [19] to identify
where x represents the instance to be explained and g repre- 4 skin classes with 1021 dermoscopic images. They incorpo-
sents the interpretable model, the loss function L, also known rated Content-Based Image Retrieval (CBIR) [43] as expla-
as the fidelity function (e.g., mean squared error), calculates nation method, with accuracy rate of 60.94%. Xie et al. [25]

VOLUME 10, 2022 113717


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 2. The workflow of LIME method.

used CAM [37] as an explanation method to classify 3 skin using the data augmentation approach. The data augmenta-
diseases with a modified version of deep CNN and achieved tion and picture preparation techniques considerably improve
average accuracy rate of 90.4%. They used ISIC 2017 and the classification rates, according to their research. They were
PH2 [44] datasets. Yang et al. [26] used ResNet-50 [19] able to reach 93% precision, 92% accuracy and 92% recall.
along with CAM [37] as explanation method to classify Salido et al. [12] proposed technique automatically seg-
2 skin diseases using ISIC 2017 dataset with accuracy mented the skin lesion after pre-processing the photos by
rate of 83%. Young et al. [27] used both GradCAM [42] removing undesirable elements such as hair. They con-
and Kernel SHAP [45] as explanation methods using structed a deep CNN after eliminating artifacts and noise from
HAM10000 dataset [36], to identify 2 skin diseases with the images. Their tests revealed that the processed photos had
accuracy rate of 85%. Zunair et al. [28] used VGG16 [41] a high level of categorization accuracy. They were able to
to classify 2 skin diseases using ISIC 2016 dataset and reach 93% accuracy and sensitivity in the 84-94% range.
CAM [37] as explanation method with sensitivity 91.76% and Shahin et al. [10] proposed a framework based on deep
AUC 81.18%. neural network that follows an ensemble method to skin
In our work, we also compare our model accuracy with lesion classification by integrating Inception V3 and ResNet-
the studies who have not applied XAI method. In this con- 50 architectures. To train the algorithm, they used the ISIC
text, Brinker et al. [17] deep learning system outperformed 2018 dataset. On the same dataset of dermoscopic images, the
136 out of 157 experienced dermatologists of the hospitals in system was tested and validated. The validation experimental
a German university. When the system’s results were com- results achieved an accurate classification rate with a valida-
pared to those of board-certified dermatologists, the system tion accuracy of up to 89.9%. Sherif et al. [13] also employed
outperformed 136 of 157 in the melanoma detection chal- deep CNN for melanoma classification and detection. To train
lenge. They used 12,378 images from the ISIC dataset for the system, they used the ISIC 2018 dataset. The system was
training the network. A total of 100 images were utilised to tested and validated on the same dermoscopic images dataset.
compare the system’s performance against that of human spe- They were able to reach 96.67% accuracy.
cialists. They used the Local Outlier Factor (LOF) approach Ünver et al. [30] used latest deep learning algorithm for
to find outliers. The specificity of the network was 86.5% melanoma detection. You Only Look Once (YOLO) [46]
as compared with the human experts who got only 60%. and GrabCut algorithm [47] was used to detect and segment
The sensitivity was also 74.1% for both doctors and network the melanoma affected body parts. The YOLO is used for
system. detection purposes which has great detection results. It’s very
Kassem et al. [9] explained skin lesion classification into fast and computationally inexpensive [46]. After this GrabCut
eight classes. In their research, they employed the ISIC algorithm was applied to segment the detected area on image.
2019 dataset for testing and training. They demonstrated that They used PH2 and ISBI 2017 datasets and got an accuracy
image augmentation and transfer learning can improve clas- of 93.39%.
sification rates. Their results show 94.2% accuracy, 74.5% Table 1 presents the summary of these works. It can be
sensitivity, 96.5% specificity, 73.62% precision and 74.04% observed that most of the researchers have used CAM [37]
F1 score using image augmentation techniques. When they as model explainability method with not so high accuracy.
applied additional image augmentation steps and modified Only 1 study has considered ISIC 2019 dataset (with large
GoogleNet architecture, the results obtained were 94.92% number of images). This motivates our research to develop a
accuracy, 79.8% sensitivity, 97% specificity, 80.36% preci- robust XAI based model with the goal to achieve AI model
sion and 80.07% F1 score. transparency, traceability, and improvement in skin lesion
Kasani et al. [29] compared various deep learning archi- classification.
tectures for melanoma diagnosis. They tested the most recent
deep learning architectures for melanoma detection in dermo- III. PROPOSED METHODOLOGY
scopic images. They used image pre-processing to improve In this section, we explain the proposed methodology. The
image quality and remove noise. Overfitting was reduced flow is shown in Fig. 5 and steps are explained below.

113718 VOLUME 10, 2022


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 3. The workflow of proposed methodology.

FIGURE 4. Data augmented preview.

A. DERMOSCOPY IMAGE PRE-PROCESSING applications like image display, compression, and progres-
Due to the intricacy of digital pictures, the detection of sive transmission. During upsampling or downsampling pro-
malignancy by visual evaluation becomes complicated. As a cesses, a two-dimensional (2D) representation is kept the
result, effective image processing techniques are required to same while the spatial resolution is reduced or increased,
assist clinicians in properly diagnosing skin lesions. In this respectively. On the other hand, cropping is a technique used
study, the training set contained more than 25,000 skin lesion to find the ROI in an image by framing around and clipping
images of different resolutions [9]. As the resolution of all the area.
lesion images is greater than 299 × 299, it was necessary
to extract the region of interest (ROI) and get rid of unnec- 2) IMAGE RESIZING WITH ADDING ZERO-PADDING
essary/redundant regions from each image. Therefore, these The data obtained from the ISIC archive [9] is not always
images are cropped automatically and processed before using ready to directly feed into the algorithm which requires struc-
the images in classification algorithm. This pre-processing tured, clean, and meaningful data. To overcome this problem,
step is necessary to reduce the computation time and all images are resized from the archive to 224 × 224 without
increasing the effective performance and reliability of the losing any feature. The pseudo-code for this process is as
classifier. follows:
1) IMAGE RESAMPLING AND CROPPING 1) Identify which side of the image is short.
This step applies image resampling and cropping to the 2) Find the difference between two sides.
images. Image resampling is a technique used to manip- 3) Take half of the difference.
ulate the size of an image. Increasing the size of the 4) Do padding by putting number of zeros to short sides
image is called upsampling while decreasing the size is by adding half of the difference.
called downsampling. These two techniques are essential for 5) Resize the image to 224 × 224.

VOLUME 10, 2022 113719


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

B. DATA AUGMENTATION TABLE 2. ISIC dataset 2019 [9] distribution.

In this step, the data augmentation technique is employed.


In image classification, this equates to rotating, flipping, and
cropping the picture. The ISIC dataset was supplemented
with several random modifications to make the most of our
limited training samples and improve the model’s accuracy.
Furthermore, data augmentation is intended to aid in pre- TABLE 3. Hyperparameters of the Resnet-18 classifier.
venting overfitting (a typical problem in ML with limited
datasets in which the model learn patterns that do not apply
to new data) and, as a result, improve the model’s capacity
to generalise. Model overfitting can also be avoided by using
an early stopping criterion [48]. The Fig. 4 shows the data
augmentation of few dataset instances.

C. FEATURES EXTRACTION USING RESNET-18


Existing algorithms require manual feature extraction, pre-
processing and calculate only numeric values. To pass these
cumbersome steps and make the algorithm to do the feature 3) PERFORMANCE MEASURES
extraction itself, we use transfer learning algorithm ResNet- To evaluate the performance of classifiers, common quantita-
18 [19] which is a specialized version of CNN. The general tive metrics are presented in this section. For classification
architecture of the algorithm is shown in Fig. 5. problems, results are categorized as either normal case or
abnormal, named as positive class or negative class, respec-
D. PREDICTION EXPLAINABILITY tively. The prediction results can also be either true or false,
In this step, the LIME framework is applied which is an implying correct prediction or incorrect prediction, respec-
approach for explaining individual predictions that uses a tively. Thus, we can categorize classification into below
local, interpretable model to approximate any black box ML four possible states which is commonly known as confusion
model. We perturbed the original data points, fed them into matrix [49].
a black box model, and then observed the outcomes. The i) True positive (TP) : Correct prediction of positive class
technique then weights the additional data points based on ii) True negative (TN) : Correct prediction of negative
their distance from the original location. Finally, it uses those class
sample weights to train a surrogate model on the dataset, such iii) False positive (FP) : Incorrect prediction of positive
as linear regression. The newly trained explanation model class
may then be used to explain each of the original data points. iv) False negative (FN) : Incorrect prediction of negative
class
IV. EXPERIMENTAL ANALYSIS Based on the confusion matrix, the Accuracy, Precision,
A. EXPERIMENTAL SETUP Recall and F1 score are calculated as below:
1) DATASET
TP + TN
The developed model is evaluated on the skin lesion clas- Accuracy = (3)
sification using ISIC 2019 dataset. This dataset is publicly FP + TN + TP + FN
TP
available and comprises of 25,331 RGB images. It is divided Precision = (4)
into 8 classes namely: melanocytic nevus (NV), melanoma TP + FP
TP
(MEL), benign keratosis (BKL), basal cell carcinoma (BCC), Recall = (5)
squamous cell carcinoma (SCC), vascular lesion (VASC), TP + FN
dermatofibroma (DF), and actinic keratosis (AKIEC). The The F1 score is the harmonic mean of precision and recall:
images are distributed as NV : 12,875, MEL : 4,522, BKL
: 2,624, BCC : 3,323, SCC : 628, VASC : 253, DF : 239 and recall −1 + precision−1 −1 precision.recall
F1 = ( ) = 2.
AKIEC : 867. All dataset images are labelled with one type of 2 precision + recall
skin lesion (Table 3). In Fig. 6, we depict several forms of skin (6)
cancer. This dataset is one of the most difficult to categorise
into eight classes with an uneven number of images in each 4) EXPERIMENTAL ENVIRONMENT
class. In our experiments, it took about 24 hours to train the ResNet-
18 model with NVIDIA GeForce GTX 1650 GPUs. All the
2) PARAMETERS experiments are implemented in Python, running on a per-
Table 3 shows the hyperparameters of the Resnet-18 classifier sonal computer with Intel core i5, 3.2 GHz CPU and 16 GB
used in the experiments. RAM.

113720 VOLUME 10, 2022


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 5. Resnet-18 transfer learning algorithm layers.

FIGURE 6. An illustration of ISIC 2019 skin lesions instances.

all cases requires some image pre-processing techniques to


apply before feeding into any deep learning algorithm. We did
many experiments and tried various techniques to solve the
complexity of skin lesions classes.
Regarding model selection, we compare ResNet-18 with
Inception v3 using ISIC 2019 dataset. ResNet is one
of the most powerful deep neural networks which has
achieved fantabulous performance for classifications prob-
FIGURE 7. Models comparison using ISIC 2019 dataset. lems [19]. The Inception v3 [50] is a pre-trained model
on the ImageNet datasets. It has also shown better per-
B. RESULTS AND DISCUSSION formance for images classification tasks as compared to
The skin cancer detection is complicated by irregular forms other deep learning algorithms [50]. The results indicate that
of skin lesions, various types of colours on each skin, and ResNet-18 outperforms Inception v3 in terms of accuracy,
defining the ROI on each dermoscopic picture. The detec- precision, recall and F1 score as shown in Fig. 7. There-
tion of minute changes on the skin requires expertise in this fore, we select ResNet-18 as our final model for training
field. However, the human eye may not always catch these purposes.
tiny changes. Many lives can be saved by assisting doctors In first part of the experiments, 8000 images are used that
with computer vision and deep learning techniques. With were not pre-processed before feeding the algorithm. The
this motivation, we studied skin cancer malignancy detec- purpose is to examine the performance of Resnet-18 algo-
tion to classify skin lesions and identify malignant cases. rithm based on the existence of the noise and other artifacts
The pre-training settings and post-training measurements to see how much it tolerates the noise. Images were randomly
of all experiments showed that the skin cancer malignancy split into training and testing subsets. We obtained 0.75 F1
detection is a difficult task and generalizing a model for score (75%) for classification accuracy.

VOLUME 10, 2022 113721


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 8. Developed model working example.

FIGURE 9. Model learing rate vs loss.

In the second part of experiments, 1600 pre-processed


and augmented images are used for training, and 240 pre-
processed and augmented images for testing. After this, the
classification algorithm is trained. The performance measures
were accuracy, precision, recall and F1 score, while the val- FIGURE 10. Confusion matrix of our developed model.
ues of these measures were 94.47%, 93.57%, 94.01%, and
94.45% respectively. As compared to the first part of experi- MEL : 515, NV : 559, SCC: 535, and VASC : 542. It means
ment, it shows higher recall and F1 score average values. This that our developed model has correctly identified the respec-
indicates that the image pre-processing has a profound impact tive disease with good number of percentages.
on the classification algorithm by making the ROI more clean,
distinguishable, obvious, and easy to capture so that the algo- V. THREATS TO VALIDITY
rithm could extract better features about the image and learn This study is to help the dermatologists in the early assess-
better. ment of skin cancer. However, there are some limitations to
The developed model working example is shown in Fig. 8. this work. First, we have not considered the large or dif-
The visual representation of results (Fig. 11) show that our ferent datasets. Second, we have used only one pre-trained
developed model detected each infected image correctly with network in our work. The model extension to incorporate
100% confidence. This result is a good indicator for the more advanced pre-trained models could result in improved
potential of such a technology to classify predictions accu- classification performance. Third, the more training data
rately and eventually help physicians increase their diagnos- could lead to better results. The resizing of the images to
tic prediction power. We also present the learning rate (log) very small patches could affect the classifier’s performance.
against loss (Fig. 9); it can be seen that as the learning rate It may deteriorate some useful information from the lesions
increase, there is a point where the loss stops decreasing when images are downsized. To balance the dataset, the
and learning rate starts to increase. The confusion matrix is classifier’s performance could also be affected by decreas-
shown in Fig. 10. It can be observed that true positives for ing the total number of samples available for training and
8 classes are, AKIEC : 548, BCC : 544, BKL : 537, DF : 528, validation.

113722 VOLUME 10, 2022


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

FIGURE 11. Infected images detection by our developed model with 100% confidence.

VI. CONCLUSION AND FUTURE WORK detection can be made more accurate through skin lesion
Skin cancer is the most common type of cancer and a major classifiers. Consequently, the ML driven solution has the
health and economic concern. The dermatologists exam- potential to save many lives by assisting in the early detection
ines patients individually with the naked eye or a magni- of malignant lesions, assisting in decision-making, reducing
fying glass for the skin cancer diagnosis. However, with diagnostic costs, and reducing money spent on treatment.
the advancements in the field of ML, early skin cancer This offers great help for the doctors and patients and can

VOLUME 10, 2022 113723


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

be offered through smartphone apps, websites, or hospital [8] D. Thanh and S. Dvoenko, ‘‘A denoising of biomedical images,’’ Int. Arch.
stations. Photogramm., Remote Sens. Spatial Inf. Sci., vol. 40, no. 5, p. 73, 2015.
[9] M. A. Kassem, K. M. Hosny, and M. M. Fouad, ‘‘Skin lesions classification
In this paper, transfer learning and the pre-trained deep into eight classes for ISIC 2019 using deep convolutional neural network
neural network Resnet-18 is used to develop the ML model and transfer learning,’’ IEEE Access, vol. 8, pp. 114822–114832, 2020.
using ISIC 2019 dataset. This model is capable of accurately [10] A. H. Shahin, A. Kamal, and M. A. Elattar, ‘‘Deep ensemble learning for
skin lesion classification from dermoscopic images,’’ in Proc. 9th Cairo
classifying eight different types of lesions with accuracy, pre- Int. Biomed. Eng. Conf. (CIBEC), Dec. 2018, pp. 150–153.
cision, recall, and F1 score as 94.47%, 93.57%, 94.01%, and [11] Y. Li and L. Shen, ‘‘Skin lesion analysis towards melanoma detection using
deep learning network,’’ Sensors, vol. 18, no. 2, p. 556, Feb. 2018.
94.45%, respectively. Moreover, LIME framework is used to [12] J. A. A. Salido and C. Ruiz, ‘‘Using deep learning for melanoma detec-
present the useful explanations to support rational decisions. tion in dermoscopy images,’’ Int. J. Mach. Learn. Comput., vol. 8, no. 1,
The visual explanations are capable of demonstrating model’s pp. 61–68, Feb. 2018.
[13] F. Sherif, W. A. Mohamed, and A. Mohra, ‘‘Skin lesion analysis toward
good generalisation as well as biases learned from the outlier melanoma detection using deep learning techniques,’’ Int. J. Electron.
images. Moreover, these insights enable researchers and field Telecommun., vol. 65, no. 4, pp. 597–602, 2019.
experts to better understand the rational associated with skin [14] T. A. Kumar, R. Rajmohan, M. Pavithra, S. A. Ajagbe, R. Hodhod, and
T. Gaber, ‘‘Automatic face mask detection system in public transportation
lesion classification resulting from the black-box model’s in smart cities using IoT and deep learning,’’ Electronics, vol. 11, no. 6,
inner working. p. 904, Mar. 2022.
It’s worth mentioning that the availability and quality of [15] J. Schmidhuber, ‘‘Deep learning in neural networks: An overview,’’
Neural Netw., vol. 61, pp. 85–117, Oct. 2014. [Online]. Available:
dataset is critical for training more accurate ML models. The https://www.sciencedirect.com/science/article/pii/S0893608014002135
ISIC 2019 dataset, used in this paper, comprise of 25,331 [16] K. Weiss, T. M. Khoshgoftaar, and D. Wang, ‘‘A survey of transfer learn-
images with 8 skins lesions classes. Due to privacy, these ing,’’ J. Big Data, vol. 3, no. 1, pp. 1–40, Dec. 2016.
[17] T. J. Brinker, A. Hekler, A. H. Enk, J. Klode, A. Hauschild, C. Berking,
datasets require continuous enrichment with patients consent B. Schilling, S. Haferkamp, D. Schadendorf, T. Holland-Letz, J. S. Utikal,
which is not obvious. The proposed approach where ML and C. von Kalle, ‘‘Deep learning outperformed 136 of 157 dermatologists
model is complemented with XAI helps the dermatologist in a head-to-head dermoscopic melanoma image classification task,’’ Eur.
J. Cancer, vol. 113, pp. 47–54, May 2019.
with a visual rational to identify new classes and enrich the [18] F. Doshi-Velez and B. Kim, ‘‘Towards a rigorous science of interpretable
existing datasets with good examples for improved perfor- machine learning,’’ 2017, arXiv:1702.08608.
mance in earliest skin lesions detection. This is a signifi- [19] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
cant contribution in not only improving skin cancer detection Jun. 2016, pp. 770–778.
accuracy but also in identifying the new classes. [20] T. Chowdhury, A. R. S. Bajwa, T. Chakraborti, J. Rittscher, and U. Pal,
As a future work, a more robust model can be developed ‘‘Exploring the correlation between deep learned and clinical features in
melanoma detection,’’ in Proc. Annu. Conf. Med. Image Understand. Anal.
that considers other diseases, as well as opposing examples Cham, Switzerland: Springer, 2021, pp. 3–17.
such as healthy skin, fingers, hair, nose, eyes, and background [21] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,
and S. Thrun, ‘‘Dermatologist-level classification of skin cancer with deep
objects. This addition will help the model in better gener-
neural networks,’’ Nature, vol. 542, no. 7639, pp. 115–118, 2017.
alising features association with a given lesion while ignor- [22] W. Li, J. Zhuang, R. Wang, J. Zhang, and W.-S. Zheng, ‘‘Fusing metadata
ing adjacent features. Moreover, gathering written reports of and dermoscopy images for skin disease diagnosis,’’ in Proc. IEEE 17th
Int. Symp. Biomed. Imag. (ISBI), Apr. 2020, pp. 1996–2000.
lesion observations, both in technical and non-technical lan- [23] F. Nunnari, M. A. Kadir, and D. Sonntag, ‘‘On the overlap between
guages, is another task that would lead to the adoption of this grad-CAM saliency maps and explainable visual features in skin cancer
model. This may also help to create a model to generate image images,’’ in Proc. Int. Cross-Domain Conf. Mach. Learn. Knowl. Extrac-
tion. Cham, Switzerland: Springer, 2021, pp. 241–253.
captions to serve as an image explanation which is important [24] M. Sadeghi, P. K. Chilana, and M. S. Atkins, ‘‘How users perceive content-
for the decision being made. based image retrieval for identifying skin images,’’ in Understanding and
Interpreting Machine Learning in Medical Image Computing Applications.
REFERENCES Cham, Switzerland: Springer, 2018, pp. 141–148.
[1] R. L. Siegel, K. D. Miller, A. G. Sauer, S. A. Fedewa, L. F. Butterly, [25] Y. Xie, J. Zhang, Y. Xia, and C. Shen, ‘‘A mutual bootstrapping model for
J. C. Anderson, A. Cercek, R. A. Smith, and A. Jemal, ‘‘Colorectal cancer automated skin lesion segmentation and classification,’’ IEEE Trans. Med.
statistics, 2020,’’ CA A, Cancer J. Clinicians, vol. 70, no. 3, pp. 145–164, Imag., vol. 39, no. 7, pp. 2482–2493, Dec. 2020.
2020. [26] J. Yang, F. Xie, H. Fan, Z. Jiang, and J. Liu, ‘‘Classification for dermoscopy
[2] S. A. Ajagbe, K. A. Amuda, M. A. Oladipupo, F. A. Oluwaseyi, and K. images using convolutional neural networks based on region average pool-
I. Okesola, ‘‘Multi-classification of Alzheimer disease on magnetic res- ing,’’ IEEE Access, vol. 6, pp. 65130–65138, 2018.
onance images (MRI) using deep convolutional neural network (DCNN) [27] K. Young, G. Booth, B. Simpson, R. Dutton, and S. Shrapnel, ‘‘Deep neural
approaches,’’ Int. J. Adv. Comput. Res., vol. 11, no. 53, p. 51, 2021. network or dermatologist?’’ in Interpretability of Machine Intelligence in
[3] C. Barata, J. S. Marques, and M. E. Celebi, ‘‘Deep attention model for the Medical Image Computing and Multimodal Learning for Clinical Decision
hierarchical diagnosis of skin lesions,’’ in Proc. IEEE/CVF Conf. Comput. Support. Cham, Switzerland: Springer, 2019, pp. 48–55.
Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019, pp. 1–9. [28] H. Zunair and A. B. Hamza, ‘‘Melanoma detection using adversarial train-
[4] H. P. Soyer, G. Argenziano, V. Ruocco, and S. Chimenti, ‘‘Dermoscopy ing and deep transfer learning,’’ Phys. Med. Biol., vol. 65, no. 13, Jul. 2020,
of pigmented skin lesions*(part ii),’’ Eur. J. Dermatol., vol. 11, no. 5, Art. no. 135005, doi: 10.1088/1361-6560/ab86d3.
pp. 483–498, 2001. [29] S. H. Kassani and P. H. Kassani, ‘‘A comparative study of deep learning
[5] B. Ankad, P. Sakhare, and M. Prabhu, ‘‘Dermoscopy of non-melanocytic architectures on melanoma detection,’’ Tissue Cell, vol. 58, pp. 76–83,
and pink tumors in Brown skin: A descriptive study,’’ Indian J. Der- Jun. 2019.
matopathol. Diagnostic Dermatol., vol. 4, no. 2, p. 41, 2017. [30] H. M. Ünver and E. Ayan, ‘‘Skin lesion segmentation in dermoscopic
[6] X. Li, J. Wu, E. Z. Chen, and H. Jiang, ‘‘From deep learning towards images with combination of Yolo and GrabCut algorithm,’’ Diagnostics,
finding skin lesion biomarkers,’’ in Proc. 41st Annu. Int. Conf. IEEE Eng. vol. 9, no. 3, p. 72, Jul. 2019.
Med. Biol. Soc. (EMBC), Jul. 2019, pp. 2797–2800. [31] B. H. M. van der Velden, H. J. Kuijf, K. G. A. Gilhuijs, and M. A. Viergever,
[7] C. Shorten and T. M. Khoshgoftaar, ‘‘A survey on image data augmentation ‘‘Explainable artificial intelligence (XAI) in deep learning-based medical
for deep learning,’’ J. Big Data, vol. 6, no. 1, pp. 1–48, Dec. 2019. image analysis,’’ Med. Image Anal., vol. 79, Jul. 2022, Art. no. 102470.

113724 VOLUME 10, 2022


N. Nigar et al.: Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification

[32] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘‘Why should I trust you?’ MUHAMMAD UMAR received the M.Sc.
Explaining the predictions of any classifier,’’ in Proc. 22nd ACM SIGKDD degree from the Department of Computer Sci-
Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 1135–1144. ence, University of Engineering and Technology,
[33] M. R. Zafar and N. Khan, ‘‘Deterministic local interpretable model- Lahore, Pakistan, in 2020. From 2018 to 2020,
agnostic explanations for stable explainability,’’ Mach. Learn. Knowl. he was working as a Senior a Software Engi-
Extraction, vol. 3, no. 3, pp. 525–541, Jun. 2021. neer at NETSOL Technologies Inc., Lahore.
[34] M. Manandhar, S. Hawkes, K. Buse, E. Nosrati, and V. Magar, ‘‘Gender,
He worked as Software Engineer at B.I.S.E,
health and the 2030 agenda for sustainable development,’’ Bulletin World
Lahore, from 2017 to 2018; and at Ebryx Pvt Ltd.,
Health Org., vol. 96, no. 9, p. 644, 2018.
[35] R. Erol, Skin Cancer Malignancy Classification With Transfer Learning. Lahore, from 2016 to 2017. He is currently work-
Conway, AR, USA: University of Central Arkansas, 2018. ing as a Software Developer with TEK Headquar-
[36] P. Tschandl, C. Rosendahl, and H. Kittler, ‘‘The HAM10000 dataset, a ters, Irvine, CA, USA. He has been awarded the Outstanding Performance
large collection of multi-source dermatoscopic images of common pig- Award and the Certificate of Appreciations during his professional career.
mented skin lesions,’’ Sci. Data, vol. 5, no. 1, pp. 1–9, Dec. 2018. His research interests include artificial intelligence, machine learning, and
[37] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, ‘‘Learning cloud computing.
deep features for discriminative localization,’’ in Proc. IEEE Conf. Com-
put. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2921–2929.
[38] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning internal
representations by error propagation,’’ Inst. Cogn. Sci., Univ. California,
San Diego, La Jolla, CA, USA, Tech. Rep. ADA164453, 1985.
[39] R. F. Woolson, ‘‘Wilcoxon signed-rank test,’’ in Wiley Ency-
clopedia of Clinical Trials. Hoboken, NJ, USA: Wiley, 2008,
pp. 1–3. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/ MUHAMMAD KASHIF SHAHZAD received
10.1002/9780471462422.eoct979 the bachelor’s degree in engineering from UET,
[40] L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling, ‘‘Visualizing Lahore, Pakistan, in 2000, and the master’s and
deep neural network decisions: Prediction difference analysis,’’ 2017, Ph.D. degrees in industrial systems engineer-
arXiv:1702.04595. ing from the University of Grenoble, France, in
[41] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for 2008 and 2012, respectively. He is currently work-
large-scale image recognition,’’ 2014, arXiv:1409.1556.
ing as the Chief Technical Officer (CTO) with
[42] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
Power Information Technology Company (PITC),
gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis. Ministry of Energy, Power Division, Government
(ICCV), Oct. 2017, pp. 618–626. of Pakistan. He has vast experience of working in
[43] V. N. Gudivada and V. V. Raghavan, ‘‘Content based image retrieval sys- large scale European Research and Development projects IMPROVE and
tems,’’ Computer, vol. 28, no. 9, pp. 18–22, Sep. 1995. INTEGRATE. He specializes in designing and delivering technology driven
[44] T. Mendonça, P. M. Ferreira, J. S. Marques, A. R. Marcal, and J. Rozeira, smart grid solutions and is working with USAID in developing solutions to
‘‘PH2 —A dermoscopic image database for research and benchmarking,’’ improve Pakistan’s power sector in deregulated market. He has 38 publica-
in Proc. 35th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), tions and two book chapters. Moreover, he has more than 20 years of profes-
Jul. 2013, pp. 5437–5440. sional experience designing, business process re-engineering, and managing
[45] S. M. Lundberg and S.-I. Lee, ‘‘A unified approach to interpreting model large scale software development projects. His research interests include data
predictions,’’ Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–10. models interoperability, advanced software engineering, technology, smart
[46] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once: grid solutions, and engineering data management.
Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit., May 2016, pp. 779–788.
[47] C. Rother, V. Kolmogorov, and A. Blake, ‘‘‘Grabcut’ interactive foreground
extraction using iterated graph cuts,’’ ACM Trans. Graph., vol. 23, no. 3,
pp. 309–314, 2004.
[48] Y. H. Bhosale and K. S. Patnaik, ‘‘IoT deployable lightweight deep learning
application for COVID-19 detection with lung diseases using Raspber- SHAHID ISLAM received the B.S. and M.S.
ryPi,’’ in Proc. Int. Conf. IoT Blockchain Technol. (ICIBT), May 2022, degrees in computer science from the University
pp. 1–6. of Engineering and Technology, Lahore, Pakistan,
[49] S. Visa, B. Ramsay, A. Ralescu, and E. Van Der Knaap, ‘‘Confusion in 2003 and 2008, respectively. He is currently
matrix-based feature selection,’’ in Proc. CEUR Workshop, vol. 710, 2011,
working as an Assistant Professor with the Rachna
pp. 120–127.
[50] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethink-
College of University of Engineering and Technol-
ing the inception architecture for computer vision,’’ in Proc. IEEE Conf. ogy. His research interests include cloud comput-
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826. ing, machine learning, semantic web, m-learning,
and intelligent agent applications.

NATASHA NIGAR received the Ph.D. degree from


the School of Computer Science, University of
Birmingham, U.K., in 2021. From 2008 to 2009,
she was a Lab Engineer at the National Univer-
sity of Computer and Emerging Sciences, Lahore,
Pakistan. She was a Software Engineer at Palm- DOUHADJI ABALO received the Ph.D. degree
chip Pvt. Ltd., Lahore, from 2009 to 2011, and in mathematics from the University of Lomé,
a Senior Software Quality Assurance Engineer at in 2021. Currently, he is working as an Assis-
Netsol Technologies, Lahore, from 2011 to 2013. tant Professor of mathematics harmonic analyze
She is currently working as an Assistant Professor with the University of Lomé. His research interests
with the Department of Computer Science, University of Engineering and include topology, pure mathematics, matrix theory,
Technology. She was awarded Faculty Development Program Scholarship to and harmonic analysis.
pursue her Ph.D. studies. Her research interests include computational intel-
ligence, optimization in dynamic and uncertain environments, and machine
learning.

VOLUME 10, 2022 113725

You might also like