0% found this document useful (0 votes)
27 views39 pages

(11 Skin Cancer Diagnosis Based On Deep Transfer Learning and Sparrow Search Algorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views39 pages

(11 Skin Cancer Diagnosis Based On Deep Transfer Learning and Sparrow Search Algorithm

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Neural Computing and Applications (2023) 35:815–853

https://doi.org/10.1007/s00521-022-07762-9(0123456789().,-volV)(0123456789().
,- volV)

ORIGINAL ARTICLE

Skin cancer diagnosis based on deep transfer learning and sparrow


search algorithm
Hossam Magdy Balaha1 • Asmaa El-Sayed Hassan2

Received: 19 January 2022 / Accepted: 1 September 2022 / Published online: 23 September 2022
 The Author(s) 2022

Abstract
Skin cancer affects the lives of millions of people every year, as it is considered the most popular form of cancer. In the USA
alone, approximately three and a half million people are diagnosed with skin cancer annually. The survival rate diminishes
steeply as the skin cancer progresses. Despite this, it is an expensive and difficult procedure to discover this cancer type in the
early stages. In this study, a threshold-based automatic approach for skin cancer detection, classification, and segmentation
utilizing a meta-heuristic optimizer named sparrow search algorithm (SpaSA) is proposed. Five U-Net models (i.e., U-Net, U-
Net??, Attention U-Net, V-net, and Swin U-Net) with different configurations are utilized to perform the segmentation
process. Besides this, the meta-heuristic SpaSA optimizer is used to perform the optimization of the hyperparameters using
eight pre-trained CNN models (i.e., VGG16, VGG19, MobileNet, MobileNetV2, MobileNetV3Large, MobileNetV3Small,
NASNetMobile, and NASNetLarge). The dataset is gathered from five public sources in which two types of datasets are
generated (i.e., 2-classes and 10-classes). For the segmentation, concerning the ‘‘skin cancer segmentation and classification’’
dataset, the best reported scores by U-Net?? with DenseNet201 as a backbone architecture are 0.104, 94:16%, 91:39%,
99:03%, 96:08%, 96:41%, 77:19%, 75:47% in terms of loss, accuracy, F1-score, AUC, IoU, dice, hinge, and squared hinge,
respectively, while for the ‘‘PH2’’ dataset, the best reported scores by the Attention U-Net with DenseNet201 as backbone
architecture are 0.137, 94:75%, 92:65%, 92:56%, 92:74%, 96:20%, 86:30%, 92:65%, 69:28%, and 68:04% in terms of loss,
accuracy, F1-score, precision, sensitivity, specificity, IoU, dice, hinge, and squared hinge, respectively. For the ‘‘ISIC 2019
and 2020 Melanoma’’ dataset, the best reported overall accuracy from the applied CNN experiments is 98:27% by the
MobileNet pre-trained model. Similarly, for the ‘‘Melanoma Classification (HAM10K)’’ dataset, the best reported overall
accuracy from the applied CNN experiments is 98:83% by the MobileNet pre-trained model. For the ‘‘skin diseases image’’
dataset, the best reported overall accuracy from the applied CNN experiments is 85:87% by the MobileNetV2 pre-trained
model. After computing the results, the suggested approach is compared with 13 related studies.

Keywords Skin cancer  Melanoma cancer  Non-melanoma cancer  Convolution neural network (CNN) 
Deep learning (DL)  Meta-heuristic optimization  Segmentation  Sparrow search algorithm (SpaSA)

1 Introduction not regularly exposed to sunlight may develop cancer.


Around the world, skin cancer is the most prevalent cancer
Skin cancer is the abnormal growth of cells located in the type. The primary types of skin cancer include ‘‘basal cell
skin that frequently develop on the sun-exposed (i.e., UV carcinoma,’’ ‘‘squamous cell carcinoma,’’ and ‘‘me-
light) regions of the skin. However, skin regions that are lanoma.’’ Every year, the number of diagnosed cases is
over 3.5 million in the United States, which exceeds the
counts of lung, breast, and colon cancers combined. Every
& Hossam Magdy Balaha
[email protected] 57 seconds, one person is diagnosed with skin cancer [48].
Skin cancer can be categorized into two major cate-
1
Computer Engineering and Systems Department, Faculty of gories (i.e., melanoma and non-melanoma) concerning the
Engineering, Mansoura University, Mansoura, Egypt cell type that developed cancer. The different types of each
2
Mathematics and Engineering Physics Department, Faculty category, statistics, risk factors, diagnosis, and treatment
of Engineering, Mansoura University, Mansoura, Egypt

123
816 Neural Computing and Applications (2023) 35:815–853

will be discussed in Sect. 2.1. Early detection and screen- and discussions of the experiments and results. Section 7
ing of skin cancer can lead to a full recovery, as it is with presents the study limitations, and finally, Section 8 con-
every cancer type. Different machine and deep learning cludes the paper and presents the future work.
architectures and approaches had been proposed to perform
the task of skin cancer detection, classification, and seg-
mentation, e.g., support vector machines SVM [24], fuzzy 2 Skin disorders medical survey
C-means [116], recurrent neural networks [111], and deep
neural networks [35]. The largest organ in the body is the skin [133]. It helps
The current study concentrates on skin cancer detection, regulate body temperature, protects against injuries (and
classification, and segmentation. The classification is infections), produces vitamin D, and stores fat (and water).
accomplished using 8 pre-trained convolution neural net- It consists of three main layers. They are (1) Epidermis
work (CNN) models. They are VGG16, VGG19, Mobile- (i.e., the skin outer layer), (2) Dermis (i.e., the skin inner
Net, MobileNetV2, MobileNetV3Large, layer), and (3) Hypo-dermis (i.e., the skin deepest layer).
MobileNetV3Small, NASNetMobile, and NASNetLarge. Skin disorders are the conditions that affect the layers of
The segmentation is done by employing five different the skin [57]. They can cause rashes, sores, itching, or other
U-Net models. They are U-Net, U-Net??, Attention changes. Some skin conditions can be caused by lifestyle
U-Net, V-Net, and Swin U-Net. Additionally, the CNN factors, while others can be a result of genetic factors. Skin
hyperparameters optimization is done using the sparrow disorders differ widely in severity and symptoms, as they
search algorithm (SpaSA) to gain the performance metrics can be genetic or situational causes; permanent or tempo-
of the state-of-the-art (SOTA). rary; painful or painless; and life-threatening or minor [3].

1.1 Paper contributions 2.1 Skin disorders taxonomy

The current study contributions can be recapped in the next Skin-related disorders can be classified into permanent and
points: temporary. Temporary disorders include (1) acne, (2)
contact dermatitis, (3) cold sore, (4) keratosis pilaris, (5)
• Presenting a survey on the skin disorders with a
blister, (6) hives, (7) sunburn, (8) actinic keratosis, (9)
graphical taxonomy.
carbuncle, (10) latex allergy, (11) cellulitis, (12) measles,
• Performing the skin cancer segmentation task using
(13) chickenpox, and (14) impetigo. Permanent disorders
U-Net, U-Net??, Attention U-Net, V-Net, and swin
can be divided into skin cancer and skin diseases (i.e., not
U-Net models.
cancer). Skin diseases are (1) lupus, (2) eczema, (3) rosa-
• Performing the skin cancer classification task using 8
cea, (4) seborrheic dermatitis, (5) psoriasis, (6) vitiligo, and
pre-trained CNN models (i.e., VGG16, VGG19,
(7) melasma. As mentioned before, melanoma and non-
MobileNet, MobileNetV2, MobileNetV3Large,
melanoma are the two main categories of skin cancer
MobileNetV3Small, NASNetMobile, NASNetLarge).
[101]. Types of melanoma cancers incorporate (1) super-
• Utilizing the SpaSA approach for the hyperparameters
ficial spreading melanoma, (2) nodular melanoma, (3) acral
optimization processes.
lentiginous melanoma, (4) lentigo maligna melanoma, and
• Reporting the SOTA performance metrics and compar-
(5) other rare melanomas. Squamous cell carcinoma, basal
ing them with different related studies and approaches.
cell carcinoma, Merkel cell cancer, and cutaneous lym-
phomas are types of non-melanoma skin cancer [8, 45].
1.2 Paper organization Figure 1 shows a taxonomy of skin disorders with graphical
samples.
The rest of the current study is organized as follows:
Section 2 presents a survey of skin disorders. Section 3 2.2 Melanoma skin cancer
presents and summarizes the related studies. In Section 4,
the background is discussed. It represents deep learning Melanoma is one of the deadly cancers in the world. It may
classification, parameters optimization, transfer learning, spread to other body parts if it has not been discovered and
data scaling, data augmentation, segmentation, deep treated in an early phase. In the following subsections, the
learning segmentation, meta-heuristic optimization, and melanoma development, statistics, and other information
performance metrics. In Section 5, a discussion about the related to it will be presented.
methodology, datasets acquisition and pre-processing seg-
mentation phase, learning and optimization, and the overall
pseudo-code is discussed. Section 6 presents the details

123
Neural Computing and Applications (2023) 35:815–853 817

Fig. 1 Skin disorders taxonomy graphical summary

2.2.1 Melanoma development parts. Occasionally, normal moles or nevi already located
on the skin can form a melanoma. In such a case, the mole
The most profound epidermis layer found exactly above undergoes changes that are usually visible (e.g., shape,
the dermis has the melanocytes cells which produce the border, size, or the color of the mole changing) [92]. Scalp,
color (i.e., pigment) of the skin. When healthy melanocytes face, trunk, or torso (i.e., abdomen, back, and chest), arms,
get out of control, a cancerous tumor (i.e., melanoma) is and legs are the most common melanoma locations.
created [25, 102]. This tumor can extend to further body

123
818 Neural Computing and Applications (2023) 35:815–853

Nonetheless, it can develop on the neck, head, and areas Some inherited genetic conditions, previous skin cancer,
that are not exposed to the sun (i.e., anywhere on the body). race or ethnicity, weakened or suppressed immune system
Cutaneous melanoma is considered as the most common are also risk factors for melanoma [84].
type of melanoma type that develops first in the skin. There
are three popular types of it. First, ‘‘superficial spreading 2.2.4 Melanoma early recognition
melanoma’’ is the most popular one that presents up to 70%
of melanomas and commonly develops from a present Recognition of the early warning signs [84] including new
mole. The second one is ‘‘lentigo maligna melanoma’’ that skin growth, a suspicious change in an existing mole or
older people are more likely to develop. Frequently, it nevi, and a non-healing sore in two weeks is very impor-
begins on skin areas that are often sun-exposed. About 15% tant. Changes in the mole size, shape, feel, or color are
of melanomas are diagnosed as ‘‘nodular melanoma’’ rep- often the initial and most important warning signs of
resenting the third melanoma type [83]. It usually arises as melanoma [41, 84, 92]. A guide to the familiar signs of
a bump on the skin. melanoma is known as the ‘‘ABCDE’’ rule [125] and can
More rarely, the mouth, the mucous membranes that line be summarized as follows:
the gastrointestinal tract, and a woman’s vagina can
• A is for asymmetry: One half of a nevus or mole does
develop melanoma. Also, melanoma can develop in the eye
not match the other.
[26]. It is worth mentioning that the rare types are reported
• B is for border: The edges are blurred, notched, ragged,
graphically in Fig. 1.
or irregular.
• C is for color: The color of the mole varies and may
2.2.2 Melanoma Statistics
have black, brown, and tan shades and white, gray, red,
or blue areas.
In the USA, an estimated number of 106, 110 adults (i.e.,
• D is for diameter: The spot is larger than 6 millimeters
62, 260 men and 43, 850 women) are diagnosed with
across (i.e., about 0.25 inch), although melanomas
invasive melanoma in 2021 [29, 114, 115]. Among men
sometimes can be more diminutive than this.
and women, the fifth most common type of cancer is
• E is for evolving: The size, color, or shape of a mole is
melanoma. It is less common in black people than in white
altered. Additionally, when an existing nevus develops
people by 20 times and is one of the most regular types of
melanoma, its texture becomes hard or lumpy.
cancer diagnosed among young adults, specifically women.
The average diagnostic age for it is 65. In 2020, the pre-
dicted melanoma cases diagnosed in people aged 15 to 29 2.2.5 Melanoma diagnosis and treatment
were about 2, 400. Over the past three decades, the number
of diagnosed cases with this type of cancer has increased For melanoma, a biopsy from the lesion (i.e., the suspicious
sharply [44]. area of skin) is the only sure way to diagnose the cancer
The rates increased annually by around 2% from 2008 to [43]. During a biopsy, a sample of tissue is possessed to be
2017. However, the number of melanoma-diagnosed tested in a laboratory. Computed tomography scan, ultra-
teenagers aged 15 to 19 between 2007 and 2016 declined sound, positron emission tomography scan, and magnetic
by 6% a year. The number of adults in their 20’s is resonance imaging are some of the other tests that can be
decreased by 3%. Most of the deaths associated with skin done to diagnose and define the stage of melanoma [47].
cancer are caused by melanoma and approximately repre- Treatment recommendations are depending on numer-
sent 75% of deaths. 7, 180 deaths (i.e., 4, 600 men and ous factors, counting the stage of the melanoma, the
2, 580 women) are estimated to occur in 2021 from mel- thickness of the initial melanoma, whether cancer has
anoma. However, deaths from melanoma have decreased grown or not, rate of melanoma growth, the presence of
from 2014 to 2018 by almost 5% in adults older than 50 specified genetic changes in the affected cells, and some
and by 7% otherwise [114]. other medical circumstances [84]. To obtain a suit-
able treatment arrangement, potential side effects, the
2.2.3 Melanoma risk factors overall health, and preferences of the patient are considered
[122].
Risk factors for melanoma include indoor tanning (i.e., the For people with local melanoma and most people with
people using sun lamps, tanning beds, or tanning parlors regional one, the main treatment is surgery. Radiation
are more likely to grow all types of skin cancer) [95], therapy is another treatment option that employs X-rays or
moles, and fair skin (i.e., people with blue eyes, blond or other particles with high energy to damage cancer cells
red hair, and freckles). Studies show that 10% of melano- [23]. After surgery, it is common to prescribe radiation
mas may be linked to genetic factors or conditions [74]. therapy to avert cancer from coming back (i.e., recurrence).

123
Neural Computing and Applications (2023) 35:815–853 819

Other treatment options involve systematic therapy which 2.3.3 Non-melanoma risk factors
comprises immunotherapy, targeted therapy, and
chemotherapy [112]. Similar to melanoma, indoor tanning, Ultraviolet light
exposure, and people with light-colored skin are more
2.3 Non-melanoma skin cancer likely to develop non-melanoma skin cancer [81]. The risk
of getting non-melanoma skin cancers rises as getting
When healthy cells of the skin mutate and grow out of older. Women are less potential to develop this type of
control, a tumor mass is formed [132]. Non-melanoma skin cancer than men. Exposure to considerable amounts of
cancer can be partitioned into three main types (i.e., ‘‘Basal certain chemicals such as coal tar, arsenic, and paraffin
cells carcinomas,’’ ‘‘squamous cell cancer,’’ and ‘‘Merkel rises the chance of developing skin cancer [70]. People
cell cancer’’). with smoking habits have more potential to grow squamous
cell cancer, particularly on the lips.
2.3.1 Non-Melanoma Development
2.3.4 Non-melanoma diagnosis and treatment
Basal cells can be defined as the round-shaped cells
existing in the lower epidermis. This cell type develops Since it is rare for non-melanoma cancer to expand, a
about 80% of non-melanoma cancer and is defined as biopsy is usually the only test required to analyze and
‘‘basal cell carcinomas’’ [103]. They mostly occur on the acquire the stage of cancer [81]. As mentioned in Sect.
head and neck, and it is mainly produced by exposure to 2.2.5, a biopsy is the small amount of tissue extracted for
the sun. This type of skin cancer rarely expands to other testing beneath a microscope. For non-melanoma skin
body parts as it usually grows gradually. The epidermis is cancer, surgery is the main treatment [81]. It involves
mostly formed up of flat scale-shaped cells called squa- extracting the cancerous part and surrounding skin. Other
mous cells. Approximately 20% of skin cancers arise from treatments include anti-cancer creams, freezing (i.e.,
these cells and are named ‘‘squamous cell carcinomas’’ cryotherapy), photodynamic therapy, and radiotherapy.
[103]. Sun exposure is the main cause of it, so it is diag-
nosed in many skin regions. Also, skin that has been
exposed to X-rays, burned, or damaged by chemicals can 3 Related studies
develop this type of carcinoma. The percentage of squa-
mous cell carcinomas expand to other body parts range Research in the domain of melanoma detection, segmen-
from 2% to 5%. ‘‘Merkel cell cancer’’ is a fast-growing or tation, recognition is still ongoing. Many automated
highly aggressive cancer [96]. It initiates at hormone-pro- approaches and techniques have been proposed to assist in
ducing cells just below the hair follicles and the skin. It is computer-aided diagnosis. Previous related studies can be
commonly discovered in the head and neck area. It is worth categorized into two main classifications: machine learning
mentioning that the rare types are reported graphically in (ML) and deep learning (DL) classification techniques.
Figure 1.
3.1 Classical machine learning-based approaches
2.3.2 Non-melanoma statistics
The classical ML algorithms are consisting of many steps,
In the USA alone, some people are diagnosed with more e.g., pre-processing, feature extraction and reduction, and
than one skin cancer type, so 3.3 million people are esti- classification [91]. The accuracy of classification is based
mated to be diagnosed with 5.4 million cases of basal and on the extracted features, so feature extraction is a key step.
squamous cell carcinoma [103]. For several years, non- There are two main types of the extracted features. They
melanoma cancer cases have been increasing. The causes are high-level (i.e., local) and low-level (i.e., global) fea-
of this increase are longer life spans, increased sun expo- tures [121].
sure, and earlier detection of the disease. When compared In Pugazhenthi et al. [98], a gray-level co-occurrence
to each other, basal cell carcinoma is more popular than matrix (GLCM) is employed to extract the texture features,
squamous cell carcinoma. In recent years, the rate of deaths e.g., contrast, entropy, energy, and inverse difference
from these skin cancers has decreased. Every month, more moment from the segmented images. Then, these features
than 5, 400 people worldwide die of non-melanoma skin were utilized to recognize the skin disease and classify it as
cancer [49]. melanoma, leprosy, or eczema using decision trees. An
accuracy of 87% was obtained.
Arivuselvam et al. [9] used a Fuzzy clustering algorithm
and the features were extracted from the input images by

123
820 Neural Computing and Applications (2023) 35:815–853

GLCM and Gabor filter in which features such as size, Their algorithm was evaluated on two public datasets, PH2
color, and texture were extracted. The SVM classifier was and international symposium on biomedical imaging
then utilized to calculate the feature values of 1, 500 (ISBI) 2017. For the ISBI 2017 dataset, the reported
dataset images and classify them. accuracy and dice coefficient were 95% and 92%, respec-
In Khan et al. [71], a Gaussian filter is utilized to take out tively, whereas the reported accuracy and dice coefficient
the noise from the images of the skin lesion followed by were 95% and 93% for the PH2 datasets.
segmenting out the lesion by using an enhanced K-mean Albahli et al. [5] used YOLOv4-DarkNet and active
clustering, and a unique hybrid super-feature vector is contour for localization and segmentation of melanoma.
formed. For the classification, an SVM is applied. Their Their algorithm was evaluated on ISIC 2016 and 2018. The
proposed approach was evaluated using the DERMIS reported dice score and Jaccard coefficient were 1 and
dataset, which has 397 skin cancer images, 251 were nevus 0.989.
and 146 were melanoma. An accuracy of 96% was obtained. Shan et al. [113] proposed FC-DPN segmentation
Astorino et al. [10] proposed a multiple instance learn- topology. It was constructed over a dual-path and fully
ing algorithm. It was applied on 160 clinical data images convolutional network. For the modified ISIC 2017 chal-
and divided into 80 melanomas and 80 nevi. Their lenge test dataset, their proposed method gained a Jaccard
methodology obtained an accuracy of 92:50%, a sensitivity index and an average dice coefficient of 80:02% and
of 97:50%, and specificity of 87:50%. 88:13%, respectively, while a Jaccard index and an average
Balaji et al. [22] used a dynamic graph cut algorithm to dice coefficient of 83:51% and 90:26% were obtained for
perform the skin lesion segmentation followed by a Naive the PH2 dataset.
Bayes classifier for skin disorder classification. Their pro- Junayed et al. [67] introduced a CNN-based model to
posed method was tested by ISIC 2017 dataset and they categorize skin cancer. Initially, a dataset was collected and
achieved an accuracy of 94:3% for benign cases, 91:2% for divided into four categories of skin cancer images. Then,
melanoma, and 92:9% for keratosis. augmentation techniques were applied to increase the
In Murugan et al. [90], the watershed segmentation dataset size. On the test phase, their proposed model
method was implemented to perform the segmentation received a 95:98% accuracy, exceeding the GoogleNet and
task. The resultant segments were subjected to feature the MobileNet model by 1:76% and 1:12% respectively.
extraction method in which the ABCD rule, GLCM, and Alheejawi et al. [6] suggested a DL-based technique to
the shape of the extracted features were utilized for clas- segment regions of melanoma. Results obtained using a
sification. They used four types of classifiers including K- small dataset of melanoma images showed that the sug-
nearest neighbor (KNN), random forest, and SVM. An gested approach could perform the segmentation with a
accuracy of 89:43%, sensitivity of 91:15%, and specificity dice coefficient around 85%. Their method is proper for
of 87:71% were achieved. clinical examination as it had a short execution time with a
İlkin et al. [65] used the SVM algorithm as a classifier fast turnaround time.
that utilizes a Gaussian radial basis function which had Vani et al. [128] suggested a DL-based system to predict
been enhanced by the bacterial colony algorithm. The the existence and the type of melanoma. For improving the
proposed model was trained and evaluated using two images used for classification, pre-processing methods
datasets, namely ISIC and PH2. AUC values of 98% and were utilized. CNN and self-organizing map (SOM) clas-
97% were obtained for ISIC and PH2, respectively. sifiers were utilized for the process of classification of
melanoma. Their proposed system reported accuracy of
3.2 Deep learning-based approaches 90% and specificity of 99%.
Li and Jimenez [78] proposed a novel testing method
In the late 1990s, a shift from fully human-designed sys- based on the extreme learning machine network and
tems to computer-trained systems was delivered. This had AlexNet. Additionally, a new improved version of the
done using sample data, from which the vectors of the Grasshopper optimization algorithm (GOA) was utilized to
handcrafted feature were extracted [50]. The next step was tune the hyperparameters of the proposed method. Their
to allow the computers to figure out how to extract the method was evaluated using the PH2 dataset with an
suitable features from the input to perform the required accuracy of 98% and sensitivity of 93% and had the highest
task. The concept of learning how to extract features efficiency when compared to some different SOTA
automatically from input data is the essence of numerous methods.
DL algorithms. In Hasan et al. [56], an automated skin lesion classifi-
In Adegun and Viriri [2], an improved encoder–decoder cation framework was proposed. Their proposed method
network with sub-networks linked through a skip connec- had merged the pre-processing and hybrid convolutional
tion series was utilized for feature extraction and learning. neural network. It had three distinct feature extractor

123
Neural Computing and Applications (2023) 35:815–853 821

modules that were fused to perform better-depth feature 4.1 Data scaling and augmentation
maps of the lesion. Lesion segmentation, augmentation,
and class rebalancing were used to conduct the pre-pro- 4.1.1 Data scaling
cessing phase. Three datasets named ISIC-2016, ISIC-
2017, and ISIC-2018 datasets were utilized to evaluate the To normalize the scale of features or independent variables
proposed algorithm. It had achieved an AUC of 96%, 95%, of data, scaling methods are employed. In data processing,
and 97%, for the three used datasets, respectively. scaling is generally conducted during the data pre-pro-
Maniraj and Maran [82] suggested a hybrid deep cessing step to fit the data within a specific range [4, 88].
learning approach that utilized subband fusion of 3D The four applied scaling techniques in the current research
wavelets. Their method consists of three stages (i.e., sim- are (1) standardization, (2) normalization, (3) min-max
ple median filtering, 3D wavelet transform, and multiclass scaling, and (4) max-absolute scaling.
classification). The performance results on the PH2 dataset
• Standardization: The standardization (i.e., z-score nor-
showed that it could effectively distinguish normal, benign,
malization) modifies the data, so the distribution has a
and malignant skin images with 99.33% average accuracy
mean and a standard deviation of 0 and 1 respectively.
and more than 90% sensitivity and specificity.
• Normalization: The dataset is re-scaled from its original
range so that all values are in a new range [0 : 1].
3.3 Related studies summary
• Min-max scaling: In min-max scaling, the data are
transformed so that the features are within a specified
Table 1 summarizes the discussed related studies. They are
range.
organized in descending order concerning the publication
• Max-absolute scaling: The max-absolute scaling is
year.
obtained by finding the absolute maximum value in the
dataset and dividing all the values in the column by that
3.4 Plan of solution
maximum value.
In medical imaging applications, skin cancer detection,
classification, and segmentation are important and difficult 4.1.2 Data augmentation
tasks. In the current study, various DL architectures are
proposed to solve the skin cancer classification and seg- Image data augmentation is a procedure that is employed to
mentation problem. The transfer learning (TL) and SpaSA boost the dataset size artificially by generating altered
are used to tune (i.e., optimize) the training parameters and versions of the images [86]. Data augmentation assists the
hyperparameters. Different experiments are performed and coping with the ‘‘not enough data’’ issue, prevents over-
various performance metrics are utilized for evaluation. fitting, and advances the ability of the models to generalize
The best architectures are documented, stored, and reported [16]. The transformation matrix can be also used to get the
to be used in further times. coordinates of a point after applying the data augmentation
method on an image. Image augmentation methods adopted
in the current study experiments are (1) flipping, (2) rota-
4 Preliminaries tion, (3) shifting, (4) shearing, (5) zooming, (6) cropping,
(7) color change, and (8) brightness change.
The current section discusses, for the reader, the back-
• Flipping: Images can be flipped horizontally and
ground and elementary parts behind the proposed
vertically. In some frameworks, functions for vertical
approach. The methodology section depends on them. It is
flips are not provided. Instead, a vertical flip can be
organized into the following points:
employed by rotating an image by 180 degrees and then
• Data scaling and augmentation. executing a horizontal flip.
• Segmentation. • Rotation: Rotation is accomplished by rotating the
• Deep learning (DL) classification. image on an axis between 1 and 359 , rotating the
• Transfer learning (TL). image around the center or any other point, counter-
• Parameters optimization. clockwise or clockwise. As the degree of rotation
• Meta-heuristic optimization. increases, data labels may be no longer preserved.
• Performance metrics • Shifting: Shifting the entire pixels of an image from one
position to another position is known as shift augmen-
tation. Two types of shifting (i.e., horizontal-axis and
vertical-axis shift augmentation) exist.

123
822 Neural Computing and Applications (2023) 35:815–853

Table 1 Related studies summary


References Year Approach ML DL Classification Segmentation Dataset Best performance

[78] 2022 AlexNet ? U U PH2 dataset An accuracy of 98% and


Extreme sensitivity of 93%
Learning
Machine
network
[56] 2022 CNN U U U ISIC 2016 [54], ISIC 2017 AUC of 96%, 95%, and
[37], and ISIC 2018 [36] 97% respectively.
datasets
[82] 2022 Hybrid deep U U PH2 dataset 99.33% average accuracy
learning and more than 90%
sensitivity and specificity.
[9] 2021 SVM ? Fuzzy U U Their own dataset Accuracy of 92:04%,
clustering sensitivity of 80:11%,
specificity of 95:01%, and
precision of 80:17%
[65] 2021 Bacterial U U U ISIC 2016 [54] and PH2 Precision of 0.969, recall of
colony dataset 0.979, F-measure of
optimization 0.974, accuracy of 0.975,
algorithm and AUC of 0.98
based SVM
[67] 2021 CNN U U Their own dataset 95:98% accuracy
[6] 2021 Improved NS- U U U Their own collected dataset Dice coefficient of around
Net deep from Cross Cancer Institute, 85%
learning University of Alberta,
network Edmonton, Canada
[128] 2021 SOM ? CNN U U Their own dataset collected accuracy of 90% and
from ISIC archive specificity of 99%
[5] 2020 CNN U U ISIC 2016 [54] and ISIC Average accuracy of 95%
(YOLOv4) 2018 [36, 127] and Jaccard coefficient as
0.989
[113] 2020 Fully U U ISIC 2017 [37] Dice coefficient of 90:26%
convolutional and a Jaccard index of
network and 83:51%
dual path
network
[10] 2020 Multiple U U Their own dataset Accuracy of 92:50%,
instance sensitivity of 97:50%, and
learning specificity of 87:50%
[22] 2020 Dynamic graph U U U ISIC 2017 [37] Sensitivity, Specificity, and
cut and Naive Diagnostic accuracy of
Bayes 91:7%, 70:1%, and 72:7%
respectively
[98] 2019 Decision tree U U Their own dataset Accuracy of 87%
[90] 2019 SVM, random U U U ISIC 2016 [54] Accuracy of 89:43%,
forest and sensitivity of 91:15%, and
KNN specificity of 87:71%
[71] 2019 K-means U U U DERMIS dataset [77] 96% accuracy
clustering
[2] 2019 Deep U U PH2 and ISBI 2017 [37] Accuracy and dice
convolutional coefficient of 95% and
encoder- 93%
decoder
network

123
Neural Computing and Applications (2023) 35:815–853 823

• Shearing: Shearing is used for shifting one part of the between regions, by seeking border pixels and joining them
image like a parallelogram and transforming the to produce contours of the image. Nevertheless, for
orientation of the image. applying these methods manually and automatically, pro-
• Zooming: Zooming is applied to create images with cedures are established. For manual methods, the mouse is
varying levels of zooming. This augmentation zooms used to lay lines that describe the edges of an image among
the image and adds new pixels for the image randomly. regions, while for the automatic ones, some edge-detection
The image can be zoomed out or zoomed in. filters are executed to divide the pixels into non-edge or
• Cropping: Random cropping is the method of cropping edge based on the result of the filter output. Edge-detection
a part of the image, randomly. Similarly, center filters include the Watershed segmentation algorithm,
cropping is also employed to crop the image and Laplacian of Gaussian filter, and Canny Edge Detector
applied when the image center holds more information [130].
than the corner.
• Color changing: Data of digital images are regularly 4.2.3 Region-based segmentation
encoded as a tensor that has dimensions of
ðheight  width  color channelsÞ. Color augmentation In region-based segmentation methods, an image is seg-
changes the pixel values instead of the position. mented into groups of regions (i.e., similar pixels) relying
• Brightness changing: Changing the brightness of the on some features [69]. The core principle relies on the
image is one way of performing data augmentation. concept that inside the same area neighboring pixels have
Compared to the original image, the resultant one the same value. It can be achieved by comparing all pixels
becomes lighter or darker. with their neighbors in a specific region and based on the
similarity condition, the pixel is added to a particular
region [130]. In the segmentation process, instead of the
4.2 Segmentation
original input image, a featured image is used. The featured
image is described with small neighborhoods from regions
Skin cancer segmentation algorithms are broadly catego-
[119]. To use a region-based segmentation method, suit-
rized as thresholding, region-based, or edge-based methods
able threshold approaches have to be employed [68], as the
[117]. Thresholding uses a combination of clustering,
noise has a significant influence on the output [130]. Some
adaptive thresholding, and global thresholding. Good
region-based methods are region splitting, region growing,
results can be achieved by thresholding methods when the
and region merging.
contrast between the skin and the lesion is good. Hence,
when the corresponding histogram is bimodal, but when
4.2.4 Deep learning (DL) segmentation
the two regions overlap, it fails [100]. Edge-based methods
function badly when the edges are not defined well, for
DL-based image segmentation techniques can be assorted
example, when a smooth transmission between the skin and
into: semantic, instance, panoptic, and depth segmentation
the lesion takes place. In these situations, the outline may
sorts according to the segmentation goal. However, due to
leak within the edges as they have gaps. Region-based
the huge variety in those tasks in terms of volume of work,
approaches face problems when the lesion area is struc-
the architectural categorization is used instead. The archi-
tured or differently colored, leading to over-segmentation
tectural grouping of these models includes CNNs [33],
[68].
recurrent neural networks and long short term memory
networks [60], encoder-decoders [11], and generative
4.2.1 Threshold-based segmentation
adversarial networks [52].
Threshold-, pixel-, or point-based segmentation [64] is the The U-Net model: U-Net [104] is an architecture used for
simplest method to drive the segmentation of images, semantic segmentation and is characterized by the sym-
relying on grayscale values, to segment pixels in an image. metric U-shape. U-Net composes of an encoder and a
Various algorithms have been suggested for skin segmen- decoder. The contracting path (i.e., encoder) is used to
tation and classification, including histogram-based capture and collect context and the symmetric expanding
thresholding and piecewise linear classifiers. path (i.e., decoder) is employed to enable accurate local-
ization. The encoder obeys the typical architecture of a
4.2.2 Edge-based segmentation convolutional network. It is applied to transform the input
volume into lower-dimensional space. The encoder has a
Boundary- or edge-based segmentation algorithms [68] modular structure composed of repeating convolution
usually mention dividing an image utilizing the boundaries blocks. In the expansive path, every step is consisting of an

123
824 Neural Computing and Applications (2023) 35:815–853

up-sampling of the feature map. The decoder also has a are mentioned as categories, targets, or labels. Several DL
modular structure, but its goal is to increase the spatial algorithms can be used to perform the task of classification
dimensions by reducing the encoder feature map. such as CNNs [75], recurrent neural networks [111], long
short-term memory networks [60], generative adversarial
The U-Net?? model: U-Net?? [140] architecture which
networks [52], radial basis function networks, [27], deep
can be considered as an extension of U-Net is essentially an
belief networks [59], and autoencoders [107]. In the current
encoder-decoder network that is deeply supervised where
study, only CNN models are used to perform the
the sub-networks of the decoder and encoder are linked
classification.
through a series of dense and nested skip paths. These re-
designed paths seek to decrease the semantic gap between
4.3.1 Convolution neural network (CNN)
the sub-networks of the encoder and decoder. The
U-Net?? architecture maintains the benefits of catching
Neural networks are at the core of DL algorithms and are
fine-grained details, producing better results of segmenta-
considered as an ML subset [51]. They are composed of
tion than U-Net.
layers of nodes, including an input layer, one or more
The Attention U-Net model: Similar to the U-Net model, hidden layers, and an output layer. Each node inside each
the Attention U-Net [93] includes expansion path at the layer has an associated weight and threshold and is con-
right and contraction path at the left. At each level, it has a nected to other nodes. If the node output is higher than the
skip connection which is an attention gate. The attention value of the threshold value, the node is triggered and starts
gates are merged into the typical U-Net architecture to to send data to the subsequent layer. Else, no data will be
accentuate salient features that are pushed through the skip sent [32].
connections. For each skip connection, the gating signal Neural networks are categorized into different types that
aggregates information from several imaging scales which are used to perform different tasks. For an instance,
helps reach better performance and improves the resolution recurrent neural networks [111] are generally used for
of the attention weights. speech recognition and natural language processing while
CNNs [73, 137] are frequently utilized for the tasks of
The V-Net model: The architecture of the V-Net [87] is
computer vision and classification. Before the use of
very close to the widely used U-Net model, despite some
CNNs, to recognize objects in images, manual feature
differences. In the V-Net architecture, the left part can be
extraction methods. Now, CNNs offer a scalable approach
separated into different phases running at various resolu-
to recognize and classify images. For the training process,
tions. In each stage, there are one to three convolution
they need graphical processing units (GPUs) so they are
layers. To facilitate learning the residual function, the
computationally demanding [76].
nonlinearities are used to process the input. Then, the
CNNs differ from other types by their higher perfor-
processed input is employed in the convolution layers and
mance with image, audio signal, or speech input types [12].
appended to the output of the convolution layer of that
They consist of three primary layers types, namely con-
phase. When compared to non-residual learning architec-
volutional, pooling, and fully connected (FC) layers. The
ture such as U-Net, convergence is guaranteed by the
convolution layer is the earliest layer of a typical CNN.
V-Net network.
Following it, additional convolution and pooling layers
The Swin U-Net model: Swin-U-Net [30] is a U-shaped exist, and the final layer is an FC one [14]. After each layer,
transformer-based architecture with skip connections for the complexity of the CNN increases and identifies larger
local and global feature learning. For an encoder, to extract parts of the image. As the image proceeds through the CNN
context features, a hierarchical Swin transformer with layers, more extensive elements of the object are begun to
shifted windows is employed. For the decoder, an asym- be recognized until the expected object is finally recog-
metric Swin Transformer with a patch expanding layer nized [53].
performs the up-sampling operation to restore the feature
Convolution layer: The central building block of a CNN is
map’s spatial resolution.
the convolutional layer as the computation majority hap-
pens inside it. However, it requires some elements, e.g.,
4.3 Deep learning (DL) classification
input data, a feature map, and a filter. A tensor, also called
a kernel or filter, seeks for the presence of the features,
The procedure of sorting a given data set into classes is
which is known as a convolution [39]. The kernel is a 2D
known as classification. Classification can be done on
weights array that symbolizes an image part. During
structured and unstructured data. Its main goal is to map
training, back-propagation and gradient descent are used to
input variables to discrete output variables to identify
adjust parameters like weight values. Yet, the three
which class the new data will fall into. Usually, the classes

123
Neural Computing and Applications (2023) 35:815–853 825

hyperparameters that affect the size of the output and have lost, but several benefits are gained. They assist to
to be adjusted before the training begins include [79]: lower the CNN complexity, enhance efficiency, and
restrict the risk of over-fitting [124].
• The number of filters: The output depth is affected by it.
For illustration, three different feature maps associated Fully connected (FC) layer: Simply, an FC Layer is a feed-
with three distinct depths are produced by three forward neural network. The few last layers in the network
different filters. are FC layers. The output of the last convolution or pooling
• Stride: The pixels number that the kernel proceeds layer (i.e., feature maps) is regularly flattened and then fed
through the input matrix is defined as stride. into the FC layer. A learnable weight is utilized to connect
• Padding: Discussed in the following paragraph. every input to every output [19]. The final layer usually has
several output nodes same as the number of classes. Each
There are different types of padding [7]:
FC layer is followed by a nonlinear function such as ReLU
• Zero padding: It is typically used when the filters and [15].
the input image do not meet. All elements that lay
outside the input matrix are set to zero, so a greater or 4.4 Transfer learning (TL)
equal output is produced.
• Valid padding: Also, it is named as no padding. In this The reuse of a formerly trained model for a novel problem
type, if the dimensions do not match, the last convo- is called transfer learning (TL). In DL, it becomes popular
lution will be discarded. to use TL as deep neural networks that can be trained with
• Same padding: This type guarantees that the output and relatively small data. It becomes very helpful in data sci-
input layers are of the same size. ence, as the majority of the real-world problems do not
• Full padding: In this type, the size of the output is have a considerable amount of classified data to train
increased by appending zeros to the input border. complex models [18, 126]. In TL, what has been learned in
Activation function: The linear convolution outputs are one task is exploited to enhance the second task general-
passed into a nonlinear activation function. Before, the ization. Generalization can be done by loosening the
used nonlinear functions are the smooth ones such as tan- assumption that the test and training data must be identi-
gent hyperbolic (i.e., Tanh) or Sigmoid functions [99]. cally distributed and independent [13]. The extensive idea
Lately, the most widely used function is the rectified linear is to employ the knowledge that has been gained by the
unit (ReLU) [21]. ReLU is a piecewise linear function that model trained with plenty of labeled data in a novel task
returns 0 if a negative input is received, else, it returns the with small amount of data [94]. There are five types of
input value. Thus the output range from 0 to infinity. For transfer learning which includes (1) domain adaptation, (2)
neural network types, it has evolved as the standard acti- domain confusion, (3) multitask learning, (4) one-shot
vation function. As, its architecture is simpler, easier to learning, and (5) zero-shot learning [109].
train, and often outperforms others [28]. The TL process is divided into four contexts relying on
‘‘what to transfer’’ in learning. They involve approaches of
Pooling layer: The pooling layers (i.e., down-sampling (1) the instance-transfer, (2) the feature-representation-
layers) are used to perform dimensionality reduction and transfer, (3) the parameter-transfer, and (4) the relational-
minimize the number of input parameters. Similar to the knowledge-transfer [1]. TL uses a previously trained stored
convolution one, a filter is passed over the entire input by model as a starting point for DL. This enables fast progress
the pooling operations. The distinction is that the filter and improved performance [80]. Many pre-trained CNN
contains no weights. Rather, a summation function is models are available to be used such as VGG16 [118],
applied by the kernel to the values in the receptive field, ResNet [58], MobileNet [62], Xception [34], NASNet
populating the output array [31]. The major pooling types [141], and DenseNet [63].
are: NASNet, VGG, and MobileNet architectures are utilized
• Max pooling: The input is passed to a filter that for image classification. The used architectures in the
specifies the pixel holds the maximum value to be sent current study are NASNetLarge and NASNetMobile [141],
to the output array, in which (1) patches are extracted MobileNet [62], MobileNetV2 [108], MobileNetV3Small
from the feature maps of the input, (2) in each patch, the and MobileNetV3Large [61], and VGG16 and VGG19
maximum value is generated, and (3) the other values [118]. In all experiments of classification, the size of the
are discarded [110]. input image is set to ð100  100  3Þ.
• Average pooling: The input is passed to a filter that
calculates the average value to be sent to the output
array. In the pooling layer, a bunch of information is

123
826 Neural Computing and Applications (2023) 35:815–853

4.4.1 NASNet a discrete set of choices. Then, the architecture is fine-


tuned using NetAdapt which cuts the underutilized acti-
Google ML group has materialized the idea of an opti- vation channels in small increments. MobileNetV3 is rep-
mized network through the concept of NAS that is based on resented as two models: MobileNetV3Large and
reinforcement learning [141]. The architecture is consisted MobileNetV3Small are targeted at high and low resource
of a controller RNN and CNN, which are to be trained. The use cases, respectively. When compared to MobileNetV2,
NASNet is trained with two sizes of input images of 331  MobileNetV3Large is 3:2% more accurate on ImageNet
331 and 224  224 to obtain NASNetLarge and NASNet- classification and latency is decreased by 20%. Similarly,
Mobile architectures respectively. When moving from MobileNetV3Small is more accurate by 6:6% with com-
NASNetMobile to NASNetLarge, there is great growth in parable latency.
several parameters. NASNetMobile and NASNetLarge
have 5, 326, 716 and 88, 949, 818 parameters, respec- 4.4.5 VGG model
tively, which makes NASNetLarge less reliable.
The VGG is formed up of convolution and pooling layers
4.4.2 MobileNet stacked together [118]. In VGG16, the network depth is 16
layers without the issue of vanishing gradients. It includes
MobileNet is created to efficiently increase accuracy 13 convolution layers, 5 max-pooling layers, and 3 dense
whereas being aware of the limited resources for an on- layers with two 4096 sized layers. A nonlinear ReLU
device. To meet the resource constraints of the computing activation function was used by all the hidden layers while
devices, MobileNet has low-latency and low-power models the final layer uses a SoftMax function. On the contrary,
[61]. MobileNet uses separable filters, which is a mixture the VGG19 network depth is 19 layers. There are 16
of a point- and a depth-wise convolution. It operates filters convolution layers, 5 MaxPool layers, and 3 dense layers
with a size of 1  1 for minimizing the normal convolution with two 4096 sized layers. Similar to VGG16, all the
operation computational overheads. Hence, the network is hidden layers of VGG19 utilize the ReLU activation
lighter in terms of size and computational complexity. The function and the final layer uses the SoftMax function.
MobileNet has 4.2 million parameters with input image of Table 2 summarizes the classification models used in
size 224  224  3. the current study.

4.4.3 MobileNetV2 4.5 Parameters optimization

MobileNetV2 architecture is close to the original Mobile- Parameters optimization is the process of selecting the
Net, except that inverted residual blocks with bottlenecking values of parameters that are optimal for some desired
features were utilized and nonlinearities in narrow layers purpose (e.g., minimizing an error function). The parame-
were removed [108]. It has fewer parameters than the ters are the weights and biases of the network. The cost
original MobileNet. MobileNets support all input sizes (i.e., error) function is used to perform the model predic-
larger than 32  32 with greater image sizes giving a better tions and target values comparison [129]. Some of the
performance. In MobileNetV2, two sorts of blocks exist. weights optimizers are gradient descent algorithm [106],
The first is a 1 stride-sized residual block. The other is used Adam (adaptive moment optimization algorithm) [72],
for downsizing and it is a 2 stride-sized block. For both Nadam [123], Adagrad [89, 131], AdaDelta [136], AdaMax
types of blocks, there are 3 layers. The initial layer is 11 [40], and Ftrl [85].
convolution with ReLU6, the next is a depth-wise convo-
lution, and the last one is a further 1  1 convolution but Table 2 Classification models summary
without any activation function.
Model name Reference Parameters # Size (MB)
4.4.4 MobileNetV3 NASNetLarge [141] 88,949,818 343
NASNetMobile [141] 5,326,716 23
The major contribution of MobileNetV3 is the utilizing of MobileNet [62] 4,253,864 16
AutoML to obtain the best possible neural network archi- MobileNetV2 [108] 3,538,984 14
tecture for a given problem. Precisely, MobileNetV3 MobileNetV3Small [61] 2.54 M N/A
combines pair of AutoML techniques: NetAdapt and MobileNetV3Large [61] 5.48 M N/A
MnasNet [61]. MobileNetV3 initially uses MnasNet to VGG16 [118] 138,357,544 528
search for a coarse architecture. MnasNet utilized rein- VGG19 [118] 143,667,240 549
forcement learning to select the optimal configuration from

123
Neural Computing and Applications (2023) 35:815–853 827

To minimize the error function, the gradient descent proportion of the followers and the discoverers inside the
algorithm [106] updates the parameters. Small steps in the population is fixed. The individuals’ energy and the sparrows’
negative direction of the loss function are taken by this anti-predation behavior determine their foraging strategies.
algorithm. Adam [72] merges the heuristics of the The mathematical representation of the SpaSA algorithm will
RMSProp [42] and momentum [134] optimization algo- be discussed in Sect. 5.4.1.
rithms. The momentum optimizer speeds up the search in Why the sparrow search algorithm (SpaSA) has been
the minima direction, while the RMSProp optimizer pre- selected to be used? SpaSA is a relatively new swarm
vents the search in the direction of the oscillation. intelligence heuristic algorithm. As reported in [135],
results showed that the proposed SpaSA is superior to grey
4.6 Meta-heuristic optimization wolf optimizer (GWO), gravitational search algorithm
(GSA), and particle swarm optimization (PSO) in terms of
Usually, numerous real-world optimization problems accuracy, convergence speed, stability, and robustness.
involve a big number of decision variables, complex non- Additionally, the SpaSA has high performance in diverse
linear constraints, and objective functions; hence, they are search spaces. Using the SpaSA, the local optimum issue is
increasingly becoming challenging. When the objective avoided effectively as it has a good ability to explore the
constraints have multi-peaks, the traditional optimization potential region of the global optimum.
approaches such as numerical methods become less pow-
erful. Meta-heuristic optimization methods become pow- 4.7 Performance metrics
erful tools for managing optimization issues. Their
popularity drives by the following aspects [139]: Evaluating the quality of the produced output is accom-
plished by comparing images is an essential part of mea-
• Simplicity: These meta-heuristic methods are mathe-
suring progress [138]. Performance metrics are distinct from
matical models derived from nature and are generally
loss functions. Loss functions give a measure of model
simple, easy to perform, and develop variants according
performance. Metrics are employed to estimate the perfor-
to existing approaches.
mance of a model. Nevertheless, the loss function can also be
• Black box: For a given problem, a set of inputs can offer
utilized as a performance metric. The assessment metric
a set of outputs.
should provide details related to the task, whether it is
• Randomness: This allows the meta-heuristic algorithm
interventional or diagnostic. For illustration, some tasks
to prevent trapping into local optima and inspect the
require real-time operations, while tasks for diagnostic pro-
entire search space.
cedures can be conducted offline. For choosing the optimal
• Highly flexible: Their practicality can be implied to
approach, the importance of different performance metrics
diverse types of optimization problems, e.g., complex
may differ. Performance metrics can be categorized as (1)
numerical problems with plentiful local minima, non-
spatial overlap-based metrics, (2) probabilistic-based met-
linear problems, or non-differentiable problems.
rics, (3) pair-counting-based metrics, (4) volume- or area-
based metrics, (5) information theoretic-based metrics, and
4.6.1 Sparrow search algorithm (SpaSA) (6) spatial distance-based metrics [120]. In the current study,
only spatial overlap- and probabilistic-based metrics were
The sparrow search algorithm (SpaSA) [135] is inspired by used, so they will be discussed in this section.
the strategies of foraging and the behaviors of anti-preda-
tion of sparrows. Compared with traditional heuristic 4.7.1 Spatial overlap-based metrics
search methods, it has strong optimization ability, fast
convergence speed, and more extensive application pro- The overlap-based performance metrics are the ones that
cedures. Hence, the SpaSA is captivating the attention of can be acquired from the cardinalities of the confusion
researchers in various fields. matrix. A confusion matrix is a matrix created to assess the
It was originally suggested by Xue and Shen [135]. The model performance. It matches the actual values with the
sparrow population is divided into (1) the discoverer and (2) ones predicted by the model. It consists of: (1) true positive
the follower sparrows according to their role in the food search (TP), (2) true negative (TN), (3) false positive (FP), and (4)
procedure. Each of them does their behavioral strategies false negative (FN).
separately. In most cases, the discoverers are 0.2 of the pop- The accuracy is the ratio of the correct predictions for
ulation size. They are the guiders, leading other individuals in the test data to the total predictions. The true negative rate
the food search. To obtain more food, the roles are switched (TNR), also termed specificity, estimates the ability of the
flexibly between the discoverers and the followers and com- model to predict the true negatives of every class. Like-
pete for the food resources of their companions. However, the wise, the true positive rate (TPR), also termed sensitivity or

123
828 Neural Computing and Applications (2023) 35:815–853

recall, is defined as the ratio of samples that were predicted in which images have different sizes. It is partitioned into 2
to belong to a class to all of the samples that truly belong to classes: ‘‘Melanoma’’ and ‘‘NotMelanoma.’’ It can be
this class. Hence, it estimates the model’s ability to predict downloaded and used from https://www.kaggle.com/adac
the true positives of every class. slicml/melanoma-classification-ham10k.
The false negative rate (FNR) and the false positive rate The third one is named ‘‘Skin diseases image dataset’’.
(FPR) (i.e., fallout) are another two metrics associated with It is composed of 27, 153 images in which images have
the two previously mentioned metrics. Additionally, pre- different sizes. It is partitioned into 10 classes: ‘‘Atopic
cision also termed positive predictive value (PPV) is the Dermatitis,’’ ‘‘Basal Cell Carcinoma,’’ ‘‘Benign Keratosis-
ratio of true positives among the retrieved instances. like Lesions,’’ ‘‘Eczema,’’ ‘‘Melanocytic Nevi,’’ ‘‘Mela-
The dice coefficient, also termed as the overlap index noma,’’ ‘‘Psoriasis pictures Lichen Planus and related dis-
and F1-score, is one of the most applied metrics to evaluate eases,’’ ‘‘Seborrheic Keratoses and other Benign Tumors,’’
the medical images [46]. Besides the direct comparison ‘‘Tinea Ringworm Candidiasis and other Fungal Infec-
between the true and predicted value, it is often used to tions’’ and ‘‘Warts Molluscum and other Viral Infections.’’
estimate repeatability. The Jaccard index (JAC), also ter- It can be downloaded and used from https://www.kaggle.
med as the intersection over union (IoU), is the intersection com/ismailpromus/skin-diseases-image-dataset.
among two sets divided by their union [66]. The fourth one is named ‘‘Skin cancer segmentation
and classification’’. It is composed of 10, 015 images in
4.7.2 Probabilistic-based metrics which images are of different sizes. It can be downloaded
and used from https://www.kaggle.com/surajghuwalewala/
Probabilistic-based metrics are defined as a statistical func- ham1000-segmentation-and-classification. The fifth one is
tion’s measure estimated from the voxels in the overlap named ‘‘PH2’’. It is composed of 200 dermoscopic images
region. The receiver operating characteristic (ROC) curve is of melanocytic lesions. It can be downloaded and used
a relationship graph between FPR and TPR. The area under from https://www.fc.up.pt/addi/ph2%20database.html.
the ROC curve (AUC) was suggested by Hanley and McNeil Table 3 summarizes the used datasets, and Fig. 2 shows
[55] as an evaluation of the accuracy of diagnostic radiology samples from them.
In the situation of comparing the predicted and true value, the
AUC defined according to [97] is considered, that is, the
trapezoidal area determined by the lines TPR ¼ 0 and 5.2 Dataset pre-processing
FPR ¼ 1 and the measurement point.
5.2.1 dataset scaling

5 Methodology and suggested approach Data scaling is discussed in Sect. 4.1.1 and the correspond-
ing equations that are used in the current study are Eq. 1 for
In summary, the images is accepted by the input layer. In the standardization, Eq. 2 for normalization, Eq. 3 for the min-
next phase, they are pre-processed by employing dataset max scaler, and Eq. 4 for the max-absolute scaler where l is
augmentation, scaling, and balancing. The images can be the image mean and r is the image standard deviation.
classified and segmented after that using the suggested pre- input  l
trained models. Finally, the transfer learning and meta- output ¼ ð1Þ
r
heuristic optimization phase occurs. After completion, the
input
figures, statistics, and post-trained models are prefaced. In output ¼ ð2Þ
max ðinputÞ
the following subsections, these phases are debated.
input  min ðinputÞ
output ¼ ð3Þ
5.1 Dataset acquisition max ðinputÞ  min ðinputÞ
input
In the current study, five publicly available datasets are output ¼ ð4Þ
j max ðinputÞj
utilized and downloaded from Kaggle. The first dataset is
named ‘‘ISIC 2019 and 2020 Melanoma dataset’’
[37, 38, 105, 127]. It is composed of 11,449 images. It is 5.2.2 Dataset augmentation and balancing
partitioned into 2 classes: ‘‘MEL’’ and ‘‘NEVUS.’’ It can
be downloaded and used from https://www.kaggle.com/ Before the training process, data balancing is applied to
qikangdeng/isic-2019-and-2020-melanoma-dataset. balance the categories since the number of images per
The second one is named ‘‘Melanoma Classification category is not even. Data balancing is performed using the
(HAM10000)’’ [36, 127]. It is composed of 10,015 images

123
Neural Computing and Applications (2023) 35:815–853 829

Table 3 The used datasets summary


Dataset Classes Classes Images # Size of Extensions Source (link)
# image

ISIC 2019 and 2 ‘‘MEL’’ and ‘‘NEVUS’’ 25,331 images for Different ‘‘.jpg’’ https://www.kaggle.
2020 ISIC 2019 and sizes com/qikangdeng/isic-
Melanoma 11, 449 images 2019-and-2020-
dataset for ISIC 2020 melanoma-dataset
Melanoma 2 ‘‘Melanoma’’ and ‘‘NotMelanoma’’ 10, 015 Different ‘‘.jpg’’ https://www.kaggle.
Classification sizes com/adacslicml/
(HAM10K) melanoma-
classification-ham10k
Skin diseases 10 ‘‘Atopic Dermatitis,’’ ‘‘Basal Cell 27, 153 Different ‘‘.jpg’’ https://www.kaggle.
image dataset Carcinoma,’’ ‘‘Benign Keratosis-like sizes com/ismailpromus/
Lesions,’’ ‘‘Eczema,’’ ‘‘Melanocytic skin-diseases-image-
Nevi,’’ ‘‘Melanoma,’’ ‘‘Psoriasis dataset
pictures Lichen Planus and related
diseases,’’ ‘‘Seborrheic Keratoses and
other Benign Tumors,’’ ‘‘Tinea
Ringworm Candidiasis and other
Fungal Infections’’ and ‘‘Warts
Molluscum and other Viral Infections’’
Skin cancer N/A N/A 10, 015 Different ‘‘.jpg’’ https://www.kaggle.
segmentation sizes com/
and surajghuwalewala/
classification ham1000-
segmentation-and-
classification
PH2 N/A N/A 200 Similar ‘‘.bmp’’ https://www.fc.up.pt/
addi/ph2%20database.
html

applying horizontal and vertical flipping, and (5) changing


the brightness in the range of [0.8 : 1.2]. Additionally, in
the learning and optimization phase, data augmentation is
used to augment the images to avoid any over-fitting and
increase the diversity [17]. The used transformation metrics
are Eq. 5 for horizontal flipping (i.e., x-axis), Eq. 6 for
rotation, Eq. 7 for shifting, Eq. 8 for shearing, and Eq. 9
for zooming where h is the rotation angle, tx determines the
shifting along x-axis, while ty determines the shifting along
y-axis, shx determines the shear factor along x-axis while
shy determines the shear factor along y-axis, and Cx
determines the zoom factor along x-axis and Cy determines
the zoom factor along y-axis.
2 3
1 0 0
6 7
Flipping Matrix ¼4 0 cos h  sin h 5 ð5Þ
0 sin h cos h
2 3
cos h sin h 0
6 7
Fig. 2 Samples from the used datasets Rotation Matrix ¼4  sin h cos h 0 5 ð6Þ
0 0 1
methods of data augmentation discussed in Sect. 4.1.2. The
used ranges in this process are (1) 25 for rotation, (2) 15%
for shifting the width and height, (3) 15% for shearing, (4)

123
830 Neural Computing and Applications (2023) 35:815–853

2 3
1 0 0 configuration, the architecture provided in [140] is utilized,
6 7 Whereas batch normalization and GeLU as a hidden acti-
Shifting Matrix ¼4 0 1 05 ð7Þ
vation function is applied in the second one. For the third
tx ty 1
and fourth configurations, in addition to batch normaliza-
2 3
1 shy 0 tion and GeLU hidden activation function, VGG19 and
6 7 DenseNet201 are utilized as a backbone. For the four
Shearing Matrix ¼4 shx 1 05 ð8Þ
configuration, deep supervision is deactivated. A summa-
0 0 1
2 3 rization of the four different U-Net?? configurations is
Cx 0 0 presented in Table 5.
6 7
Zooming Matrix ¼4 0 Cy 05 ð9Þ
0 0 1 5.3.3 The Attention U-Net model

Similar to the previous two architectures, four configura-


5.3 Segmentation phase tions of Attention U-Net are used to perform the segmen-
tation task. In the first configuration, the architecture
The segmentation phase is qualified for segmenting the provided in [93] is utilized, whereas the remaining con-
tumor portion from the medical skin images. In the current figurations are the same as the previous two networks. For
phase, the U-Net with different five flavors designed for the four configuration, ReLU was used as an attention
image segmentation was used. The used U-Net models in activation function and Add is used as an attention type. A
the present study are U-Net [104], U-Net?? [140], summarization of the four different Attention U-Net con-
Attention U-Net [93], V-Net [87], and Swin U-Net [30]. figurations is presented in Table 6.
The U-Net consists of the left contraction path and the right
expansion path. In the current study, there are four different 5.3.4 The V-Net model
configurations applied to three U-Net models (i.e., U-Net,
U-Net??, and Attention U-Net). They are (1) the default In this study, only one configuration is used for the v-net
left contraction path is used in two settings and it is model. GeLU is used as the hidden activation function,
replaced with the VGG19 and DenseNet121 architectures batch normalization is applied, and the pooling and un-
for the other two settings, (2) the pre-trained weights for pooling are deactivated. The configuration of the V-Net
the VGG19, DenseNet121 are set with ImageNet, (3) the architectures is presented in Table 7.
ImageNet weights are frozen from being updated, (4) the
depth of the architecture is set to five with the number of 5.3.5 The Swin U-Net model
filters of [64, 128, 256, 512, 1024] in each level (i.e.,
block), (5) the input image size is set to ð128  128  3Þ, Similar to V-Net, only one configuration is used for the
and (6) the output mask size is ð128  128  1Þ. swin u-net model. The configuration of the Swin U-Net
architecture is presented in Table 8.
5.3.1 The U-Net model
5.4 Learning and optimization
In the current study, four configurations of U-Net are
employed to perform the segmentation task. In the first To achieve the SOTA performance, different DL training
configuration, the architecture provided in [104] is utilized. hyperparameters (as shown in Table 9) required to be
Whereas batch normalization and GeLU as a hidden acti- optimized. Try-and-error, grid search, and meta-heuristic
vation function is applied in the second one. For the third and optimization algorithms are techniques used to optimize
fourth configurations, in addition to batch normalization and the hyperparameters. Try-and-error is a weak technique as
GeLU hidden activation function, VGG19 and DenseNet201 it does not cover the ranges of the hyperparameters.
are utilized as a network backbone (i.e., replace the encoder However, the grid search covers it, but, to complete the
with the SOTA architectures). A summarization of the four searching process, a long time (e.g., months) is required.
different U-Net configurations is presented in Table 4. In the current study, we are optimizing (1) loss function,
(2) dropout, (3) batch size, (4) the parameters (i.e., weights)
5.3.2 The U-Net11 model optimizer, (5) the pre-training TL model learn ratio, (6) the
dataset scaling technique, (7) do we need to apply aug-
Similarly to U-Net, four configurations of U-Net?? are mentation or not, (8) width shift range, (9) rotation range,
employed to perform the segmentation task. In the first (10) shear range, (11) height shift range, (12) horizontal
flipping, (13) zoom range, (14) vertical flipping, and (15)

123
Neural Computing and Applications (2023) 35:815–853 831

Table 4 The U-net summary


Keyword Hidden Backbone Freeze Stack Stack Batch Freeze batch Pooling Unpooling
activation backbone down # Up # normalization normalization
function

U-Net-Default ReLU None N/A 2 2 False True True True


U-Net-None GeLU None N/A 2 2 True True False False
U-Net-VGG19 GeLU VGG19 True 2 2 True True False False
U-Net- GeLU DenseNet201 True 2 2 True True False False
DenseNet201

Table 5 The U-Net?? summary


Keyword Hidden Backbone Freeze Stack Stack Batch Freeze batch Pooling Unpooling Deep
activation backbone Down Up # normalization normalization supervision
function #

U-Net??- ReLU None N/A 2 2 False True True True False


Default
U-Net??- GeLU None N/A 2 2 True True False False False
None
U-Net??- GeLU VGG19 True 2 2 True True False False False
VGG19
U-Net??- GeLU DenseNet201 True 2 2 True True False False False
DenseNet201

Table 6 The Attention U-Net summary


Keyword Hidden Backbone Freeze Stack Stack Batch Freeze batch Attention Attention
activation backbone Down # Up # normalization normalization activation type
function

Attention U-Net- ReLU None N/A 2 2 False True ReLU Add


default
Attention U-Net- GeLU None N/A 2 2 True True ReLU Add
none
Attention U-Net- GeLU VGG19 True 2 2 True True ReLU Add
VGG19
Attention U-Net- GeLU DenseNet201 True 2 2 True True ReLU Add
DenseNet201

Table 7 The V-Net configuration


Keyword Hidden activation function Batch normalization Pooling Un-pooling Res Initial # Res Max #

V-Net GeLU True False False 1 3

brightness range. In the case the sixth hyperparameter is grid search approach is utilized, this will lead to OðN 6 Þ
true, the last eight hyperparameters will be optimized; concerning the running complexity. As a result, the meta-
otherwise, they will be neglected. Hence, at least 6 heuristic optimization algorithms approach using Sparrow
hyperparameters are required to be optimized. So, if the Search Algorithm (SpaSA) is applied in the current study.

123
832 Neural Computing and Applications (2023) 35:815–853

Table 8 The Swin U-Net configuration


Keyword Stack Up # Stack Down # Patch size Heads # Window size MLP # Fliter Begin # Depth Shift window

Swin U-Net 2 2 (2,2) [4, 8, 8, 8] [4, 2, 2, 2] 512 64 4 True

Table 9 A list of parameters and hyperparameters in a CNN to be optimized


Layer Parameters Hyperparameters

Convolution layer Kernels’ weights Number of kernels, kernel size, stride, activation function, and padding
Pooling layer Filter size, pooling method, padding, and stride
Fully connected layer Neurons’ Weights Activation function and number of weights
Others Optimizer, model architecture, loss function, learning rate, epochs,
batch size, weight initialization, dataset splitting, and regularization

5.4.1 Sparrow search algorithm (SpaSA) input and the score (i.e., accuracy in this case) will be
output. What happens internally? After accepting the
This approach will be used to solve the optimization issue solution required to be evaluated, the 15 elements men-
to obtain the best combinations. As a beginning, all spar- tioned earlier are extracted and applied to the pre-trained
row populations and their parameters are initialized ran- CNN model (e.g., VGG16). Initially, the model uses these
domly from the given ranges (as described in Table 11). certain values to start the learning process (i.e., the training
The steps of the hyperparameters optimization include (1) and validation processes). Then, it evaluates itself on the
objective function calculation, (2) population sorting, (3) entire dataset to find the overall performance metrics.
selection, and (4) updating. After operating a set of itera- Finally, the objective function returns the accuracy. The
tions, the best global optimal location and fitness value are reported performance metrics are discussed in Sect. 4.7
reported. The steps are explained comprehensively in the and their equations are Eq. 11 for accuracy, Eq. 12 for
next subsections. specificity, Eq. 13 for recall, Eq. 14 for FNR, Eq. 15 for
fallout, Eq. 16 for precision, Eq. 17 for dice coef., Eq. 18
5.4.2 Initial population for JAC, and Eq. 19 for AUC.
TP þ TN
Initially, the sparrows population and its relevant parame- Accuracy ¼ ð11Þ
TP þ TN þ FP þ FN
ters are selected randomly. An arbitrary method is used to
TN
generate the initial population in SpaSA. It can be defined Specificity ¼ TNR ¼ ð12Þ
TN þ FP
as shown in Eq. 10 where Xi;j is the position of ith sparrow
in jth search space, i is the solution index, and j is the TP
Sensitivity ¼ Recall ¼ TPR ¼ ð13Þ
dimension index. D in the current study will be set to 15 TP þ FN
(i.e., the number of hyperparameters required to be opti- FN
FNR ¼ ¼ 1  TPR ð14Þ
mized). It will generate a population with a size of ðPs  FN þ TP
DÞ where Ps is the population size (i.e., number of spar- FP
rows) and its value is set to 10 in the current study. FPR ¼ Fallout ¼ 1  TNR ¼ ð15Þ
  FP þ TN
Xi;j ¼ LBj þ UBj  LBj  randomð1; DÞ ð10Þ TP
Precision ¼ PPV ¼ ð16Þ
TP þ FP
5.4.3 Objective function calculation 2  TP
Dice ¼ ð17Þ
2  TP þ FP þ FN
The objective function is applied to each sparrow to TP Dice
JAC ¼ ¼ ð18Þ
determine the corresponding score. The current problem is TP þ FP þ FN 2  Dice
a maximization one, the higher the value, the better the
sparrow. To simplify this step, the objective function can
be thought of as a black box in which the solution is the

123
Neural Computing and Applications (2023) 35:815–853 833

FPR þ FNR optimal discoverer position, and Xworst indicates the current
AUC ¼ 1  ¼ worst position. A is a 1  D matrix, where an element is
2
  ð19Þ
FP FN only - 1 or 1, with Aþ ¼ AT  ðA  AT Þ1 . If i [ 0:5  n,
1  0:5  þ when the followers are starving and have low levels of
FP þ TN FN þ TP
energy reserves, they leave to search for food in other
areas. The movement of the leaving followers is in a ran-
5.4.4 Population sorting dom direction which is away from the current worst posi-
tion. Otherwise, the followers with high levels of energy
After calculating the objective function of each sparrow in move to the discoverers that have found good food.
the population set, sparrows are sorted in descending

8 Xt X t
< worst i;j
tþ1 Q  expð i2
Þ
; if ði [ 0:5  nÞ
Xi;j ¼ : ð21Þ
:
XPtþ1 þ jXi;j
t
 XPtþ1 j  Aþ  L; Otherwise

arrangements concerning the values of the objective It is assumed that only 10% to 20% of the entire sparrow
function. population are aware of the danger. The sparrows initial
positions are randomly formed in the population using
5.4.5 Selection Eq. 22 where b, the step size control parameter, is a normal
distribution with a mean value of 0 and a variance of 1 of
t t random numbers. Xbest is the current global optimal loca-
The current best individual Xbest and worst individual Xworst
and their fitness values are picked to be applied in the tion. K 2 ½1; 1 is a random number that denotes the
updating process. direction in which the sparrow moves.  is the smallest
constant to avoid zero-division-error. fi is the fitness value
5.4.6 Population updating of the present sparrow; hence, fg and fw are the current
global best and worst fitness values, respectively. When
Using SpaSA, the individual with the best fitness values fi [ fg, it indicates that the sparrow is at the edge of the
has the priority to collect food in the search procedure and group. Xbest represents the location of the center of the
oversee the entire population movement. So, updating the population and is safe around it. While fi ¼ fg reveals that
sparrow location for producers is important and can be the sparrows, that are in the middle of the population, are
done using Equation 20 where h represents the number of aware of the danger.
the current iteration and T is the maximal iterations num- 8 t t t
ber. Xi;j represents the current position of the ith sparrow in < Xbest þ b  jXi;j  Xbest j;
> if fi 6¼ fg
tþ1  t t 
Xi;j ¼ jXi;j  Xworst j
the jth dimension. a is a random number 2 ½0; 1. Q is a > t
: Xi;j þK ; Otherwise
random number from the normal distribution. L represents ðfi  fw Þ þ 
a 1  D matrix containing all 1 element. R2 and ST rep- ð22Þ
resent warning and safety values respectively, and
R2 2 ½0; 1, ST 2 ½0:5; 1. When R2\ST, there are no
predators and the discoverers can widely search for food 5.5 The overall pseudocode and flowchart
sources. Otherwise, some sparrows have detected the
predators and the whole population flies to other safe areas The steps are iteratively computed for a number of itera-
when the chirping alarm happens. tions. Algorithm 1 and the corresponding flowchart in
( h
Fig. 3 summarize the proposed learning and optimization
tþ1
t
Xi;j  expðaT Þ ; if R2 \ST approach.
Xi;j ¼ t
ð20Þ
Xi;j þ Q  L; Otherwise

Additionally, some of the followers supervise the discov-


erers and those discoverers with high predation rates for
food, increasing their nutrition. The followers’ position is
updated using Equation 21 where XP is the currently

123
834 Neural Computing and Applications (2023) 35:815–853

123
Neural Computing and Applications (2023) 35:815–853 835

6 Experiments and discussions

The experiments are divided into two categories: (1) seg-


mentation experiments and (2) optimization, learning, and
classification experiments.

6.1 Experiments configurations

Generally, ‘‘Python’’ programming language is used in the


current study for coding and testing. Google Colab, with its
GPU, is the learning and optimization environment. Ten-
sorflow, Keras, keras-unet-collection, NumPy, OpenCV,
Pandas, and Matplotlib are the major used Python packages
[20]. The dataset split ratio is set to 85% (for training and
validation) and 15% (for testing). Dataset shuffling is
applied randomly during the learning process. The images
are resized to ð100  100  3Þ for classification and to
ð128  128  3Þ for segmentation in the RGB color space.
Table 10 summarizes the common configurations of the
experiments, Table 11 summarizes the optimization,
learning, and classification specific configurations, and
Table 12 summarizes the segmentation specific
configurations.

6.2 Segmentation experiments

The current subsection presents and discusses the experi-


ments related to segmentation. The experiments are applied
using U-Net [104], U-Net?? [140], Attention U-Net [93],
Swin U-Net [30], and V-Net [87]. Table 13 shows the
summarization of the reported results related to the seg-
mentations experiments. For the ‘‘Skin cancer segmenta-
tion and classification’’ dataset, Table 13 shows that the
best model is the ‘‘U-Net??-DenseNet201’’ concerning
Fig. 3 The suggested learning and hyperparameters optimization the loss, accuracy, F1, AUC, IoU, and dice values. How-
flowchart
ever, the ‘‘U-Net??-Default’’ model is the best concerning
the specificity, hinge, and square hinge values. It worth

Table 10 The used experiments common configurations


Configuration Specifications

Dataset details Table 3


Scripting language Python
Python packages Tensorflow, Keras, keras-unet-collection, NumPy, OpenCV, Scikit-Learn, SciPy, Pandas, and Matplotlib
Learning and optimization Google Colab (Intel(R) Xeon(R) CPU @ 2.00 GHz, Tesla T4 16 GB GPU with CUDA v.11.2, and 12 GB
environment RAM)

123
836 Neural Computing and Applications (2023) 35:815–853

Table 11 The used optimization, learning, and classification specific configurations


Configuration Specifications

Image size ð100  100  3Þ for classification


Train split ratio 85% to 15%
Shuffle dataset Yes
Number of epochs 5
Hyperparameters optimizer Sparrow Search Algorithm (SpaSA)
SpaSA population size 10
SpaSA number of iterations 10
Output activation function SoftMax
Early stopping patience 5
Pre-trained parameters ImageNet
initializers
Pre-trained models MobileNet, MobileNetV2, MobileNetV3Small, MobileNetV3Large, VGG16, VGG19, NASNetMobile, and
NASNetLarge
Loss Categorical Crossentropy, Categorical Hinge, KLDivergence, Poisson, Squared Hinge, and Hinge
Parameters optimizer Adam, NAdam, AdaGrad, AdaDelta, AdaMax, RMSProp, SGD, Ftrl, SGD Nesterov, RMSProp Centered, and
Adam AMSGrad
Dropout range [0, 0.6]
Batch size 4 to 48 with a step of 4
Pre-trained model learn ratio 1 to 100 with a step of 1
Scaling techniques Normalize, Standard, Min Max, and Max Abs
Apply data augmentation Boolean (Yes or No)
Rotation range 0 to 45 with a step of 1
Width shift range [0, 0.25]
Height shift range [0, 0.25]
Shear range [0, 0.25]
Zoom range [0, 0.25]
Horizontal flip range Boolean (Yes or No)
Vertical lip range Boolean (Yes or No)
Brightness range [0.5, 2.0]

Table 12 The used


Configuration Specifications
segmentation-specific
configurations Image size ð128  128  3Þ for segmentation
Train split ratio 85% to 15%
Filters [64, 128, 256, 512, 1024]
Loss Binary crossentropy
Parameters optimizer Adam
Batch size 4
Number of epochs 10
Early stopping patience 5
Output activation function Sigmoid
Pre-trained parameters initializers ImageNet
Freeze backbone Yes
Freeze batch normalization Yes

123
Table 13 Summary of the segmentation experiments and results concerning the ‘‘Skin cancer segmentation and classification’’ dataset
Keyword Model Configuration Loss Accuracy F1 Precision Sensitivity Specificity AUC IoU Dice Hinge Squared hinge
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

V-Net-VGG19 V-Net Section 5.3.4 0.291 85.28 76.42 69.00 91.85 85.53 95.13 88.28 88.63 76.95 75.14
Attention U-Net-Default Attention Table 6 0.225 89.62 81.30 87.94 79.42 96.49 95.16 91.91 92.36 79.97 77.70
Neural Computing and Applications (2023) 35:815–853

U-Net
Attention U-Net-None Attention Table 6 0.130 93.13 89.26 95.25 85.67 98.54 98.50 95.74 96.09 78.19 76.39
U-Net
Attention U-Net-VGG19 Attention Table 6 0.112 93.88 90.84 93.50 89.58 97.78 98.85 95.53 95.88 77.33 75.52
U-Net
Attention U-Net- Attention Table 6 0.142 92.89 88.45 96.12 83.35 98.77 98.71 94.55 95.17 80.00 76.73
DenseNet201 U-Net
Swin U-Net Swin U-Net Section 5.3.5 0.227 89.41 80.35 84.27 80.72 95.27 95.40 91.09 91.63 80.36 77.58
U-Net??-Default U-Net?? Table 5 0.572 72.09 0 0 0 99.57 51.96 75.62 76.74 93.22 88.16
U-Net??-None U-Net?? Table 5 0.142 92.99 88.86 93.45 86.17 98.02 98.43 94.68 95.22 78.99 76.14
U-Net??-VGG19 U-Net?? Table 5 0.117 93.84 90.52 90.51 91.63 96.86 98.84 94.91 95.22 76.62 74.98
U-Net??-DenseNet201 U-Net?? Table 5 0.104 94.16 91.39 94.51 89.36 98.16 99.03 96.08 96.41 77.19 75.47
U-Net-Default U-Net Table 4 0.237 89.05 80.61 82.17 83.68 94.30 94.94 90.80 91.24 79.25 76.97
U-Net-None U-Net Table 4 0.326 86.01 75.65 70.33 86.38 88.40 92.72 84.40 84.97 79.21 76.24
U-Net-VGG19 U-Net Table 4 0.120 93.64 90.16 93.18 88.65 97.79 98.58 95.43 95.78 77.48 75.65
U-Net-DenseNet201 U-Net Table 4 0.110 93.99 91.00 94.54 88.70 98.09 98.91 95.46 95.86 77.34 75.22
837

123
838 Neural Computing and Applications (2023) 35:815–853

Fig. 4 Graphical summary of the segmentation experiments and results concerning the ‘‘Skin cancer segmentation and classification’’ dataset

mentioning that the ‘‘V-Net-VGG19’’ model is better than 6.4 The ‘‘ISIC 2019 and 2020 Melanoma dataset’’
other concerning the sensitivity and recall values and the experiments
‘‘Attention U-Net-DenseNet201’’ models is better con-
cerning the precision value. Figure 4 presents a graphical Table 15 shows the TP, TN, FP, and FN of the best solu-
summarization of the reported segmentation results con- tions after the learning and optimization processes on each
cerning the ‘‘Skin cancer segmentation and classification’’ pre-trained model concerning the ‘‘ISIC 2019 and 2020
dataset. For the ‘‘PH2’’ dataset, Table 14 shows that the Melanoma dataset’’ dataset. It shows that MobileNet pre-
best model is the ‘‘Attention U-Net-DenseNet201’’ con- trained model has the lowest FP and FN values. On the
cerning the loss, accuracy, F1, IoU, and dice values. other hand, MobileNetV3Small has the highest FP and FN
However, the ‘‘Swin U-Net’’ model is the best concerning values. The best solutions combinations concerning each
the precision, specificity, and squared hinge values. It model are reported in Table 16. It shows that the
worth mentioning that the ‘‘UNet??-Default’’ model is KLDivergence loss is recommended by five models while
ignored as it reported meaningless performance metrics. the Poisson loss is recommended by two models only. The
Figure 5 presents a graphical summarization of the repor- AdaGrad parameters optimizer is recommended by four
ted segmentation results concerning the ‘‘PH2’’ dataset. models, while the SGD Nesterov parameters optimizer is
recommended by two only. All models recommended
applying data augmentation. The min-max scaler is rec-
ommended by three models. From the values reported in
6.3 Learning and optimization experiments Table 15 and the learning history, we can report different
performance metrics. The reported metrics are partitioned
The current subsection presents and discusses the experi- into two types. The first reflects the metrics that are
ments related to the learning and optimization experiments required to be maximized (i.e., Accuracy, F1, Precision,
using the mentioned pre-trained TL CNN models (i.e., Recall, Specificity, Sensitivity, AUC, IoU, Dice, and
MobileNet, MobileNetV2, MobileNetV3Small, Mobile- Cosine Similarity). The second reflects the metrics that are
NetV3Large, VGG16, VGG19, NASNetMobile, and required to be minimized (i.e., Categorical Crossentropy,
NASNetLarge) and SpaSA meta-heuristic optimizer. The KLDivergence, Categorical Hinge, Hinge, SquaredHinge,
number of epochs is set to 5. The numbers of SpaSA Poisson, Logcosh Error, Mean Absolute Error, Mean IoU,
iterations and population size are set to 10 each. The Mean Squared Error, Mean Squared Logarithmic Error,
captured and reported metrics are the loss, accuracy, F1, and Root Mean Squared Error). The first category metrics
precision, recall, sensitivity, specificity, AUC, IoU coef., are reported in Table 17, while the second is in Table 18.
Dice coef., cosine similarity, TP, TN, FP, FN, logcosh From them, we can report that the MobileNet pre-trained
error, mean absolute error, mean IoU, mean squared error, model is the best model compared to others concerning the
mean squared logarithmic error, and root mean squared ‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset. Figure 6
error. and Figure 7 present graphical summarizations of the

123
Table 14 Summary of the segmentation experiments and results concerning the ‘‘PH2’’ dataset
Keyword Model Configuration Loss Accuracy F1 Precision Sensitivity Specificity IoU Dice Hinge Squared hinge
(%) (%) (%) (%) (%) (%) (%) (%) (%)

VNet-VGG19 VNet Section 5.3.4 0.377 84.43 74.28 80.86 68.69 92.29 59.09 74.28 78.47 73.50
Attention U-Net-Default Attention Table 6 0.374 84.89 69.99 90.66 57.00 97.44 53.83 69.99 82.90 79.23
Neural Computing and Applications (2023) 35:815–853

UNet
Attention U-Net-None Attention Table 6 0.394 87.47 82.05 79.20 85.11 89.05 69.57 82.05 74.08 70.03
UNet
Attention U-Net-VGG19 Attention Table 6 0.377 79.35 74.36 60.35 96.83 72.03 59.18 74.36 71.93 70.37
UNet
Attention U-Net- Attention Table 6 0.137 94.75 92.65 92.56 92.74 96.20 86.30 92.65 69.28 68.04
DenseNet201 UNet
Swin U-Net Swin-UNET Section 5.3.5 0.445 84.68 69.05 99.09 52.99 99.77 52.73 69.05 83.43 80.30
UNet??-Default UNet?? Table 5 0.629 66.43 0 100 0 100 0 0 87.98 80.29
UNet??-None UNet?? Table 5 1.332 75.28 69.82 55.44 94.25 67.46 53.63 69.82 71.97 70.97
UNet??-VGG19 UNet?? Table 5 10.720 62.57 61.96 44.94 99.74 46.66 44.88 61.96% 69.69 69.62
UNet??-DenseNet201 UNet?? Table 5 1.516 87.88 83.11 72.02 98.23 83.82 71.10 83.11 70.77 70.63
UNet-Default UNet Table 4 0.453 77.03 42.20 95.93 27.05 99.49 26.74 42.20 86.88 79.70
UNet-None UNet Table 4 0.275 88.30 78.30 93.35 67.43 97.88 64.34 78.30 78.85 76.00
UNet-VGG19 UNet Table 4 0.242 90.23 86.37 80.63 92.98 89.32 76.00 86.37 70.66 69.54
UNet-DenseNet201 UNet Table 4 1.908 67.77 65.60 48.88 99.71 54.01 48.81 65.60 69.52 69.41
839

123
840 Neural Computing and Applications (2023) 35:815–853

Fig. 5 Graphical summary of the segmentation experiments and results concerning the ‘‘PH2’’ dataset

Table 15 The confusion matrix results concerning the ‘‘ISIC 2019 optimizer is recommended by two only. Seven models
and 2020 Melanoma dataset’’ dataset
recommended applying data augmentation. The min-max
Model name TP TN FP FN scaler is recommended by four models. From the values
reported in Table 19 and the learning history, we can report
MobileNet 11,250 11,250 198 198
different performance metrics. The reported metrics are
MobileNetV2 11,129 11,129 295 295
partitioned into two types. The first reflects the metrics that
MobileNetV3Small 10,647 10,647 801 801
are required to be maximized (i.e., Accuracy, F1, Precision,
MobileNetV3Large 10,994 10,994 446 446
Recall, Specificity, Sensitivity, AUC, IoU, Dice, and
VGG16 10886 10,886 538 538
Cosine Similarity). The second reflects the metrics that are
VGG19 11098 11,098 350 350
required to be minimized (i.e., Categorical Crossentropy,
NASNetMobile 11,062 11,062 362 362
KLDivergence, Categorical Hinge, Hinge, SquaredHinge,
NASNetLarge 10,670 10,670 778 778
Poisson, Logcosh Error, Mean Absolute Error, Mean IoU,
Mean Squared Error, Mean Squared Logarithmic Error,
and Root Mean Squared Error). The first category metrics
are reported in Table 21, while the second is in Table 22.
reported learning and optimization results concerning the From them, we can report that the MobileNet pre-trained
‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset. model is the best model compared to others concerning the
‘‘Melanoma Classification (HAM10K)’’ dataset. Figure 8
and Figure 9 present graphical summarizations of the
6.5 The ‘‘Melanoma Classification (HAM10K)’’ reported learning and optimization results concerning the
experiments ‘‘Melanoma Classification (HAM10K)’’ dataset.

Table 19 shows the TP, TN, FP, and FN of the best solu-
tions after the learning and optimization processes on each 6.6 The ‘‘Skin diseases image dataset’’
pre-trained model concerning the ‘‘Melanoma Classifica- experiments
tion (HAM10K)’’ dataset. It shows that MobileNet pre-
trained model has the lowest FP and FN values. On the Table 23 shows the TP, TN, FP, and FN of the best solu-
other hand, MobileNetV3Small has the highest FP and FN tions after the learning and optimization processes on each
values. The best solutions combinations concerning each pre-trained model concerning the ‘‘Skin diseases image
model are reported in Table 20. It shows that the dataset.’’ It shows that MobileNetV2 pre-trained model has
KLDivergence loss is recommended by four models while the lowest FP and FN values. On the other hand,
the Squared Hinge loss is recommended by two models MobileNetV3Small has the highest FP and FN values. The
only. The SGD Nesterov parameters optimizer is recom- best solutions combinations concerning each model are
mended by five models while the SGD parameters reported in Table 24. It shows that the KLDivergence loss

123
Neural Computing and Applications (2023) 35:815–853 841

Brightness
is recommended by five models, while the Categorical

1.21-1.31

0.93-1.06
0.86-1.29
0.82-1.09

0.67-1.86
0.8-0.85

0.5-1.62

1.6-1.73
Crossentropy loss is recommended by three models only.

Range
The AdaMax parameters optimizer is recommended by
four models, while the SGD parameters optimizer is rec-
Vertical

ommended by two only. Seven models recommended


Flip

Yes

Yes
applying data augmentation. The standardization is rec-
No
No

No
No

No
No
ommended by four models. From the values reported in
Horizontal

Table 23 and the learning history, we can report different


performance metrics. The reported metrics are partitioned
Flip

Yes

Yes

Yes
No

No

No

No
No
into two types. The first reflects the metrics that are
required to be maximized (i.e., Accuracy, F1, precision,
Range
Zoom

0.19

0.17
0.24
0.15

0.03

0.16
0.17
recall, specificity, sensitivity, AUC, IoU, dice, and cosine
0.2

similarity). The second reflects the metrics that are required


Range
Shear

to be minimized (i.e., Categorical crossentropy, KLDiver-


0.16
0.24

0.19
0.22
0.03

0.17

0.25
0.12
gence, categorical hinge, hinge, squaredhinge, poisson,
Table 16 The best solutions after the training and optimization process concerning the ‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset
Height

logcosh error, mean absolute error, mean IoU, mean


range
Shift

0.12
0.05

0.13
0.08
0.04

0.08

0.06
0.22
squared error, mean squared logarithmic error, and root
mean squared error). The first category metrics are reported
Width

range
Shift

in Table 25, while the second is in Table 26. From them,


0.25
0.06

0.01
0.01
0.14

0.11

0.23
0.19

we can report that the MobileNetV2 pre-trained model is


Rotation

the best model compared to others concerning the ‘‘Skin


Range

diseases image dataset.’’ Figures 10 and 11 present


35

38
26
34

19

14
41
8

graphical summary of the reported learning and optimiza-


Augmentation

tion results concerning the ‘‘Skin diseases image dataset.’’


Apply

Yes
Yes

Yes
Yes
Yes

Yes

Yes
Yes

6.7 Overall discussions


Standardization
Normalization

Normalization
Max-Absolute

Max-Absolute

The experiments conducted in this study are split into two


Min-Max

Min-Max

Min-Max

categories, i.e., segmentation and classification. In the


Scaler

segmentation experiments, different U-Net models were


used. Concerning the ‘‘Skin cancer segmentation and
Optimizer

classification’’ dataset, the best model was the U-Net??


Nesterov

Nesterov
AdaGrad
AdaGrad

AdaGrad
AdaGrad
AdaMax
Nadam

with DenseNet201 as a backbone regarding the loss,


SGD

SGD

accuracy, F1-score, AUC, IoU, and Dice values. However,


the Attention U-Net model was the best regarding the AUC
value. The worst results were obtained by the default
learn
ratio
TL

54
76

99
89
81

29

49
76

UNet?? architecture. The achieved scores by U-Net??


with DenseNet201 as a backbone architecture were
Dropout

94:16%, 91:39%, 99:03%, 96:08%, 96:41%, 77:19%,


0.51
0.04

0.57
0.19
0.05

0.25

0.22
0.45

75:47% in terms of accuracy, F1-score, AUC, IoU, Dice,


hinge, and squared hinge. Concerning the ‘‘PH2’’ dataset,
Batch
size

the best model was the Attention U-Net with DenseNet201


24
28

40
32

24

32
12
8

as a backbone regarding the loss, accuracy, F1, IoU, and


KLDivergence

KLDivergence

KLDivergence

KLDivergence
KLDivergence

dice values. However, the ‘‘Swin U-Net’’ model is the best


Categorical

concerning the precision, specificity, and squared hinge


Hinge

Poisson
Poisson
Loss

values. The achieved scores by Attention U-Net with


MobileNetV3Small
MobileNetV3Large

NASNetMobile
Model name

NASNetLarge
MobileNetV2
MobileNet

VGG16

VGG19

123
842 Neural Computing and Applications (2023) 35:815–853

Table 17 The ‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset experiments with the maxmized metrics
Model name Accuracy F1 Precision Recall Sensitivity Specificity AUC IoU Dice Cosine similarity
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

MobileNet 98.27 98.27 98.27 98.27 98.27 98.27 99.42 98.00 98.32 98.49
MobileNetV2 97.42 97.42 97.42 97.42 97.42 97.42 98.19 98.12 98.21 97.56
MobileNetV3Small 93.00 93.00 93.00 93.00 93.00 93.00 97.87 93.37 94.28 94.12
MobileNetV3Large 96.10 96.10 96.10 96.10 96.10 96.10 98.85 96.01 96.58 96.61
VGG16 95.29 95.29 95.29 95.29 95.29 95.29 99.13 94.51 95.49 96.25
VGG19 96.94 96.94 96.94 96.94 96.94 96.94 99.19 96.11 96.78 97.23
NASNetMobile 96.83 96.83 96.83 96.83 96.83 96.83 99.14 96.69 97.16 97.19
NASNetLarge 93.20 93.20 93.20 93.20 93.20 93.20 98.18 90.67 92.48 94.46

Table 18 The ‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset experiments with the minimized metrics
Model name Logcosh Mean absolute Mean Mean squared Mean squared logarithmic Root mean squared
error error IoU error error error

MobileNet 0.006 0.025 0.512 0.014 0.007 0.120


MobileNetV2 0.010 0.027 0.685 0.024 0.011 0.154
MobileNetV3Small 0.024 0.086 0.323 0.054 0.026 0.232
MobileNetV3Large 0.014 0.051 0.478 0.032 0.015 0.178
VGG16 0.016 0.068 0.264 0.034 0.017 0.185
VGG19 0.012 0.048 0.363 0.026 0.013 0.162
NASNetMobile 0.012 0.043 0.289 0.026 0.013 0.162
NASNetLarge 0.024 0.113 0.250 0.051 0.025 0.225

Table 19 The confusion matrix results concerning the ‘‘Melanoma


Classification (HAM10K)’’ dataset
Model name TP TN FP FN

MobileNet 9867 9867 117 117


MobileNetV2 9745 9745 263 263
MobileNetV3Small 8999 8999 985 985
MobileNetV3Large 9720 9720 264 264
VGG16 9417 9417 591 591
VGG19 9811 9811 189 189
NASNetMobile 9323 9323 677 677
Fig. 6 Summary of the confusion matrix results concerning the ‘‘ISIC
2019 and 2020 Melanoma dataset’’ dataset NASNetLarge 9163 9163 837 837

Fig. 7 Summary of the ‘‘ISIC 2019 and 2020 Melanoma dataset’’ dataset experiments with the maxmized metrics

123
Table 20 The best solutions after the training and optimization process concerning the ‘‘Melanoma Classification (HAM10K)’’ dataset
Model name Loss Batch Dropout TL Optimizer Scaler Apply Rotation Width Height Shear Zoom Horizontal Vertical Brightness
size learn
ratio
Augmentation Range Shift Shift Range Range Flip Flip Flip
Neural Computing and Applications (2023) 35:815–853

range range

MobileNet KLDivergence 48 0.16 53 SGD Min-Max Yes 15 0.23 0.11 0.05 0.13 Yes No 0.78-1.14
Nesterov
MobileNetV2 Squared 36 0.56 46 SGD Min-Max Yes 20 0.01 0.22 0.09 0.01 Yes Yes 0.92-1.08
Hinge
MobileNetV3Small Hinge 48 0.6 100 Adam Max-Absolute No N/A N/A N/A N/A N/A N/A N/A N/A
AMSGrad
MobileNetV3Large KLDivergence 32 0.51 42 SGD Standardization Yes 26 0.06 0.02 0.05 0.04 No No 0.57-0.68
Nesterov
VGG16 Poisson 12 0.08 90 SGD Standardization Yes 39 0.25 0.24 0.23 0.08 No Yes 1.28-1.66
Nesterov
VGG19 KLDivergence 40 0.01 92 SGD Normalization Yes 3 0.25 0.03 0.12 0.05 No No 1.81-1.95
Nesterov
NASNetMobile KLDivergence 16 0.08 95 SGD Min-Max Yes 38 0.02 0.19 0 0.08 Yes No 1.86-1.97
NASNetLarge Squared 16 0.19 77 SGD Min-Max Yes 0 0.02 0.21 0.24 0.08 Yes No 1.0-1.24
Hinge Nesterov
843

123
844 Neural Computing and Applications (2023) 35:815–853

Table 21 The ‘‘Melanoma Classification (HAM10K)’’ dataset experiments with the maxmized metrics
Model name Accuracy F1 Precision Recall Sensitivity Specificity AUC IoU Dice Cosine similarity
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

MobileNet 98.83 98.83 98.83 98.83 98.83 98.83 99.45 98.83 98.97 98.91
MobileNetV2 97.37 97.37 97.37 97.37 97.37 97.37 98.26 97.95 98.07 97.44
MobileNetV3Small 90.13 90.13 90.13 90.13 90.13 90.13 90.13 93.42 93.42 90.13
MobileNetV3Large 97.36 97.36 97.36 97.36 97.36 97.36 99.21 96.93 97.42 97.62
VGG16 94.09 94.09 94.09 94.09 94.09 94.09 98.44 92.10 93.58 95.06
VGG19 98.11 98.11 98.11 98.11 98.11 98.11 99.52 97.87 98.18 98.24
NASNetMobile 93.23 93.23 93.23 93.23 93.23 93.23 97.59 94.39 95.00 94.29
NASNetLarge 91.63 91.63 91.63 91.63 91.63 91.63 95.07 91.07 92.30 92.35

Table 22 The ‘‘Melanoma Classification (HAM10K)’’ dataset experiments with the minimized metrics
Model name Logcosh Mean absolute Mean Mean squared Mean squared logarithmic Root mean squared
error error IoU error error error

MobileNet 0.005 0.015 0.585 0.010 0.005 0.102


MobileNetV2 0.011 0.029 0.779 0.025 0.012 0.158
MobileNetV3Small 0.043 0.099 0.820 0.099 0.047 0.314
MobileNetV3Large 0.010 0.039 0.274 0.022 0.011 0.150
VGG16 0.021 0.096 0.251 0.046 0.022 0.213
VGG19 0.007 0.027 0.251 0.017 0.008 0.129
NASNetMobile 0.024 0.075 0.412 0.053 0.026 0.231
NASNetLarge 0.033 0.115 0.258 0.072 0.035 0.269

and squared hinge. It is worth noting that the worst results


were also obtained by the default UNet?? architecture.
The SpaSA meta-heuristic optimizer was utilized to
optimize the pre-trained CNN models hyperparameters to
achieve the learning, classification, and optimization phase.
For the classification experiments, eight pre-trained CNN
architectures are used, i.e., VGG16, VGG19, MobileNet,
MobileNetV2, MobileNetV3Small, MobileNetV3Large,
NasNetLarge, and NasNetMobile. For the ‘‘ISIC 2019 and
2020 Melanoma’’ dataset, the best reported overall accu-
Fig. 8 Summary of the confusion matrix results concerning the racy from the applied CNN experiments is 98:27% by the
‘‘Melanoma Classification (HAM10K)’’ dataset MobileNet pre-trained model. The average accuracy was
95:88%. The floored average TP, TN, FP, and FN were
11.250, 11.250, 198, and 198, respectively. Similarly, for
DenseNet201 as a backbone architecture were 0.137, the ‘‘Melanoma Classification (HAM10K)’’ dataset, the
94.75%, 92.65%, 92:56%, 92:74%, 96:20%, 86:30%, best reported overall accuracy from the applied CNN
92:65%, 69:28%, 68:04% in terms of loss, accuracy, F1- experiments is 98:83% by the MobileNet pre-trained
score, precision, sensitivity, specificity, IoU, dice, hinge, model. The average accuracy was 95:09%. The floored

123
Neural Computing and Applications (2023) 35:815–853 845

Fig. 9 Summary of the ‘‘Melanoma Classification (HAM10K)’’ dataset experiments with the maxmized metrics

Table 23 The confusion matrix results concerning the ‘‘Skin diseases hybrid mechanisms, were previously used for skin cancer
image dataset’’
diagnosis; however, the SpaSA has not been utilized as far
Model name TP TN FP FN as we know concerning the published articles. Moreover,
the suggested algorithm is trained and evaluated using 5
MobileNet 21,111 240,729 3351 6009
different datasets. Figure 12 shows a graphical comparison
MobileNetV2 22,130 241,562 2518 4990
between the current study and the related studies.
MobileNetV3Small 9657 241,447 2849 17,487
MobileNetV3Large 16,843 240,391 3689 10,277
VGG16 18,261 241,430 2938 8891
6.9 Time complexity and other approaches
VGG19 18,873 242,176 2156 8275
remarks
NASNetMobile 20,983 238,828 5396 6153
NASNetLarge 20,455 237,669 6519 6677
The current study concentrated on skin cancer detection,
classification, and segmentation. The classification was
accomplished using 8 pre-trained CNNs, while the seg-
mentation was done by employing 5 different U-Net
average TP, TN, FP, and FN were 9.867, 9.867, 117, and models. The CNN hyperparameters optimization was per-
117, respectively. For the ‘‘Skin diseases image’’ dataset, formed using the SpaSA to gain the SOTA performance
the best reported overall accuracy from the applied CNN metrics. The target during the different experiments was to
experiments is 85:87% by the MobileNetV2 pre-trained achieve high-performance metrics. The learning and pro-
model. The average accuracy was 75:50%. The floored cessing time was high and hence was not declared exactly
average TP, TN, FP, and FN were 22.130, 241.562, 2.518, in this study. However, approximate times can be calcu-
and 4.990, respectively. lated. It is worth noting that the time depended mainly on
the working environment. The current study worked on
6.8 Related studies comparisons Google Colab. Assume that each CNN model takes one
minute approximately. There are 15 hyperparameters to
Table 27 shows a comparison between the suggested optimize using the SpaSA. The number of SpaSA iterations
approach and related studies concerning the same used is set to 10, the SpaSA population size is set to 10, and the
datasets. It can be observed from the literature that there number of epochs is set to 5. Hence, there are 10  10 
are different research works for skin cancer diagnosis. 5 ¼ 500 runs for each model to complete. The approximate
Additionally, using meta-heuristic algorithms for skin time is 500 minutes (i.e., 8.33 hours) for a single model.
cancer diagnosis is increasing exponentially. In the current We have 8 pre-trained models and 3 datasets. Hence, there
study, a hybrid algorithm using deep learning, transfer are 24 experiments. The total approximate time can be 200
learning, and a recently proposed meta-heuristic algorithm hours (i.e., 8.33 days). If the SpaSA is replaced with the
named SpaSA has been suggested for performing skin grid search (GS) native search approach, the time would be
cancer diagnosis. Many optimization approaches, including more than that (e.g., months) as the GS searches for all

123
846 Neural Computing and Applications (2023) 35:815–853

Brightness
possible combinations. From Table 11, we have 6 losses,

1.01–1.01
1.26–1.67

0.71–0.84

0.81–1.77
0.68–1.86

0.52–0.69
1.18–1.47
11 parameters optimizers, 60 dropouts, 12 batch sizes, 100

Range

N/A
learning ratios, 4 scalers, 2 augmentation values, 45 rota-
tions, 25 shifts in width, 25 shifts in height, 25 shears, 25

Vertical
zooms, 2 horizontal flips, 2 vertical flips, and 15 brightness

N/A
Flip

Yes
Yes

Yes

Yes
Yes

No
No
ranges. From that, we can obtain approximately 4E16
combinations which can last for more than 77 billion years.

Horizontal

N/A
Flip

Yes
Yes

Yes

Yes

Yes

Yes
No
7 Study limitations

Range
Zoom

0.05
0.17

0.15

0.17
0.22

0.21

N/A
Even though the proposed study demonstrated the potential

0
of using deep learning models for detecting, classifying,

Range
Shear
and segmenting skin cancer, the suggested approach pre-

0.01
0.15

0.05

0.02
0.07

N/A
0.1
0.2
sents some limitations. The main limitation is the instan-

Height
taneity, where the most time-consuming stage is the

range
Shift

0.12

0.14
0.19

0.04

0.08

0.02
0.23

N/A
training of the classifier. The slow convergence of the
boosting algorithm and high-dimensional features are the

Width

range
Shift

0.24

0.11
0.16

0.11

0.22

N/A
causes of this limitation. Another limitation includes that

0.1
0
only 8 CNN architectures and 5 U-Net models are used.

Table 24 The best solutions after the training and optimization process concerning the ‘‘Skin diseases image dataset’’
Rotation

Range

N/A
34

27
44

27

36

21
4
8 Conclusions and future work

Automatic skin cancer detection and segmentation is an Augmentation


Apply

open-ended research area that steadily requires improve-

Yes
Yes

Yes
Yes

Yes

Yes

Yes

No
ment. In this research, a methodological approach for
classifying various skin images into their corresponding

Standardization
Standardization
Standardization

Standardization

Standardization
category with the help of convolution neural networks. Max-Absolute

Min–Max
Min–Max
CNN is a well-known SOTA approach for classifying
Scaler

images and big data. The study showed several works


related to the current work. The deep learning-based skin
cancer detection, classification, and segmentation system
Nesterov
Optimizer

AdaGrad
AdaMax

AdaMax
AdaMax
AdaMax

was developed. Different experiments were achieved and


SGD
SGD

SGD
the results are reported. For the segmentation phase, con-
cerning the ‘‘Skin cancer segmentation and classification’’
Ratio
learn

dataset, the best model was the U-Net?? with Dense-


TL

48
29

84

31
28
26

56

64

Net201 as a backbone regarding the loss, accuracy, F1-


Dropout

score, AUC, IoU, and dice values. The achieved scores this
0.55
0.12

0.39

0.03
0.47
0.28

0.07

0.13

architecture were 94:16%, 91:39%, 99:03%, 96:08%,


96:41%, 77:19%, 75:47% in terms of accuracy, F1-score,
Batch

AUC, IoU, dice, hinge, and squared hinge. Additionally,


Size

40
12

16

32
28
48

40

44

concerning the ‘‘PH2’’ dataset, the best model was the


Crossentropy

Crossentropy

Crossentropy

Attention U-Net with DenseNet201 as a backbone


KLDivergence
KLDivergence

KLDivergence

KLDivergence
KLDivergence
Categorical

Categorical

Categorical

regarding the loss, accuracy, F1, IoU, and dice values. The
achieved scores by this architecture were 0.137, 94.75%,
Loss

92.65%, 92:56%, 92:74%, 96:20%, 86:30%, 92:65%,


69:28%, 68:04% in terms of loss, accuracy, F1-score,
MobileNetV3Large
MobileNetV3Small

precision, sensitivity, specificity, IoU, dice, hinge, and


NASNetMobile
NASNetLarge
MobileNetV2

squared hinge. The SpaSA meta-heuristic optimizer was


Model name

MobileNet

VGG16
VGG19

123
Neural Computing and Applications (2023) 35:815–853 847

Table 25 The ‘‘Skin diseases image dataset’’ experiments with the maxmized metrics
Model name Accuracy F1 Precision Recall Sensitivity Specificity AUC IoU Dice Cosine similarity
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)

MobileNet 82.09 81.68 86.22 77.84 77.84 98.63 98.35 81.63 83.96 85.02
MobileNetV2 85.87 85.33 89.74 81.60 81.60 98.97 99.01 83.28 85.72 88.08
MobileNetV3Small 54.64 45.67 75.16 35.58 35.58 98.83 91.75 55.92 60.32 62.93
MobileNetV3Large 72.11 70.30 82.03 62.11 62.11 98.49 96.80 71.52 74.89 76.87
VGG16 76.19 74.45 86.05 67.25 67.25 98.80 97.70 75.44 78.41 80.29
VGG19 79.30 77.97 89.72 69.52 69.52 99.12 98.28 76.43 79.52 82.88
NASNetMobile 78.23 78.36 79.54 77.33 77.33 97.79 95.91 82.97 84.31 80.78
NASNetLarge 75.59 75.60 75.84 75.39 75.39 97.33 91.19 82.81 83.42 76.88

Table 26 The ‘‘Skin diseases image dataset’’ experiments with the minimized metrics
Model name Logcosh Mean absolute Mean Mean squared Mean squared logarithmic Root mean squared
error error IoU error error error

MobileNet 0.012 0.048 0.452 0.026 0.013 0.160


MobileNetV2 0.010 0.043 0.451 0.021 0.010 0.143
MobileNetV3Small 0.027 0.119 0.450 0.057 0.028 0.240
MobileNetV3Large 0.018 0.075 0.450 0.038 0.019 0.195
VGG16 0.015 0.065 0.450 0.032 0.016 0.179
VGG19 0.013 0.061 0.450 0.029 0.014 0.169
NASNetMobile 0.015 0.047 0.476 0.034 0.017 0.185
NASNetLarge 0.019 0.050 0.585 0.044 0.021 0.210

are categorized into 2-class and 10-class datasets. For the


‘‘ISIC 2019 and 2020 Melanoma’’ dataset, the best reported
overall accuracy from the applied CNN experiments is
98:27% by the MobileNet pre-trained model. Similarly, for
the ‘‘Melanoma Classification (HAM10K)’’ dataset, the
best reported overall accuracy from the applied CNN
experiments is 98:83% by the MobileNet pre-trained
model. For the ‘‘Skin diseases image’’ dataset, the best
reported overall accuracy from the applied CNN experi-
ments is 85:87% by the MobileNetV2 pre-trained model.
Fig. 10 Summary of the confusion matrix results concerning the The current study results were then compared with 13 prior
‘‘Skin diseases image dataset’’ related works. This showed that this study had outper-
formed numerous of the prior works. Even though there is a
utilized to optimize the pre-trained CNN models hyperpa- lot of work for the recognition and segmentation of skin
rameters to achieve the learning, classification, and opti- cancer. There was a big challenge to get a suitable accu-
mization phase. They were VGG16, VGG19, MobileNet, racy. Due to the lack of datasets in this area. In future
MobileNetV2, MobileNetV3Small, MobileNetV3Large, studies, the plan is to (1) try other ML or DL techniques (2)
NasNetLarge, and NasNetMobile. The dataset was col- work on enhancing the performance of the skin cancer
lected from different public sources. After collection, there

123
848 Neural Computing and Applications (2023) 35:815–853

Fig. 11 Summary of the ‘‘Skin diseases image dataset’’ experiments with the maxmized metrics

Table 27 Comparison between the suggested approach and related studies


Study Year Dataset Approach Best performance

[90] 2019 ISIC 2016 [54] SVM, Random Accuracy of 89:43%, sensitivity of 91:15%, and specificity
Forest and KNN of 87:71%
[71] 2019 DERMIS dataset [77] K-means clustering 96% accuracy
[2] 2019 PH2 and ISIC 2017 [37] Deep Convolutional Accuracy and dice coefficient of 95% and 93%
Encoder-Decoder
Network
[98] 2019 Their own dataset Decision Tree Accuracy of 87%
[5] 2020 ISIC 2016 [54] and ISIC 2018 [36, 127] CNN (YOLOv4) Average accuracy of 95% and Jaccard coefficient as 0.989
[113] 2020 ISIC 2017 [37] Fully convolutional Dice coefficient of 90:26% and a Jaccard index of 83:51%
network and dual
path network
[22] 2020 ISIC 2017 [37] Dynamic graph cut Sensitivity, Specificity, and Diagnostic accuracy of 91:7%,
and Naive Bayes 70:1%, and 72:7% respectively
[10] 2020 Their own dataset Multiple instance Accuracy of 92:50%, sensitivity of 97:50%, and specificity
learning of 87:50%
[65] 2021 ISIC 2016 [54] and PH2 dataset Bacterial colony Precision of 0.969, recall of 0.979, F-measure of 0.974,
optimization accuracy of 0.975, and AUC of 0.98
algorithm based
SVM
[128] 2021 Their own dataset collected from ISIC SOM ? CNN accuracy of 90% and specificity of 99%
archive
[9] 2021 Their own dataset SVM ? Fuzzy Accuracy of 92:04%, sensitivity of 80:11%, specificity of
clustering 95:01%, and precision of 80:17%
[6] 2021 Their own collected dataset from Cross Improved NS-Net Dice coefficient of around 85%
Cancer Institute, University of Alberta, deep learning
Edmonton, Canada network
[67] 2021 Their own dataset CNN 95:98% accuracy
[78] 2022 PH2 dataset AlexNet ? Extreme An accuracy of 98% and sensitivity of 93%
Learning Machine
network
[56] 2022 ISIC 2016 [54], ISIC 2017 [37], and CNN AUC of 96%, 95%, and 97% respectively.
ISIC 2018 [36] datasets
[82] 2022 PH2 dataset Hybrid deep learning 99.33% average accuracy and more than 90% sensitivity
and specificity.
Current 2022 Five datasets described in Table 3 Hybrid (SpaSA, 94.16% and 94.75% (Segmentation, for the two datasets
Study CNN, and U-Net) respectively) and 98.27%, 98.83%, and 85.87%
(Classification, for the three datasets respectively)

123
Neural Computing and Applications (2023) 35:815–853 849

Author agreement statement

We, the undersigned authors, declare that this manuscript is


original, has not been published before, and is not currently
being considered for publication elsewhere. We confirm
that the manuscript has been read and approved by all
named authors and that there are no other persons who
satisfied the criteria for authorship but are not listed. We
further confirm that the order of authors listed in the
manuscript has been approved by all of us. We understand
that the ‘‘corresponding author’’ is the sole contact for the
Fig. 12 Graphical comparison between the current study and the editorial process. He is responsible for communicating with
related studies the other authors about progress, submissions of revisions,
and final approval of proofs.
segmentation phase, and (3) evaluate the system with fur-
ther available datasets.
Author contributions All the authors have participated in writing the
manuscript and have revised the final version. All authors read and
approved the final manuscript.
Appendices
Funding Open access funding provided by The Science, Technology
Table of Abbreviations & Innovation Funding Authority (STDF) in cooperation with The
Egyptian Knowledge Bank (EKB).
Table 28 shows the abbreviations and the corresponding
Data availability The datasets, if existing, that are used, generated, or
meaning. They are sorted alphabetically in ascending analyzed during the current study (A) if the datasets are owned by the
order. authors, they are available from the corresponding author on rea-
sonable request, (B) if the datasets are not owned by the authors, the
supplementary information including the links and sizes is included in
this published article.
Table 28 Table of abbreviations
Abbreviation Definition
Declarations
Adam Adaptive moment optimization algorithm
AUC Area under the ROC curve Conflict of interest No conflict of interest exists. We wish to confirm
that there are no known conflicts of interest associated with this
CNN Convolution neural network publication and there has been no significant financial support for this
DL Deep learning work that could have influenced its outcome.
FC Fully connected
Intellectual property We confirm that we have given due considera-
GLCM Gray-level co-occurrence matrix
tion to the protection of intellectual property associated with this work
ISBI International symposium on biomedical imaging and that there are no impediments to publication, including the timing
IoU Intersection over union of publication, concerning intellectual property. In so doing, we
JAC Jaccard Index confirm that we have followed the regulations of our institutions
concerning intellectual property.
KNN K-nearest neighbor
ML Machine learning Research ethics We further confirm, if existing, that any aspect of the
RELU Rectified linear unit work covered in this manuscript that has involved human patients has
been conducted with the ethical approval of all relevant bodies and
ROC Receiver operating characteristic
that such approvals are acknowledged within the manuscript. Written
SOM Self-organizing map consent to publish potentially identifying information, such as details
SOTA State-of-the-art of the case and photographs, was obtained from the patient(s) or their
SpaSA Sparrow search algorithm legal guardian(s).
SVM Support l Authorship We confirm that the manuscript has been read and
TL Transfer learning approved by all named authors. We confirm that the order of authors
listed in the manuscript has been approved by all named authors.

Human participants and/or animals The current study does not con-
tain any studies with human participants and/or animals performed by
any of the authors.

123
850 Neural Computing and Applications (2023) 35:815–853

Consent to participate There is no informed consent for the current 11. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep
study. convolutional encoder-decoder architecture for image segmen-
tation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Consent for publication Not applicable. 12. Baghdadi NA, Malki A, Abdelaliem SF, Balaha HM, Badawy
M, Elhosseini M (2022) An automated diagnosis and classifi-
Contact with the editorial office The ‘‘Corresponding Author’’ who is cation of COVID-19 from chest CT images using a transfer
declared on the title page. This author submitted this manuscript using learning-based convolutional neural network. Comput Biol Med
his account in the editorial submission system. (A) We understand 144:105383
that the ‘‘Corresponding Author’’ is the sole contact for the Editorial 13. Bahgat WM, Balaha HM, AbdulAzeem Y, Badawy MM (2021)
process (including the editorial submission system and direct com- An optimized transfer learning-based approach for automatic
munications with the office). He is responsible for communicating diagnosis of COVID-19 from chest x-ray images. PeerJ Comput
with the other authors about progress, submissions of revisions, and Sci 7:e555
final approval of proofs. (B) We confirm that the email address shown 14. Balaha HM, Ali HA, Badawy M (2021) Automatic recognition
below is accessible by the ‘‘Corresponding Author’’ and is the address of handwritten Arabic characters: a comprehensive review.
to which ‘‘Corresponding Author’s’’ editorial submission system Neural Comput Appl 33(7):3011–3034
account is linked and has been configured to accept email from the 15. Balaha HM, Ali HA, Saraya M, Badawy M (2021) A new
editorial office (email: [email protected]). Arabic handwritten character recognition deep learning system
(AHCR-DLS). Neural Comput Appl 33(11):6325–6367
Open Access This article is licensed under a Creative Commons 16. Balaha HM, Ali HA, Youssef EK, Elsayed AE, Samak RA,
Attribution 4.0 International License, which permits use, sharing, Abdelhaleem MS, Tolba MM, Shehata MR, Mahmoud MR,
adaptation, distribution and reproduction in any medium or format, as Abdelhameed MM et al (2021) Recognizing Arabic handwritten
long as you give appropriate credit to the original author(s) and the characters using deep learning and genetic algorithms. Multimed
source, provide a link to the Creative Commons licence, and indicate Tools Appl 80(21):32473–32509
if changes were made. The images or other third party material in this 17. Balaha HM, Balaha MH, Ali HA (2021) Hybrid COVID-19
article are included in the article’s Creative Commons licence, unless segmentation and recognition framework (HMB-HCF) using
indicated otherwise in a credit line to the material. If material is not deep learning and genetic algorithms. Artif Intell Med
included in the article’s Creative Commons licence and your intended 119:102156
use is not permitted by statutory regulation or exceeds the permitted 18. Balaha HM, El-Gendy EM, Saafan MM (2021) Covh2sd: A
use, you will need to obtain permission directly from the copyright COVID-19 detection approach based on Harris hawks opti-
holder. To view a copy of this licence, visit http://creativecommons. mization and stacked deep learning. Expert Syst Appl
org/licenses/by/4.0/. 186:115805
19. Balaha HM, El-Gendy EM, Saafan MM (2022) A complete
framework for accurate recognition and prognosis of COVID-19
References patients based on deep transfer learning and feature classifica-
tion approach. Artif Intell Rev 55:1–46
20. Balaha HM, Saafan MM (2021) Automatic exam correction
1. Abdulazeem Y, Balaha HM, Bahgat WM, Badawy M (2021) framework (AECF) for the MCQs, essays, and equations
Human action recognition based on transfer learning approach. matching. IEEE Access 9:32368–32389
IEEE Access 9:82058–82069 21. Balaha HM, Saif M, Tamer A, Abdelhay EH (2022) Hybrid
2. Adegun AA, Viriri S (2019) Deep learning-based system for deep learning and genetic algorithms approach (HMB-
automatic melanoma detection. IEEE Access 8:7160–7172 DLGAHA) for the early ultrasound diagnoses of breast cancer.
3. Ahmed N.J. (2021) Prescribing practices of medications in the Neural Comput Appl 34(11):8671–8695
outpatient dermatology department of a public hospital. J Pharm 22. Balaji V, Suganthi S, Rajadevi R, Kumar VK, Balaji BS, Pan-
Res Int, pp 70–74 diyan S (2020) Skin disease detection and segmentation using
4. Ahsan MM, Mahmud M, Saha PK, Gupta KD, Siddique Z dynamic graph cut algorithm and classification through Naive
(2021) Effect of data scaling methods on machine learning Bayes classifier. Measurement 163:107922
algorithms and model performance. Technologies 9(3):52 23. Baskar R, Lee KA, Yeo R, Yeoh K-W (2012) Cancer and
5. Albahli S, Nida N, Irtaza A, Yousaf MH, Mahmood MT (2020) radiation therapy: current advances and future directions. Int J
Melanoma lesion detection and segmentation using YOLOv4- Med Sci 9(3):193
darknet and active contour. IEEE Access 8:198403–198414 24. Binaghi E, Omodei M, Pedoia V, Balbi S, Lattanzi D, Monti E
6. Alheejawi S, Berendt R, Jha N, Maity SP, Mandal M (2021) (2014) Automatic segmentation of MR brain tumor images
Detection of malignant melanoma in H &E-stained images using using support vector machine in combination with graph cut. In:
deep learning techniques. Tissue Cell 73:101659 IJCCI (NCTA), pp 152–157
7. Alsallakh B, Kokhlikyan N, Miglani V, Yuan J, Reblitz- 25. Binder M, Schwarz M, Winkler A, Steiner A, Kaider A, Wolff
Richardson O (2020) Mind the pad–cnns can develop blind K, Pehamberger H (1995) Epiluminescence microscopy: a use-
spots. arXiv preprint arXiv:2010.02178 ful tool for the diagnosis of pigmented skin lesions for formally
8. American Society of Clinical Oncology (2021). Skin Cancer trained dermatologists. Arch Dermatol 131(3):286–291
(Non-Melanoma). http://www.cancer.net/cancer-types/skin-can 26. Bronkhorst IH, Jager MJ (2013) Eye 27(2):217–223
cer-non-melanoma. Accessed: 24 Dec 2021 27. Broomhead DS, Lowe D (1988) Radial basis functions, multi-
9. Arivuselvam B et al (2021) Skin cancer detection and classifi- variable functional interpolation and adaptive networks. Tech.
cation using svm classifier. Turk J Comput Math Edu(- rep, royal signals and radar establishment malvern (United
TURCOMAT) 12(13):1863–1871 Kingdom)
10. Astorino A, Fuduli A, Veltri P, Vocaturo E (2020) Melanoma 28. Brownlee J (2019) A gentle introduction to the rectified linear
detection by means of multiple instance learning. Interdiscip Sci unit (ReLU). Mach Learn Mastery 6
Comput Life Sci 12(1):24–31 29. Cancer.Net (2021) Melanoma: Statistics. https://www.cancer.
net/cancer-types/melanoma/statistics. Accessed 24 Dec 2021

123
Neural Computing and Applications (2023) 35:815–853 851

30. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M staging of adults with cutaneous melanoma. Cochrane Database
(2021) Swin-unet: Unet-like pure transformer for medical image Syst Rev 7:CD012806
segmentation. arXiv preprint arXiv:2105.05537 48. Esteva A, Kuprel B, Thrun S (2015) Deep networks for early
31. Cao Z, Yang H, Zhao J, Guo S, Li L (2021) Attention fusion for stage skin disease and skin cancer classification. project report
one-stage multispectral pedestrian detection. Sensors 49. Fitzmaurice C, Abate D, Abbasi N, Abbastabar H, Abd-Allah F,
21(12):4184 Abdel-Rahman O, Abdelalim A, Abdoli A, Abdollahpour I,
32. Chan H-P, Lo S-CB, Sahiner B, Lam KL, Helvie MA (1995) Abdulle AS et al (2019) Global, regional, and national cancer
Computer-aided detection of mammographic microcalcifica- incidence, mortality, years of life lost, years lived with dis-
tions: pattern recognition with an artificial neural network. Med ability, and disability-adjusted life-years for 29 cancer groups,
Phys 22(10):1555–1567 1990 to 2017: a systematic analysis for the global burden of
33. Chen L.-C, Papandreou G, Schroff F, Adam H (2017) disease study. JAMA Oncol 5(12):1749–1768
Rethinking Atrous convolution for semantic image segmenta- 50. Fujiyoshi H, Hirakawa T, Yamashita T (2019) Deep learning-
tion. arXiv preprint arXiv:1706.05587 based image recognition for autonomous driving. IATSS Res
34. Chollet F (2017) Xception: deep learning with depthwise sep- 43(4):244–252
arable convolutions. In: Proceedings of the IEEE conference on 51. Georgevici A.I, Terblanche M (2019) Neural networks and deep
computer vision and pattern recognition, pp 1251–1258 learning: a brief introduction
35. Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep 52. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley
neural networks for image classification. In: 2012 IEEE con- D, Ozair S, Courville A, Bengio Y (2014) Generative adver-
ference on computer vision and pattern recognition. IEEE, sarial nets. Adv Neural Inf Process Syst 27
pp 3642–3649 53. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T,
36. Codella N, Rotemberg V, Tschandl P, Celebi M. E, Dusza S, Wang X, Wang G, Cai J et al (2018) Recent advances in con-
Gutman D, Helba B, Kalloo A, Liopyris K, Marchetti M, et al. volutional neural networks. Pattern Recogn 77:354–377
(2019) Skin lesion analysis toward melanoma detection 2018: a 54. Gutman D, Codella N.C, Celebi E, Helba B, Marchetti M,
challenge hosted by the international skin imaging collaboration Mishra N, Halpern A (2016) Skin lesion analysis toward mel-
(isic). arXiv preprint arXiv:1902.03368 anoma detection: a challenge at the international symposium on
37. Codella NC, Gutman D, Celebi M.E, Helba B, Marchetti MA, biomedical imaging (ISBI) 2016, hosted by the international
Dusza SW, Kalloo A, Liopyris K, Mishra N, Kittler H, et al skin imaging collaboration (ISIC). arXiv preprint arXiv:1605.
(2018) Skin lesion analysis toward melanoma detection: a 01397
challenge at the 2017 international symposium on biomedical 55. Hanley JA, McNeil BJ (1982) The meaning and use of the area
imaging (ISBI), hosted by the international skin imaging col- under a receiver operating characteristic (ROC) curve. Radiol-
laboration (ISIC). In: 2018 IEEE 15th international symposium ogy 143(1):29–36
on biomedical imaging (ISBI 2018). IEEE, pp 168–172 56. Hasan MK, Elahi MTE, Alam MA, Jawad MT, Martı́ R (2022)
38. Combalia M, Codella N.C, Rotemberg V, Helba B, Vilaplana V, DermoExpert: Skin lesion classification using a hybrid convo-
Reiter O, Carrera C, Barreiro A, Halpern A.C., Puig S, et al. lutional neural network through segmentation, transfer learning,
(2019) Bcn20000: Dermoscopic lesions in the wild. arXiv pre- and augmentation. Inf Med Unlocked, p 100819
print arXiv:1908.02288 57. Hay R, Bendeck S. E, Chen S, Estrada R, Haddix A, McLeod T,
39. Coşkun M, Uçar A, Yildirim Ö, Demir Y (2017) Face recog- Mahé A (2006) Skin diseases. Disease control priorities in
nition based on convolutional neural network. In: 2017 inter- developing countries. 2nd edn
national conference on modern electrical and energy systems 58. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
(MEES). IEEE, pp 376–379 image recognition. In: Proceedings of the IEEE conference on
40. Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y computer vision and pattern recognition, pp 770–778
(2016) Binarized neural networks: Training deep neural net- 59. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algo-
works with weights and activations constrained to? 1 or-1. rithm for deep belief nets. Neural Comput 18(7):1527–1554
arXiv preprint arXiv:1602.02830 60. Hochreiter S, Schmidhuber J (1997) Long short-term memory.
41. Danshina SD, Markov A, Huldani A (2020) Causes, symptoms, Neural Comput 9(8):1735–1780
diagnosis and treatment of melanoma. Int J Pharm Res 61. Howard A, Sandler M, Chu G, Chen L.-C, Chen B, Tan M,
12(3):903 Wang W, Zhu Y, Pang R, Vasudevan V, et al. (2019) Searching
42. Dauphin Y.N, De Vries H, Bengio Y (2015) Equilibrated for mobilenetv3. In: Proceedings of the IEEE/CVF international
adaptive learning rates for non-convex optimization. arXiv conference on computer vision, pp 1314–1324
preprint arXiv:1502.04390 62. Howard A.G, Zhu M, Chen B, Kalenichenko D, Wang W,
43. Davis LE, Shalin SC, Tackett AJ (2019) Current state of mel- Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient
anoma diagnosis and treatment. Cancer Biol Therapy convolutional neural networks for mobile vision applications.
20(11):1366–1379 arXiv preprint arXiv:1704.04861
44. DeSantis CE, Miller KD, Dale W, Mohile SG, Cohen HJ, Leach 63. Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017)
CR, Sauer AG, Jemal A, Siegel RL (2019) Cancer statistics for Densely connected convolutional networks. In: Proceedings of
adults aged 85 years and older. CA: A Cancer J Clin the IEEE conference on computer vision and pattern recogni-
69(6):452–467 tion, pp 4700–4708
45. DeVita VT, Lawrence TS, Rosenberg SA (2015) Cancer of the 64. Ibraheem N. A, Khan R.Z, Hasan M.M. (2013) Comparative
skin: cancer: principles & practice of oncology. Lippincott study of skin color based segmentation techniques. Int J Appl Inf
Williams & Wilkins, Philadelphia Syst, 5(10)
46. Dice LR (1945) Measures of the amount of ecologic association 65. İlkin S, Gençtürk TH, Gülağız FK, Özcan H, Altuncu MA,
between species. Ecology 26(3):297–302 Şahin S (2021) hybSVM: Bacterial colony optimization algo-
47. Dinnes J, di Ruffano LF, Takwoingi Y, Cheung ST, Nathan P, rithm based SVM for malignant melanoma detection. Eng Sci
Matin RN, Chuchu N, Chan SA, Durack A, Bayliss SE et al Technol Int J 24:1059–1071
(2019) Ultrasound, CT, MRI, or PET-CT for staging and re- 66. Jaccard P (1912) The distribution of the flora in the alpine zone.
1. New Phytol 11(2):37–50

123
852 Neural Computing and Applications (2023) 35:815–853

67. Junayed M.S, Anjum N, Noman A, Islam B (2021) A deep CNN 2018 international interdisciplinary PhD workshop (IIPhDW),
model for skin cancer detection and classification pp 117–122. IEEE
68. Kaganami H.G, Beiji Z (2009) Region-based segmentation 87. Milletari F, Navab N, Ahmadi S.-A (2016) V-net: Fully con-
versus edge detection. In: 2009 fifth international conference on volutional neural networks for volumetric medical image seg-
intelligent information hiding and multimedia signal processing. mentation. In: 2016 fourth international conference on 3D vision
IEEE, pp 1217–1221 (3DV). IEEE, pp 565–571
69. Kaur D, Kaur Y (2014) Various image segmentation techniques: 88. Mohamad IB, Usman D (2013) Standardization and its effects
a review. Int J Comput Sci Mob Comput 3(5):809–814 on k-means clustering algorithm. Res J Appl Sci Eng Technol
70. Kennedy C, Bajdik C, Willemze R, Bouwes Bavinck J (2005) 6(17):3299–3303
Chemical exposures other than arsenic are probably not 89. Mukkamala M.C, Hein M(2017) Variants of rmsprop and ada-
important risk factors for squamous cell carcinoma, basal cell grad with logarithmic regret bounds. In: International confer-
carcinoma and malignant melanoma of the skin. Br J Dermatol ence on machine learning. PMLR, pp. 2545–2553
152(1):194–197 90. Murugan A, Nair SAH, Kumar KS (2019) Detection of skin
71. Khan MQ, Hussain A, Rehman SU, Khan U, Maqsood M, cancer using SVM, random forest and kNN classifiers. J Med
Mehmood K, Khan MA (2019) Classification of melanoma and Syst 43(8):1–9
nevus in digital images for diagnosis of skin cancer. IEEE 91. Nabi J (2018) Machine learning-text processing. Towards Data
Access 7:90132–90144 Sci Retrieved 8(23):2019
72. Kingma D. P, Ba J (2014) Adam: A method for stochastic 92. Negin BP, Riedel E, Oliveria SA, Berwick M, Coit DG, Brady
optimization. arXiv preprint arXiv:1412.6980 MS (2003) Symptoms and signs of primary melanoma: impor-
73. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classi- tant indicators of Breslow depth. Cancer 98(2):344–348
fication with deep convolutional neural networks. Adv Neural 93. Oktay O, Schlemper J, Folgoc L.L, Lee M, Heinrich M, Misawa
Inf Process Syst 25:1097–1105 K, Mori K, McDonagh S, Hammerla N.Y, Kainz B, et al. (2018)
74. Leachman SA, Carucci J, Kohlmann W, Banks KC, Asgari MM, Attention u-net: learning where to look for the pancreas. arXiv
Bergman W, Bianchi-Scarrà G, Brentnall T, Bressac-de Pail- preprint arXiv:1804.03999
lerets B, Bruno W et al (2009) Selection criteria for genetic 94. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE
assessment of patients with familial melanoma. J Am Acad Trans Knowl Data Eng 22(10):1345–1359
Dermatol 61(4):677-e1 95. Parkin D, Mesher D, Sasieni P (2011) cancers attributable to
75. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature solar (ultraviolet) radiation exposure in the UK in 2010. British J
521(7553):436–444 Cancer 105(2):66–69
76. Lee S, Kim H, Lieu QX, Lee J (2020) CNN-based image 96. Pectasides D, Pectasides M, Economopoulos T (2006) Merkel
recognition for topology optimization. Knowl-Based Syst cell cancer of the skin. Ann Oncol 17(10):1489–1495
198:105887 97. Powers D.M (2020) Evaluation: from precision, recall and
77. Lee TK (2001) Measuring border irregularity and shape of f-measure to roc, informedness, markedness and correlation.
cutaneous melanocytic lesions. Ph.D. thesis, Citeseer arXiv preprint arXiv:2010.16061
78. Li G, Jimenez G (2022) Optimal diagnosis of the skin cancer 98. Pugazhenthi V, Naik S, Joshi A, Manerkar S, Nagvekar V, Naik
using a hybrid deep neural network and grasshopper optimiza- K, Palekar C, Sagar K (2019) Skin disease detection and clas-
tion algorithm. Open Med 17(1):508–517 sification. Int J Adv Eng Res Sci 6(5):396–400
79. Loussaief S, Abdelkrim A (2018) Convolutional neural network 99. Ramachandran P, Zoph B, Le Q.V (2017) Searching for acti-
hyper-parameters optimization based on genetic algorithms. Int vation functions. arXiv preprint arXiv:1710.05941
J Adv Comput Sci Appl 9(10):252–266 100. Revathi V, Chithra A (2015) A review on segmentation tech-
80. Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) niques in skin lesion images. Int Res J Eng Technol
Transfer learning using computational intelligence: a survey. 2(9):2598–2603
Knowl-Based Syst 80:14–23 101. Rhodes AR (1995) Public education and cancer of the skin. what
81. Madan V, Lear JT, Szeimies R-M (2010) Non-melanoma skin do people need to know about melanoma and nonmelanoma skin
cancer. The Lancet 375(9715):673–685 cancer? Cancer 75(S2):613–636
82. Maniraj S, Maran PS (2022) A hybrid deep learning approach 102. Rigel DS, Carucci JA (2000) Malignant melanoma prevention,
for skin cancer diagnosis using subband fusion of 3d wavelets. early detection., and treatment in the 21st century. CA Cancer J
J Supercomput 78:1–16 Clin 50(4):215–236
83. Mar V, Roberts H, Wolfe R, English DR, Kelly JW (2013) 103. Rogers HW, Weinstock MA, Feldman SR, Coldiron BM (2015)
Nodular melanoma: a distinct clinical entity and the largest Incidence estimate of nonmelanoma skin cancer (keratinocyte
contributor to melanoma deaths in Victoria, Australia. J Am carcinomas) in the us population, 2012. JAMA Dermatol
Acad Dermatol 68(4):568–575 151(10):1081–1086
84. Markovic S.N, Erickson L.A, Rao R.D, McWilliams R.R, 104. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional
Kottschade L.A, Creagan E.T, Weenig R.H, Hand J.L., Pit- networks for biomedical image segmentation. In: International
telkow M.R, Pockaj B.A, et al. (2007) Malignant melanoma in conference on medical image computing and computer-assisted
the 21st century, part 1: epidemiology, risk factors, screening, intervention. Springer, pp 234–241
prevention, and diagnosis. In: Mayo clinic proceedings, vol. 82, 105. Rotemberg V, Kurtansky N, Betz-Stablein B, Caffery L,
pp 364–380. Elsevier Chousakos E, Codella N, Combalia M, Dusza S, Guitera P,
85. McMahan H.B, Holt G, Sculley D, Young M, Ebner D, Grady J, Gutman D et al (2021) A patient-centric dataset of images and
Nie L, Phillips T, Davydov E, Golovin D, et al. (2013) Ad click metadata for identifying melanomas using clinical context.
prediction: a view from the trenches. In: Proceedings of the 19th Scientific data 8(1):1–8
ACM SIGKDD international conference on Knowledge dis- 106. Ruder S (2016) An overview of gradient descent optimization
covery and data mining, pp 1222–1230 algorithms. arXiv preprint arXiv:1609.04747
86. Mikołajczyk A, Grochowski M (2018) Data augmentation for 107. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning
improving deep learning in image classification problem. In: internal representations by error propagation. California Univ
San Diego La Jolla Inst for Cognitive Science, Tech. rep

123
Neural Computing and Applications (2023) 35:815–853 853

108. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L.-C (2018) 124. Theodoridis S (2015) Neural networks and deep learning. Mach
Mobilenetv2: Inverted residuals and linear bottlenecks. In: Learn, pp 875–936
Proceedings of the IEEE conference on computer vision and 125. Thomas L, Tranchand P, Berard F, Secchi T, Colin C, Moulin G
pattern recognition, pp 4510–4520 (1998) Semiological value of ABCDE criteria in the diagnosis
109. Sarkar D, Bali R, Ghosh T (2018) Hands-On Transfer Learning of cutaneous pigmented tumors. Dermatology 197(1):11–17
with Python: Implement advanced deep learning and neural 126. Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of
network models using TensorFlow and Keras. Packt Publishing research on machine learning applications and trends: algo-
Ltd, Birmingham rithms, methods, and techniques. IGI global, pp 242–264
110. Scherer D, Müller A, Behnke S (2010) Evaluation of pooling 127. Tschandl P, Rosendahl C, Kittler H (2018) The HAM10000
operations in convolutional architectures for object recognition. dataset, a large collection of multi-source dermatoscopic images
In: International conference on artificial neural networks. of common pigmented skin lesions. Sci Data 5:180161
Springer, pp 92–101 128. Vani R, Kavitha J, Subitha D (2021) Novel approach for mel-
111. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural anoma detection through iterative deep vector network.
networks. IEEE Trans Signal Process 45(11):2673–2681 J Ambient Intell Hum Comput, pp 1–10
112. Seth R, Messersmith H, Kaur V, Kirkwood JM, Kudchadkar R, 129. Verma S (2018) Understanding different loss functions for
McQuade JL, Provenzano A, Swami U, Weber J, Alluri KC et al neural networks. Towards Data Sci
(2020) Systemic therapy for melanoma: Asco guideline. J Clin 130. Wang Y-H (2010) Tutorial: image segmentation. National Tai-
Oncol 38(33):3947–3970 wan University, Taipei, pp 1–36
113. Shan P, Wang Y, Fu C, Song W, Chen J (2020) Automatic skin 131. Ward R, Wu X, Bottou L (2019) Adagrad stepsizes: Sharp
lesion segmentation based on FC-DPN. Comput Biol Med convergence over nonconvex landscapes. In: International con-
123:103762 ference on machine learning. PMLR, pp 6677–6686
114. Siegel RL, Miller KD, Fuchs HE, Jemal A (2021) Cancer 132. Weinberg RA (1996) How cancer arises. Sci Am 275(3):62–70
statistics, 2021. CA Cancer J Clin 71(1):7–33 133. Wysocki AB (1999) Skin anatomy, physiology, and patho-
115. Siegel RL, Miller KD, Goding Sauer A, Fedewa SA, Butterly physiology. Nurs Clin North Am 34(4):777–97
LF, Anderson JC, Cercek A, Smith RA, Jemal A (2020) 134. Xiang T, Wang J, Liao X (2007) An improved particle swarm
Colorectal cancer statistics, 2020. CA Cancer J Clin optimizer with momentum. In: 2007 IEEE congress on evolu-
70(3):145–164 tionary computation. IEEE, pp 3341–3345
116. Sikka K, Sinha N, Singh PK, Mishra AK (2009) A fully auto- 135. Xue J, Shen B (2020) A novel swarm intelligence optimization
mated algorithm under modified fcm framework for improved approach: sparrow search algorithm. Syst Sci Control Eng
brain mr image segmentation. Magn Reson Imaging 8(1):22–34
27(7):994–1004 136. Zeiler M.D (2012) Adadelta: an adaptive learning rate method.
117. Silveira M, Nascimento JC, Marques JS, Marçal AR, Mendonça arXiv preprint arXiv:1212.5701
T, Yamauchi S, Maeda J, Rozeira J (2009) Comparison of 137. Zhang Y, Wallace B (2015) A sensitivity analysis of (and
segmentation methods for melanoma diagnosis in dermoscopy practitioners’ guide to) convolutional neural networks for sen-
images. IEEE J Sel Top Signal Process 3(1):35–45 tence classification. arXiv preprint arXiv:1510.03820
118. Simonyan K, Zisserman A (2014) Very deep convolutional 138. Zhang YJ (1996) A survey on evaluation methods for image
networks for large-scale image recognition. arXiv preprint segmentation. Pattern Recogn 29(8):1335–1346
arXiv:1409.1556 139. Zhao W, Zhang Z, Wang L (2020) Manta ray foraging opti-
119. Singh KK, Singh A (2010) A study of image segmentation mization: an effective bio-inspired optimizer for engineering
algorithms for different types of images. Int J Comput Sci Issues applications. Eng Appl Artif Intell 87:103300
7(5):414 140. Zhou Z, Siddiquee M.M.R, Tajbakhsh N, Liang J (2018)
120. Taha AA, Hanbury A (2015) Metrics for evaluating 3d medical Unet??: A nested u-net architecture for medical image seg-
image segmentation: analysis, selection, and tool. BMC Med mentation. In: Deep learning in medical image analysis and
Imaging 15(1):1–28 multimodal learning for clinical decision support, pp 3–11.
121. Takarli F, Aghagolzadeh A, Seyedarabi H (2016) Combination Springer
of high-level features with low-level features for detection of 141. Zoph B, Vasudevan V, Shlens J, Le Q.V (2018) Learning
pedestrian. Signal Image Video Process 10(1):93–101 transferable architectures for scalable image recognition. In:
122. Tariman J.D, Berry D.L, Cochrane B, Doorenbos A, Schepp K Proceedings of the IEEE conference on computer vision and
(2012) Physician, patient and contextual factors affecting pattern recognition, pp. 8697–8710
treatment decisions in older adults with cancer: a literature
review. In: Oncology nursing forum, vol 39 (p E70). NIH Public Publisher’s Note Springer Nature remains neutral with regard to
Access jurisdictional claims in published maps and institutional affiliations.
123. Tato A, Nkambou R (2018) Improving Adam optimizer

123

You might also like