A Skin Disease Classification Model Based On DenseNet and ConvNeXt Fusion
A Skin Disease Classification Model Based On DenseNet and ConvNeXt Fusion
Article
A Skin Disease Classification Model Based on DenseNet and
ConvNeXt Fusion
Mingjun Wei 1 , Qiwei Wu 1 , Hongyu Ji 2 , Jingkun Wang 3 , Tao Lyu 4 , Jinyun Liu 1, * and Li Zhao 3, *
1 College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China
2 School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
3 Beijing National Research Center for Information Science and Technology, Institute for Precision Medicine,
Tsinghua University, Beijing 100084, China
4 Department of Obstetrics and Gynecology, Beijing Tsinghua Changgung Hospital, Beijing 102218, China
* Correspondence: [email protected] (J.L.); [email protected] (L.Z.)
Abstract: Skin disease is one of the most common diseases. Due to the intricate categories of skin
diseases, their symptoms being very similar in the early stage, and the lesion samples being extremely
unbalanced, their classification is challenging. At the same time, under the conditions of limited data,
the generalization ability of a single reliable convolutional neural network model is weak, the feature
extraction ability is insufficient, and the classification accuracy is low. Therefore, in this paper, we
proposed a convolutional neural network model for skin disease classification based on model fusion.
Through model fusion, deep and shallow feature fusion, and the introduction of an attention module,
the feature extraction capacity of the model was strengthened. In addition, a series of works such
as model pre-training, data augmentation, and parameter fine-tuning were conducted to upgrade
the classification performance of the model. The experimental results showed that when working
on our private dataset dominated by acne-like skin diseases, our proposed model outperformed the
two baseline models of DenseNet201 and ConvNeXt_L by 4.42% and 3.66%, respectively. On the
public HAM10000 dataset, the accuracy and f1-score of the proposed model were 95.29% and 89.99%,
respectively, which also achieved good results compared with other state-of-the-art models.
Keywords: attention module; classification; feature fusion; model fusion; skin disease
Citation: Wei, M.; Wu, Q.; Ji, H.;
Wang, J.; Lyu, T.; Liu, J.; Zhao, L. A
Skin Disease Classification Model
Based on DenseNet and ConvNeXt
Fusion. Electronics 2023, 12, 438. 1. Introduction
https://doi.org/10.3390/ Skin disease is a severe global public health problem that affects a large number of
electronics12020438 people [1]. The symptoms of skin diseases are diverse, and the changing of the symptoms is
Academic Editor: Maria
a long-term process. It is difficult for ordinary people to determine the type of skin disease
Evelina Fantacci with the naked eye, and most people often neglect the changes in their skin symptoms,
which can lead to severe consequences such as permanent skin damage and even the risk
Received: 1 December 2022 of skin cancer [2]. In addition, the early treatment of skin cancer can decrease morbidity
Revised: 4 January 2023
and mortality [3].
Accepted: 12 January 2023
In addition, due to the rapid development of deep learning technology, it has rapidly
Published: 14 January 2023
become the preferred method for medical image analysis [4,5]. In addition, compared with
traditional classification methods, deep learning has a stronger robustness and a better
generalization ability [6]. In the meantime, convolutional neural networks are one of the
Copyright: © 2023 by the authors.
most well-known and representative deep learning models [7,8]. It has been widely used in
Licensee MDPI, Basel, Switzerland. many aspects of medical image analysis [9,10], and great progress has been made in medical
This article is an open access article image classification. For example, Datta. et al. [11] combined soft-attention and Inception
distributed under the terms and ResNet-V2 [12] (IRv2) to construct an IRV2-SA model for dermoscopic image classification.
conditions of the Creative Commons This combination improved the sensitivity score compared to the baseline model, reaching
Attribution (CC BY) license (https:// 91.6% on the ISIC2017 [13] dataset. Apart from that, its accuracy on the HAM10000 [14]
creativecommons.org/licenses/by/ dataset was 93.7%, which was 4.7% higher than the baseline model. Lan. et al. [15] proposed
4.0/). a capsule network method called FixCaps. It is an improved convolutional neural network
model based on CapsNets [16] with a larger receptive domain. It works by applying a high-
performance large kernel with a kernel size of up to 31 × 31 at the bottom convolutional
layer. At the same time, an attention mechanism was introduced to reduce the loss of
spatial information caused by convolution and pooling, and it achieved an accuracy of
96.49% and an f1-score of 86.36% on the HAM10000 dataset.
The IRV2-SA model and FixCaps model perform well in terms of classification accuracy.
However, they are not impeccable in terms of other classification performance evaluation
criteria, and the classification performance is not satisfactory in classifications with a
restricted individual sample data. Enhancing their classification accuracy is problematic
because of the restricted available image data of skin diseases and the extreme imbalance of
lesion samples. In addition, the categories of skin diseases are elaborate, and the symptoms
are very analogous in the early stages, which causes the model classification to be more
problematic. At the same time, the generalization ability of a single reliable network model
qualified with restricted data is weak, and the feature extraction ability is insufficient.
Attaining a high classification accuracy is still challenging. The common research strategy
to solve the problem of small data samples and class imbalance is data augmentation or
enhancing the feature extraction ability of the model.
All in all, the main contributions of this paper can be summarized in the follow-
ing points:
1. In this work, a convolutional neural network (CNN) model based on model fusion
was proposed for skin disease classification. DenseNet201 [17] and ConvNeXt_L [18]
were selected as the backbone sub-classification models for the model fusion.
2. To enhance the feature extraction ability of the proposed network model, the
Efficient Channel Attention [19] module and the Gated Channel Transforma-
tion [20] attention module were introduced into the core blocks of DenseNet201
and ConvNeXt_L, respectively.
3. A parallel strategy was applied to fuse the features of the deep and shallow layers to
further enhance the feature-extraction ability of the model.
4. The classification performance of the model was improved through a series of works
such as model pre-training, data augmentation, and parameter fine-tuning.
5. Extensive experiments were conducted to compare the proposed model with the basic
CNN models commonly used in recent years to ensure the validity of this work. The
experiments were carried out by the proposed network model on a private dataset
dominated by acne-like skin diseases, and training and testing were conducted on
the public HAM10000 [14] (Human-Against-Machine with 10000 training images)
dataset with an extreme imbalance in skin diseases, and the proposed model was
compared with other state-of-the-art models on the HAM10000 dataset. This verified
the generalization capacity and the accuracy of the proposed network model.
2. Related Work
CNN models have been widely explored for skin disease classification, and some
of these models have achieved very good classification performances. Below, we sum-
marized the relevant published work of some researchers in the field of skin disease
image classification.
Many researchers have proposed reliable multi-class CNN models. Mobiny et al. [21]
proposed an approximate risk-aware deep Bayesian model named Bayesian DenseNet-169,
which outputs an estimate of model uncertainty without additional parameters or signifi-
cant changes to the network architecture. It increased the classification accuracy of the base
DenseNet169 [17] model from 81.35% to 83.59% on the HAM10000 dataset. Wang et al. [22]
propose an interpretability-based CNN model. It is a multi-class classification model
that takes skin lesion images and patient metadata as the input for skin lesion diagnosis.
It achieved a 95.1% and 83.5% accuracy and sensitivity, respectively, on the HAM10000
dataset. Allugunti et al. [23] created a multi-class CNN model for diagnosing skin cancer.
The proposed model makes a distinction between lesion maligna, superficial spreading, and
Electronics 2023, 12, 438 3 of 19
nodular melanoma. This permits the early diagnosis of the virus and the quick isolation and
therapy necessary to stop the further transmission of infection. Anand et al. [24] modified
the Xception [25] model by adding layers such as a pooling layer, two dense layers, and a
dropout layer. A new fully connected (FC) layer changed the original FC layer with seven
skin disease classes. It had a classification accuracy of 96.40% on the HAM10000 dataset.
Improving the classification accuracy of the model by using ensemble learning is also
an effective method. Thurnhofer-Hemsi et al. [26] proposed an ensemble composed of
improved CNNs combined with a regularly spaced test-time-shifting technique for skin
lesion classification. It builds up multiple test input images via a shift technique and
passes it to each classifier passed to the ensemble and then combines all the outputs for
classification. It had a classification accuracy of 83.6% on the HAM10000 dataset.
Through the introduction of an attention module, the feature extraction ability of a
model can be enhanced, thereby improving the classification performance of the model.
Karthik et al. [27] replaced the standard Squeeze-and-Excite [28] block in the Efficient-
NetV2 [29] model with an Efficient Channel Attention [19] block, and the total number
of training parameters dropped significantly. The test accuracy of the model reached
84.70% in four types of skin disease datasets including acne, actinic keratosis, melanoma
and psoriasis.
Through image processing techniques such as image conversion, equalization, en-
hancement and segmentation, the accuracy of image classification can be enhanced.
Abayomi-Alli et al. [30] propose an improved data augmentation model for the effective
detection of melanoma skin cancer. The method was based on oversampling data em-
bedded in a nonlinear low-dimensional manifold to create synthetic melanoma images.
It achieved a 92.18%, 80.77%, 95.1% and 80.84% accuracy, sensitivity, specificity and
f1-score, respectively, on the PH2 [31] dataset. Hoang et al. [32] proposed a novel method
using a new segmentation approach and wide-ShuffleNet for skin lesion classification. It
first separates the lesion from the background by computing an entropy-based weighted
sum first-order cumulative moment (EW-FCM) of the skin image. The segmentation
results are then input into a new deep learning structure, wide-ShuffleNet, and classified.
It achieved a 96.03%, 70.71%, 75.15%, 72.61% and 84.80% specificity, sensitivity, preci-
sion, f1-score and accuracy, respectively, on the HAM10000 dataset. Malibari et al. [33]
proposed an Optimal Deep-Neural-Network-Driven Computer-Aided Diagnosis Model
for their skin cancer detection and classification model. The model primarily applies a
Wiener-filtering-based pre-processing step followed by a U-Net segmentation approach.
The model achieved a maximum accuracy of 99.90%. Nawaz et al. [34] proposed an im-
proved Deep-Learning-based method, namely, the DenseNet77-based UNET model. Their
experiments demonstrated the robustness of the model and its ability to accurately identify
skin lesions of different colors and sizes. It obtained a 99.21% and 99.51% accuracy on the
ISIC2017 [13] and ISIC2018 [35] datasets, respectively.
Therefore, by summarizing the related work published by these researchers in the
field of skin disease image classification, we proposed a CNN model for skin disease
classification based on model fusion. In addition, through a series of work such as model
fusion, deep and shallow feature fusion, the introduction of an attention module, model
pre-training, data augmentation and parameter fine-tuning, the classification performance
of the proposed model was enhanced.
3. Method
First, we trained and tested the classification performance of basic CNN models
(including ResNet50 [36], EfficientNet_B4 [37], DenseNet201 [17] and ConvNeXt_L [18])
that have been commonly used in recent years on our private dataset dominated by acne-
like skin diseases. This was a typical dataset with a small amount of sample data and
extremely unbalanced categories. Then, after the research, it was found that the two
CNN models DenseNet201 and ConvNeXt_L achieved a good classification performance,
and their accuracy rates were 92.12% and 92.88%, respectively, which were the top two
Electronics 2023, 12, 438 4 of 19
best-performing models. Multi-model fusion can be configured with any number of sub-
classification CNN models at the same time. However, the more sub-classifiers there are,
the less computationally efficient the model is, and it is important to strike a balance [38].
Therefore, we chose DenseNet201 and ConvNeXt_L as the backbone sub-classification
models of our model fusion.
xl = Hl ([ x0 , x1 , . . . , xl −1 ]) (1)
the number of base channels from 64 to 96. Then, an inverted bottleneck structure was
Electronics 2023, 12, x FOR PEER REVIEW
adopted, while the rectified linear unit (ReLU) and batch normalization (BN) were replaced
by a Gaussian error linear unit [41] (GELU) and layer normalization [42] (LN). Finally, the
convolution kernel was enlarged to 7 × 7.
Input
BN
Improved
layer
ReLU
1 x 1 Conv
BN Improved
layer
ReLU
3 x 3 Conv
AdaptiveAvgPool Improved
layer
ECA Block Conv1d
Sigmoid
Improved
layer
Output
Input
7 x 7 Depthwise Conv, 96
LN
1 x 1 Conv, 384
GELU
1 x 1 Conv, 96
L2-norm
CN
GCT Block β
γ
tanh
Output
Figure2.2.The
Figure Thestructure of the
structure improved
of the ConvNeXt
improved block.block.
ConvNeXt
3.3. Macro Design
3.3. Macro Design
Different sub-models have different expressive abilities, and by combining the parts
Different
they are good at,sub-models
a model thathave different
is “accurate” in expressive
all aspects isabilities,
obtained.and by combining
Therefore, we fusedthe part
they are good at, a model that is “accurate” in all aspects
the two improved sub-models to form the backbone of our classification model. is obtained. Therefore, we fused
the two improved
The features sub-models
extracted to form network
by the shallow the backbone of our classification
were relatively model.
close to the input and
contained
The more pixelextracted
features information,bythat
the is, fine-grained
shallow networkinformation such as theclose
were relatively color,totexture,
the input and
edges and corners
contained of theinformation,
more pixel image. The receptive field of the shallow
that is, fine-grained network
information wasas
such smaller,
the color, tex
and the overlapping area of the receptive field was also smaller, so
ture, edges and corners of the image. The receptive field of the shallow network wa the shallow network
could capture more details. However, the semantics were lower due to less convolution
smaller, and the overlapping area of the receptive field was also smaller, so the shallow
going through. The features extracted by the deep network were closer to the output and
network could
contained captureinformation,
more abstract more details. thatHowever, the semantics
is, coarse-grained informationwere lower
such due to less con
as semantic
volution going
information. through.
However, the The features
resolution wasextracted by the
low, and the deep network
perception of detailswere closer to th
was poor.
output and
Therefore, contained
combining the more abstractof
characteristics information, that is,
the two, a parallel coarse-grained
strategy was adopted information
to fuse such
as semantic
the information.
deep and shallow However,
features. It can bethe resolution
represented by was low, (2),
Function andwhere
the perception
x representsof detail
the
wasinput,
poor.Conv represents
Therefore, the 2 × 2 the
combining convolution operation
characteristics of with stridea2,parallel
the two, and Dropoutstrategy wa
represents the operation of randomly ignoring some features,
adopted to fuse the deep and shallow features. It can be represented which could significantly
by Function (2)
reduce
wherethe overfittingthe
x represents phenomenon
input, Conv [43].
represents the 2 × 2 convolution operation with strid
2, and Dropout represents the
G (operation
x ) = Conv(of randomly
Dropout ( x )) ignoring some features, which
(2) could
significantly reduce the overfitting phenomenon [43].
The complete structure of our proposed model is shown in Figure 3. For the improved
DenseNet model, the features output 𝐺(𝑥) = 𝐶𝑜𝑛𝑣(𝐷𝑟𝑜𝑝𝑜𝑢𝑡(𝑥))
from the second block are first passed through (2
the (2)The
operation and then
complete addedof
structure and fused
our with themodel
proposed featuresisoutput
shown from
in the third3.block.
Figure For the im
The fused features are again subjected to the (2) operation, and they are then added and
proved DenseNet model, the features output from the second block are first passed
fused with the features output by the fourth block to serve as the final output features.
through the (2) operation and then added and fused with the features output from th
The extracted features are first adaptively average-pooled, and then the multi-dimensional
third block.
features The fused features by
are one-dimensionalized arethe
again subjected
flattening to For
layer. the the
(2) operation, and they are then
improved ConvNeXt
added and fused with the features output by the fourth block to serve as
model, the features output by the third stage are first subjected to the (2) operation,the and
final outpu
features. The extracted features are first adaptively average-pooled, and then the multi
dimensional features are one-dimensionalized by the flattening layer. For the improved
ConvNeXt model, the features output by the third stage are first subjected to the (2) oper
ation, and they are then added and fused with the features output by the fourth stage a
the final output features. The extracted features are adaptively average-pooled. Finally
Electronics 2023, 12, 438 7 of 19
they are then added and fused with the features output by the fourth stage as the final
output features. The extracted features are adaptively average-pooled. Finally, the features
the features output by the two improved sub-models are concatenated for classification.
output by the two improved sub-models are concatenated for classification. In addition. all
In addition. all the models were pre-trained on ImageNet [44], where the weight files were
the models were pre-trained on ImageNet [44], where the weight files were either obtained
either obtained from Torchvision or Github. In order to match our proposed model, we
from Torchvision or Github. In order to match our proposed model, we replaced and
replaced and deleted some keys in the weight file.
deleted some keys in the weight file.
Input
7 x 7 Conv
BN+ReLU 4 x 4 Conv
MaxPool
6 x Improved 3 x Improved
DenseNet ConvNeXt
Improved DenseNet Block 1 Layer Block Improved ConvNeXt Stage 1
BN+ReLU
1 x 1 Conv Layer Norm
Transition 1 Downsample
AvgPool 2 x 2 Conv
12 x Improved 3 x Improved
DenseNet ConvNeXt
Improved DenseNet Block 2 Layer Block Improved ConvNeXt Stage 2
BN+ReLU
Dropout 1 x 1 Conv Layer Norm
Transition 2 Downsample
2 x 2 Conv AvgPool 2 x 2 Conv
48 x Improved 27 x Improved
DenseNet ConvNeXt
Improved DenseNet Block 3 Layer Block Improved ConvNeXt Stage 3
BN+ReLU
Dropout 1 x 1 Conv Layer Norm Dropout
Transition 3 Downsample
2 x 2 Conv AvgPool 2 x 2 Conv 2 x 2 Conv
12 x Improved 3 x Improved
DenseNet ConvNeXt
Improved DenseNet Block 4 Layer Block Improved ConvNeXt Stage 4
AdaptiveAvgPool
AdaptiveAvgPool
Layer Norm
Flatten C Flatten
Linear
Output
Figure
Figure 3.
3. The
The full
full structure
structure of
of the
the proposed model.
proposed model.
4. Experient and Results
Electronics 2023, 12, x FOR PEER REVIEW
4.1. Datasets 8 of 19
Electronics 2023, 12, 438 8 of 19
The first experimental dataset in this paper was provided by Peking Union Medical
College Hospital, and all participants provided informed consent. This dataset had a total
4. Experient
Experient and Results
of4.2600 images, and Results 1600 images of acne skin diseases, 400 images of melasma skin
including
4.1. Datasets
4.1.
diseases,Datasets
300 images of rosacea skin diseases and 300 images of nevus of Ota skin diseases.
These TheThe
imagesfirst experimental
first and labels were
experimental dataset in
inthis
thispaper
rigorously
dataset was
wasprovided
reviewed
paper by
by multiple
provided byPeking Union
UnionMedical
experienced
Peking dermatolo-
Medical
College Hospital, and all participants provided informed
gists. Some of the sample images from the datasets are shown in Figure 4. We total
College Hospital, and all participants provided informed consent.
consent. This
This dataset
dataset had
hada atotal
randomly
of 2600
of 2600 images,
images, including
including 1600
1600 images
images of acne skin diseases, 400
400images ofofmelasma skin
divided the dataset into a training set ofandacne skinset
a test diseases,
according images
to a ratio melasma
of 8:2. The skin
fact that
diseases, 300
diseases, 300 images
images of
of rosacea
rosacea skin
skin diseases
diseases and
and 300
300 images
images ofofnevus
nevus ofofOta
Ota skin diseases.
skin diseases.
there
These were
imagesfar more
andlabels acnewere
labels skinrigorously
disease images reviewed than the otherexperienced
three classes led to an irreg-
These images and were rigorously reviewed bybymultiple
multiple dermatolo-
experienced dermatologists.
ular distribution
gists. Some of skin disease images and an unbalanced dataset. Therefore, we used
Some of the of the sample
sample imagesimages
from thefrom the datasets
datasets are shownare shown
in Figurein Figure 4. We randomly
4. We randomly divided
data
the augmentation
divided
datasetthe into
dataset to balance
into
a training a training theset
set and adata
andset
test so as set
a test to according
accordingimprove the
to a of
to a ratio classification
ratio
8:2.ofThe Theperformance
8:2. fact fact that
that there of
the model,
there were reduce
far more the overfitting
acne skin of
disease the data
images and
than make
the otherthe
were far more acne skin disease images than the other three classes led to an irregular model
three more
classes led stable
to an in the
irreg- learn-
ingular distribution
process
distribution [45].
of skinof skin
We disease
expanded
disease images
the
images and
training
and an
setunbalanced
an unbalanced eight times dataset.
dataset. Therefore,
byTherefore,
horizontal we used
weflipping,
used datavertical
data augmentation
flipping, increasing
augmentation to
thebalance
to balance the data
brightness,
the data assotoasimprove
socenter to improve
cropping, the
theCutout classification
[46], Cutmix
classification performance
performance of of
[47], Augmixthe [48]
andtheRandom
model,
model, reduce
reduce the the
Erasing overfitting
overfitting
[49], butofwe of the
the data
did data
and
not andmake
modifymakethe the
the model
model
test more
more
set. stable
stable
Figure inthe
thelearning
5inshows learn-
the number
ing process
process [45].[45].
We We expanded
expanded the
the trainingset
training seteight
eight times
times by by horizontal
horizontal flipping,
flipping, vertical
vertical
of images for each class of skin disease in the test set. Before training, we normalized the
flipping, increasing
flipping, increasing the the brightness,
brightness, center cropping, Cutout [46],
[46],Cutmix [47],
[47],Augmix [48]
pixel values of the input images tocenter a [0, 1]cropping,
range and Cutoutresized
and Random Erasing [49], but we did not modify the test set. Figure 5 shows the number
Cutmix
the images toAugmix
512 × 512[48]pixels.
and Random Erasing [49], but we did not modify the test set. Figure 5 shows the number
of images
of images for
for each
each class
class of
of skin
skin disease
diseasein inthe
thetest
testset.
set.Before
Beforetraining,
training,we
wenormalized
normalizedthe
the
pixel values of the input images to a [0, 1] range and resized the images to 512××
pixel values of the input images to a [0, 1] range and resized the images to 512 512 pixels.
512 pixels.
80
80
Acne
Acne
Melasma
Melasma
6060
320
320
Rosacea
Rosacea
Nevus of Ota
Nevus of Ota
60
60
Figure
Figure 5. The number
5. The number of
of images
images of
of each
eachclass
classof
ofskin
skindisease
diseaseininthe
thetest
testset.
set.
Figure 5. The number of images of each class of skin disease in the test set.
In addition, in order to verify the generalization ability of our proposed network model
In addition, in order to verify the generalization ability of our proposed network
andInmake the accuracy of the model more convincing, weability
conducted additional experi-
modeladdition,
and make in theorder to verify
accuracy the generalization
of the model of our
more convincing, we conducted proposed
additionalnetwork
ments on the public dataset HAM10000 [14] (Human-Against-Machine with 10000 training
model and make
experiments on thethe accuracy
public datasetofHAM10000
the model[14] more convincing, we conducted
(Human-Against-Machine additional
with 10000
images). It contains 10015 images of skin diseases that are divided amongst seven classes,
training
experiments images). It contains 10015 images of skin diseases that are divided
on the public dataset HAM10000 [14] (Human-Against-Machine with 10000 amongst seven
including three hundred and twenty-seven images of actinic keratosis and intraepithelial
classes,images).
training includingItthree hundred and twenty-seven images ofthat
actinic divided
keratosis amongst
and in- seven
carcinoma (AKIEC),contains 10015
five hundred images
and of skin
fourteen diseases
images of basal cellare
carcinoma (BCC), one
traepithelial carcinoma (AKIEC), five hundred and fourteen images of basal cell carci-
classes,
thousandincluding three hundred
and ninety-nine images ofand twenty-seven
benign keratosis-likeimages of actinic
lesions (BKL), keratosis
one hundred and and in-
noma (BCC), one thousand and ninety-nine images of benign keratosis-like lesions (BKL),
traepithelial
fifteen images carcinoma (AKIEC),(DF),
of dermatofibroma five one
hundred andone
thousand fourteen
hundred images of basal
and thirteen cell carci-
images
one hundred and fifteen images of dermatofibroma (DF), one thousand one hundred and
of melanoma
noma (BCC), (MEL),
one six thousand
thousand and seven hundred
ninety-nine images andoffive images
benign of melanocytic
keratosis-like nevi(BKL),
lesions
thirteen images of melanoma (MEL), six thousand seven hundred and five images of
(NV)
one and one
hundred hundred
and fifteenand forty-two
images images of vascular
of dermatofibroma skin
(DF), lesions
one (VASC).
thousand oneSo, it is a and
hundred
dataset images
thirteen with extremely imbalanced
of melanoma skin six
(MEL), disease classes.seven
thousand Some hundred
sample images fromimages
and five the of
Electronics 2023, 12, x FOR PEER REVIEW 9 of 19
melanocytic nevi (NV) and one hundred and forty-two images of vascular skin lesions
(VASC). So, it is a dataset with extremely imbalanced skin disease classes. Some sample
images fromdataset
HAM10000 the HAM10000
are shown dataset are shown
in Figure 6. Then, inwe
Figure 6. Then, the
normalized we dataset
normalized the da-
to a uniform
size to a ×
taset(300 uniform sizea(300
300). For fair ×comparison
300). For a fair
with comparison
the other with the other
models, models,the
we divided wedataset
dividedin
the dataset
two ways. In in two ways.
the first In the
way, 828first way,
skin 828 skin
disease disease
images images
were were randomly
randomly extracted extracted
as the test
as the
set, test was
which set, which
the same was asthe
thesame as division
dataset the dataset division
of the models of IRv2-RA
the models IRv2-RA
[11], FixCaps[11],
[15],
FixCaps
etc. In the[15], etc. In
second thewe
way, second way, we
randomly randomly
divided dividedset
the training the training
and setset
the test and the test set
according to a
according
ratio of 8:2.toThe
a ratio
test of
set8:2.
had The testskin
2000 set had 2000images,
disease skin disease
whichimages,
was the which
samewas thedataset
as the same
as the dataset
division of the division of the models [26],
models Shifted2-Nets Shifted2-Nets
etc. Table [26], etc. the
1 shows Table 1 shows
number the number
of images of
for each
images
class for each
of skin classinofthe
disease skin
testdisease
set for in
thethetwotest set for theIntwo
partitions. partitions.
addition, In addition,
in order to make in the
order to
model make
have the experimental
better model have better experimental
results, the trainingresults,
datasetthe
wastraining dataset
processed withwas
the pro-
same
cessed
data with the same
augmentation data augmentation
method method
as the first private as the first private dataset.
dataset.
Table 1. The number of images for each class of skin disease in the test set for the two partitions.
Table 1. The number of images for each class of skin disease in the test set for the two partitions.
Class
Class FirstWay
First Way SecondWay
Second Way
AKIEC
AKIEC 2323 6565
BCC
BCC 2626 103
103
BKL 66 219
BKL 66 219
DF 6 23
DF
MEL 634 23221
MEL
NV 34
663 221
1341
VASC
NV 66310 134128
VASC
Total 10
828 28
2000
Total 828 2000
4.2. Metrics
4.2. Metrics
In the conducted experiments, various metrics were used to evaluate the performance
of theInproposed
the conducted
modelsexperiments,
and compared various metrics
it to that were
of four used
basic to evaluate
models, namelytheResNet50,
perfor-
mance of the proposed models and compared it to that of four basic
EfficientNet_B4, DenseNet201 and ConvNeXt_L. We also compared the models proposed models, namely Res-
Net50, EfficientNet_B4, DenseNet201 and ConvNeXt_L. We also compared
by others. The preliminary metrics were accuracy, precision, recall and f1-score. To extend the models
proposed
our metrics bytoothers. The preliminary
multiclass metrics
classification, were accuracy,was
the macro-average precision, recall and f1-score.
also calculated.
To extend our metrics to multiclass classification, the macro-average
Accuracy is the most intuitive performance measure, and it is simply was alsoacalculated.
ratio of the
correctly predicted observations to total observations. The accuracy wasacalculated
Accuracy is the most intuitive performance measure, and it is simply ratio of theby
correctly
using predicted
(3) [50], whereobservations to totalrepresents
TP (true positives) observations. The accuracy
the correctly was positive
predicted calculated by
values,
which means that the value of the actual class is yes and the value of the predicted class
is also yes. TN (true negatives) represents the correctly predicted negative values, which
means that the value of the actual class is no and the value of the predicted class is also
Electronics 2023, 12, 438 10 of 19
no. FP (false positives) represents when the actual class is no and the predicted class is yes.
FN (false negatives) represents when the actual class is yes but the predicted class in no.
TP + TN
accuracy = (3)
TP + TN + FP + FN
Precision is the ratio of correctly predicted positive observations to the total predicted
positive observations. The precision was calculated using Equation (4) [51].
TP
precision = (4)
TP + FP
Recall is the proportion of actual positives that are identified correctly. The recall was
calculated using Equation (5) [51].
TP
recall = (5)
TP + FN
The f1-score takes into account both precision and recall. The f1-score was calculated
using Equation (6) [51].
2 × precision × recall
f 1 − score = (6)
precision + recall
The macro-average treats each class equally, with all classes having the same weight.
It is obtained by adding up the evaluation metrics (precision/recall/f1-score) of different
classes and calculating the average. For example, to calculate the macro-average of the
metric precision of k-class, its macro-average is calculated by using (7) [52].
4.3. Results
We conducted experiments on both our private dataset and the public dataset HAM10000.
The operating system of the experimental server was Ubuntu20.04, which was configured
with 1 AMD EPYC 7642 48-Core CPU and 8 NVIDIA RTX 3090 24GB GPUs.
ResNet50 88.58
EfficientNet_B4 91.54
DenseNet201 92.12
ConvNeXt_L 92.88
Ours 96.54
Figure7.7.The
Figure Thebest
besttest
testaccuracy
accuracy
of of each
each model
model on on
ourour private
private dataset.
dataset.
In
Inaddition,
addition,thetheconfusion
confusion matrix
matrixcorresponding
corresponding to the bestbest
to the accuracy
accuracyof each model
of each model
isisshown in Figure 8. Then, these confusion matrixes were used to calculate the
shown in Figure 8. Then, these confusion matrixes were used to calculate the precision precision
of each model separately for each class based on (4), as shown in Table 2. Meanwhile, the
of each model separately for each class based on (4), as shown in Table 2. Meanwhile, the
macro-average of precision was calculated based on (7). Similarly, the recall, f1-score and
macro-average of precision was calculated based on (7). Similarly, the recall, f1-score and
corresponding macro-average of each model for each class were calculated based on (5), (6)
corresponding macro-average of each model for each class were calculated based on (5),
and (7), respectively, as shown in Tables 3 and 4. It can be seen from Tables 2–4 that our
(6) and (7),
proposed respectively,
model was not onlyas shown in Tables
better than 3 and
the other four4.basic
It can be seen
models from of
in terms Tables 2–4 that
accuracy
our proposed model was not only better than the other four basic models
but also better than the other four basic models in terms of precision, recall and f1-score. in terms of ac-
curacy
At but also
the same time,better than the other
our proposed model four
notbasic
only models in terms
outperformed ofother
the precision,
models recall and f1-
in the
score. At the
categories withsame
moretime,
data our
but proposed model
also performed not only
better in theoutperformed
categories with the other
less data. models
For in
the categories
example, with more
our proposed modeldata but also performed
outperformed ResNet50 better
by 16.77in percentage
the categories
pointswith less data.
in terms
For
of example,
precision on our proposed
the images modelofoutperformed
of nevus ResNet50
Ota skin disease. by 16.77
Finally, these percentage
experimental points in
results
demonstrated that our
terms of precision on proposed
the imagesmodel had aofbetter
of nevus classification
Ota skin performance.
disease. Finally, these experimental
results demonstrated that our proposed model had a better classification performance.
Table 2. The precision of each model on our private dataset; the unit is %.
Model
4.3.2. The Second
Acne
Dataset
Melasma Rosacea Nevus of Ota Macro-Average
ResNet50 93.52 Comparing78.65 the models on the85.42public dataset HAM10000,
79.66 the experimental
84.31 environ-
EfficientNet_B4 ment
93.94 of this dataset
85.37 was basically the same
94.34 as the experimental
83.64 environment
89.32 of the first
DenseNet201 95.62 85.90 90.00 83.87 88.85
private dataset. Similarly, categorical cross-entropy was selected as the loss function, the
ConvNeXt_L 95.94
initial learning rate85.71was 0.01, the momentum
92.86 was 0.986.67 90.30 was 0.0001,
and the weight decay
Ours 98.12 90.70 96.55 96.43 95.45
but the batch size was 128. The MultiStepLR algorithm was used to dynamically adjust
the learning rate and reduce the learning rate in the eighth, fifteenth and twentieth epoch,
Table 3. The recall
respectively, and of the
eachgamma
model onwas
our 0.1.
private
Alldataset; the unit iswere
of the models %. trained for 40 epochs.
To begin with, based on the dataset divided in the first way (test set of 828 images),
Model Acne Melasma Rosacea Nevus of Ota Macro-Average
the accuracy of the four basic models, our proposed model and those proposed by others
ResNet50 94.69 87.50
are shown in Table 5. It can be seen68.33
from Table 5 that our 78.33
proposed model 82.21
was improved
EfficientNet_B4 96.88 87.50 83.33 76.67 86.10
DenseNet201 by 2.42 percentage
95.62 points and 1.57 percentage
83.75 90.00 points, respectively,
86.67 compared
89.01with the two
ConvNeXt_L DenseNet201 and
95.94 ConvNeXt_L baseline
90.00 86.67 models. Compared 86.67 with CNN [54], IM-CNN [22]
89.82
Ours 98.12
and IRv2-RA [11], 97.50
our proposed model 93.33outperformed them 90.00 in terms of accuracy
94.74 by 9.31%,
1.89% and 0.19%, respectively, but compared with FixCaps [15], our proposed model was
1.2% lower in terms of accuracy.
Electronics 2023,12,
Electronics2023, 12,438
x FOR PEER REVIEW 12 12
ofof1919
Acne
Acne
Melasma
Melasma
Predicted
Predicted
Rosacea Nevus of ota
Acne
Melasma
Melasma
Predicted
Predicted
Rosacea Nevus of ota
Acne Melasma Rosacea Nevus of ota Acne Melasma Rosacea Nevus of ota
True True
(e) Ours
Figure8.
Figure 8. The
The confusion
confusion matrix of each model on
on our
our private
private dataset.
dataset.
Electronics 2023, 12, 438 13 of 19
Table 4. The f1-score of each model on our private dataset, the unit is %.
Model Accuracy
ResNet50 90.10
EfficientNet_B4 93.36
DenseNet201 92.87
ConvNeXt_L 93.72
IRv2-RA [11] 93.47
FixCaps [15] 96.49
IM-CNN [22] 95.10
CNN [54] 85.98
Ours 95.29
In addition, Tables 6–8 show the precision, recall and f1-score of each model based on
the dataset divided in the first way (test set of 828 images), respectively. As can be seen
from Tables 6–8, our proposed model performed slightly better in the categories with less
data. For example, from Table 7, it can be seen that our proposed model outperformed the
IRv2-RA [11] and FixCaps [15] models by 83% and 33.3%, respectively, in the dermatofi-
broma (DF) skin disease category. Taken together, our proposed model outperformed all
the other models in terms of the macro-average recall and macro-average f1-score. Mean-
while, in terms of macro-average precision, our proposed model was higher than most
models, and it was only 0.79% and 0.59% lower than the IRv2-RA [11] and FixCaps [15]
models, respectively.
Electronics 2023, 12, 438 14 of 19
Table 6. The precision of each model on the public dataset HAM10000 (test set of 828 images); the
missing values of indicators are replaced by “-”, and the unit is %.
Table 7. The recall of each model on the public dataset HAM10000 (test set of 828 images); the missing
values of indicators are replaced by “-”, and the unit is %.
Table 8. The f1-score of each model on the public dataset HAM10000 (test set of 828 images); the
missing values of indicators are replaced by “-”, and the unit is %.
Finally, Tables 9–12 show the accuracy, precision, recall and f1-score of each model
on the dataset divided in the second way (test set of 2000 images), respectively. It can be
observed from Tables 9–12 that our proposed model not only outperformed the models
proposed by others in terms of accuracy but also outperformed the models proposed by
others in terms of the macro-average precision, macro-average recall and macro-average
f1-score. In particular, compared with the models proposed by others in terms of the macro-
average recall and macro-average f1-score, our proposed model possessed the largest
improvement of 18.91% and 14.75%, respectively. All in all, our proposed model not only
possessed a good classification performance on our private dataset but also showed good
classification performance on the public dataset HAM10000. In addition, compared with
the other state-of-the-art models, it also achieved good results. This demonstrates that
our proposed model possessed a good generalization ability at the same time. In order to
facilitate the comparison of the classification performance of multiple models on multiple
datasets, we performed a statistical analysis on the accuracy rates of the models ResNet50,
Electronics 2023, 12, 438 15 of 19
EfficientNet_B4, DenseNet201, ConvNeXt_L and our model on the above three datasets.
The critical value calculated using the on-parametric Friedman test [55] was 0.0218. So, the
test accuracy of these models showed a significant difference. Then, the post-hoc Nemenyi
test [55] was used to further distinguish the model performance, where the calculated
critical difference (CD) was 3.5215. The critical difference diagram is shown in Figure 9.
According to Figure 9, it can be seen that our proposed model performed better overall.
Table 9. The accuracy of each model on the public dataset HAM10000 (test set of 2000 images); the
unit is %.
Model Accuracy
ResNet50 81.85
EfficientNet_B4 88.20
DenseNet201 87.75
ConvNeXt_L 88.40
Bayesian DenseNet169 [21] 83.59
MobileNetV2-LSTM [56] 85.34
EW-FCM and wide-ShuffleNet [32] 84.80
Shifted2-Nets [26] 83.60
Ours 90.85
Table 10. The precision of each model on the public dataset HAM10000 (test set of 2000 images); the
missing values of indicators are replaced by “-”, and the unit is %.
Table 11. The recall of each model on the public dataset HAM10000 (test set of 2000 images); the
missing values of indicators are replaced by “-”, and the unit is %.
CD
1 2 3 4 5
Ours ResNet50
ConvNeXt_L EfficientNet_B4
DenseNet201
Thecritical
Figure9.9.The
Figure criticaldifference
differencediagram.
diagram.
5. Discussion
5. Discussion
Although our proposed model possessed good classification performance on the
Although our proposed model possessed good classification performance on the da-
datasets with an extreme imbalance or a small number of samples, it was not flawless
tasets with an extreme imbalance or a small number of samples, it was not flawless and
and still had limitations. For example, our proposed model consumed a lot of computing
still had limitations. For example, our proposed model consumed a lot of computing re-
resources while training, and the training speed was also relatively slow. In addition, our
sources
proposed while training,
model and the
recognized training
fewer typesspeed was
of skin also relatively
diseases slow. training
and requires In addition, our
on more
proposed model recognized fewer types of skin diseases and requires training
benchmark datasets in order to refine it. Therefore, in future work we will carry out a on more
benchmark
lightweightdatasets in orderoftothe
transformation refine it. Therefore,
proposed model in inorder
futuretowork
adaptwe will
it to carry out
different worka
lightweight transformation of the proposed model in order to adapt it to different
scenarios. In addition, we will test our proposed model by using other benchmark datasets work
scenarios. In addition,
with different we will test our proposed model by using other benchmark datasets
skin diseases.
with different skin diseases.
6. Conclusions
6. Conclusions
In this paper, we proposed a convolutional neural network model for skin disease
In this paper,
classification basedweonproposed a convolutional
model fusion. We choseneural networkand
DenseNet201 model for skin disease
ConvNeXt_L as the
classification based on model
backbone sub-classification fusion.
models of We chose DenseNet201
our model and ConvNeXt_L
fusion. In addition, as the
on the core block of
backbone sub-classification
each sub-classification models
model, of our model
an attention fusion.
module was In addition, to
introduced onassist
the core
the block
networkof
each sub-classification
in acquiring a region ofmodel, an in
interest attention
order tomodule
enhance was
theintroduced to assist
ability of the the model
network network to
extract
in image
acquiring features.
a region of In addition,
interest the features
in order extracted
to enhance by the
the ability ofshallow network
the network modelcould
to
captureimage
extract morefeatures.
details, and the features
In addition, extracted
the features by the deep
extracted by thenetwork
shallowcontained more
network could
abstract more
capture semantic information.
details, and the Combining the characteristics
features extracted by the deep of the two, acontained
network parallel strategy
more
was adopted to fuse the features of the deep and shallow layers. Finally, through a series
of works such as model pre-training, data augmentation and parameter fine-tuning, the
classification performance of the proposed model was further improved.
On the private dataset, the proposed model achieved an accuracy of 96.49%, which
was 4.42% and 3.66% higher than the two baseline models, respectively. On the public
dataset, HAM10000, the accuracy and f1-scores of the proposed model were 95.29% and
89.99%, respectively, which also achieved good results compared to the other state-of-the-
Electronics 2023, 12, 438 17 of 19
art models. It was demonstrated that the proposed model possessed a good classification
performance on the datasets with an extreme imbalance or a small number of samples as
well as a good generalization ability.
Author Contributions: Conceptualization, M.W. and Q.W.; methodology, Q.W.; validation, H.J. and
J.W.; formal analysis, M.W. and J.L.; writing—original draft preparation, Q.W.; writing—review
and editing, J.L. and L.Z.; supervision, T.L.; project administration, T.L., L.Z., M.W., Q.W. and
H.J. contribute equally to the work. All authors have read and agreed to the published version of
the manuscript.
Funding: This publication emanated from research conducted with the financial support of the
National Key Research and Development Program of China under grant no. 2017YFE0135700, the
Tsinghua Precision Medicine Foundation under grant no. 2022TS003.
Institutional Review Board Statement: Ethical review and approval were waived for this study
because the data used in this study only involved pictures of skin diseases, which do not involve
ethics, and we did not involve experiments using animals.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: We analyzed a public dataset in this study. It is available at https://
challenge.isic-archive.com/data/#2018, (access on 10 November 2022).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Karimkhani, C.; Dellavalle, R.P.; Coffeng, L.E.; Flohr, C.; Hay, R.J.; Langan, S.M.; Nsoesie, E.O.; Ferrari, A.J.; Erskine, H.E.;
Silverberg, J.I. Global skin disease morbidity and mortality: An update from the global burden of disease study 2013. JAMA
Dermatol. 2017, 153, 406–412. [CrossRef] [PubMed]
2. Leiter, U.; Eigentler, T.; Garbe, C. Epidemiology of skin cancer. Sunlight Vitam. D Ski. Cancer 2014, 810, 120–140.
3. Baumann, B.C.; MacArthur, K.M.; Brewer, J.D.; Mendenhall, W.M.; Barker, C.A.; Etzkorn, J.R.; Jellinek, N.J.; Scott, J.F.; Gay, H.A.;
Baumann, J.C. Management of primary skin cancer during a pandemic: Multidisciplinary recommendations. Cancer 2020, 126,
3900–3906. [CrossRef] [PubMed]
4. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I.
A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [CrossRef] [PubMed]
5. Abdulrahman, A.A.; Rasheed, M.; Shihab, S. The Analytic of image processing smoothing spaces using wavelet. In Proceedings
of the Ibn Al-Haitham International Conference for Pure and Applied Sciences (IHICPS), Baghdad, Iraq, 9–10 December 2020;
p. 022118.
6. Rashid, T.; Mokji, M.M. Low-Resolution Image Classification of Cracked Concrete Surface Using Decision Tree Technique. In
Control, Instrumentation and Mechatronics: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2022; pp. 641–649.
7. Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications.
Neurocomputing 2017, 234, 11–26. [CrossRef]
8. Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.-L.; Chen, S.-C.; Iyengar, S.S. A survey on deep learning:
Algorithms, techniques, and applications. ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [CrossRef]
9. Ker, J.; Wang, L.; Rao, J.; Lim, T. Deep learning applications in medical image analysis. IEEE Access 2017, 6, 9375–9389. [CrossRef]
10. Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural
networks: A review. J. Med. Syst. 2018, 42, 226. [CrossRef]
11. Datta, S.K.; Shaikh, M.A.; Srihari, S.N.; Gao, M. Soft Attention Improves Skin Cancer Classification Performance. In Interpretability
of Machine Intelligence in Medical Image Computing, and Topological Data Analysis and Its Applications for Medical Data; Springer:
Berlin/Heidelberg, Germany, 2021; pp. 13–23.
12. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning.
In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017.
13. Codella, N.C.; Gutman, D.; Celebi, M.E.; Helba, B.; Marchetti, M.A.; Dusza, S.W.; Kalloo, A.; Liopyris, K.; Mishra, N.; Kittler, H.
Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi),
hosted by the international skin imaging collaboration (isic). In Proceedings of the 2018 IEEE 15th International Symposium on
Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 168–172.
14. Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of
common pigmented skin lesions. Sci. Data 2018, 5, 180161. [CrossRef] [PubMed]
15. Lan, Z.; Cai, S.; He, X.; Wen, X. FixCaps: An Improved Capsules Network for Diagnosis of Skin Cancer. IEEE Access 2022, 10,
76261–76267. [CrossRef]
16. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30, 3856–3866.
Electronics 2023, 12, 438 18 of 19
17. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
18. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986.
19. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. Supplementary material for ‘ECA-Net: Efficient channel attention for deep
convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Seattle, WA, USA, 13–19 June 2020; pp. 13–19.
20. Yang, Z.; Zhu, L.; Wu, Y.; Yang, Y. Gated channel transformation for visual recognition. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11794–11803.
21. Mobiny, A.; Singh, A.; Van Nguyen, H. Risk-aware machine learning classifier for skin lesion diagnosis. J. Clin. Med. 2019, 8, 1241.
[CrossRef]
22. Wang, S.; Yin, Y.; Wang, D.; Wang, Y.; Jin, Y. Interpretability-based multimodal convolutional neural networks for skin lesion
diagnosis. IEEE Trans. Cybern. 2021, 52, 12623–12637. [CrossRef] [PubMed]
23. Allugunti, V.R. A machine learning model for skin disease classification using convolution neural network. Int. J. Comput.
Program. Database Manag. 2022, 3, 141–147.
24. Anand, V.; Gupta, S.; Koundal, D.; Nayak, S.R.; Nayak, J.; Vimal, S. Multi-class Skin Disease Classification Using Transfer
Learning Model. Int. J. Artif. Intell. Tools 2022, 31, 2250029. [CrossRef]
25. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258.
26. Thurnhofer-Hemsi, K.; López-Rubio, E.; Domínguez, E.; Elizondo, D.A. Skin lesion classification by ensembles of deep convolu-
tional networks and regularly spaced shifting. IEEE Access 2021, 9, 112193–112205. [CrossRef]
27. Karthik, R.; Vaichole, T.S.; Kulkarni, S.K.; Yadav, O.; Khan, F. Eff2Net: An efficient channel attention-based convolutional neural
network for skin disease classification. Biomed. Signal Process. Control 2022, 73, 103406. [CrossRef]
28. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141.
29. Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine
Learning, Shenzhen, China, 26 February–1 March 2021; pp. 10096–10106.
30. Abayomi-Alli, O.O.; Damasevicius, R.; Misra, S.; Maskeliunas, R.; Abayomi-Alli, A. Malignant skin melanoma detection using
image augmentation by oversamplingin nonlinear lower-dimensional embedding manifold. Turk. J. Electr. Eng. Comput. Sci.
2021, 29, 2600–2614. [CrossRef]
31. Mendonça, T.; Ferreira, P.M.; Marques, J.S.; Marcal, A.R.; Rozeira, J. PH 2-A dermoscopic image database for research and
benchmarking. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 5437–5440.
32. Hoang, L.; Lee, S.-H.; Lee, E.-J.; Kwon, K.-R. Multiclass Skin Lesion Classification Using a Novel Lightweight Deep Learning
Framework for Smart Healthcare. Appl. Sci. 2022, 12, 2677. [CrossRef]
33. Malibari, A.A.; Alzahrani, J.S.; Eltahir, M.M.; Malik, V.; Obayya, M.; Al Duhayyim, M.; Neto, A.V.L.; de Albuquerque, V.H.C.
Optimal deep neural network-driven computer aided diagnosis model for skin cancer. Comput. Electr. Eng. 2022, 103, 108318.
[CrossRef]
34. Nawaz, M.; Nazir, T.; Masood, M.; Ali, F.; Khan, M.A.; Tariq, U.; Sahar, N.; Damaševičius, R. Melanoma segmentation: A
framework of improved DenseNet77 and UNET convolutional neural network. Int. J. Imaging Syst. Technol. 2022, 32, 2137–2153.
[CrossRef]
35. Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.
Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic).
arXiv 2019, arXiv:1902.03368.
36. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
37. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International
Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.
38. Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [CrossRef]
39. Zhou, Z.-H. Ensemble learning. In Machine Learning; Springer: Berlin/Heidelberg, Germany, 2021; pp. 181–210.
40. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted
windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October
2021; pp. 10012–10022.
41. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415.
42. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450.
43. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks
from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
44. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of
the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
Electronics 2023, 12, 438 19 of 19
45. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [CrossRef]
46. DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552.
47. Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with local-
izable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea,
27 October–2 November 2019; pp. 6023–6032.
48. Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. Augmix: A simple data processing method to
improve robustness and uncertainty. arXiv 2019, arXiv:1912.02781.
49. Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on
Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13001–13008.
50. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4.
51. Olson, D.L.; Delen, D. Advanced Data Mining Techniques; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008.
52. Opitz, J.; Burst, S. Macro f1 and macro f1. arXiv 2019, arXiv:1911.03347.
53. Bottou, L. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012;
pp. 421–436.
54. Alwakid, G.; Gouda, W.; Humayun, M.; Sama, N.U. Melanoma Detection Using Deep Learning-Based Classifications. Healthcare
2022, 10, 2481. [CrossRef] [PubMed]
55. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30.
56. Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of skin disease using deep learning neural
networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.