Deep Feature Extraction of Pap Smear Images Based On Convolutional Neural Network and Vision Transformer For Cervical Cancer Classification
Deep Feature Extraction of Pap Smear Images Based On Convolutional Neural Network and Vision Transformer For Cervical Cancer Classification
Abstract—Cervical cancer is a malignant disease that results obtained are considered subjective and potentially
women commonly experience. This cancer can be prevented if cause an inaccurate diagnosis. A computer-based detection
screening is carried out early using the pap smear method. The system is needed to minimize observation errors [7]. The
pap smear technique yields a subjective diagnosis. An importance of using computers to observe normal and
appropriate decision-making method is needed to overcome abnormal cells is a way of diagnosing and saving human
this obstacle, such as using a computer-based diagnosis method lives.
and applying machine learning. We apply a combination of
deep feature extraction using transfer learning from A method that can be used in the medical field to
convolutional neural network models and vision transformers diagnose cells is machine learning. Machine learning has
to obtain local and global features. Local and global features been widely used in the medical field as decision support to
can represent an image's more comprehensive variety of reduce bias in observing data [8]. Several studies use
features. The combined features are then reduced using two machine learning as a proposed method to support decisions
steps, principal component analysis and linear discriminant to help diagnose cervical cells based on Pap smear images by
analysis, to obtain a representation of the essential features of applying deep feature extraction using convolutional neural
the data. The reduced features are then analyzed using several network (CNN) as research conducted [9], [10]. Some
classifiers, including SVM, K-NN, MLP, and LR. The researchers also use the vision transformer (ViT) method to
proposed framework was evaluated on three publicly
extract deep features, as researched by [6], [11].
accessible datasets, namely Herlev, Mendeley LBC, and
SIPaKMeD, achieving classification accuracies of 97.83% This study presents a framework as an alternate
(SVM, K-NN, MLP, and LR), 100% (SVM, K- NN, MLP, and methodology using CNN and ViT to derive profound
LR), and 98.52% (SVM, K-NN, and LR) respectively. characteristics from image data and apply two steps of
feature reduction using principal component analysis (PCA)
Keywords—Cervical Cancer, Deep Feature Extraction, and linear discriminant analysis (LDA) to obtain essential
Feature Reduction, Classification features from an ample feature space corresponding to class
I. INTRODUCTION labels. We use transfer learning from a CNN as applied in
the paper [9], namely ResNet-50, DenseNet-121, VGG-16,
Cervical cancer stands as the primary contributor to and Inception-V3. The ViT used in this study are ViT-B16,
mortality among women. It constitutes around 12% of all ViT-L16, ViT-B32, and ViT-L32. The ViT aims to increase
cancer cases and poses a significant mortality risk for women feature variations to obtain relevant feature variations. The
globally [1]. In 2018, an estimated 570.000 women were proposed approach demonstrates proficient classification
diagnosed with cervical cancer worldwide, and accuracy for two distinct classes on the Herlev and
approximately 311.000 women died from the disease [2]. Mendeley LBC datasets, as well as three classes on the
This cancer grows abnormally in a woman's cervix and then SIPaKMeD dataset [9].
spreads, causing normal cells in other tissues to become
damaged [3]. Cervical cancer is usually detected early before This study aims to conduct deep feature extraction using
it gets worse because precancerous lesions can be found with CNN and ViT, which have undergone prior training. The
a pap smear test [4]. Human Papillomavirus (HPV) is the objective is to generate more comprehensive feature variants
primary etiological factor responsible for cervical cancer in to extract relevant features. The characteristics that have
women. This cancer is one of the most murderous diseases been extracted are merged and subsequently standardized
but can be cured if identified at an initial phase [5]. using standard scaler routines accessible within the Python
library. PCA and LDA are employed to reduce standardized
The mortality rate attributed to cervical cancer in women features. These reduced features are further classified using
can be reduced by routine examination of the cervix as an various classifiers, such as support vector machine (SVM),
early detection process to reduce the risk of death [6]. A pap k-nearest neighbor (K-NN), multi-layer perceptron (MLP),
smear test is one method of detecting cell changes in a and logistic regression (LR). The utilization of PCA [9] and
woman's cervix. The pap smear test aims to look for cell LDA [12] can provide significant structural characteristics.
changes from normal cells to abnormal or precancerous cells The benefits of both strategies can be implemented
in cells in the cervix [7]. Visual observation of the pap smear sequentially. The overall flow of the suggested framework is
test using a microscope has several weaknesses, so the illustrated in Fig. 1.
Below is a concise overview of the contributions made to machine learning can be used to support decisions because it
the proposed framework: produces more objective predictions.
1. We demonstrate feature fusion using a CNN and ViT Feature extraction and reduction are important stages that
feature extraction. have a significant impact on categorization results. Large
feature dimensions can cause redundancy and require
2. We apply two feature reduction steps, PCA and LDA, increased processing resources. PCA used in research [9] is
to obtain important features from combining features. able to extract important features with the aim of increasing
3. We evaluated the proposed framework using three model accuracy. However, the feature reduction process still
publicly available and accessible pap smear datasets: has large feature dimensions, which can cause bias in the
Herlev [13], Mendeley LBC [14], and SIPaKMeD [15]. data. Research [12] applying LDA was able to produce
significant features with much smaller feature dimensions.
The feature extraction method using CNN does not
require cell image segmentation [17]. Research [6], [11]
Images of Cytology performs feature extraction using ViT, which does not
require cell image segmentation. The deep feature extraction
method using CNN and ViT is a suitable combination for
capturing local and global features in images. The proposed
method is evaluated using several classifiers to obtain the
best accuracy, namely SVM, K-NN, MLP, and LR.
ResNet-50 ViT-B16 CNN and ViT Extraction
VGG-16 ViT-B32 of Features Artificial neural networks (ANN) require datasets with
DenseNet-121 ViT-L16 significant quantities of data to overcome the problems of
Inception-V3 ViT-L32 overfitting and poor generalization. The number of images in
the Herlev and Mendeley LBC datasets does not exceed a
thousand. We applied an ANN method using transfer
learning from the CNN and ViT methods to (overcome the
problem of the limited number of images. By using a pre-
trained model, it can improve the classification model. Some
Concatenation of Concatenation of CNN of the following studies use different transfer learning
features and ViT features models [6], [9], [10], [11], [12].
III. MATERIALS AND METHODS
A. Datasets
Figure 1. The proposed framework This study assesses the suggested framework by
employing three publicly accessible datasets. The datasets
This paper comprises five chapters. Chapter 1 presents a under consideration include Herlev [13], Mendeley LBC
comprehensive summary of the research. Chapter 2 explains [14], and SIPaKMeD [15]. The Herlev dataset comprises 917
the classification of cervical cancer. Chapter 3 explains the images, categorized into seven distinct cell classes. Among
materials and methods related to the topic studied. Chapter 4 these classes, three are categorized as normal, while four are
describes the experimental outcomes and discourse, and categorized as abnormal (Fig. 2). The Mendeley LBC dataset
Chapter 5 presents the summary and prospective endeavors. comprises 963 images categorized into four distinct cell
classes. Among these classes, one represents the normal
II. RELATED WORKS category, while the remaining three represent the abnormal
Prediction is an essential technique within the domain of category (Fig. 3). The SIPaKMeD dataset comprises 4049
machine learning [16]. Machine learning techniques have images, categorized into five distinct cell classes; among
been broadly applied in the medical field, especially to these classes, two are classified as normal, two as abnormal,
predict the type or category of cancer cells in normal or and one as benign (Fig. 4). Visual representations of the
abnormal conditions. Many researchers have conducted Herlev, Mendeley LBC, and SIPaKMeD datasets are
studies related to cancer, one of which is cervical cancer, as depicted in Fig. 2 to 4, correspondingly. Tables I to III
studied by [4], [6], [9], [11]. These studies show that correspondingly present the distribution of images related to
291
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.
The 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)
The cervical cytology data obtained from four datasets: B. Deep Features Extraction
Herlev, Mendeley LBC, and SIPaKMeD.
Manual feature extraction exhibits limitations in terms
of feature count and correlation. The manual extraction of
numerous features is a laborious, time-intensive, and error-
prone process that has a detrimental impact on the quality of
classification. We propose deep feature extraction using a
combination of transfer learning from CNN and ViT. CNN
(a) (b) (c) (d) (e)
is able to capture local information, while ViT is able to
capture global information in images. The transfer learning
CNNs used are ResNet-50, DenseNet-121, VGG-16, and
Inception-V3. Meanwhile, the ViT transfer learning used is
ViT-B16, ViT-L16, ViT-B32, and ViT-L32.
(f) (g)
1) ResNet-50: The deep convolutional neural network
Figure 2. Herlev dataset: (a) Normal squamous; (b) Intermediate known as ResNet-50 [18] consists of a total of fifty layers.
squamous; (c) Columnar; (d) Mild dysplasia; (e) Moderate
A neural network with residual connections is an ANN that
dysplasia; (f) Severe dysplasia (g) Carcinoma in situ
consists of stacking residual blocks to create a network.
2) VGG-16: The VGG-16 model is a deep CNN
consisting of 16 layers [19]. Convolutional layers with a
3x3 size are a fundamental characteristic of VGG networks.
This greatly enhances network efficiency while also
(a) (b) (c) (d) augmenting network depth.
Figure 3. Mendeley LBC dataset: (a) Negative for intraepithelial
3) DenseNet-121: DenseNet-121 [20] is a CNN
malignancy (NILM); (b) Low squamous intraepithelial lesion architecture distinguished by its interconnected layers,
(LSIL); (c) High squamous intraepithelial lesion (HSIL); (d) wherein each layer is mutually coupled to all other layers in
Squamous cell carcinoma (SCC) a feedforward fashion. DenseNets possess numerous notable
advantages, including mitigating the vanishing gradient
issue, enhancement of feature diffusion, ability to reuse
features, and substantially reducing parameter count. In
addition to its speed, efficiency, and ease of use, this
network has gained significant popularity within the
(a) (b) (c) (d) (e)
medical imaging domain [21].
Figure 4. SIPaKMeD dataset: (a) Superficial; (b) Parabasal; (c) 4) Inception-V3: The architecture of Inception-V3 [32]
Koilocytotic; (d) Dyskeratotic; (e) Metaplastic
is a convolutional neural network belonging to the Inception
Table I: The Distribution of Images within the Herlev Dataset family, which incorporates many improvements, such as
label smoothing. The Inception-V3 architecture is
Number of
Category Cell type
images characterized by the use of blocks that utilize parallel
Intermediate squamous 70 convolutions, followed by a concatenation of the channels.
Normal The utilization of parallel convolution allows for the
Columnar 98
Normal squamous 74 mitigation of overfitting concerns while effectively
Mild dysplasia 182 managing computational complexity.
Carcinoma in situ 150 5) Vision Transformer [22]: ViT is a deep learning
Abnormal
Moderate dysplasia 146 architectural model that enacts an image as a sequence of
Severe dysplasia 197 pixels and applies them to transformer layers to extract
Table II: The Distribution of Images within the Mendeley LBC Dataset features from the image. ViT has two main components,
namely, the embedding layer and the transformer layer. The
Number of image is broken down into small blocks, then represented as
Category Cell type
images
a vector and converted into a sequence of tokens. The
Normal NILM 613
LSIL 163
sequence of tokens is passed through a sequence of
Abnormal HSIL 113
transformer blocks, each of which consists of a self-
SCC 74
attention layer. ViT-B16 and ViT-B32 are ViT-Base
architectures that each have embedded layers with sizes (14,
Table III: The Distribution of Images within the SIPaKMeD Dataset
14, 3) with a total of 86 million parameters and (7, 7, 3)
Number of with a total of 88 million parameters, while both have 12
Category Cell type
images transformer blocks. ViT-L16 and ViT-L32 are ViT-Large
Normal Superficial 831 architectures, each of which has an embedding layer of size
Parabasal 787 (24, 24, 1024) with a total of 304 million parameters and
Koilocytotic 825 (12, 12, 1024) with a total of 306 million parameters. ViT-
Abnormal
Dyskeratotic 813 L16 has 16 transformer blocks, and ViT-L32 has 24
Benign Metaplastic 793 transformer blocks.
292
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.
The 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)
C. Principal Component Analysis (PCA) Throughout the training process, the weights are modified in
PCA is a linear reduction technique that uses simple order to minimize the disparity between the predicted and
statistics theories. The PCA technique is used to project actual output. The desired output is ultimately generated by
high-dimensional vector data into low-dimensional vectors. the last layer of the MLP.
These vectors are called principal components. The 4) Logistic Regression (LR): LR is a statistical method
principal components are selected to maximize the variance used to predict the probability of an event occurring based
of the data [23]. on independent factors. LR employs the logit function, often
D. Linear Discriminant Analysis (LDA) known as the logistic curve, to establish a mathematical
model that describes the association between independent
LDA is a commonly used approach in machine learning factors and a binary dependent variable, also called the
analysis to decrease the number of dimensions of features in
target variable.
classification assignments [24]. LDA is a statistical
technique that aims to minimize the dimensionality of IV. RESULTS AND DISCUSSION
features by increasing the proportion of variance between
The concatenation of CNN and ViT combined features is
classes while reducing the variance within classes. This subsequently scaled utilizing the scaling function in the
optimization process maximizes the segregation between Python library for standardization. The process of
various classes. standardization involves scaling the feature values obtained
E. Classification through the process of subtraction of the mean and
subsequently scaling it to unit variance. The determination of
Classifying is the last step following the reduction of the unit of variance involves the division of each feature
feature dimensions. The utilization of classifiers in this value, which is scaled by the standard deviation. The
work encompasses SVM, K-NN, MLP, and LR. standardized features are subsequently decreased by a two-
1) Support Vector Machine (SVM) [25]: SVM is a step feature reduction method, which involves the utilization
prevalent guided learning approach utilized for of PCA and LDA.
classification undertakings. The goal is to determine the
After completing feature reduction, the next step is to
most efficient hyperplane, also known as the boundary of perform classification using many classifiers, specifically
decision, that effectively segregates various categories SVM, K-NN, MLP, and LR. The train_test_split function,
within the space of inputs. SVM employs a kernel operation accessible in the Python library, is employed to partition the
to transform input into a feature space of greater dataset into distinct subsets for training, validation, and
dimensions, facilitating the non-linear segregation of testing purposes. Test results with the four classifiers
heterogeneous data. The decision boundary of the SVM is demonstrate that the suggested strategy surpasses earlier
designed to maximize the margin, which is defined as the studies in accuracy. The evaluation results of the testing
smallest measurement of separation between the segregating classification, including accuracy, precision, recall, and F1-
line and the closest points from every class on both sides. Score, for The proposed approach and existing research are
displayed in Tables IV to VI.
2) K-Nearest Neighbor (K-NN): K-NN algorithm lies
within the guided learning domain. It is acknowledged as Table IV: Comparison between the suggested approach with the existing
approach (Two Classes of Herlev dataset)
one of the most straightforward and extensively utilized
algorithms in machine learning. This methodology is Split Accuracy Precision Recall F1-Score
Method
commonly utilized for a range of classification duties [26]. (Train-Val-Test) (%) (%) (%) (%)
The categorization of a recently acquired data point in the 70%-15%-15% 90.58 91.34 83.74 86.63
K-NN algorithm is determined by examining the category CNN+VIT+PCA+
LDA+SVM 80%-10%-10% 89.13 91.2 82.56 85.51
with the highest frequency or mean value within its nearest (Proposed Method)
neighbors in the training set. The proposed approach 90%-5%-5% 97.83 98.75 92.86 95.52
employs a distance measurement, such as the Euclidean 70%-15%-15% 93.48 93.5 89.3 91.13
distance, to quantify the distance between the newly CNN+VIT+ PCA+
LDA+K-NN 80%-10%-10% 89.13 87.52 85.81 86.6
introduced data point and each data point inside the training (Proposed Method)
dataset. In classification tasks, the K-NN algorithm is 90%-5%-5% 97.83 98.75 92.86 95.52
employed to determine the K-NN of a recently acquired
70%-15%-15% 86.96 82.61 87.58 84.39
data point inside the training dataset. Subsequently, the CNN+VIT+ PCA+
projected class label is assigned by selecting the class label LDA+MLP
(Proposed Method)
80%-10%-10% 83.7 80.65 85.21 81.9
that appears most frequently among the K-NN. 90%-5%-5% 97.83 98.75 92.86 95.52
3) Multi-Layer Perceptron (MLP): MLP is a commonly
70%-15%-15% 92.75 92.96 87.91 90.04
employed ANN in the field of guided learning, specifically
CNN+VIT+ PCA+
for tasks such as classification. The structure comprises a LDA+LR 80%-10%-10% 91.3 91.3 87.35 89.01
(Proposed Method)
neural network that includes one or more concealed layers. 90%-5%-5% 97.83 98.75 92.86 95.52
These layers are composed of nodes or neurons that perform
nonlinear changes on the input data. The procedure involves 70%-15%-15% 85.48 89.51 73.04 76.82
CNN+PCA+
the sequential processing of incoming data through a series GWO+SVM 80%-10%-10% 84.89 88.20 75.45 78.64
(Basak, et al.[9])
of hidden layers, where each layer is related to the
90%-5%-5% 92.61 89.55 80.40 83.82
preceding one by a predetermined set of weights.
293
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.
The 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)
Table V: Comparison between the suggested approach with the existing Table VII: Accuracy comparison of the suggested approach with the
approach (Two Classes of Mendeley LBC dataset) existing approach (Two classes of Herlev dataset,
(Train:70%-Val:15%-Test:15%))
Split Accuracy Precision Recall F1-Score
Method
(Train-Val-Test) (%) (%) (%) (%)
Training Validation Testing
Method
(%) (%) (%)
70%-15%-15% 97.93 97.27 98.39 97.78
CNN+VIT+PCA+
CNN+VIT+ PCA+ 100 94.93 90.58
LDA+SVM
LDA+SVM 80%-10%-10% 98.97 98.78 99.12 98.94
(Proposed Method)
(Proposed Method)
CNN+VIT+ PCA+
90%-5%-5% 100 100 100 100 LDA+K-NN 100 92.03 93.48
(Proposed Method)
70%-15%-15% 98.62 98.15 98.92 98.51 CNN+VIT+ PCA+
LDA+MLP 100 86.96 86.96
CNN+VIT+ PCA+
(Proposed Method)
LDA+K-NN 80%-10%-10% 98.97 98.78 99.12 98.94
(Proposed Method) CNN+VIT+ PCA+
LDA+LR 100 92.75 92.75
90%-5%-5% 100 100 100 100 (Proposed Method)
CNN+PCA+
70%-15%-15% 98.62 98.15 98.92 98.51 GWO+SVM 93.56 88.43 85.48
(Basak, et al.[9])
CNN+VIT+ PCA+
LDA+MLP 80%-10%-10% 97.94 97.62 98.25 97.89 Table VIII: Accuracy comparison of the suggested approach with the
(Proposed Method)
existing approach (Two classes of Herlev dataset,
90%-5%-5% 100 100 100 100 (Train:80%-Val:10%-Test:10%))
70%-15%-15% 98.62 98.15 98.92 98.51
Training Validation Testing
CNN+VIT+ PCA+ Method
(%) (%) (%)
LDA+LR 80%-10%-10% 98.97 98.78 99.12 98.94
(Proposed Method) CNN+VIT+ PCA+
LDA+SVM 100 94.57 89.13
90%-5%-5% 100 100 100 100
(Proposed Method)
CNN+VIT+ PCA+
70%-15%-15% 98.28 98.44 97.84 98.10 LDA+K-NN 100 92.39 89.13
CNN+PCA+ (Proposed Method)
GWO+SVM 80%-10%-10% 97.22 97.52 96.85 97.09 CNN+VIT+ PCA+
(Basak, et al.[9]) LDA+MLP 100 81.52 83.7
(Proposed Method)
90%-5%-5% 95.44 95.43 94.67 94.93
CNN+VIT+ PCA+
LDA+LR 100 94.57 91.3
Table VI: Comparison between the suggested approach with the existing (Proposed Method)
approach (Three Classes of SIPaKMeD dataset) CNN+PCA+
GWO+SVM 94.06 88.73 84.89
(Basak, et al.[9])
Split Accuracy Precision Recall F1-Score
Method
(Train-Val-Test) (%) (%) (%) (%) Table IX: Accuracy comparison of the suggested approach with the
70%-15%-15% 95.39 95.03 93.83 94.36
existing approach (Two classes of Herlev dataset,
(Train:90%-Val:5%-Test:5%))
CNN+VIT+ PCA+
LDA+SVM 80%-10%-10% 96.05 94.92 95.2 95.04
(Proposed Method) Training Validation Testing
Method
90%-5%-5% 98.52 98.81 97.8 98.28 (%) (%) (%)
CNN+VIT+ PCA+
70%-15%-15% 95.56 95.52 93.97 94.65 LDA+SVM 100 86.96 97.83
(Proposed Method)
CNN+VIT+ PCA+
CNN+VIT+ PCA+
LDA+K-NN 80%-10%-10% 96.3 95.6 95.4 95.48
(Proposed Method) LDA+K-NN 100 86.96 97.83
(Proposed Method)
90%-5%-5% 98.52 98.81 97.8 98.28 CNN+VIT+ PCA+
LDA+MLP 100 86.96 97.83
70%-15%-15% 96.05 95.01 95.01 95.01 (Proposed Method)
CNN+VIT+ PCA+
CNN+VIT+ PCA+
LDA+MLP 80%-10%-10% 96.79 95.79 95.77 95.78 LDA+LR 100 89.13 97.83
(Proposed Method) (Proposed Method)
CNN+PCA+
90%-5%-5% 97.54 98.04 95.99 96.9 GWO+SVM 94.21 81.01 92.61
(Basak, et al.[9])
70%-15%-15% 96.05 95.56 94.69 95.09
Table X: Accuracy comparison of the suggested approach with the
CNN+VIT+ PCA+
LDA+LR 80%-10%-10% 96.79 95.56 96.06 95.8
existing approach (Two classes of Mendeley LBC,
(Proposed Method) (Train:70%-Val:15%-Test:15%))
90%-5%-5% 98.52 98.81 97.8 98.28
Training Validation Testing
Method
70%-15%-15% 91.95 91.95 90.82 91.25 (%) (%) (%)
CNN+PCA+ CNN+VIT+ PCA+
GWO+SVM 80%-10%-10% 93.19 92.48 92.07 92.21 LDA+SVM 100 98.61 97.93
(Basak, et al.[9]) (Proposed Method)
CNN+VIT+ PCA+
90%-5%-5% 94.63 94.39 94.07 94.17
LDA+K-NN 100 98.61 98.62
(Proposed Method)
Tables VII to XV present the training accuracy, CNN+VIT+ PCA+
LDA+MLP 100 98.61 98.62
validation accuracy, and testing accuracy of the suggested (Proposed Method)
approach and compare them with existing research CNN+VIT+ PCA+
LDA+LR 100 98.61 98.62
methodologies. The tables exhibit comparisons among each (Proposed Method)
classifier and present the data based on the division of the CNN+PCA+
GWO+SVM 99.75 97.27 98.28
dataset. (Basak, et al.[9])
294
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.
The 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)
Table XI: Accuracy comparison of the suggested approach with the Table XV: Accuracy comparison of the suggested approach with the
existing approach (Two classes of Mendeley LBC dataset, existing approach (Three classes of SIPaKMeD dataset,
(Train:80%-Val:10%-Test:10%)) (Train:90%-Val:5%-Test:5%))
Table XII Accuracy comparison of the suggested approach with the The performance metrics obtained from the experiment
existing approach (Two classes of Mendeley LBC dataset,
conducted on the two-class Herlev dataset are as follows:
(Train:90%-Val:5%-Test:5%))
accuracy = 97.83%, precision = 98.75%, recall = 92.86%,
Training Validation Testing and F1-Score = 95.52%, which are displayed in Table IV.
Method
(%) (%) (%) The performance metrics obtained from experiments
CNN+VIT+ PCA+ conducted on the two-class Mendeley LBC dataset are
LDA+SVM 100 100 100
(Proposed Method) accuracy, precision, recall, and F1-Score, all of which are
CNN+VIT+ PCA+ 100%, displayed in Table V. The performance metrics
LDA+K-NN 100 100 100
(Proposed Method) obtained from experiments conducted on the three-class
CNN+VIT+ PCA+ SIPaKMeD dataset are as follows: accuracy = 98.52%,
LDA+MLP 100 100 100
(Proposed Method) precision = 98.81%, recall = 97.80%, and F1-Score =
CNN+VIT+ PCA+ 98.28%, which are displayed in Table VI.
LDA+LR 100 100 100
(Proposed Method)
CNN+PCA+ V. CONCLUSIONS AND FUTURE WORK
GWO+SVM 99.60 98.47 95.44
(Basak, et al.[9]) The application of computer-based machine learning in
Table XIII: Accuracy comparison of the suggested approach with the medical image analysis has proven to be an excellent
existing approach (Three classes of SIPaKMeD dataset, solution for addressing the limitations associated with
(Train:70%-Val:15%-Test:15%)) manual data analysis. This approach effectively mitigates the
issue of data redundancy and expedites the analysis process.
Training Validation Testing The impetus for this study stems from the significant
Method
(%) (%) (%)
CNN+VIT+ PCA+
mortality rates associated with cervical cancer across
LDA+SVM 100 97.2 95.39 multiple nations, prompting exploratory inquiries into the
(Proposed Method) potential of machine learning techniques to address
CNN+VIT+ PCA+
LDA+K-NN 100 97.2 95.56 diagnostic challenges. Our motivation prompted us to create
(Proposed Method) a framework that utilizes CNN and ViT, which have the
CNN+VIT+ PCA+ ability to extract intricate information. The feature
LDA+MLP 100 97.2 96.05
(Proposed Method) dimensions derived by CNN and ViT feature extraction are
CNN+VIT+ PCA+ reduced to generate important features corresponding to the
LDA+LR 100 98.02 96.05
(Proposed Method) data type.
CNN+PCA+
GWO+SVM 95.95 92.72 91.95 This research presents an alternate method for picking
(Basak, et al.[9]) features from a vast feature space obtained using CNN and
Table XIV: Accuracy comparison of the suggested approach with the ViT feature extraction. This method utilizes PCA and LDA
existing approach (Three classes of SIPaKMeD dataset, to generate optimal features. PCA and LDA are often
(Train:80%-Val:10%-Test:10%)) employed as techniques for feature reduction, aiming to
preserve feature quality and minimize redundant features.
Training Validation Testing
Method
(%) (%) (%) The offered study yields superior outcomes in contrast to the
CNN+VIT+ PCA+ previous approach strategy. The achieved accuracy of the
LDA+SVM 100 97.04 96.05 suggested frameworks on three freely accessible datasets,
(Proposed Method)
CNN+VIT+ PCA+
namely Herlev, Mendeley LBC, and SIPaKMeD, 97.83%
LDA+K-NN 100 96.79 96.3 (SVM, K-NN, MLP, and LR), 100% (SVM, K-NN, MLP,
(Proposed Method)
and LR), and 98.52% (SVM, K-NN, and LR).
CNN+VIT+ PCA+
LDA+MLP 100 97.53 96.79
(Proposed Method)
This research can be enhanced by integrating the CNN
CNN+VIT+ PCA+ convolution layer with the ViT transformer block. In
LDA+LR 100 97.28 96.79 addition to integrating architectures, it is imperative to
(Proposed Method)
CNN+PCA+ consider the utilization of feature selection techniques to
GWO+SVM 96.52 92.71 93.19 effectively identify significant features, hence enhancing the
(Basak, et al.[9])
overall optimization of classification outcomes.
295
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.
The 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)
296
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on March 12,2025 at 08:21:32 UTC from IEEE Xplore. Restrictions apply.