0% found this document useful (0 votes)
29 views6 pages

Comparing Transformers and CNN Approaches For Malware Detection A Comprehensive Analysis

Uploaded by

archieanil230
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Comparing Transformers and CNN Approaches For Malware Detection A Comprehensive Analysis

Uploaded by

archieanil230
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE - 61001

Comparing Transformers and CNN Approaches for


2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT) | 979-8-3503-7024-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICCCNT61001.2024.10724417

Malware Detection: A Comprehensive Analysis


Sarath Jayan Nair Sreelakshmi R Syam
Center for Cybersecurity Systems and Networks Center for Cybersecurity Systems and Networks
Amrita Vishwa Vidyapeetham Amrita Vishwa Vidyapeetham
Amritapuri, India Amritapuri, India
[email protected] [email protected]

Abstract—Detecting malicious software, known as malware, is and identify malicious software. This method is ineffective
crucial in cybersecurity due to the constantly evolving threats and in cases of new and unknown malware. Heuristic methods
the ways in which malware tries to avoid detection. This research analyze the behaviour of the programs to detect malware based
investigates the efficacy of deep learning models for multi-class
malware classification using the MaleVis image dataset. Research on suspicious patterns. In an image-based malware detection
on using transformer architectures for malware detection and approach, the binary data are converted to RGB or greyscale
classification is limited. The proposed study explores using the images, followed by machine learning or deep learning models
Convolutional Vision Transformer (CvT) for detecting malware, to analyze these images for patterns characteristic of var-
comparing its performance with the Vision Transformer (ViT) ious malware families. The transformation of binaries into
and the pre-trained Convolutional Neural Network, EfficientNet
B0. Each model is fine-tuned on the MaleVis dataset to distin- visual format allows for the application of vision transformers
guish between different malware categories and benign samples. and convolutional neural networks (CNNs) to uncover subtle
A comprehensive assessment using multiple evaluation metrics anomalies and correlations in malware images providing a
suggests CvT outperformed the other models, with an F1-Score dynamic and robust detection method [8].
of 0.96054. ViT followed closely with a score of 0.95821, while Previous research works such as in [9], were among the
EfficientNet B0 scored 0.87386. The research aims to contribute to
cybersecurity advancements by leveraging modern deep learning earliest to use malware byteplot images for classification
techniques for enhanced malware detection. through image processing for feature extraction and faster
Index Terms—Malware, Convolutional Vision Transformer, classification. Subsequently, CNN algorithms prevalent in
Vision Transformer, EfficientNet B0, Convolutional Neural Net- computer vision, were adapted for malware detection and
work. classification. The seminal work by Dosovitskiy et al., [10]
established the basis for employing transformer architecture
I. I NTRODUCTION in image classification, subsequently facilitating its application
Malware can be considered as any malicious software in malware detection. More recent studies, such as [11] em-
disrupting the intended operation of a system or gathers sensi- ploying CNNs and [12] utilizing vision transformers, primarily
tive information. It encompasses viruses, worms, ransomware, focus on datasets composed exclusively of malware samples,
spyware, trojan, etc., categorized depending on the type of hampering the model’s capability to effectively differentiate
attack and its functioning [1]. Some of the widely known between benign and malicious samples within a detection
cyber attacks performed include the Stuxnet worm, WannaCry framework. Moreover, the research in [13] achieves a notable
ransomware, NotPetya Malware, Equifax data breach, etc accuracy of 97% in binary classification but performs poorly
[2]. Every year, malware attacks create substantial risks to with 0.497 macro F1-score in multiclass classification.
computer systems, networks, and mobile devices. Security This research work investigates the application of the Con-
organizations are processing a growing number of malware volutional Vision Transformer (CvT) for malware detection.
samples, with some handling over 450,000 malware and CvT, introduced in [14], leverages the combined advantages
potentially unwanted applications (PUAs) per day [3]. As of CNN and Vision Transformer, demonstrably improving
the volume of malware continues to rise, manual analysis image recognition tasks. Fine-tuning a pre-trained CvT model
becomes impractical. Machine learning and deep learning on the MaleVis dataset is proposed, which encompasses 25
offer promising solutions to accelerate the analysis process malware classes and a benign class. Furthermore, a compar-
[4][5]. Deep learning methods are capable of determining the ative analysis will be conducted to evaluate the performance
prominent features leveraging its deep neural networks rather of CvT against Vision Transformers and EfficientNet B0, a
than relying solely on the proficient domain knowledge for conventional CNN model noted for its effectiveness in [11].
identifying the features for classification [6][7].
Former approaches for malware classification depended on II. R ELATED W ORK
signature-based and heuristics-based methods. Signature-based Continued proliferation of malware necessitates the need
relied on an existing database of known malware to compare for diverse detection methods. In response to this challenge,

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

researchers have investigated various techniques, particularly a separate study [13], Sachith et al. introduce SHERLOCK, a
the transformation of malware binaries into image data for novel deep learning model based on the vision transformer ar-
classification. Nataraj et al. were among the first to use this chitecture and self-supervised learning for malware detection.
method [9], observing visual patterns in malware images SHERLOCK demonstrates robust learning from unlabeled
could effectively distinguish between families of malware, data, enhancing generalization and the ability to detect new
through image processing techniques to classify these patterns. malware types, achieving 97% accuracy in binary classifica-
This method promises a reduction in the feature extraction tion. However, its macro F1-scores for classifying up to 47
complexity and potentially faster classification times compared types and 696 families were 0.497 and 0.491, respectively.
to traditional methods requiring detailed binary or behavioural Limited research currently applies transformer architec-
analysis. Another study by Ben et al. [15] builds upon this, by ture specifically for malware detection. Previous studies have
employing the K-Nearest Neighbors (KNN) algorithm in along predominantly focused on datasets composed exclusively of
with GIST descriptors for feature extraction from malware malware samples. Although these studies provide significant
images. The study uses a dataset consisting of malware from insights into malware classification, the lack of benign samples
25 different families. The model achieved a high classifica- in the model training dataset impedes the model’s ability
tion accuracy of 97%, demonstrating the efficacy of image- to distinguish between malware and non-malicious samples
based features for malware classification. The authors suggest effectively. In this study, the MaleVis dataset is utilized, which
better performance can be achieved using deep learning tech- includes 25 malware classes and one benign class. Moreover,
niques. In [16], the authors propose a novel lightweight vision exploration is conducted on Convolutional Vision Transform-
transformer for malware detection applications in resource- ers (CvT), integrating the strengths of both convolutional
constrained devices. The proposed model transforms exe- neural networks (CNNs) and Vision Transformers (ViTs) to
cutable bytecode into images for the transformer model to improve the efficiency and effectiveness of image recognition
learn. The results suggested that the achieved accuracy of tasks [14]. Additionally, a comparative analysis of CvT, ViT,
94% outperforms the traditional CNN models also confirming and a traditional CNN model (EfficientNet B0) is perfomed to
that the ViTs do not require deeper network layers to achieve assess it’s performance in malware detection.
similar performance.
Jayasudha et al. [11] explore the effectiveness of six popular III. M ETHODOLOGY
CNN architectures across three datasets with varying class A. System Model
imbalances: Malimg (highly imbalanced), MaleVis (balanced),
The proposed system model is illustrated in figure 1. The
and a blended dataset (intermediately imbalanced). The au-
research utilizes the MaleVis dataset, partitioned into three
thors highlight the shortcomings of traditional signature-based
subsets for training, validation, and testing. Data preprocessing
methods in modern malware detection and propose transfer
includes resizing images, applying data augmentation to the
learning as a solution. Transfer learning is suggested for its
training data, and normalizing pixel values. This preprocessed
ability to automate feature extraction and recognize patterns
data is subsequently fine-tuned using three pre-trained models:
indicative of malicious behavior. Results indicate that model
the Vision Transformer, Convolutional Vision Transformer,
performance varies significantly with class imbalance, with
and EfficientNet-B0. The performance of these fine-tuned
fewer training epochs needed for convergence on more imbal-
models is assessed on the test data using multiple evaluation
anced datasets. ResNet50, EfficientNetB0, and DenseNet169
metrics.
performed well across all dataset types, while VGG16 and
XceptionNet showed sensitivity to imbalances. The study
achieved up to 97% precision on the highly imbalanced dataset
and 95% on the intermediate and balanced datasets. However,
the absence of benign byteplot images in training data poses a
challenge for real-world application, potentially hindering the
model’s ability to distinguish between malicious and legitimate
software.
Following the successful adoption of transformer architec-
tures in image classification tasks, exemplified by Dosovitskiy
et al. [10], researchers began exploring its application in Fig. 1. Proposed System Model
malware classification. Ben et al. [12] compared the vision
transformer with CNN models using the Malimg dataset, em-
phasizing the transformer’s improved performance, especially B. Dataset
with large and complex datasets. The study highlight the The MaleVis dataset offers a collection of malware vi-
importance of choosing the optimal model based on specific sualizations specifically designed for malware analysis and
task requirements and computational resources. However, like detection research. The concept behind the dataset is to trans-
previous studies, this research is limited to malware samples, form malware binaries into visual image formats, allowing
potentially reducing its applicability in real-world settings. In for the examination of distinctive patterns and characteristics

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

associated with malicious software. The Multimedia Informa- uniformity in the content of the images being analyzed. By
tion Lab under the Department of Computer Engineering of implementing these preprocessing techniques, which are inte-
Hacettepe University in collaboration with Comodo Inc. came grated using PyTorch’s ‘torchvision.transforms’ module, deep
up with the MaleVis dataset [17]. It comprises 14,226 RGB learning models can be trained more effectively, enhancing the
images across 26 classes, including 25 malware categories performance and reliability.
and 1 legitimate class. The binary data sourced from Comodo
Inc. were transformed into 3-channel RGB images using the D. Model Fine-Tuning
’bin2png’ tool and resized into square formats of 224x224 and The research work uses two pre-trained transformer archi-
300x300 pixels. For the purposes of this study, the dataset tectures Vision Transformer (ViT) and Convolutional Vision
utilized is of the 224x224 format. It features a broad range of Transformer (CvT) from the Huggingface transformers library.
malware types, providing a robust base for the training and Fine-tuning of these transformer models is performed using
testing of machine learning and deep learning models. Each the preprocessed MaleVis training data to adapt them to spe-
dataset entry consists of an image representing the original cialized malware byteplot image classification tasks. Initially,
binary data of either malware or legitimate software, complete ‘ViTForImageClassification’ and ‘CvtForImageClassification’
with labels specifying the category. The distribution of the models are utilized, provided by the Hugging Face ‘trans-
various classes within the dataset is illustrated in figure 2. formers’ library, pre-configured for image classification tasks.
Then, adjusts the models’ output layer to correspond to the
number of unique labels in the dataset, ensuring the model’s
predictions are tailored to specific classification requirements.
The training process is configured using the ‘TrainingArgu-
ments’ class, allowing control over the training process. Key
parameters include setting a moderate batch size to balance
computational efficiency and model performance, here 16,
and employing a lower learning rate of 0.0002 to fine-tune
the pre-trained weights lightly. The training process is set to
evaluate the models’ performance at the end of each epoch,
saving checkpoints only if there is an improvement in the F1
score, which is used as primary performance metric due to
its relevance in balancing precision and recall, particularly
in datasets with uneven class distributions. The fine-tuning
Fig. 2. MaleVis Data Distribution process is executed using the ‘Trainer’ class, streamlining the
training, validation, and testing of the model.
C. Data Preprocessing The EfficeintNet-B0 used is a pre-trained CNN model from
Data preprocessing is an essential first step for enhancing the TensorFlow Keras applications module. This model is aug-
the performance of deep learning models. It involves preparing mented with two additional layers: a global average pooling
the image data to improve its quality and consistency for layer and a subsequent prediction layer equipped with 26 units
effective model training and evaluation. Images are resized and a sigmoid activation function, making it suitable for multi-
to the necessary scale of 224x224 as required by the CvT, label classification. Prior to fine-tuning, out of the available
ViT, and EfficientNet-B0 models used in this study. The nor- 238 layers in the model, the first 150 layers are frozen,
malization process calculates the mean and standard deviation allowing only the subsequent layers to be updated during
for each RGB channel and transforms the channel values by training. The model is compiled using the Adam optimizer and
subtracting the mean and dividing it by the standard deviation. categorical cross-entropy loss, well-suited for tasks involving
This normalizes the pixel values, facilitating better model multiclass classification. Finally, the model was fine-tuned for
convergence during training by keeping the scale and distri- 20 epochs using training data, and incorporates validation
bution uniform. Additionally, data augmentation strategies are through validation data to ensure the model’s performance is
applied to the training data to improve the models’ robustness effectively measured and to analyze the chances of overfitting
and generalization capabilities. These augmentation techniques in the model.
include random resized cropping and randomized horizontal This fine-tuning approach enhances the model’s perfor-
flipping. The former technique randomly crops and resizes mance on target image classification task of malware detection
images to a consistent size, aiding the model in recognizing and classification from byteplot images.
patterns and objects at various scales and orientations. The
latter involves randomly flipping the image horizontally with E. Evaluation Metrics
a 0.5 probability, adding variability to the training set without No single metric captures all aspects of a model’s per-
needing extra data collection, thus enhancing the model’s formance. To comprehensively evaluate the CvT, ViT, and
resilience to rotations and generalization capacity. For vali- EfficientNet-B0 models in the multiclass classification task,
dation and test datasets, resizing and center cropping ensure multiple metrics are employed. Accuracy offers a general

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

overview, but it doesn’t distinguish between error types. Subsequently, the model’s performance is assessed on un-
Macro-averaging addresses the by providing average precision, seen test data. It records an F1-score of 0.95821 and a
recall, and F1-score across all classes, giving each class equal precision of 0.96148. The confusion matrix presented in
weight and avoiding bias. F1-score, the harmonic mean of figure 4 provides a more detailed understanding of the model
precision and recall, is given more emphasis in the study due performance and its errors.
to its balanced perspective and ability to effectively trade-
off between these metrics, leading to a more comprehensive
evaluation.

IV. R ESULTS
The research work is carried out using the MaleVis dataset.
Subsets of the dataset are created for training, validation and
testing purposes. This data is utilized to fine-tune two pre-
trained transformer models Vision Transformer and Convo-
lutional Vision Transformer, and a pre-trained CNN model
EfficientNet-B0. The raw image data undergoes resizing with
respect to the model requirements. The 3-channel RGB values
are also normalized. Data augmentation is performed on the
training data, adding generalization ability and better robust-
ness to the model. Resizing and centre cropping are done on
the validation and test data to incorporate uniformity in the
analyzed images. The models are fine-tuned on the dataset
with the help of the Huggingface transformers library and
Tensorflow Keras application module.

A. Vision Transformer (ViT)


The ViT model undergoes fine-tuning with training data
and is assessed using validation data. Figure 3 illustrates the
descending trend in both training and validation losses as Fig. 4. ViT Model Confusion Matrix
the number of epochs increases. This demonstrates effective
learning and convergence by the model. The model does not
show any signs of overfitting as there is no widening of gap B. Convolutional Vision Transformer (CvT)
between the training and validation loss. Beyond 12 epochs, The training and validation losses of the CvT model during
the validation losses show minor fluctuations but generally its fine-tuning is depicted in figure 5. The plot suggests the
remain stable or continue to decrease slightly. This pattern model is effectively learning from the train data as the loss
suggests the model reaches a stable state with minimal gains is decreasing over increasing epochs. In epochs, particularly
from additional training epochs, impying diminishing returns from epoch 14 onwards, the reduction in both training and
from further training past a certain point. The best-performing validation losses becomes less pronounced, indicating the
model configuration is preserved by the training system. model is approaching a plateau in its learning where additional
epochs yield smaller improvements in loss reduction. Overall,
the data implies the CvT model fine-tunes effectively, learns
efficiently from the training data, and generalizes well to
unseen data without significant overfitting.

Further, the fine-tuned model is tested on unseen data,


resulting in an F1-score of 0.96054 and a precision score
of 0.96523. Figure 6 illustrates the confusion matrix of the
CvT model giving more information about the model’s per-
formance.

C. EfficientNet-B0
Fine-tuning the EfficientNet-B0 architecture resulted in an
overall decrease in both training and validation loss as shown
in figure 7. Initially, the validation loss is higher than the
Fig. 3. ViT Training-Validation Loss training loss, highlighting difficulties in the model’s capability

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

Fig. 5. CvT Training-Validation Loss Fig. 7. EfficientNet-B0 Training-Validation Loss

Fig. 8. EfficientNet-B0 Training-Validation Accuracy

breakdown of the model’s classification errors is provided by


the confusion matrix presented in figure 9.

D. Comparative Analysis
Table I presents a detailed performance comparison of the
Fig. 6. CvT Model Confusion Matrix multiclass classification executed using three models: ViT,
CvT, and CNN. The data reveals that the CvT model, with
an F1-score of 0.96054, slightly surpasses the ViT, which
to generalize the validation data at the onset of training. achieved a score of 0.95821 and significantly exceeds the
On continued training, generalization improves, although the performance of EfficientNet-B0, which recorded an F1-score
persistent fluctuations in validation loss indicate potential of 0.87386. The CvT model demonstrates a higher F1-score
areas for optimization. Despite fluctuations, the overall compared to both the ViT and EfficientNet-B0 models. Besides
decrease in both training and validation losses indicates the achieving impressive performance metrics, the CvT model also
model is able to learn. The significant drops in validation provides substantial benefits in terms of training efficiency.
loss at several points suggest the model achieves better On using A100 GPU from Google Colab, CvT required
generalization periodically. Figure 8 illustrates the training approximately 43 minutes to complete training, significantly
and validation accuracy of the CNN model, with both metrics shorter than the 91 minutes taken by the ViT model. When
showing significant improvement early on as the training evaluating both the F1-score and training duration, the CvT
epochs increase. The initial disparity between training and model emerges as the most efficient and effective option
validation accuracies, diminishes as the epochs advance. By among the three examined.
the 20th epoch, the accuracy measures begin to stabilize.
V. C ONCLUSION
Evaluation on unseen test data yielded an F1-score of The research work implements the fine-tuning of a pre-
0.87386 and a precision score of 0.89240. A detailed trained Convolutional Vision Transformer (CvT) for malware

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India
IEEE - 61001

[3] AV-Test, “Malware statistics & trends report 2024.” https://www.av-tes


t.org/en/statistics/malware/.
[4] R. Vinayakumar, K. Soman, P. Poornachandran, and V. K. Menon, “A
deep-dive on machine learning for cyber security use cases,” in Machine
Learning for Computer and Cyber Security, pp. 122–158, CRC Press,
2019.
[5] K. M. Balasubramanian, S. V. Vasudevan, S. K. Thangavel, G. Kumar,
K. Srinivasan, A. Tibrewal, and S. Vajipayajula, “Obfuscated malware
detection using machine learning models,” in 2023 14th International
Conference on Computing Communication and Networking Technologies
(ICCCNT), pp. 1–8, IEEE, 2023.
[6] C. Reilly, S. O Shaughnessy, and C. Thorpe, “Robustness of image-
based malware classification models trained with generative adversarial
networks,” in Proceedings of the 2023 European Interdisciplinary Cy-
bersecurity Conference, pp. 92–99, 2023.
[7] B. A. V. Vidyapeetham, “Api call based malware detection approach
using recurrent neural network—lstm,” in Intelligent Systems Design
and Applications: 18th International Conference on Intelligent Systems
Design and Applications (ISDA 2018) held in Vellore, India, December
6-8, 2018, Volume 1, vol. 940, p. 87, Springer, 2019.
[8] S. Akarsh, K. Simran, P. Poornachandran, V. K. Menon, and K. Soman,
“Deep learning framework and visualization for malware classification,”
in 2019 5th International Conference on Advanced Computing & Com-
munication Systems (ICACCS), pp. 1059–1063, IEEE, 2019.
[9] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware
images: visualization and automatic classification,” in Proceedings of the
8th international symposium on visualization for cyber security, pp. 1–7,
2011.
[10] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
Fig. 9. EfficientNet-B0 Model Confusion Matrix T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al.,
“An image is worth 16x16 words: Transformers for image recognition
at scale,” arXiv preprint arXiv:2010.11929, 2020.
TABLE I [11] M. Jayasudha, A. Shaik, G. Pendharkar, S. Kumar, B. Muhesh Kumar,
M ULTICLASS C LASSIFICATION R ESULTS and S. Balaji, “Comparative analysis of imbalanced malware byteplot
image classification using transfer learning,” in International Conference
Metric ViT CvT EfficientNet-B0 on Power Engineering and Intelligent Systems (PEIS), pp. 313–324,
Macro F1-Score 0.95821 0.96054 0.87386 Springer, 2023.
Macro Precision 0.96148 0.96523 0.89240 [12] I. Ben abdel ouahab, L. Elaachak, and M. Bouhorma, “Enhancing
Macro Recall 0.95769 0.95934 0.86758 malware classification with vision transformers: A comparative study
Accuracy 0.95769 0.95934 0.86758 with traditional cnn models,” in Proceedings of the 6th International
Conference on Networking, Intelligent Systems & Security, pp. 1–5,
2023.
[13] S. Seneviratne, R. Shariffdeen, S. Rasnayaka, and N. Kasthuriarachchi,
detection using the MaleVis dataset consisting of RGB images “Self-supervised vision transformers for malware detection,” IEEE Ac-
belonging to 25 malware classes and 1 benign class. A cess, vol. 10, pp. 103121–103135, 2022.
comprehensive comparison of the CvT is performed against [14] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang,
“Cvt: Introducing convolutions to vision transformers,” in Proceedings
Vision Transformer (ViT) and a CNN model EfficientNet-B0. of the IEEE/CVF International Conference on Computer Vision (ICCV),
All the models are fine-tuned using the same dataset. The pp. 22–31, October 2021.
performance evaluation reveals the CvT model, with an F1- [15] B. A. O. Ikram, B. Mohammed, B. A. Abdelhakim, E. A. Lotfi, and
B. Zafar, “Machine learning application for malwares classification
score of 0.96054, slightly surpassed the ViT, having a score using visualization technique,” in Proceedings of the 4th international
of 0.95821 and significantly outperformed the EfficientNet-B0, conference on smart city applications, pp. 1–6, 2019.
which recorded an F1-score of 0.87386. Moreover, the CvT [16] A. Ravi, V. Chaturvedi, and M. Shafique, “Vit4mal: Lightweight vision
transformer for malware detection on edge devices,” ACM Transactions
model showcased exceptional training efficiency, requiring on Embedded Computing Systems, vol. 22, no. 5s, pp. 1–26, 2023.
only half the time required by the ViT. The results imply [17] A. S. Bozkir, A. O. Cankaya, and M. Aydos, “Utilization and compari-
the CvT model offers an optimal balance of high accuracy sion of convolutional neural networks in malware recognition,” in 2019
27th signal processing and communications applications conference
and efficiency, making it a preferable choice in scenarios (SIU), pp. 1–4, IEEE, 2019.
where both performance and speed are critical. Future studies
could broaden the scope to include diverse datasets to vali-
date and potentially strengthen the robustness of the results.
Furthermore, studies could focus on optimizing the training
duration and computational resources required by the models,
particularly transformer models.
R EFERENCES
[1] S. Talukder, “Tools and techniques for malware detection and analysis,”
arXiv preprint arXiv:2002.06819, 2020.
[2] M. Thakur, “Cyber security threats and countermeasures in digital age,”
Journal of Applied Science and Education (JASE), pp. 1–20, 2024.

15th ICCCNT IEEE Conference,


Authorized licensed use limited to: Amrita School Of Engineering - Kollam.June 24-28, 2024,
Downloaded on November 15,2024 at 09:40:03 UTC from IEEE Xplore. Restrictions apply.
IIT - Mandi, Kamand, India

You might also like