Mani
Mani
Originality Assessment
13%
Overall Similarity
v 8.0.7 - WML 3
FILE - TRANSFORMER BASED SKIN CANCER CLASSIFICATION - 1.DOCX
A project report on
Analytics
by
April, 2024
Analytics
by
April, 2024
DECLARATION
I here by declare that the thesis entitled “Transformer Based Skin Cancer Classification”
submitted by me,for the award of the degree of M.Tech. (Integrated) Computer Science
Chennai, is are cord of bonafide work carried out by me under the supervision of “DR.
Rajesh R”
I further declare that the work reported in this thesis has not been submitted and will not be
submitted, either in part or in full, for the award of any other degree or diploma in this
Place: Chennai
CERTIFICATE
This is to certify that the report entitled “Transformer Based Skin Cancer Classification” is
prepared and submitted by Darsi Venkata Sai Mahidhar (20MIA1016) to Vellore Instituteof
Technology, Chennai, in partial fulfillment of the requirement for the award of the degree of
M.Tech. (Integrated) Computer Science and Engineering with Specialization in Business
Analytics programme is a bonafide record carried out under my guidance. The project
fulfills the requirements as per the regulations of this University and in my opinion meets
the necessary standards for submission. The contents of this report have not been
submitted and will not be submitted either in part or in full, for the award of any other
Date:
Name: Name:
Date: Date:
ABSTRACT
Skin cancer is 38 one of the major health condition, as it increases steadily and requires
early diagnosis to allow for an effective treatment outcome. Despite the advancement in
diagnostic imaging and machine learning technologies, the diseases of various skin lesions
in dermoscopic images are still posed with complexity due to the textures, colors, and
shapes that simulate benign and malignant lesions. In this context, a novel scheme for skin
cancer classification using BEiT will be proposed, which overcomes some of the
inadequacies pertaining to traditional CNNs and even more advanced models like VGG-19
with an even more effective mechanism for capturing global contextual information.
With a huge ISIC 2019 dataset containing thousands of labeled dermoscopic images, our
model will try to classify eight different classes of skin lesions, including , and other groups
for a total of nine classes. To handle the potential class imbalances in our dataset, we
augment the data even further to include 9,560 images. The augmentation of the data
reduced class bias and enhanced the model's generative capability for images.
Image resizing to 224 x 224 took place during preparation. The input size is very widely
used by models such as BEiT for efficient processing and to extract enough detail. The
dataset had training, validation, and testing sets at a ratio of 80:10:10 for proper robust
evaluation. Of course, the existence of hairs would interfere with the ideal classification, so
we included the technique for eliminating strands of hair using a black hat filter; it is a
Our design for this experiment included the comparative analysis between CNN, VGG-19,
1 Vision Transformer (ViT), and BEiT models to get more information in line with the
strengths and weaknesses of each model when it deals with dermoscopic images.
Employing the self-attention mechanism and pretraining on knowledge about the patches
of images, BEiT reached better classification accuracies compared to CNN and VGG-19 in
the detection of faint patterns spreading all over an image. This means that the model is
more appropriate for tasks involving the capture of a global context combined with fine-
grained details, that make the main difference in identifying visually similar types of skin
lesions.
The findings of the project demonstrate that there 2 is a great potential for the
improvement of the quality of diagnosis of skin cancer through the analysis of dermoscopic
techniques within medical imaging as a very promising tool for clinicians in terms of making
an early diagnosis and finding ways to treat patients with skin cancer appropriately for
better outcomes. Future work includes refining this model with other image modalities,
more elaborate data augmentation strategies, and clinical validation to make it more
Key words: ISIC (International Skin Imaging Collaboration), CNN, VGG19, VIT(Vision
more than all, he taught me patience in my endeavor. My association with him is not
It is with gratitude that I would like to extend my thanks to the visionary leader Dr. G.
Viswanathan our Honorable Chancellor, Mr. Sankar Viswanathan, Dr. Sekar Viswanathan,
Dr. G V Selvam Vice Presidents, Dr. Sandhya Pentareddy, Executive Director, Ms.
Special mention to Dr. Ganesan R, Dean, Dr. Parvathi R, Associate Dean Academics, Dr.
Vellore Institute of Technology, Chennai for spending their valuable time and efforts in
Head of the Department, Project Coordinator, Dr. Yogesh C, SCOPE, 34 Vellore Institute
of Technology, Chennai, for their valuable support and encouragement to take up and
29 My sincere thanks to all the faculties and staff at Vellore Institute of Technology,
Chennai, who helped me acquire the requisite knowledge. I would like to thank my parents
for their support. It is indeed a pleasure to thank my friends who encouraged me to take up
Place: Chennai
Mahidhar
CONTENTS
CONTENTS 8
LIST OFFIGURES 9
LIST OFTABLES 10
LISTOFACRONYMS 11
1. CHAPTER - 1
INTRODUCTION
1.1 INTRODUCTION 12 - 13
1.4 OBJECTIVES 15
1.5 CHALLENGES 15
2. CHAPTER – 2 BACKGROUND STUDY
3.3.1 PRE-PROCESSING 28 - 29
3.3.2 VISUALIZATION 30
APPENDIX……………………………………………………………………………46 – 77
REFERENCES………………………………………………………………………...78 - 79
LIST OF FIGURES
LIST OF TABLES
Introduction
1.1 12 INTRODUCTION
Skin cancer constitutes a significant public health concern on a global scale, with rising
incidence rates highlighting the necessity for prompt and precise diagnostic measures to
paper comes up with a novel method for the classification of skin cancer, using the model
architectures of deep learning, CNNs, and VGG-19, BEiT offers better ability to capture
subtle features in the global level from images, making it an excellent candidate for
handling small differences found in dermoscopic images, where understanding the context
Utilizing the ISIC 2019 dataset, which contains numerous labeled images representing
balanced collection of 9,560 images. This comprehensive dataset facilitated the training
and assessment of our model on a wide variety of cases, thereby improving its robustness
We resized all the 44 images to 224x224 pixels, as BEiT requires the input in this size. A
black hat filter was also used; hair artifacts are a very common problem in dermoscopic
imaging. Hair artifacts obscure 1 most of the lesion details and cause impairment of
From the comparative study, the model comprising CNN, VGG-19, ViT (Vision
Transformer), and BEiT demonstrated better classification accuracy than the models and
therefore suggests a better capacity to process complex dermoscopic images. This implies
that transformer-based frameworks hold great potential for being used in medical imaging
applications, considering the deep need for precise accuracy while identifying slight visual
patterns.
imaging, this research facilitates a trajectory toward heightened accuracy and earlier
detection and also opens up avenues for prospective advancements within the same
domain to encompass various applications in another type of cancers and other medical
accurate and sensitive detection methods. Though melanoma is the deadliest 1 of all the
types of skin cancer, timely diagnosis substantially improves survival opportunities. The
conventional methods of detection are mainly based on specialized clinical experience and
might take a lot of time. Thus, automated classification methods gained popularity with
deep learning models, like CNNs, proving to be powerful for skin cancer images. All CNNs
are acknowledged for their ability to learn high-order abstractions of complex image
features but fall short in many cases relating to the identification and capture of global
context within dermoscopic images, which may sometimes be crucial for distinguishing
the model uses self-attention to capture local and global features in images, making it
show that the transformer 1 models, such as BEiT, tend to outperform traditional CNNs in
The project applies the 2 BEiT model to classification of skin cancer using the ISIC 2019
dataset, comparing its performance with CNN and VGG-19 models. The goal of the results
is to prove that BEiT could emerge as a more accurate 11 tool in the detection of skin
cancer, ultimately beneficial to advanced diagnostic aids for the early detection of cancer
dermoscopic images, a process that demands substantial expertise and remains prone to
human error. Deep learning frameworks, particularly CNNs, have also shown promise in
supporting the process of image-based diagnostics, which are used to identify images but
often fail to well articulate the entire conceptual framework 4 of the image, and therefore
models, such as Vision Transformer and Bidirectional Encoder Representation from Image
Transformers, have yielded some promise of capturing effective local as well as global
classification using dermoscopic images. This research aims to fill this gap by investigating
the accuracy and reliability of transformer models for the sake of classifying skin cancer,
eventually towards developing a tool that clinicians can confidently use in order to improve
1.4 OBJECTIVES
namely ViT and BEiT, utilizing dermoscopic images of skin cancer lesions. Transformers
have been proven to have significant potential within the analysis of global and local
features from highly complex image data, thanks to their self-attention mechanisms. It will
discuss just how effective these are in analyzing dermoscopic images, with the inherent
using transformer models as better contextual and accurate diagnostic aids for skin cancer.
such a context, one can compare traditional Convolutional Neural Networks like VGG-19
and ResNet-50. CNNs tend to dominate image classification but still involve local feature
extraction, and they do not therefore tend to use any global context for representing
complex images. We will evaluate and compare these models across several performance
metrics, including accuracy, precision, recall, and F1-score, to identify strengths and
limitations of each approach. By comparing 7 the classification results, this study will
provide insights into whether transformers offer a meaningful performance advantage over
1.5 CHALLENGES
Many skin lesion types have very similar visual appearances, being rounded with relatively
minor color differences. As such, models would interpret classes as very hard to classify
because the minor, sometimes subtly nuanced differences between classes are those they
need to capture. The visual overlaps thus easily lead to misclassifications, especially when
Computation Power
processing power, especially when complex models like transformers are applied. Efficient
use of such computational resources is critical to keep up with the demands without
Preprocessing Requirement
The quality, lighting, and image conditions 3 of dermoscopic images are very different,
frequently found 1 in dermoscopic images obscure lesion features and interfere with
model learning. Therefore, hair removal is needed as a preprocessing step so that the
The model, therefore, for practical use in clinical environments must be fast enough to
deliver real time results. Optimism on the model should find a balance between speed and
accuracy for rapid diagnosis and reliable performance in the real world without compromise
diagnosing skin cancers. Multiclassification turns out to pose a very challenging task
because classification between malignant and benign lesions requires a very high level of
precision. Therefore, it is very critical to create consistent models across all classes while
CHAPTER 2
BACKGROUND STUDY
Actinic Keratosis, or solar keratosis, is a pre-cancerous skin lesion formed as the result of
extensive exposure to ultraviolet light, most commonly from sunlight. It presents most often
as rough skin patches covered with scales; this occurs in sun-exposed regions-such as the
face, neck, and hands. Whereas AK itself is not malignant, it can continue to degenerate
27 Melanoma is the most aggressive form of skin cancer and originates from melanocytes-
cells responsible for the production of melanin, the pigment that gives the skin its color. 33
Melanoma can originate from all areas of the skin and often begins as a new or changing
It is the smaller component of cases 12 of skin cancer but it accounts for all major deaths
from skin cancers due to its high metastatic potential. Early detection leads to successful
treatment.
benign growth of melanocytes. These lesions are usually benign and could present
anywhere on the skin as small, round or oval-shaped macules with uniform pigmentation.
While most melanocytic nevi are benign, some atypical or dysplastic nevi might be more
prone to becoming malignant, even culminating into melanoma. Therefore, any changes in
Squamous cell carcinoma (SCC) Among the most common types of non-melanoma skin
cancers, are Squamous Cell Carcinoma of skin due to squamous cells in the epidermis.
SCC is generally due to built-up exposure to ultraviolet, and it typically presents as firm red
nodules or as scaly, crusted patches that bleed or ulcerate. SCC tends to metastasize less
often than melanoma, but can invade other adjacent tissues and places 3 if left untreated.
The early intervention can easily handle SCC and prevent any further progression.
Basal Cell Carcinoma (BCC) is the most common skin carcinoma and originates in the
basal cells of the lower layer of epidermis. BCC often appears as pearly or waxy bumps,
flat pink patches, or sores that never heal. This condition is usually caused by too much
sun exposure and also tends to occur more commonly in fair-skinned individuals than
Although very rarely metastasizing, BCC grows in size and invades the surrounding
tissues, thus causing large amounts of local destruction. BCC is highly treatable, if
discovered early.
Dermatofibroma are benign skin growths that typically come to medical attention as firm,
raised nodules, sometimes with brown or reddish colors. They usually do not represent
malignant skin lesions and may follow minor cuts, such as bites by insects. The
dermatofibromas are generally innocuous 19 and should be left alone unless they become
more bothersome. Though no established relationship to any of the skins cancers has
Vascular lesions These are abnormal formations of blood vessels 12 in the skin, such as
small, benign growth, usually appear as red or purple spots on the skin. Even though 19
these are usually harmless, some vascular lesions require medical attention, such as large
or rapidly growing hemangiomas. Vascular lesions are not malignant and are not
composed of cancer cells; however, lesions have to be differentiated from malignant skin
Hrithwik et al. (2024)proposed a hybrid deep learning model combining VGG16 and
ResNet50 to improve skin cancer detection and classification. Utilizing a dataset of 3,000
images across nine skin conditions, the researchers addressed class imbalance through
class weights and emphasized rigorous data pre-processing. Their methodology involved
99.51% and a testing accuracy of 91.82%, while VGG16 and ResNet50 had lower testing
accuracies of 70.03% and 88.89%, respectively. The hybrid model outperformed individual
These findings underscore the effectiveness of the hybrid approach in enhancing skin
Shahriar Himel et al. introduces a 1 skin cancer classification approach using the Vision
Transformer (ViT), specifically Google's ViT-patch32 model, in conjunction with the
Segment Anything Model (SAM) for effective cancerous area segmentation. Leveraging
that the ViT-Google model achieved a high classification accuracy of 96.15% and an
impressive ROC AUC score of 99.49%, surpassing other tested models. Despite these
promising results, the research highlights a gap in the model’s applicability to diverse skin
types, as the dataset primarily represents fair-skinned individuals. Future work is proposed
to expand the dataset 7 to include a more diverse range of ethnic backgrounds and to
Naeem et al. presents SNC_Net, an advanced 3 model for skin cancer detection that
integrates handcrafted and deep learning features from dermoscopic images. This model
utilizes a convolutional neural network (CNN) alongside handcrafted feature extraction and
employs the SMOTE Tomek approach to address class imbalance. Evaluated on the ISIC
ResNet-101. Despite these impressive results, the research highlights a gap in applying
suggested to include federated learning for improved 3 model accuracy and broader
applicability.[3]
for skin cancer detection using both image and accompanying metadata. This framework
leverages BEiT, a vision transformer pre-trained on masked image modeling tasks, for
encoding images. 11 The proposed approach integrates encoded metadata with visual
highlights the potential for implementing this framework in telemedicine and other medical
adaptation for remote communities 1 to further improve the model's applicability and
[5] Yang et al. present 3 a novel skin cancer classification method utilizing a transformer-
based architecture for improved accuracy. Their approach involves four key steps: class
rebalancing 1 of seven skin cancer types, splitting images into patches and flattening
them into tokens, processing these tokens with a transformer encoder, and a final
classification block with dense layers and batch normalization. Transfer learning is
employed, with pretraining on ImageNet and fine-tuning on the HAM10000 dataset. This
method achieved a classification accuracy of 94.1%, surpassing the IRv2 model with soft
baseline models on the Edinburgh DERMOFIT dataset. Despite these advancements, the
study highlights the potential for further improvements with larger transformer models and
more extensive pretrained datasets, suggesting future work could enhance classification
[6] Pacal, Alaftekin, and Zengul present an advanced skin cancer diagnostic method using
an enhanced Swin Transformer model. Their approach integrates a hybrid shifted window-
regions and capture fine details while maintaining efficiency. Additionally, they substitute
accuracy, training speed, and parameter efficiency. Evaluated 2 on the ISIC 2019
dataset, the modified Swin model achieved an accuracy of 89.36%, outperforming both
traditional CNNs and state-of-the-art vision transformers. This study demonstrates the
significant potential of deep learning in enhancing diagnostic precision and efficiency for
skin cancer detection, highlighting its impact on improving patient outcomes and setting
[7]Gulzar and Khan address the challenge of accurately diagnosing melanoma skin cancer
by enhancing image segmentation techniques. They propose the hybrid TransUNet model,
combining Vision Transformers with U-Net, to capture detailed spatial relationships in skin
lesion images. This approach addresses limitations of pure transformers, which struggle
with small medical datasets and low-resolution features. Their 11 results show that
study highlights that while TransUNet excels 1 in accuracy and detailed segmentation, it
requires more training and inference time compared to simpler models. This indicates a
need for balancing accuracy with computational efficiency. Future research could integrate
capabilities and early detection of malignant skin lesions, aiming for a more comprehensive
analysis system.
[8] Arshed et al. propose using Vision Transformers (ViT) for 3 multi-class skin cancer
address 2 the challenge of class imbalance and dataset diversity by employing data
these advancements, the study points to the need for enhanced preprocessing techniques
to further improve model robustness and accuracy. This gap suggests that 2 future
research should focus on refining data handling and augmentation strategies to better
support ViT and other deep learning models in skin cancer diagnosis.
[9] Cirrincione et al. propose a Vision Transformer (ViT)-based model for melanoma
ISIC data. Their model, ViT-Large with 307 million parameters, 2 achieved high
AUROC of 94.8%. They optimized the model through extensive hyperparameter tuning,
including 11 learning rates and layer configurations, finding that 24 layers provided the
best balance between accuracy and complexity. Despite these advancements, the study
notes that integrating attention maps for model interpretability remains a future research
direction. This 2 approach aims to enhance the understanding of which image regions
[10]Gallazzi et al. investigate Transformer-based deep neural networks for multiclass skin
on a newly released benchmark dataset for 2023, they achieved a test accuracy of
emphasizes the benefits of using large, merged datasets to enhance model generalization.
However, the application of Transformer models to medical imaging is still emerging, and
38 further research is needed to explore their full potential and integration with clinical
workflows. The authors have shared their benchmarks and dataset on GitHub, promoting
diagnostics. This research highlights the promise of Transformers for advanced 1 skin
lesion classification and sets the stage for future developments in medical image analysis.
[11]Xu et al. 7 propose a novel multi-modal transformer-based framework for skin tumor
classification, addressing the challenge of integrating diverse clinical data sources. Their
clinical images, dermoscopic images, and patient metadata. The framework leverages a
and a 2.8% improvement in accuracy on the Derm7pt dataset, and an impressive 88.5%
accuracy on a larger in-house dataset. Despite these advancements, the study notes
challenges with imbalanced data distributions and 2 the need for further research on
addressing these imbalances and optimizing modality fusion strategies. 7 The proposed
approach highlights the potential for improved skin tumor classification and provides a solid
[12]Nahata and Singh address the critical issue 1 of skin cancer detection by developing
a Convolutional Neural Network (CNN) model aimed at classifying different types of skin
lesions. They focus on utilizing 11 various CNN architectures, including Inception V3 and
performance. Their approach, tested 2 on the ISIC challenge dataset, achieves a high
classification accuracy, with InceptionResNet reaching 91%. The study highlights the
effectiveness of these CNN models in distinguishing between skin cancer types, leveraging
data augmentation to improve robustness. However, the research could benefit from
limitations in generalization and performance across diverse datasets. Future work could
also investigate integrating multi-modal data 2 to further enhance detection accuracy and
clinical applicability.
[13]Magdy et al. (2024) present advanced methods for enhancing the accuracy of skin
cancer classification through dermoscopic image analysis. The study introduces two key
approaches: the first leverages 1 k-nearest neighbor (KNN) as a classifier, using various
pretrained deep neural networks (e.g., AlexNet, VGG, ResNet, EfficientNet) as feature
extractors, while the second approach optimizes AlexNet's hyperparameters via 41 the
Grey Wolf Optimizer. Additionally, the authors compare 1 machine learning techniques
(such as KNN and support vector machines) with deep learning models, demonstrating
that their proposed methods achieve superior classification accuracy, surpassing 99% on a
[14] Rashid et al. (2024) present a deep transfer learning approach for the early detection
of melanoma, a highly dangerous form of skin cancer. The study introduces a novel
skin lesions as either malignant or benign. By employing the ISIC 2020 dataset and
demonstrate that their model not only achieves superior accuracy but also reduces
potential of transfer learning in enhancing early skin cancer detection, thereby contributing
[15]Gregoor et al. (2024) evaluate the impact of an AI-based mobile health (mHealth) app
for skin cancer detection in a large Dutch population. The study involved 2.2 million adults
who were given free access to the app, with a comparison between users and non-users of
the app. The results revealed that mHealth 35 users had a higher incidence of
dermatological claims for (pre)malignant lesions and benign tumors compared to controls,
with a notable increase in healthcare consumption. While the app enhanced 11 the
detection of (pre)malignant skin conditions, it also led to a higher cost per additional
(pre)malignant lesion detected. This research highlights the app's potential benefits 2 for
early skin cancer detection but also points out the challenge of increased healthcare
utilization for benign conditions, emphasizing the need for a balanced approach in
Chapter 3
PROPOSED SYSTEM
intended for further development and assessment of automated skin cancer detection
systems. Comprising a 3 total number of images that are 25,331 in number and coming
good promise for training as well as testing machine learning models. The data set is
categorized into eight completely distinct classes of skin cancer images: melanoma (MEL),
nevi (VN), basal cell carcinoma (BCC), benign keratosis lesions (BKL), actinic keratoses
(AKIEC), squamous cell carcinoma (SCC), dermatofibroma (DF), and vascular lesions
(VASC). Each class has different numbers of images; Melanoma has 4,522 images while
Actinic Keratoses has 867. The Images are mainly in JPEG format and vary in resolutions,
mainly being 600 x 450 pixels and 1024 x 1024 pixels. This image data is combined with
crucial metadata of the patient- which consists of age and sex location information about
the lesion-these context enhancers during training 2 can be utilized to improve the
model's performance in classification tasks. Reducing class imbalance and the high
variability between lesion types makes the ISIC 2019 dataset a goldmine for researchers in
this area to develop their deep learning models in the classification of skin cancer. Its
extensive use in studies evaluating various machine architectures makes its importance in
dermatology self-evident.
We dealt with the preliminary mismatch in images samples for each class by being unequal
in quantities 2 for each class. This may lead to bias and hinder the generalization ability
in the model. Thus, we reduced the overrepresented classes and improved the smaller
ones by augmenting them. We were thus able to get a uniform dataset with 1,195 images
per class, which amounts to 9,560 images after this balancing. Additionally, all images are
resized, which helps ensure consistency in the input dimension 7 in the dataset and
improvement of the focus on diagnostic features. Hair artifacts may interfere with key visual
grayscale to 2 be able to help in the detection of the hair, applied a black hat filter with a
rectangular kernel (9, 9) to brighten out the hair strands, and then we used another
Gaussian blur to take away noise. Then, we applied a binary threshold mask 1 in order to
isolate the hair regions, then placed an inpainting technique that forms all the removed hair
Blackhat(I)=Closing(I)−I
where:
Closing(I)=(I⊕S)⊖S
where:
model robustness. We performed horizontal and vertical flip images to increase variability
and also reduce overfitting through these transformations. 7 The increase in variability
through these data augmentation techniques has helped the model learn more
generalizable patterns based on the diversity in the training samples. Through these
and standardized dataset that enhances the model's ability to classify lesions of skin
cancer.
3.3.2 VISUVALIZATIONS
Convolutional Neural Networks (CNNs) are a type of deep neural network created primarily
for image recognition and categorization. A CNN has numerous layers, including
convolutional, pooling, and fully linked layers. Convolutional layers collect information from
input photos by convolving learnable filters across the data. This approach creates feature
maps that capture spatial patterns and hierarchical representations 1 in the image. The
pooling layers minimize the 2 spatial dimensions of the feature maps, which improves
layers combine the retrieved characteristics and conduct classification using learnt
representations. A common CNN architecture has alternating convolutional and pooling
layers, followed by fully linked layers. The convolutional layers use activation functions like
ReLU 4 (Rectified Linear Unit) to bring nonlinearity into the network. The pooling layers,
also known as max pooling or average pooling, downsample feature maps to extract the
most important information. The fully connected layers classify using softmax activation
and output probabilities for each class. The architecture of a CNN adjusts to the dataset's
complexity, with deeper networks capable of learning abstract properties. However, deeper
structures raise computing demands and 41 the potential of overfitting, demanding careful
Where:
3.3.3.2 VGG – 19
The VGG19 is one of the thoroughly known deep convolutional neural networks developed
by the Visual Geometry Group at the University of Oxford. It has been highly
acknowledged for being simple, but very effective, so much so that it has been
classification, object detection, and segmentation. The VGG19 architecture has 19 layers:
16 convolutional layers, 3 fully connected layers, and a final softmax layer at the end. This
model lets it pervasively capture very intricate spatial information within 11 images by
Architecture of VGG19
By setting the padding to be 1, it ensures spatial dimensions of the image remain constant
throughout each convolution block so that more feature information is carried forward 4
as the depth of the network increases. Number of filters is doubled with every single
successive block starting from 64 in the first block and finally reaching 512 in the deeper
layers. This step-wise procedure helps VGG19 to pick up features at greater and more
A max-pooling layer with a filter of 2x2 and 11 stride of 2 is applied after every
convolutional block. Spatial dimensions are halved. The reduction in spatial dimensions
reduces computation and retains the crucial features of the input. 42 The max pooling
operation takes the max value from each segment in the feature map to improve the
224x224 pixels, comprising three channels (RGB). To conform to this specified input size,
images undergo preprocessing that includes resizing and normalization, thereby ensuring
their compatibility 3 with the model and facilitating an expedited training process.
Typically, 7 the pixel values are normalized before sending them to the neural network
through subtraction of the mean RGB values inferred from the training dataset. This
improves stability in training since gradient updates are a little smoother, hence making
convergence faster.
2 The feature extraction process VGG19 employs begins with the detection of basic
edges and textures in the earlier layers and then continues to include more complex
Then the acquired features are introduced into three 1 fully connected layers of units
4096 at each of the first two layers and units 1000 in the last with respect to classes
where W is the weight matrix, X is the input feature vector, b is the bias, and ReLU
(Rectified Linear Unit) introduces non-linearity by activating only positive values, helping
Classification Layer and Applications The final layer of the VGG19 model is a softmax layer
that produces the distribution of probabilities across classes, as seen from the following
equation:
where z is the input to the ith neuron, and KKK is the number of classes. The softmax
function ensures that the outputs add up to 1, making it suitable for multi-class
classification tasks. The model then selects the class with the highest probability as its
prediction.
VGG19 architecture, deep but uniform in 3x3 convolutions, is well-set for feature extraction
and transfer learning. Despite 4 the size of the number of parameters, being close to 143
million, which puts up expensive computation, VGG19 is indeed the most straightforward
approach but yet very effective for visual recognition. Such fields as medical imaging or
very sharp object detection are highly helped by such features since fine details may offer
The breakthrough computer vision contender is called 2 the Vision Transformer or just
ViT, built upon a transformer architecture originally proposed for Natural Language
Processing towards image classification. The core intuition behind the ViT is that an image,
just as text, can be 1 split into patches that can be viewed as sequences of tokens,
similar to word tokens used in transformer-based language models such as BERT. That is,
the patches and a good model of images which would enable performance in many tasks
beyond the power of simple CNNs with a large enough amount of data and training.
Architecture: Architecture. 1 The ViT model works by taking an input of an image and
patches gets flattened into a one-dimensional vector having a linear embedding applied to
map those vectors into a higher dimension. This way, patch embeddings are served as the
The most prominent difference between ViT and traditional CNNs 19 is that the
transformer does not rely on convolutions or local receptive fields to learn spatial
layer of self-attention calculates the interactions of all 21 patches in the input, which
enables different parts of the image to be focused upon according to their context within
the situation. 17 The output of the last transformer is fed into the classification head.
Almost certainly that classification head is an MLP, but in practice it is almost always very
Encoder: 1 In the ViT encoder model, a few self-attention layers are included combined
with position-wise feed-forward networks, in order to be able to enable the learning of the
encoding layer of the model to capture both local and global relationships between patches
4 In fact the actual core of the transformer is this self-attention mechanism, that enables
the model to take into account the relevance of each patch of the image with respect to
other patches without considering distances within space. That's huge compared with
CNNs since every filter can be sensitive only to a localized region 1 of the image
Layer Normalization 10 and Residual Connections: These stabilize the training and
facilitate smooth flowing gradients throughout the network. Such 2 that the model
captures more complex and abstract relations between image patches as each such layer
Decoder:
Unlike most sequence-to-sequence tasks like translation, ViT does not inherently need a
conventional decoder; instead, 1 it is a lot more of an application that uses the encoder
itself to gain the representation of the image that they later go on to utilize for classification
purposes.
In the final stages, all the output representations are aggregated together as a
classification token closely resembling the BERT [CLS] token 4 is used for NLP to
summarize information from the entire image. The token then passed through the
classifier, 1 the output of which typically takes an MLP that gives the class under which
the image should be classified. Since this 3 is a classification problem, the task does not
need a specialized decoder for sequential outputs, which makes ViT simpler in that respect
Attention Network: The attention network is the core ability of ViT to capture the
attention scores amongst all the patches in the input image. Because localized receptive
fields in CNNs make it difficult 13 to focus on different parts of the image simultaneously,
ViT can catch intricate dependencies between distant, far-apart spatial patches.
The multi-head attention projects the input embeddings into multiple attention heads,
learning different aspects of the input data-this will allow the ViT to attend on the different
regions of the image at one time, allowing the model to capture both local features-for
4 A number of these outputs from the attention heads are then sent through a
feedforward network and fed back again for further process. This architecture enables ViT
to learn well from the image and understand its contents and the global context of it.
Tokenizer: Tokenizer in ViT: It converts the image to 1 a sequence of tokens, which the
transformer then processes. First and foremost, the image is divided 30 into fixed-size,
non-overlapping patches usually at 16x16 pixels. These patchwork is flattened into one-
dimensional vectors and then linearly embedded into a higher-dimensional space using a
learned projection. The process transforms the image 1 into a sequence of vectors each
representing a unique part of the image. These are added with positional encoding 3 to
ensure that the spatial relationships between them stay preserved so that the transformer
can have track of where every patch located is in the original image. The transformer takes
these patches as input, applies self-attention on capturing the dependencies among them,
and feeds these vectors into a multi-layered transformer. Such a tokenization approach is
very different from the prevalent pixel-wise processing in CNNs. This makes it possible to
treat an image similarly to 1 a sequence of tokens, just like NLP models happen to treat
text.
BEiT(Bidirectional Encoder Image Transformer) is a highly powerful model that uses the
transformer architecture, which was originally proposed for Natural Language Processing
tasks. Contrary to traditional CNNs that make use of localized filters to capture the features
of the images, BEiT exploits 4 the fact that the transformer network is capable of
capturing contextual relationships within an image and big dependencies. The pretext task
is the key innovation of BEiT. This is inspired by the masked language model. BERTs have
used this pretext task to enrich the understanding of image structure in a bidirectional way.
Bringing it along the way in processing visual inputs much more efficiently, it will achieve
Architecture: The BEiT architecture follows 1 the transformer encoder structure: First, the
image divides into fixed-size patches like in ViT. These patches are treated as tokens and
input through positional embeddings so that spatial information is retained. They are then
processed in a group by the transformer encoder component. But it's the pretraining
strategy that lets BEiT stand out: a vision-specific version of masked language modeling.
Unlike BERT, which predicts missing words, BEiT predicts the original patches 13 of an
image that have been masked out, forcing the model to learn the global structure of the
image.
These should make for much more general image representations and thus have the
model be really very good at classification tasks when fine-tuned on specific datasets.
Encoder:
split into patches, and the sequences of patched are embedded by a patch embedding
encoding to the patches for spatial relationships. These patches then feed into a
transformer encoder. This self-attention mechanism in the encoder allows capturing long
range dependencies inside an image and enables models to learn intricate relationships for
different parts of an image. Traditional CNNs were traditionally not able to capture this
Decoder: BEiT lacks a traditional decoder that some sort 43 of transformer models use in
primarily be utilized 15 in the application of image classification, with its focus being placed
7 However, in the context of its pretraining task masked image modeling, decoder can be
viewed as a mechanism where the model attempts to reconstruct the original patches of
the image from the corrupted input during the pretraining phase. 45 Output from the
encoder is typically passed through a classification head --usually a simple MLP --in the
projecting input representations (image patches) into different attention heads. It allows
every attention head 17 to attend to different parts of the input sequence. This mechanism
allows BEiT to calculate 4 the weighted sum of patches fed into it along with their relative
significance and it also allows it to look up features at any place across an image
irrespective of distance and therefore, overcomes a few drawbacks of the classical CNN
where each pixel has limited knowledge outside the receptive field 11 of a convolutional
operation. 2 Following this, the feed-forward networks work position-wise and add non-
linearity for making the transformed input features flexible. In the encoding layers, layer
Tokenization:
1 One of the central roles that processing plays in the input image within BEiT is that the
image is split into patches which do not overlap each other. Unlike patching using a fixed
grid of pixels directly applied in traditional CNN these are treated almost like the tokens
within models of text processing like BERT which are also split initially by the tokenizer into
small patches, then flattened and mapped to a high-dimensional space via a linear
embedding layer. This set of embeddings serves as the input tokens of the model. In this
pretraining phase, patches are masked 17 so that the model must learn to predict the
original patches from the context given by the surrounding visible patches. This forces 1
the model to learn textured and contextual representations of images akin to BERT's
tokenizer to ascertain that it transforms images into the right format for input into the
The graphs above are training and validation loss and accuracy over 20 epochs, which
indicates how the training was progressing. Inside the loss plot, 5 both training and
validation loss are decreased plots. It means that the model is learning and reducing its
errors with time. The validation loss tracks very well with the training loss; hence the model
The accuracy graph again shows improvement in both 3 training and validation accuracy
over each epoch of training and the former peaks to about 60–70% at the end of training.
18 Validation accuracy is figured to exceed that of training early on but then convergence
closely towards the final epochs suggests stability and balance to this model's learning.
Such convergence between 3 training and validation accuracy is a good sign of the
general capability of the model, though further epochs of training or tuning may improve
performance.
4.2 VGG19
The graphs above plot the training and validation loss and accuracy over 20 epochs,
reporting on the learning of the model. In the loss graph, both training and validation losses
steadily decrease as time progresses, suggesting that the model is successfully learning
and reducing error. The losses at the end show closeness to each other, implying the
model is not grossly overfitting to the training dataset, since there is no significant
In the accuracy graph, training and validation accuracies improve constantly over epochs
with validation accuracy at times crossing the training accuracy. As such, together with a
similar performance 2 in terms of results on both the training and on the validation set, it
suggests that this model has good generalization skills without overfitting. Upon finishing
the 20 epochs of the process, this model achieved validation accuracy approaching 0.72,
according to the test accuracy metric. The steady improvement and the high alignment 5
between training and validation metrics give the impression that the model is actually
4.3 ViT
Figure 10 Training and Vlaidation Accuracies
These plots show both the training and validation accuracy and loss over 30 epochs. As
can be seen in the accuracy plot, 20 the training accuracy is at a steady increase while
going near perfect at 95-98%, and the validation accuracy is also climbing up but oscillated
near 90-92% fitting. This would mean 5 that the model is doing great on the training data
and not too bad on the validation data where there is little overfitting since the training
In loss plot, training loss keeps decreasing consistently and therefore probably the model is
learning well but the validation loss fluctuates without dropping smoothly, and then even
levels off with some oscillation. The gap between 10 training and validation loss says
something about overfitting: it learns the patterns specific to the training data well but does
not generalize equally well on the unseen ones. This pattern indicates that although there
generalization better.
When both training loss 5 and validation loss decrease, and both training accuracy and
validation accuracy increase, it indicates that the model is learning effectively and
generalizing well to new data.
20 When the training loss continues to decrease but the validation loss starts to increase,
or if the training accuracy increases significantly while the validation accuracy plateaus or
decreases, it could be a sign of overfitting. This means 23 the model is too specialized to
the training data and may not perform well on new, unseen data.
The classification report shows strong model performance on the skin cancer classification
task. Precision, recall, and F1-scores are high across most classes, with "VASC" achieving
perfect precision and recall (1.00). The "DF" class, however, has a lower recall of 0.69,
indicating 31% of true "DF" cases are missed. Despite this, the model maintains a high
overall accuracy of 91%. The macro and weighted averages for 2 precision, recall, and
F1-score are 0.91, reflecting consistent performance. 39 While the model performs well
4o mini
Model
ACCURACIES
CNN
78.80
VGG 19
73.24
ViT
98.18
BEiT
98.81
traditional CNN-based models and transformer-based models. While 3 the CNN model
reached an accuracy of 78.80%, VGG19, yet another CNN-based model, only succeeded
to 73.24%. This only signifies that they were capable enough to some degree 10 in
classifying the skin cancer images but had to fail at learning complex global features.
because 25 the best result of ViT results in 98.18% and BEiT performs the best because it
carries out the best as the model performs at 98.81%. Therefore, the result indicates that
transformer-based models, especially BEiT, produce better local and global features in
Given the good performance of BEiT during training, this 11 was used for testing. Inferred
consisting of 957 images is a good general performance of 91% overall, with solid
individual class performance metrics in precision, recall, and F1-score. VASC was accurate
at both 4 precision and recall, with the NV, BCC, and SCC classes all having higher
scores on all of the metrics evaluated. However, some types were problematic: "DF"
(Dermatofibroma) had lower recall at only 0.69 and likely due to difficulties in distinguishing
its features from others. "BKL" 3 (Benign Keratosis-Like Lesions) also showed slightly
lower precision and recall and likely due to overlap of characteristics with other diagnoses.
The BEiT model exhibited resilience in differentiating among 4 various types of skin
cancer, attaining notable accuracy alongside well-balanced precision and recall in the
majority of classifications.
real-time 2 detection of skin cancer, especially useful in clinical settings. Its remarkable
43 performance in the test signifies that this model can help dermatologists to make sharp
diagnoses regarding their patients' allergies to the dermis; however, some classes require
more improvements for enhanced diagnostic correctness.
UI OUTPUT
CHAPTER 6
We created and tested in the present study a classification model for skin cancer using the
architecture BEiT, which outperformed the conventional CNN-based models like VGG19
and CNN 11 in terms of accuracy as well as the ability of extracting the feature. Based on
the strength of the BEiT model to capture local as well as global features, the model
achieved training accuracy of 98.81% with a testing accuracy of 91%, thus promising a use
as a diagnostic tool for proficiently classifying skin cancer. The classification report further
demonstrated 2 robust performance across most classes but still exhibits challenges in
identifying the exact types of skin lesions like Dermatofibroma (DF) and Benign Keratosis-
Even so, further enhancements of both performance accuracy and reliability are still
applicable only to dermoscopic images; those include color normalization and more
operations and 7 generation of synthetic images by GANs in order to better explain more
sparse categories. Also, training with a more substantial but perhaps more heterogeneous
set may also lead to stronger generalization, especially concerning classes that share
attributes.
An avenue for further study is multi-scale attention mechanisms that would be added into
the model. It will then 10 have a greater chance to pay more attention to smaller, but
diagnostically important features of dermoscopic images. Also, optimization in real time
may improve faster and efficient inference 4 for increasing the model's application to real
clinical environments. These are the areas on which this model can better become
accurate and robust, leading 2 the way for clinicians to obtain an accurate and reliable
Appendices
Pre processing
import pandas as pd
import numpy as np
import time
import random
import os
import cv2
import re
import tensorflow as tf
import time
spatial.QhullError = QhullError
import os
import cv2
# 9 import imgaug as ia
hp = {}
hp['image_size'] = 512
hp['num_channels'] = 3
hp['batch_size'] = 32
hp['lr'] = 1e-4
hp["num_epochs"] = 30
hp['num_classes'] = 8
hp['dropout_rate'] = 0.1
#read data
md =
pd.read_csv("C://Users//mahid//Downloads//isic-2019//ISIC_2019_Training_GroundTruth.c
sv")
md.head()
md.shape
# seperate Melenoma, 19 Basal cell carcinoma, Squamous cell carcinoma from dataset
# length of data
mel_count = len(mel_images)
bcc_count = len(bcc_images)
scc_count = len(scc_images)
nv_count = len(nv_images)
ak_count = len(ak_images)
bkl_count = len(bkl_images)
vasc_count = len(vasc_images)
unk_count = len(unk_images)
df_count = len(df_images)
df_count)
nv_count ]
centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.title('Distribution of Images')
MEL = []
SCC = []
BCC = []
NV = []
AK = []
VASC = []
DF = []
BKL = []
path =
"C://Users//mahid//Downloads//isic-2019//ISIC_2019_Training_Input//ISIC_2019_Training_
Input"
for i in os.listdir(path):
#print(i)
name = i.split('.')[-2]
if name in mel_images:
MEL.append(os.path.join(path, i))
SCC.append(os.path.join(path, i))
BCC.append(os.path.join(path, i))
NV.append(os.path.join(path, i))
AK.append(os.path.join(path, i))
VASC.append(os.path.join(path, i))
DF.append(os.path.join(path, i))
BKL.append(os.path.join(path, i))
"""
def apply_dullrazor(image_path):
img = cv2.imread(image_path)
# Gaussian filter
plt.subplot(1, 2, 1)
plt.imshow(original)
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(processed)
plt.title('Segmented Image')
plt.axis('off')
original_images = []
processed_images = []
os.makedirs('C:/Users/mahid/Downloads/isic-2019//preproessed')
VASC[90:95]+DF[90:95] + BKL[90:95]
len(img_list)
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("/kaggle/working/preproessed/img_"+str(i)+".png", processed)
# Append to the lists
original_images.append(original)
processed_images.append(processed)
36 for i in range(len(original_images)):
axs[i, 0].imshow(original_images[i])
axs[i, 0].axis('off')
axs[i, 1].imshow(processed_images[i])
axs[i, 1].axis('off')
os.makedirs("C:/Users/mahid/Downloads/isic-2019/DF", exist_ok=True)
import shutil
shutil.rmtree("C:/Users/mahid/Downloads/isic-2019//DF")
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/MEL", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/SCC", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/BCC", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/AK", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/VASC", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/DF", exist_ok=True)
os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/BKL", exist_ok=True)
# os.makedirs("/kaggle/working/DA/BCC", exist_ok=True)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019//DF/MEL/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/SCC/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019//DF/BCC/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/NV/img_"+str(i)+".png",
processed)
start = time.time()
for i, filename in enumerate(AK):
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/AK/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/VASC/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/DF/img_"+str(i)+".png",
processed)
start = time.time()
if filename.endswith('.jpg') or filename.endswith('.png'):
cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/BKL/img_"+str(i)+".png",
processed)
9 import os
import cv2
save_images = []
name = x.split("/")[-1].split(".")
image_name = name[0]
image_ext = name[1]
image = cv2.imread(x)
# Apply HorizontalFlip
aug = HorizontalFlip(p=1.0)
augmented = aug(image=image)
x1 = augmented["image"]
# Apply VerticalFlip
aug = VerticalFlip(p=1.0)
augmented = aug(image=image)
x2 = augmented["image"]
augemented = aug(image=image)
x3 = augemented["image"]
augemented = aug(image=image)
x4 = augemented["image"]
# else:
try:
filename
cv2.imwrite(image_path, img_resized)
except Exception as e:
continue
image_name = name[0]
image_ext = name[1]
image = cv2.imread(x)
if image is None:
print(f"Failed to read image: {x}")
# Apply HorizontalFlip
aug = HorizontalFlip(p=1.0)
augmented = aug(image=image)
x1 = augmented["image"]
# Apply VerticalFlip
aug = VerticalFlip(p=1.0)
augmented = aug(image=image)
x2 = augmented["image"]
try:
filename
cv2.imwrite(image_path, img_resized)
except Exception as e:
print(f"Error processing image {image_name}: {e}")
continue
# Example usage
augment_data_bcc(BCC, "C:/Users/mahid/Downloads/isic-2019/DAA/BCC")
bcc_l = glob("C:/Users/mahid/Downloads/isic-2019/DAA/BCC/*")
len(bcc_l)
9 import os
import cv2
handling
image_name = name[0]
image_ext = name[1]
image = cv2.imread(x)
if image is None:
aug = HorizontalFlip(p=1.0)
augmented = aug(image=image)
x1 = augmented["image"]
save_images.append((x1, "aug1"))
try:
filename
except Exception as e:
continue
# Example usage
augment_data_mel(MEL, "C:/Users/mahid/Downloads/isic-2019/DA/MEL")
mel_l = glob("C:/Users/mahid/Downloads/isic-2019/DA/MEL/*")
len(mel_l)
import os
import cv2
image_name = name[0]
image_ext = name[1]
image = cv2.imread(x)
if image is None:
# Apply HorizontalFlip
aug = HorizontalFlip(p=1.0)
augmented = aug(image=image)
x1 = augmented["image"]
# Apply VerticalFlip
aug = VerticalFlip(p=1.0)
augmented = aug(image=image)
x2 = augmented["image"]
# augemented = aug(image=image)
# x3 = augemented["image"]
# augemented = aug(image=image)
# x4 = augemented["image"]
try:
filename
cv2.imwrite(image_path, img_resized)
except Exception as e:
continue
# Example usage
#augment_data_bcc(AK, "C:/Users/mahid/Downloads/isic-2019/DAA/AK")
ak_l = glob("C:/Users/mahid/Downloads/isic-2019/DAA/AK/*")
len(ak_l)
augment_data_bcc(SCC, "C:/Users/mahid/Downloads/isic-2019/DAA/SCC")
augment_data_bcc(DF, "C:/Users/mahid/Downloads/isic-2019/DAA/DF")
augment_data_bcc(VASC, "C:/Users/mahid/Downloads/isic-2019/DAA/VASC")
augment_data_bcc(NV, "C:/Users/mahid/Downloads/isic-2019/DAA/NV")
augment_data_bcc(BKL, "C:/Users/mahid/Downloads/isic-2019/DAA/BKL" 9 )
## Balanced Data
"""
import os
import shutil
os.makedirs(new_dest_folder, exist_ok=True)
os.path.isfile(os.path.join(folder_path, img))]
images.sort()
selected_images = images[:image_limit]
img_name = os.path.basename(img_path)
try:
# Copy the image to the new destination subfolder (use `shutil.move` to move
instead of copy)
shutil.copy(img_path, new_img_path)
except Exception as e:
# Example usage
src_folder = "C:/Users/mahid/Downloads/isic-2019/DAA"
dest_folder = "C:/Users/mahid/Downloads/selected_images_new"
move_images_from_folders(src_folder, dest_folder)
ak = glob("C:/Users/mahid/Downloads/selected_images_new/AK/*")
mel = glob("C:/Users/mahid/Downloads/selected_images_new/MEL/*")
bcc = glob("C:/Users/mahid/Downloads/selected_images_new/BCC/*")
scc = glob("C:/Users/mahid/Downloads/selected_images_new/SCC/*")
nv = glob("C:/Users/mahid/Downloads/selected_images_new/NV/*")
bkl = glob("C:/Users/mahid/Downloads/selected_images_new/BKL/*")
df = glob("C:/Users/mahid/Downloads/selected_images_new/DF/*")
vasc =glob("C:/Users/mahid/Downloads/selected_images_new/VASC/*")
new_mel_count = len(mel)
new_bcc_count = len(bcc)
new_scc_count = len(scc)
new_nv_count = len(nv)
new_ak_count = len(ak)
new_bkl_count = len(bkl)
new_df_count = len(df)
new_vasc_count = len(vasc)
labels = ["MEL", "BCC", "SCC", "AK", "BKL", "DF", "VASC", "NV" ]
new_df_count, new_vasc_count,new_nv_count]
# Colors for the pie chart (you can expand the color list if needed)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.show()
CNN
# Install gdown
# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-
FGz/view?usp=sharing
# !gdown --id 1_1vTv57QQcDCrbV6bzwpgTUZhd4y52ke
import tarfile
import os
print(os.listdir())
import os
import cv2
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
# Function to remove hair from images
def apply_dullrazor(image_path):
img = cv2.imread(image_path)
return dst
class SkinCancerDataset(Dataset):
self.root_dir = root_dir
self.transform = transform
self.image_paths = []
self.labels = []
if os.path.isdir(class_path):
self.image_paths.append(image_path)
self.labels.append(class_idx)
def __len__(self):
return len(self.image_paths)
image_path = self.image_paths[idx]
label = self.labels[idx]
image = Image.fromarray(image)
if self.transform:
image = self.transform(image)
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.RandomRotation(20),
transforms.ToTensor()
14 ])
dataset = SkinCancerDataset(root_dir="/kaggle/working/Augumented_images_new",
transform=transform)
train_data, test_data = train_test_split(dataset, test_size=0.2, random_state=42)
class CustomCNN(nn.Module):
super(CustomCNN, self).__init__()
self.dropout = nn.Dropout(0.5)
self.pool = nn.MaxPool2d(2, 2)
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.pool(F.relu(self.conv4(x)))
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
model = model.to(device)
regularization
train_acc_history = []
val_acc_history = []
train_loss_history = []
val_loss_history = []
model.train()
running_loss = 0.0
correct = 0
total = 0
6 outputs = model(inputs)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
train_acc_history.append(train_acc)
train_loss_history.append(train_loss)
# Validation step
model.eval()
val_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
outputs = model(inputs)
val_loss += loss.item()
_, predicted = outputs.max(1)
5 total += labels.size(0)
correct += predicted.eq(labels).sum().item()
val_acc_history.append(val_acc)
val_loss_history.append(val_loss)
{train_acc:.2f}, "
torch.save(model.state_dict(), "skin_cancer_model.pth")
1.2153, 1.1801, 1.1374, 1.1247, 1.0372, 1.0094, 0.9296, 0.9064, 0.8544, 0.7971, 0.7540,
0.7304]
val_loss_history = [1.7590, 1.6307, 1.5635, 1.4442, 1.3952, 1.3771, 1.3163, 1.2706,
1.2710, 1.2359, 1.2415, 1.2013, 1.2172, 1.1425, 1.1483, 1.1416, 1.1184, 1.1350, 1.2033,
1.0989]
train_acc_history = [18.01, 30.27, 35.24, 41.55, 44.88, 47.07, 49.10, 50.78, 53.48, 54.32,
56.18, 56.78, 60.26, 60.53, 64.40, 65.76, 67.67, 68.94, 71.09, 72.31]
val_acc_history = [30.92, 35.62, 38.76, 45.62, 47.52, 48.24, 49.35, 50.00, 50.52, 52.22,
53.14, 55.75, 54.51, 54.97, 58.10, 59.08, 58.95, 58.82, 59.35, 61.37]
# Plot loss
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
# Plot accuracy
plt.subplot(1, 2, 2)
plt.xlabel("Epochs")
plt.ylabel("Accuracy (%)")
plt.legend()
plt.tight_layout()
plt.show()
import torch
import torch.nn.functional as F
dataset
saved model
model = model.to(device)
criterion = nn.CrossEntropyLoss()
test_loss = 0.0
correct = 0
total = 0
all_labels = []
all_preds = []
# Testing loop
with torch.no_grad():
outputs = model(inputs)
test_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
all_labels.extend(labels.cpu().numpy())
classification
{test_auc:.2f}")
# Install gdown
# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-
FGz/view?usp=sharing
import tarfile
import os
print(os.listdir())
import os
import cv2
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
def apply_dullrazor(image_path):
img = cv2.imread(image_path)
# Gaussian filter
class SkinCancerDataset(torch.utils.data.Dataset):
self.image_dir = image_dir
self.filepaths = []
self.labels = []
self.transform = transform
self.filepaths.append(image_path)
def __len__(self):
return len(self.filepaths)
image_path = self.filepaths[idx]
# Convert to tensor
if self.transform:
img = self.transform(hair_removed_img)
label = self.labels[idx]
data_transforms = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.RandomRotation(15),
transforms.ToTensor(),
14 ])
# Load dataset
image_dir = '/kaggle/working/Augumented_images_new'
test_size])
# Data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
class VGG19Modified(nn.Module):
super(VGG19Modified, self).__init__()
self.vgg = models.vgg19(pretrained=True)
param.requires_grad = False
self.vgg.classifier = nn.Sequential(
nn.Linear(25088, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, num_classes)
x = self.vgg(x)
return x
# Instantiate model, 18 loss function, and optimizer
model = VGG19Modified(num_classes=num_classes)
model = nn.DataParallel(model)
model = model.to(device)
6 criterion = nn.CrossEntropyLoss()
train_loss = []
val_loss = []
train_acc = []
val_acc = []
model.train()
running_loss = 0.0
correct = 0
total = 0
for images, labels in train_loader:
optimizer.zero_grad()
outputs = model(images)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted 5 = torch.max(outputs, 1)
total += labels.size(0)
train_loss.append(epoch_loss)
train_acc.append(epoch_acc)
# Validation
model.eval()
val_running_loss = 0.0
val_correct = 0
val_total = 0
with torch.no_grad():
val_running_loss += loss.item()
5 _, predicted = torch.max(outputs, 1)
val_total += labels.size(0)
val_loss.append(val_epoch_loss)
val_acc.append(val_epoch_acc)
num_epochs = 20
plt.figure(figsize=(14, 5))
# Loss plot
plt.subplot(1, 2, 1)
plt.title('Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
# Accuracy plot
plt.subplot(1, 2, 2)
plt.title('Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
model.eval()
test_correct = 0
test_total 8 =0
with torch.no_grad():
outputs = model(images)
5 _, predicted = torch.max(outputs, 1)
test_total += labels.size(0)
torch.save(model.state_dict(), "skin_cancer_classification_VGG19_20epoch_FT.pth")
VIT MODEL
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under
import os
print(os.path.join(dirname, filename))
# 14 You can write up to 20GB to the current directory (/kaggle/working/) that gets
preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of
# Install gdown
# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-
FGz/view?usp=sharing
import tarfile
import os
import os
import cv2
import torch
import torch.nn as nn
def apply_dullrazor(image_path):
img = cv2.imread(image_path)
# Gaussian filter
return dst
# Dataset directory
data_dir = '/kaggle/working/Augumented_images_new'
preprocess 8 = transforms.Compose([
transforms.ToPILImage(),
transforms.ToTensor(),
])
class SkinCancerDataset(torch.utils.data.Dataset):
self.root_dir = root_dir
self.transform = transform
self.image_paths = []
self.labels = []
self.classes = os.listdir(root_dir)
self.image_paths.append(img_path)
self.labels.append(label)
8 def __len__(self):
return len(self.image_paths)
img_path = self.image_paths[idx]
label = self.labels[idx]
if self.transform:
img = self.transform(img)
test_size])
# Dataloaders
config = ViTConfig.from_pretrained('google/vit-base-patch16-384',
num_labels=len(dataset.classes),
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-384',
config=config, ignore_mismatched_sizes=True)
model = nn.DataParallel(model)
6 criterion = nn.CrossEntropyLoss()
model.train()
total_train = 0
for images, labels in train_loader:
optimizer.zero_grad()
outputs = model(images).logits
loss.backward()
optimizer.step()
train_loss += loss.item()
5 _, predicted = torch.max(outputs, 1)
total_train += labels.size(0)
train_acc_history.append(train_acc)
train_loss_history.append(train_loss / len(train_loader))
model.eval()
total_val 8 =0
with torch.no_grad():
outputs = model(images).logits
val_loss += loss.item()
5 _, predicted = torch.max(outputs, 1)
total_val += labels.size(0)
val_acc_history.append(val_acc)
val_loss_history.append(val_loss / len(val_loader))
{val_acc:.4f}')
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(train_loss, label='Train Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
torch.save(model.state_dict(), "skin_cancer_classification_VIT_20epoch_FT.pth")
BEITMODEL
# Install gdown
# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-
FGz/view?usp=sharing
import tarfile
# Open the tar file
import os
print(os.listdir())
9 import os
import cv2
import torch
import random
import numpy as np
BeitModel
8 import torch.nn as nn
def apply_dullrazor(image_path):
img = cv2.imread(image_path)
9 import os
import cv2
import torch
import random
import numpy as np
import torch.nn as nn
class SkinCancerDataset(Dataset):
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.img_paths)
img_path = self.img_paths[idx]
label = self.labels[idx]
_, processed_img = apply_dullrazor(img_path)
pil_img = Image.fromarray(processed_img)
if self.transform:
pil_img = self.transform(pil_img)
9 import os
import pandas as pd
directory = '/kaggle/working/Augumented_images_new'
image_names = []
subfolder_names = []
if os.path.isdir(subfolder_path):
image_names.append(image_name)
subfolder_names.append(subfolder)
df.to_csv('/kaggle/working/image_folder_mapping.csv', index=False)
path
labels = []
if os.path.isdir(class_folder):
img_paths.append(os.path.join(class_folder, img_name))
# Train-test-validation split
random_state=42, stratify=labels)
random_state=42, stratify=y_temp)
# Transformations
transform = transforms.Compose([
transforms.ToTensor(),
])
self.img_paths = img_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.img_paths)
img_path = self.img_paths[idx]
label = self.labels[idx]
_, processed_img = apply_dullrazor(img_path)
pil_img = Image.fromarray(processed_img)
if self.transform:
pil_img = self.transform(pil_img)
# Create datasets
8 import torch
import torch.nn as nn
# Set device
# Load the configuration for the BEiT model with modified dropout rates
config = BeitConfig.from_pretrained("microsoft/beit-base-patch16-224")
21 model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224",
config=config)
class ModifiedBeitModel(nn.Module):
super(ModifiedBeitModel, self).__init__()
self.base_model = base_model
head
self.classifier = base_model.classifier
x = self.base_model(x).logits
x = self.dropout(x)
return x
model = ModifiedBeitModel(model)
model = nn.DataParallel(model)
model.to(device)
6 criterion = nn.CrossEntropyLoss()
num_epochs = 20
model.train()
running_loss = 0.0
correct = 0
total = 0
optimizer.zero_grad()
outputs = model(imgs)
loss.backward()
optimizer.step()
running_loss += loss.item()
16 _, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
train_losses.append(running_loss / len(train_loader))
train_acc.append(correct / total)
# Validation
model.eval()
val_running_loss = 0.0
val_correct = 0
val_total 32 =0
with torch.no_grad():
outputs = model(imgs)
loss = criterion(outputs, labels)
val_running_loss += loss.item()
_, val_predicted = torch.max(outputs.data, 1)
val_total += labels.size(0)
val_losses.append(val_running_loss / len(val_loader))
val_acc.append(val_correct / val_total)
print("Training complete.")
import numpy as np
model.eval()
test_correct = 0
test_total = 0
all_preds = []
all_labels = []
# Collect predictions and labels
with torch.no_grad():
_, predicted = torch.max(outputs.data, 1)
test_total += labels.size(0)
all_preds.extend(predicted.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print("\nClassification Report:")
plt.figure(figsize=(8, 6))
xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()
# One-vs-rest approach
fpr = {}
tpr = {}
roc_auc 31 = {}
for i in range(len(class_names)):
plt.plot([0, 1], [0, 1], 'k--') # Diagonal line for random classifier
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc="lower right")
plt.show()
# Precision-Recall Curve
plt.figure(figsize=(10, 8))
all_preds_binarized[:, i])
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.legend(loc="lower left")
plt.show()
model_save_path = 'scd_beit_2.pth'
torch.save(model.state_dict(), model_save_path)
import streamlit as st
import torch
import os
model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")
model.classifier = torch.nn.Linear(model.classifier.in_features, 8)
model.load_state_dict(torch.load("C:/Users/mahid/Downloads/scd_beit_1.pth",
map_location=torch.device('cpu')), strict=False)
model.eval()
transform = transforms.Compose([
transforms.Resize((224, 224)), # Resize to match model input size
transforms.ToTensor(),
])
# Class labels
class_labels = {
0: "NV" 1 ,
7: "BKL"
device
with torch.no_grad():
outputs = model(image).logits
5 _, predicted = torch.max(outputs, 1)
return predicted.item()
# Streamlit Dashboard
if uploaded_file:
image = Image.open(uploaded_file).convert('RGB')
# Predict class
# Display result
use_column_width=True)
REFERENCES
[1]. Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra, Abdus
Techniques for Skin Cancer Detection. International Journal of Computer and Systems
[2]. Himel, G. M. S., Islam, M. M., Al-Aff, K. A., Karim, S. I., & Sikder, M. K. U. (2024). Skin
cancer segmentation and classification using Vision Transformer for automatic analysis in
[3]. Naeem, A., Anees, T., Khalil, M., Zahra, K., Naqvi, R. A., & Lee, S. (2024). SNC_NET:
Skin cancer detection by integrating handcrafted and Deep Learning-Based features using
[4]. Vachmanus, S., Noraset, T., Piyanonpong, W., Rattananukrom, T., & Tuarob, S.
145467–145484. https://doi.org/10.1109/access.2023.3345225
[5]. Yang, G., Luo, S., & Greer, P. (2023). A novel Vision transformer model for skin cancer
9335–9351. https://doi.org/10.1007/s11063-023-11204-5
[6] Pacal, I., Alaftekin, M., & Zengul, F. D. (2024). Enhancing Skin Cancer Diagnosis Using
[7]. Gulzar, Y., & Khan, S. A. (2022). Skin lesion Segmentation Based on Vision
[8]. Arshed, M. A., Mumtaz, S., Ibrahim, M., Ahmed, S., Tahir, M., & Shafi, M. (2023). Multi-
Class skin cancer classification using vision transformer networks and convolutional neural
415. https://doi.org/10.3390/info14070415
[9]. Cirrincione, G., Cannata, S., Cicceri, G., Prinzi, F., Currieri, T., Lovino, M., Militello, C.,
Xplore. https://ieeexplore.ieee.org/document/10623626?denied=
[11] Xu, J., Gao, Y., Liu, W., Huang, K., Zhao, S., Lu, L., DAMO Academy, Alibaba Group,
Wang, X., Hua, X.-S., Wang, Y., Chen, X., & Department of Dermatology, Xiangya Hospital
Central South University. (2021). RemixFormer: a transformer model for precision skin
article]. https://www.cs.jhu.edu/~lelu/publication/MICCAI%202022_paper1023_RemixForm
er.pdf
[12] Nahata, H., & Singh, S. P. (2020). Deep learning solutions for skin cancer detection
159–182). https://doi.org/10.1007/978-3-030-40850-3_8
Xplore. https://ieeexplore.ieee.org/abstract/document/10181245
[14]. Rashid, J., Ishfaq, M., Ali, G., Saeed, M. R., Hussain, M., Alkhalifah, T., Alturise, F., &
[15]. Gregoor, A. M. S., Sangers, T. E., Bakker, L. J., Hollestein, L., De Groot, C. a. U. –.,
Nijsten, T., & Wakkee, M. (2023). An artificial intelligence based app for skin cancer
2
2
2
Sources
https://link.springer.com/article/10.1007/s11063-023-11204-5
1 INTERNET
2%
https://link.springer.com/article/10.1007/s10278-024-01140-8
2 INTERNET
2%
https://www.nature.com/articles/s41598-022-22644-9
3 INTERNET
1%
https://www.mdpi.com/2079-3197/11/3/52
4 INTERNET
1%
https://medium.com/@frederik.vl/interpreting-training-validation-accuracy-and-loss-cf16f0d5329f
5 INTERNET
1%
https://discuss.pytorch.org/t/using-mseloss-instead-of-crossentropy-for-ordinal-regression-
6 classification/102473
INTERNET
1%
https://www.nature.com/articles/s41598-022-22882-x
7 INTERNET
1%
https://arcwiki.rs.gsu.edu/en/dali/pytorch_basic_data_loader
8 INTERNET
1%
https://github.com/aleju/imgaug/issues/839
9 INTERNET
<1%
https://www.mdpi.com/2076-3417/12/11/5714
10 INTERNET
<1%
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9227226/
11 INTERNET
<1%
https://www.ncbi.nlm.nih.gov/books/NBK603721
12 INTERNET
<1%
https://viso.ai/deep-learning/attention-mechanisms
13 INTERNET
<1%
https://www.tensorflow.org/tutorials/load_data/images
14 INTERNET
<1%
https://www.geeksforgeeks.org/vgg-net-architecture-explained
15 INTERNET
<1%
https://stackoverflow.com/questions/67295494
16 INTERNET
<1%
https://towardsdatascience.com/transformers-explained-visually-part-2-how-it-works-step-by-step-
17 b49fa4a64f34
INTERNET
<1%
https://www.restack.io/p/beginners-guide-to-artificial-intelligence-answer-keras-validation-accuracy-cat-ai
18 INTERNET
<1%
https://www.mdpi.com/2673-5261/5/2/10
19 INTERNET
<1%
https://stackoverflow.com/questions/59167577/relationship-between-training-accuracy-and-validation-
20 accuracy
INTERNET
<1%
https://medium.com/voxel51/streamline-computer-vision-workflows-with-hugging-face-transformers-and-
21 fiftyone-0b377d4ac745
INTERNET
<1%
https://stackoverflow.com/questions/76767585/importerror-cannot-im…
22 INTERNET
<1%
https://machinelearningmastery.com/overfitting-and-
23 INTERNET
<1%
ieeexplore.ieee.org/document/10366268
24 INTERNET
<1%
https://towardsdatascience.com/different-ways-of-improving-training-accuracy-c526db15a5b2
25 INTERNET
<1%
https://www.irma-international.org/affiliate/s-geetha/306346/
26 INTERNET
<1%
https://www.pennmedicine.org/.../types-of-melanoma/melanoma-sk…
27 INTERNET
<1%
https://pubmed.ncbi.nlm.nih.gov/38839675
28 INTERNET
<1%
https://resource.hix.ai/messages/professional-thank-you-messages
29 INTERNET
<1%
https://link.springer.com/article/10.1007/s10462-023-10595-0
30 INTERNET
<1%
https://stackoverflow.com/questions/38250710/how-to-split-data-into-3-sets-train-validation-and-test
31 INTERNET
<1%
https://medium.com/@qakunnath/fine-tuning-a-pre-trained-vgg16-mod…
32 INTERNET
<1%
https://www.bing.com/ck/a?!&&p=4ae09e23c18383fa8fe184f2c9b342049f416fcead66df969fea51b77e7bcf48Jm
ltdHM9MTczMTYyODgwMA&ptn=3&ver=2&hsh=4&fclid=11014217-5de6-6430-14ff-572f5c4b6571&u=a1L2lt
YWdlcy9zZWFyY2g_cT1tZWxhbm9tYStjYW4rb3JpZ2luYXRlK2Zyb20rYWxsK2FyZWFzK29mK3RoZStza2luK2FuZCt
vZnRlbitiZWdpbnMrYXMrYStuZXcrb3IrY2hhbmdpbmcrbW9sZSt3aXRoK2lycmVndWxhcitib3JkZXJzK3ZhcmllZ2F0
33 ZWQrY29sb3JhdGlvbithbmQrYXN5bW1ldHJ5JnFwdnQ9bWVsYW5vbWErY2FuK29yaWdpbmF0ZStmcm9tK2FsbC
thcmVhcytvZit0aGUrc2tpbithbmQrb2Z0ZW4rYmVnaW5zK2FzK2ErbmV3K29yK2NoYW5naW5nK21vbGUrd2l0aCt
pcnJlZ3VsYXIrYm9yZGVycyt2YXJpZWdhdGVkK2NvbG9yYXRpb24rYW5kK2FzeW1tZXRyeSZGT1JNPUlHUkU&ntb
=1
INTERNET
<1%
https://chennai.vit.ac.in/member/dr-v-s-kanchana-bhaaskaran/
34 INTERNET
<1%
https://link.springer.com/article/10.1007/s00105-023-05289-1
35 INTERNET
<1%
https://stackoverflow.com/questions/66302994
36 INTERNET
<1%
https://www.howtogeek.com/791362/how-to-download-files-and-folders-from-google-drive/
37 INTERNET
<1%
pmc.ncbi.nlm.nih.gov/articles/PMC4445438/
38 INTERNET
<1%
bing.com/videos
39 INTERNET
<1%
https://stackoverflow.com/questions/72029842
40 INTERNET
<1%
https://link.springer.com/content/pdf/10.1007/978-3-031-59318-5_5.pdf
41 INTERNET
<1%
https://medium.com/@bragadeeshs/max-pooling-streamlining-neural...
42 INTERNET
<1%
https://www.kolena.com/blog/4-types-of-machine-learning-embeddings-and-4-embedding-
models/#:~:text=Tokenization and embedding layer: When processing text, LLMs,to a high-dimensional vector
43 that encodes semantic information.
INTERNET
<1%
https://github.com/OpenAI/CLIP/issues/248
44 INTERNET
<1%
https://deepgram.com/learn/visualizing-and-explaining-transformer...
45 INTERNET
<1%
https://stackoverflow.com/questions/71615680/problem-with-loading-image-data-using-pytorch-dataset-and-
46 dataloader
INTERNET
<1%
EXCLUDE QUOTES ON
EXCLUDE BIBLIOGRAPHY ON