0% found this document useful (0 votes)
69 views119 pages

Mani

Uploaded by

Dheeraj Kasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views119 pages

Mani

Uploaded by

Dheeraj Kasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Plagiarism Checker X - Report

Originality Assessment

13%
Overall Similarity

Date: Nov 15, 2024 Remarks: Low similarity Verify Report:


Matches: 2120 / 16155 words detected, check with your View Certificate Online
Sources: 46 supervisor if changes are
required.

v 8.0.7 - WML 3
FILE - TRANSFORMER BASED SKIN CANCER CLASSIFICATION - 1.DOCX
A project report on

TRANSFORMER BASED SKIN CANCER CLASSIFICATION

Submitted in partial fulfillment for the award of the degree of

M.Tech. (Integrated) Computer Science and Engineering with Specialization in Business

Analytics

by

Darsi Venkata Sai Mahidhar (20MIA1016)

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

April, 2024

TRANSFORMER BASED SKIN CANCER CLASSIFICATION


Submitted in partial fulfillment for the award of the degree of

M.Tech. (Integrated) Computer Science and Engineering with Specialization in Business

Analytics

by

Darsi Venkata Sai Mahidhar (20MIA1016)

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

April, 2024

DECLARATION

I here by declare that the thesis entitled “Transformer Based Skin Cancer Classification”
submitted by me,for the award of the degree of M.Tech. (Integrated) Computer Science

and Engineering with Specialization in Business Analytics, Vellore Institute of Technology,

Chennai, is are cord of bonafide work carried out by me under the supervision of “DR.

Rajesh R”

I further declare that the work reported in this thesis has not been submitted and will not be

submitted, either in part or in full, for the award of any other degree or diploma in this

institute or any other institute or university.

Place: Chennai

Date: Signature of theCandidate

School of Computer Science and Engineering

CERTIFICATE

This is to certify that the report entitled “Transformer Based Skin Cancer Classification” is

prepared and submitted by Darsi Venkata Sai Mahidhar (20MIA1016) to Vellore Instituteof

Technology, Chennai, in partial fulfillment of the requirement for the award of the degree of
M.Tech. (Integrated) Computer Science and Engineering with Specialization in Business

Analytics programme is a bonafide record carried out under my guidance. The project

fulfills the requirements as per the regulations of this University and in my opinion meets

the necessary standards for submission. The contents of this report have not been

submitted and will not be submitted either in part or in full, for the award of any other

degree or diploma and the same is certified.

Signature of the Guide:

Name: Dr. Rajesh R

Date:

Signature of the Examiner 1 Signature of theExaminer 2

Name: Name:

Date: Date:

Approved by the Head of Department

ABSTRACT

Skin cancer is 38 one of the major health condition, as it increases steadily and requires

early diagnosis to allow for an effective treatment outcome. Despite the advancement in

diagnostic imaging and machine learning technologies, the diseases of various skin lesions

in dermoscopic images are still posed with complexity due to the textures, colors, and
shapes that simulate benign and malignant lesions. In this context, a novel scheme for skin

cancer classification using BEiT will be proposed, which overcomes some of the

inadequacies pertaining to traditional CNNs and even more advanced models like VGG-19

with an even more effective mechanism for capturing global contextual information.

With a huge ISIC 2019 dataset containing thousands of labeled dermoscopic images, our

model will try to classify eight different classes of skin lesions, including , and other groups

for a total of nine classes. To handle the potential class imbalances in our dataset, we

augment the data even further to include 9,560 images. The augmentation of the data

reduced class bias and enhanced the model's generative capability for images.

Image resizing to 224 x 224 took place during preparation. The input size is very widely

used by models such as BEiT for efficient processing and to extract enough detail. The

dataset had training, validation, and testing sets at a ratio of 80:10:10 for proper robust

evaluation. Of course, the existence of hairs would interfere with the ideal classification, so

we included the technique for eliminating strands of hair using a black hat filter; it is a

morphological image-processing technique which eliminates strands of hairs without

substantially altering the skin texture in an image.

Our design for this experiment included the comparative analysis between CNN, VGG-19,

1 Vision Transformer (ViT), and BEiT models to get more information in line with the

strengths and weaknesses of each model when it deals with dermoscopic images.

Employing the self-attention mechanism and pretraining on knowledge about the patches

of images, BEiT reached better classification accuracies compared to CNN and VGG-19 in

the detection of faint patterns spreading all over an image. This means that the model is

more appropriate for tasks involving the capture of a global context combined with fine-

grained details, that make the main difference in identifying visually similar types of skin

lesions.
The findings of the project demonstrate that there 2 is a great potential for the

improvement of the quality of diagnosis of skin cancer through the analysis of dermoscopic

images using advanced deep learning methodologies, particularly transformer-based

architectures. This approach highlights the importance of innovation in machine learning

techniques within medical imaging as a very promising tool for clinicians in terms of making

an early diagnosis and finding ways to treat patients with skin cancer appropriately for

better outcomes. Future work includes refining this model with other image modalities,

more elaborate data augmentation strategies, and clinical validation to make it more

applicable in real-world medical settings.

Key words: ISIC (International Skin Imaging Collaboration), CNN, VGG19, VIT(Vision

Transformers), BEiT(Bidirectional Encoder representation from Image Transformers),

Black hat Filter.


ACKNOWLEDGEMENT

It is my pleasure to express with deep sense of gratitude to Dr. Rajesh R,

Assistant Professor, School of Computer Science and Engineering, Vellore Institute of

Technology, Chennai, for his constant guidance, continual encouragement, understanding;

more than all, he taught me patience in my endeavor. My association with him is not

confined to academics only, but it 1 is a great opportunity on my part of work with an

intellectual and expert in the field of Deeplearning.

It is with gratitude that I would like to extend my thanks to the visionary leader Dr. G.

Viswanathan our Honorable Chancellor, Mr. Sankar Viswanathan, Dr. Sekar Viswanathan,

Dr. G V Selvam Vice Presidents, Dr. Sandhya Pentareddy, Executive Director, Ms.

Kadhambari S. Viswanathan, Assistant Vice-President, 34 Dr. V. S. Kanchana

Bhaaskaran Vice-Chancellor, Dr.T. Thyagarajan Pro-Vice Chancellor, VIT Chennai and

Dr. P. K. Manoharan, Additional Registrar for providing an exceptional working

environment and inspiring all of us during the tenure of the course.

Special mention to Dr. Ganesan R, Dean, Dr. Parvathi R, Associate Dean Academics, Dr.

Geetha S, 26 Associate Dean Research, School of Computer Science and Engineering,

Vellore Institute of Technology, Chennai for spending their valuable time and efforts in

sharing their knowledge and for helping us in every aspect.


In jubilant state, I express ingeniously my whole-hearted thanks to Dr.Sivabalakrishnan. M,

Head of the Department, Project Coordinator, Dr. Yogesh C, SCOPE, 34 Vellore Institute

of Technology, Chennai, for their valuable support and encouragement to take up and

complete the thesis.

29 My sincere thanks to all the faculties and staff at Vellore Institute of Technology,

Chennai, who helped me acquire the requisite knowledge. I would like to thank my parents

for their support. It is indeed a pleasure to thank my friends who encouraged me to take up

and complete this task.

Place: Chennai

Date:10 – 11 - 24 Name of the student

Darsi Venkata Sai

Mahidhar

CONTENTS

CONTENTS 8

LIST OFFIGURES 9

LIST OFTABLES 10

LISTOFACRONYMS 11

1. CHAPTER - 1

INTRODUCTION

1.1 INTRODUCTION 12 - 13

1.2 BACKGROUND STUDY 13 - 14

1.3 PROJECT STATEMENT 14 - 15

1.4 OBJECTIVES 15

1.5 CHALLENGES 15
2. CHAPTER – 2 BACKGROUND STUDY

2.1 BACKGROUND STUDY OF SKIN CANCER 17 - 18

2.2 LITERATURE SURVEYS 19 - 25

3. CHAPTER – 3 PROPOSED SYSTEM

3.1 DATASET DESCRIPTION 26 - 27

3.2 PROPOSED ARCHITECTURE 27

3.3 PROPOSED METHODOLOGY

3.3.1 PRE-PROCESSING 28 - 29

3.3.2 VISUALIZATION 30

3.3.3 ALGORTHIMS USED 30 - 38

4. CHAPTER – 4 RESULTS AND DISCUSSION 39 - 44

5. CHAPTER – 6 CONCLUSION AND FUTURE WORK 45

APPENDIX……………………………………………………………………………46 – 77

REFERENCES………………………………………………………………………...78 - 79

LIST OF FIGURES

1. IMAGE SAMPLES FOR EACH CLASS 27


2. PROPOSED ARCHITECTURE 27

3. PROPORTION OF IMAGES BEFORE THE PREPROCESSING 28

4. PROPORTION OF IMAGES AFTER THE PREPROCESSING 29

5. HAIR REMOVAL COMPARISION BEFORE AND AFTER 30

6. CNN TRAINING AND VALIDATION LOSS 39

7. CNN TRAINING ACCURACIES AND VALIDATION ACCURACIES 39

8. VGG - 19 TRAINING AND VALIDATION LOSS 40

9. TRAINING AND VALIDATION ACCURACIES VGG - 19 40

10. TRAINING AND VALIDATION ACCURACIES VIT 41

11. TRAINING AND VALIDATION LOSS VIT 41

12. BEIT MODEL RESULTS 42

13. CLASSIFICATION REPORT OF BEIT MODEL 42

14. UI PREDICTION OUTPUT 44

LIST OF TABLES

2.1 MODELS ACCURACIES COMPARISON 43


LIST OF ACRONYMS

CNN – Convolution Neural Network

ViT – Vision Transformers

BEiT - Bidirectional Encoder representation from Image Transformers

ISIC - International Skin Imaging Collaboration


Chapter 1

Introduction

1.1 12 INTRODUCTION

Skin cancer constitutes a significant public health concern on a global scale, with rising

incidence rates highlighting the necessity for prompt and precise diagnostic measures to

enhance patient outcomes. While conventional diagnostic techniques are indeed

beneficial, they frequently inadequately address issues of efficiency and accuracy,

especially in the context of managing extensive collections of dermoscopic images. This

paper comes up with a novel method for the classification of skin cancer, using the model

BEiT, which represents a cutting-edge transformer-based framework. Unlike traditional

architectures of deep learning, CNNs, and VGG-19, BEiT offers better ability to capture

subtle features in the global level from images, making it an excellent candidate for

handling small differences found in dermoscopic images, where understanding the context

is vital to distinguish between similar types of lesions.

Utilizing the ISIC 2019 dataset, which contains numerous labeled images representing

eight separate categories of skin lesions—including Actinic Keratosis, 1 Melanoma,

Melanocytic Nevus, Squamous Cell Carcinoma, Basal Cell Carcinoma, Dermatofibroma,


and Vascular Lesions—we implemented several data augmentation strategies to create a

balanced collection of 9,560 images. This comprehensive dataset facilitated the training

and assessment of our model on a wide variety of cases, thereby improving its robustness

and capacity to generalize among diverse types of lesions.

We resized all the 44 images to 224x224 pixels, as BEiT requires the input in this size. A

black hat filter was also used; hair artifacts are a very common problem in dermoscopic

imaging. Hair artifacts obscure 1 most of the lesion details and cause impairment of

classification accuracy. Pre-processing thus allows cleaner images, with better

presentation of lesion features critical for appropriate classification.

From the comparative study, the model comprising CNN, VGG-19, ViT (Vision

Transformer), and BEiT demonstrated better classification accuracy than the models and

therefore suggests a better capacity to process complex dermoscopic images. This implies

that transformer-based frameworks hold great potential for being used in medical imaging

applications, considering the deep need for precise accuracy while identifying slight visual

patterns.

This initiative foregrounds the considerable promise of cutting-edge deep learning

architectures, especially transformers, in enhancing diagnostic accuracy 12 in the context

of skin cancer identification. With the application of complex techniques in medical

imaging, this research facilitates a trajectory toward heightened accuracy and earlier

detection and also opens up avenues for prospective advancements within the same

domain to encompass various applications in another type of cancers and other medical

issues that are dependent on the assessment of visual images.

1.2 BACKGROUND STUDY


The incidence of skin cancer also on the rise globally comes the ever-increasing need for

accurate and sensitive detection methods. Though melanoma is the deadliest 1 of all the

types of skin cancer, timely diagnosis substantially improves survival opportunities. The

conventional methods of detection are mainly based on specialized clinical experience and

might take a lot of time. Thus, automated classification methods gained popularity with

deep learning models, like CNNs, proving to be powerful for skin cancer images. All CNNs

are acknowledged for their ability to learn high-order abstractions of complex image

features but fall short in many cases relating to the identification and capture of global

context within dermoscopic images, which may sometimes be crucial for distinguishing

visually similar lesions.

The recent transformer-based models, such as Bidirectional Encoder Representation from

Image Transformers (BEiT), appear to be quite promising as an alternative. Unlike CNNs,

the model uses self-attention to capture local and global features in images, making it

better in dealing with complexity in dermoscopic images. Preliminary studies appear to

show that the transformer 1 models, such as BEiT, tend to outperform traditional CNNs in

medical imaging tasks, including skin lesion classification.

The project applies the 2 BEiT model to classification of skin cancer using the ISIC 2019

dataset, comparing its performance with CNN and VGG-19 models. The goal of the results

is to prove that BEiT could emerge as a more accurate 11 tool in the detection of skin

cancer, ultimately beneficial to advanced diagnostic aids for the early detection of cancer

and better patient outcomes.

1.3 PROJECT STATEMENT

Accurate 1 diagnosis of skin cancer heavily relies on the precise interpretation of

dermoscopic images, a process that demands substantial expertise and remains prone to

human error. Deep learning frameworks, particularly CNNs, have also shown promise in
supporting the process of image-based diagnostics, which are used to identify images but

often fail to well articulate the entire conceptual framework 4 of the image, and therefore

can make mistakes in complex scenarios. Recent breakthroughs in transformer-based

models, such as Vision Transformer and Bidirectional Encoder Representation from Image

Transformers, have yielded some promise of capturing effective local as well as global

features and, therefore, represent a potential for increased diagnostic proficiency.

However, their effectiveness remains unelucidated as a means 1 for skin cancer

classification using dermoscopic images. This research aims to fill this gap by investigating

the accuracy and reliability of transformer models for the sake of classifying skin cancer,

eventually towards developing a tool that clinicians can confidently use in order to improve

diagnostic precision and complement expert evaluation.

1.4 OBJECTIVES

It aims to establish the classification capabilities of advanced transformer-based models,

namely ViT and BEiT, utilizing dermoscopic images of skin cancer lesions. Transformers

have been proven to have significant potential within the analysis of global and local

features from highly complex image data, thanks to their self-attention mechanisms. It will

discuss just how effective these are in analyzing dermoscopic images, with the inherent

characteristic of needing to interpret subtle patterns in a nuanced manner. It then explores

using transformer models as better contextual and accurate diagnostic aids for skin cancer.

7 With regard to achieving an intuitive understanding of transformer-based models in

such a context, one can compare traditional Convolutional Neural Networks like VGG-19

and ResNet-50. CNNs tend to dominate image classification but still involve local feature

extraction, and they do not therefore tend to use any global context for representing

complex images. We will evaluate and compare these models across several performance

metrics, including accuracy, precision, recall, and F1-score, to identify strengths and
limitations of each approach. By comparing 7 the classification results, this study will

provide insights into whether transformers offer a meaningful performance advantage over

CNNs in dermoscopic image analysis.

1.5 CHALLENGES

 Similarity Between Classes

Many skin lesion types have very similar visual appearances, being rounded with relatively

minor color differences. As such, models would interpret classes as very hard to classify

because the minor, sometimes subtly nuanced differences between classes are those they

need to capture. The visual overlaps thus easily lead to misclassifications, especially when

reliance is on lesions with overlapping 1 features for classification, requiring advanced

models that catch fine details.

 Computation Power

Significant computational resources are needed to preprocess and classify dermoscopic

images appropriately. High-resolution images necessitate high memory capacities and

processing power, especially when complex models like transformers are applied. Efficient

use of such computational resources is critical to keep up with the demands without

downgrading precision in terms of classification.

 Preprocessing Requirement

The quality, lighting, and image conditions 3 of dermoscopic images are very different,

requiring high robust preprocessing. Standardization of quality without loss of diagnostic

features should be ensured to guarantee consistency and reliability. Hair artifacts

frequently found 1 in dermoscopic images obscure lesion features and interfere with

model learning. Therefore, hair removal is needed as a preprocessing step so that the

model is not misled towards irrelevant information from lesions.


 Real-time Detection

The model, therefore, for practical use in clinical environments must be fast enough to

deliver real time results. Optimism on the model should find a balance between speed and

accuracy for rapid diagnosis and reliable performance in the real world without compromise

on the quality of diagnosis. Multi-Disease and Multi-Class Classification Automatic

classification of several skin diseases in various classes would be a prerequisite in

diagnosing skin cancers. Multiclassification turns out to pose a very challenging task

because classification between malignant and benign lesions requires a very high level of

precision. Therefore, it is very critical to create consistent models across all classes while

1 these models are applied in a clinical set up with reliable outputs.

CHAPTER 2

BACKGROUND STUDY

2.1 SKIN CANCER

Actinic Keratosis, or solar keratosis, is a pre-cancerous skin lesion formed as the result of

extensive exposure to ultraviolet light, most commonly from sunlight. It presents most often

as rough skin patches covered with scales; this occurs in sun-exposed regions-such as the

face, neck, and hands. Whereas AK itself is not malignant, it can continue to degenerate

into the cancerous SCC, making early detection so important.

27 Melanoma is the most aggressive form of skin cancer and originates from melanocytes-

cells responsible for the production of melanin, the pigment that gives the skin its color. 33

Melanoma can originate from all areas of the skin and often begins as a new or changing

mole with irregular borders, variegated coloration, and asymmetry.

It is the smaller component of cases 12 of skin cancer but it accounts for all major deaths

from skin cancers due to its high metastatic potential. Early detection leads to successful
treatment.

Melanocytic Nevus Commonly referred to as a mole, a melanocytic nevus is usually a

benign growth of melanocytes. These lesions are usually benign and could present

anywhere on the skin as small, round or oval-shaped macules with uniform pigmentation.

While most melanocytic nevi are benign, some atypical or dysplastic nevi might be more

prone to becoming malignant, even culminating into melanoma. Therefore, any changes in

nevi should follow an 12 early detection of eventual malignant evolutions.

Squamous cell carcinoma (SCC) Among the most common types of non-melanoma skin

cancers, are Squamous Cell Carcinoma of skin due to squamous cells in the epidermis.

SCC is generally due to built-up exposure to ultraviolet, and it typically presents as firm red

nodules or as scaly, crusted patches that bleed or ulcerate. SCC tends to metastasize less

often than melanoma, but can invade other adjacent tissues and places 3 if left untreated.

The early intervention can easily handle SCC and prevent any further progression.

Basal Cell Carcinoma (BCC) is the most common skin carcinoma and originates in the

basal cells of the lower layer of epidermis. BCC often appears as pearly or waxy bumps,

flat pink patches, or sores that never heal. This condition is usually caused by too much

sun exposure and also tends to occur more commonly in fair-skinned individuals than

among other race groups.

Although very rarely metastasizing, BCC grows in size and invades the surrounding

tissues, thus causing large amounts of local destruction. BCC is highly treatable, if

discovered early.

Dermatofibroma are benign skin growths that typically come to medical attention as firm,

raised nodules, sometimes with brown or reddish colors. They usually do not represent

malignant skin lesions and may follow minor cuts, such as bites by insects. The
dermatofibromas are generally innocuous 19 and should be left alone unless they become

more bothersome. Though no established relationship to any of the skins cancers has

been identified, proper differentiation of these lesions is necessary to avoid misdiagnosis.

Vascular lesions These are abnormal formations of blood vessels 12 in the skin, such as

hemangiomas, cherry angiomas, and vascular malformations. Most lesions, generally

small, benign growth, usually appear as red or purple spots on the skin. Even though 19

these are usually harmless, some vascular lesions require medical attention, such as large

or rapidly growing hemangiomas. Vascular lesions are not malignant and are not

composed of cancer cells; however, lesions have to be differentiated from malignant skin

lesions for proper 12 diagnosis and treatment.

2.2 LITERATURE REVIEW

Hrithwik et al. (2024)proposed a hybrid deep learning model combining VGG16 and

ResNet50 to improve skin cancer detection and classification. Utilizing a dataset of 3,000

images across nine skin conditions, the researchers addressed class imbalance through

class weights and emphasized rigorous data pre-processing. Their methodology involved

evaluating various models, including DenseNet121, VGG16, ResNet50, and an ensemble

approach. 1 Results showed that DenseNet121 achieved a high training accuracy of

99.51% and a testing accuracy of 91.82%, while VGG16 and ResNet50 had lower testing

accuracies of 70.03% and 88.89%, respectively. The hybrid model outperformed individual

models, achieving 25 a training accuracy of 98.75% and a validation accuracy of 97.50%.

These findings underscore the effectiveness of the hybrid approach in enhancing skin

cancer classification performance, although further refinement and validation in clinical

settings are necessary.[1]

Shahriar Himel et al. introduces a 1 skin cancer classification approach using the Vision
Transformer (ViT), specifically Google's ViT-patch32 model, in conjunction with the

Segment Anything Model (SAM) for effective cancerous area segmentation. Leveraging

the HAM10000 dataset, the researchers employed preprocessing techniques like

normalization and augmentation to enhance model robustness. Their results demonstrate

that the ViT-Google model achieved a high classification accuracy of 96.15% and an

impressive ROC AUC score of 99.49%, surpassing other tested models. Despite these

promising results, the research highlights a gap in the model’s applicability to diverse skin

types, as the dataset primarily represents fair-skinned individuals. Future work is proposed

to expand the dataset 7 to include a more diverse range of ethnic backgrounds and to

explore federated learning for continuous model improvement.[2]

Naeem et al. presents SNC_Net, an advanced 3 model for skin cancer detection that

integrates handcrafted and deep learning features from dermoscopic images. This model

utilizes a convolutional neural network (CNN) alongside handcrafted feature extraction and

employs the SMOTE Tomek approach to address class imbalance. Evaluated on the ISIC

2019 dataset, SNC_Net achieved a notable accuracy of 97.81%, precision of 98.31%, 1

and F1 score of 98.10%, outperforming baseline models such as EfficientNetB0 and

ResNet-101. Despite these impressive results, the research highlights a gap in applying

SNC_Net to camera-captured images, which limits its generalizability. Future work is

suggested to include federated learning for improved 3 model accuracy and broader

applicability.[3]

[4] Vachmanus et al. introduces 24 DeepMetaForge, a deep-learning framework designed

for skin cancer detection using both image and accompanying metadata. This framework

leverages BEiT, a vision transformer pre-trained on masked image modeling tasks, for

encoding images. 11 The proposed approach integrates encoded metadata with visual

features, e mploying a novel Deep Metadata Fusion Module (DMFM) to enhance

classification accuracy. Tested on four datasets of dermoscopic and smartphone images,


DeepMetaForge achieved an average macro-average F1 score of 87.1%. The study

highlights the potential for implementing this framework in telemedicine and other medical

applications, although it notes limitations in data preprocessing and handling imbalances.

Future research is suggested to explore multiclass classification, object detection, and

adaptation for remote communities 1 to further improve the model's applicability and

performance in diverse settings.

[5] Yang et al. present 3 a novel skin cancer classification method utilizing a transformer-

based architecture for improved accuracy. Their approach involves four key steps: class

rebalancing 1 of seven skin cancer types, splitting images into patches and flattening

them into tokens, processing these tokens with a transformer encoder, and a final

classification block with dense layers and batch normalization. Transfer learning is

employed, with pretraining on ImageNet and fine-tuning on the HAM10000 dataset. This

method achieved a classification accuracy of 94.1%, surpassing the IRv2 model with soft

attention and other state-of-the-art methods. Additionally, the approach outperformed

baseline models on the Edinburgh DERMOFIT dataset. Despite these advancements, the

study highlights the potential for further improvements with larger transformer models and

more extensive pretrained datasets, suggesting future work could enhance classification

performance through scaling and broader training data.

[6] Pacal, Alaftekin, and Zengul present an advanced skin cancer diagnostic method using

an enhanced Swin Transformer model. Their approach integrates a hybrid shifted window-

based multi-head self-attention (HSW-MSA) mechanism, replacing the conventional shifted

window-based multi-head self-attention (SW-MSA), to better handle overlapping cancerous

regions and capture fine details while maintaining efficiency. Additionally, they substitute

28 the standard multi-layer perceptron (MLP) with a SwiGLU-based MLP, improving

accuracy, training speed, and parameter efficiency. Evaluated 2 on the ISIC 2019

dataset, the modified Swin model achieved an accuracy of 89.36%, outperforming both
traditional CNNs and state-of-the-art vision transformers. This study demonstrates the

significant potential of deep learning in enhancing diagnostic precision and efficiency for

skin cancer detection, highlighting its impact on improving patient outcomes and setting

new benchmarks for future research in medical image analysis.

[7]Gulzar and Khan address the challenge of accurately diagnosing melanoma skin cancer

by enhancing image segmentation techniques. They propose the hybrid TransUNet model,

combining Vision Transformers with U-Net, to capture detailed spatial relationships in skin

lesion images. This approach addresses limitations of pure transformers, which struggle

with small medical datasets and low-resolution features. Their 11 results show that

TransUNet achieves superior performance, with an accuracy of 92.11% and a Dice

coefficient of 89.84%, outperforming traditional U-Net and attention-based methods. The

study highlights that while TransUNet excels 1 in accuracy and detailed segmentation, it

requires more training and inference time compared to simpler models. This indicates a

need for balancing accuracy with computational efficiency. Future research could integrate

these segmentation results with classification tasks 11 to enhance the diagnostic

capabilities and early detection of malignant skin lesions, aiming for a more comprehensive

analysis system.

[8] Arshed et al. propose using Vision Transformers (ViT) for 3 multi-class skin cancer

classification, comparing it with traditional CNN-based transfer learning methods. They

address 2 the challenge of class imbalance and dataset diversity by employing data

augmentation, fine-tuning, and leveraging pre-trained models. Their ViT 25 model

achieved a notable accuracy of 92.14%, surpassing CNN-based methods across multiple

evaluation metrics. This performance highlights ViT's potential in overcoming 1 the

limitations of CNNs, particularly in distinguishing between similar skin lesions. Despite

these advancements, the study points to the need for enhanced preprocessing techniques
to further improve model robustness and accuracy. This gap suggests that 2 future

research should focus on refining data handling and augmentation strategies to better

support ViT and other deep learning models in skin cancer diagnosis.

[9] Cirrincione et al. propose a Vision Transformer (ViT)-based model for melanoma

classification, distinguishing malignant melanoma from non-cancerous lesions using public

ISIC data. Their model, ViT-Large with 307 million parameters, 2 achieved high

performance with an accuracy of 94.8%, sensitivity of 92.8%, specificity of 96.7%, and an

AUROC of 94.8%. They optimized the model through extensive hyperparameter tuning,

including 11 learning rates and layer configurations, finding that 24 layers provided the

best balance between accuracy and complexity. Despite these advancements, the study

notes that integrating attention maps for model interpretability remains a future research

direction. This 2 approach aims to enhance the understanding of which image regions

contribute most to classification, addressing a gap in model explainability and providing

more insights into the decision-making process of ViT-based systems.

[10]Gallazzi et al. investigate Transformer-based deep neural networks for multiclass skin

lesion classification, a novel approach in medical imaging. Leveraging the self-attention

mechanism inherent to Transformers, their framework captures intricate spatial

dependencies in skin images without extensive pre-processing. Evaluating their 2 model

on a newly released benchmark dataset for 2023, they achieved a test accuracy of

86.37%, demonstrating superior performance compared to traditional CNNs. Their study

emphasizes the benefits of using large, merged datasets to enhance model generalization.

However, the application of Transformer models to medical imaging is still emerging, and

38 further research is needed to explore their full potential and integration with clinical

workflows. The authors have shared their benchmarks and dataset on GitHub, promoting

transparency and further investigation into Transformer-based methods in medical

diagnostics. This research highlights the promise of Transformers for advanced 1 skin
lesion classification and sets the stage for future developments in medical image analysis.

[11]Xu et al. 7 propose a novel multi-modal transformer-based framework for skin tumor

classification, addressing the challenge of integrating diverse clinical data sources. Their

method, RemixFormer, incorporates a cross-modality fusion module to effectively combine

clinical images, dermoscopic images, and patient metadata. The framework leverages a

disease-wise pairing strategy to handle missing modalities and enhances performance

through 28 the Swin transformer backbone. Their extensive experiments demonstrate a

significant improvement over previous methods, achieving a 6.5% increase in 1 F1 score

and a 2.8% improvement in accuracy on the Derm7pt dataset, and an impressive 88.5%

accuracy on a larger in-house dataset. Despite these advancements, the study notes

challenges with imbalanced data distributions and 2 the need for further research on

addressing these imbalances and optimizing modality fusion strategies. 7 The proposed

approach highlights the potential for improved skin tumor classification and provides a solid

foundation for future developments in multi-modal medical imaging.

[12]Nahata and Singh address the critical issue 1 of skin cancer detection by developing

a Convolutional Neural Network (CNN) model aimed at classifying different types of skin

lesions. They focus on utilizing 11 various CNN architectures, including Inception V3 and

InceptionResNet, and incorporate transfer learning techniques to enhance model

performance. Their approach, tested 2 on the ISIC challenge dataset, achieves a high

classification accuracy, with InceptionResNet reaching 91%. The study highlights the

effectiveness of these CNN models in distinguishing between skin cancer types, leveraging

data augmentation to improve robustness. However, the research could benefit from

exploring additional model variations and preprocessing techniques to address potential

limitations in generalization and performance across diverse datasets. Future work could

also investigate integrating multi-modal data 2 to further enhance detection accuracy and

clinical applicability.
[13]Magdy et al. (2024) present advanced methods for enhancing the accuracy of skin

cancer classification through dermoscopic image analysis. The study introduces two key

approaches: the first leverages 1 k-nearest neighbor (KNN) as a classifier, using various

pretrained deep neural networks (e.g., AlexNet, VGG, ResNet, EfficientNet) as feature

extractors, while the second approach optimizes AlexNet's hyperparameters via 41 the

Grey Wolf Optimizer. Additionally, the authors compare 1 machine learning techniques

(such as KNN and support vector machines) with deep learning models, demonstrating

that their proposed methods achieve superior classification accuracy, surpassing 99% on a

dataset of 4000 3 images from the ISIC archive.

[14] Rashid et al. (2024) present a deep transfer learning approach for the early detection

of melanoma, a highly dangerous form of skin cancer. The study introduces a novel

diagnostic model based on MobileNetV2, a deep convolutional neural network, to classify

skin lesions as either malignant or benign. By employing the ISIC 2020 dataset and

applying 2 data augmentation techniques to address class imbalance, the authors

demonstrate that their model not only achieves superior accuracy but also reduces

computational costs compared to state-of-the-art methods. This research highlights the

potential of transfer learning in enhancing early skin cancer detection, thereby contributing

to better patient outcomes.

[15]Gregoor et al. (2024) evaluate the impact of an AI-based mobile health (mHealth) app

for skin cancer detection in a large Dutch population. The study involved 2.2 million adults

who were given free access to the app, with a comparison between users and non-users of

the app. The results revealed that mHealth 35 users had a higher incidence of

dermatological claims for (pre)malignant lesions and benign tumors compared to controls,

with a notable increase in healthcare consumption. While the app enhanced 11 the

detection of (pre)malignant skin conditions, it also led to a higher cost per additional
(pre)malignant lesion detected. This research highlights the app's potential benefits 2 for

early skin cancer detection but also points out the challenge of increased healthcare

utilization for benign conditions, emphasizing the need for a balanced approach in

deploying AI-based diagnostic tools.

Chapter 3

PROPOSED SYSTEM

3.1 DATASET DESCRIPTION

The ISIC 2019 Challenge dataset is an all-inclusive collection of dermoscopic images

intended for further development and assessment of automated skin cancer detection

systems. Comprising a 3 total number of images that are 25,331 in number and coming

from numerous contributing datasets, including HAM10000 and BCN_20000, it presents

good promise for training as well as testing machine learning models. The data set is

categorized into eight completely distinct classes of skin cancer images: melanoma (MEL),

nevi (VN), basal cell carcinoma (BCC), benign keratosis lesions (BKL), actinic keratoses
(AKIEC), squamous cell carcinoma (SCC), dermatofibroma (DF), and vascular lesions

(VASC). Each class has different numbers of images; Melanoma has 4,522 images while

Actinic Keratoses has 867. The Images are mainly in JPEG format and vary in resolutions,

mainly being 600 x 450 pixels and 1024 x 1024 pixels. This image data is combined with

crucial metadata of the patient- which consists of age and sex location information about

the lesion-these context enhancers during training 2 can be utilized to improve the

model's performance in classification tasks. Reducing class imbalance and the high

variability between lesion types makes the ISIC 2019 dataset a goldmine for researchers in

this area to develop their deep learning models in the classification of skin cancer. Its

extensive use in studies evaluating various machine architectures makes its importance in

building improved automated diagnostic technologies leading to better patient outcomes in

dermatology self-evident.

Figure 1 Image 7 samples for each class

3.2 PROPOSED ARCHITECTURE

Figure 2 Proposed Architecture

3.3 PROPOSED METHODOLOGY

3.3.1 PRE – PROCESSING

We dealt with the preliminary mismatch in images samples for each class by being unequal

in quantities 2 for each class. This may lead to bias and hinder the generalization ability
in the model. Thus, we reduced the overrepresented classes and improved the smaller

ones by augmenting them. We were thus able to get a uniform dataset with 1,195 images

per class, which amounts to 9,560 images after this balancing. Additionally, all images are

resized, which helps ensure consistency in the input dimension 7 in the dataset and

standardizes one for model training.

We also employed hair removal procedure to ensure image enhancement and

improvement of the focus on diagnostic features. Hair artifacts may interfere with key visual

information 1 in dermoscopic images. We read every image, which we converted to

grayscale to 2 be able to help in the detection of the hair, applied a black hat filter with a

rectangular kernel (9, 9) to brighten out the hair strands, and then we used another

Gaussian blur to take away noise. Then, we applied a binary threshold mask 1 in order to

isolate the hair regions, then placed an inpainting technique that forms all the removed hair

pixels as the surrounding pixel values.

Blackhat(I)=Closing(I)−I

where:

o I is the original grayscale image.

o Closing(I) is a morphological closing operation, which is defined as a dilation followed by

an erosion using a given structuring element (kernel).

The morphological closing operation with a structuring element S is given by:

Closing(I)=(I⊕S)⊖S

where:

 I⊕S denotes the dilation of I with the structuring element S.

 I⊖S denotes the erosion of I with S.


To further enrich the dataset, 1 we used the data augmentation technique to enhance

model robustness. We performed horizontal and vertical flip images to increase variability

and also reduce overfitting through these transformations. 7 The increase in variability

through these data augmentation techniques has helped the model learn more

generalizable patterns based on the diversity in the training samples. Through these

preprocessing steps-hair removal, balancing, and augmentation, we introduced a robust

and standardized dataset that enhances the model's ability to classify lesions of skin

cancer.

Figure4 Proportion of images after the preprocessing

3.3.2 VISUVALIZATIONS

Figure 5 hair removal comparision before and after

3.3.3 ALGORTHIMS USED

3.3.3.1 3 Convolutional Neural Networks (CNN):

Convolutional Neural Networks (CNNs) are a type of deep neural network created primarily

for image recognition and categorization. A CNN has numerous layers, including

convolutional, pooling, and fully linked layers. Convolutional layers collect information from

input photos by convolving learnable filters across the data. This approach creates feature

maps that capture spatial patterns and hierarchical representations 1 in the image. The

pooling layers minimize the 2 spatial dimensions of the feature maps, which improves

translation invariance and reduces computational complexity. Finally, fully connected

layers combine the retrieved characteristics and conduct classification using learnt
representations. A common CNN architecture has alternating convolutional and pooling

layers, followed by fully linked layers. The convolutional layers use activation functions like

ReLU 4 (Rectified Linear Unit) to bring nonlinearity into the network. The pooling layers,

also known as max pooling or average pooling, downsample feature maps to extract the

most important information. The fully connected layers classify using softmax activation

and output probabilities for each class. The architecture of a CNN adjusts to the dataset's

complexity, with deeper networks capable of learning abstract properties. However, deeper

structures raise computing demands and 41 the potential of overfitting, demanding careful

architecture and regularization procedures.

Where:

Wi 4 is the filter Weights

Xi is the input data

b is the bias term

f is the activation function

3.3.3.2 VGG – 19

The VGG19 is one of the thoroughly known deep convolutional neural networks developed

by the Visual Geometry Group at the University of Oxford. It has been highly

acknowledged for being simple, but very effective, so much so that it has been

incorporated into standard architectures 13 in computer vision applications such as image

classification, object detection, and segmentation. The VGG19 architecture has 19 layers:

16 convolutional layers, 3 fully connected layers, and a final softmax layer at the end. This

model lets it pervasively capture very intricate spatial information within 11 images by

using consistent application of small 3x3 filters on many convolutional layers.

Architecture of VGG19

2 The structure of VGG19 is systematically arranged into five distinct convolutional

blocks, each succeeded by a layer of max-pooling. Within 15 each convolutional layer, a


3x3 filter is employed, characterized by a stride of 1 and padding to maintain the spatial

dimensions, as represented by the equation:

By setting the padding to be 1, it ensures spatial dimensions of the image remain constant

throughout each convolution block so that more feature information is carried forward 4

as the depth of the network increases. Number of filters is doubled with every single

successive block starting from 64 in the first block and finally reaching 512 in the deeper

layers. This step-wise procedure helps VGG19 to pick up features at greater and more

abstract levels in the latter stages.

A max-pooling layer with a filter of 2x2 and 11 stride of 2 is applied after every

convolutional block. Spatial dimensions are halved. The reduction in spatial dimensions

reduces computation and retains the crucial features of the input. 42 The max pooling

operation takes the max value from each segment in the feature map to improve the

generalization ability of the network by focusing on the pertinent features.

Image Processing in VGG19 necessitates a predetermined input image dimension of

224x224 pixels, comprising three channels (RGB). To conform to this specified input size,

images undergo preprocessing that includes resizing and normalization, thereby ensuring

their compatibility 3 with the model and facilitating an expedited training process.

Typically, 7 the pixel values are normalized before sending them to the neural network

through subtraction of the mean RGB values inferred from the training dataset. This

improves stability in training since gradient updates are a little smoother, hence making

convergence faster.

2 The feature extraction process VGG19 employs begins with the detection of basic

edges and textures in the earlier layers and then continues to include more complex

shapes and patterns deeper in the model.

Then the acquired features are introduced into three 1 fully connected layers of units

4096 at each of the first two layers and units 1000 in the last with respect to classes

quantities contained in VGG19, so-called original version of such architecture (ImageNet


dataset) in the following order:

where W is the weight matrix, X is the input feature vector, b is the bias, and ReLU

(Rectified Linear Unit) introduces non-linearity by activating only positive values, helping

the network learn complex patterns.

Classification Layer and Applications The final layer of the VGG19 model is a softmax layer

that produces the distribution of probabilities across classes, as seen from the following

equation:

where z is the input to the ith neuron, and KKK is the number of classes. The softmax

function ensures that the outputs add up to 1, making it suitable for multi-class

classification tasks. The model then selects the class with the highest probability as its

prediction.

VGG19 architecture, deep but uniform in 3x3 convolutions, is well-set for feature extraction

and transfer learning. Despite 4 the size of the number of parameters, being close to 143

million, which puts up expensive computation, VGG19 is indeed the most straightforward

approach but yet very effective for visual recognition. Such fields as medical imaging or

very sharp object detection are highly helped by such features since fine details may offer

10 a difference between two structural variations.

3.3.3.3 ViT (Vision Transformers)

The breakthrough computer vision contender is called 2 the Vision Transformer or just

ViT, built upon a transformer architecture originally proposed for Natural Language

Processing towards image classification. The core intuition behind the ViT is that an image,

just as text, can be 1 split into patches that can be viewed as sequences of tokens,

similar to word tokens used in transformer-based language models such as BERT. That is,

quite literally, 15 the potential of transformers to capture long-range dependencies among

the patches and a good model of images which would enable performance in many tasks
beyond the power of simple CNNs with a large enough amount of data and training.

Architecture: Architecture. 1 The ViT model works by taking an input of an image and

feeding through a fixed-size non-overlapped patch of architectures. Then, each of these

patches gets flattened into a one-dimensional vector having a linear embedding applied to

map those vectors into a higher dimension. This way, patch embeddings are served as the

input tokens to the transformer model.

The most prominent difference between ViT and traditional CNNs 19 is that the

transformer does not rely on convolutions or local receptive fields to learn spatial

hierarchies but instead relies on a self-attention mechanism to represent global

relationships for each patch to the whole image.

The transformer in ViT is a multi-layered feed-forward network with self-attention. Each

layer of self-attention calculates the interactions of all 21 patches in the input, which

enables different parts of the image to be focused upon according to their context within

the situation. 17 The output of the last transformer is fed into the classification head.

Almost certainly that classification head is an MLP, but in practice it is almost always very

simple-so a final image class is predicted.

Encoder: 1 In the ViT encoder model, a few self-attention layers are included combined

with position-wise feed-forward networks, in order to be able to enable the learning of the

encoding layer of the model to capture both local and global relationships between patches

7 formed from the image.

4 In fact the actual core of the transformer is this self-attention mechanism, that enables

the model to take into account the relevance of each patch of the image with respect to

other patches without considering distances within space. That's huge compared with

CNNs since every filter can be sensitive only to a localized region 1 of the image

The transformer encoder separates into modules as follows:

Multi-head Self-Attention: This operates to compute attentions between patches, thereby

enabling a model 13 to focus on the right parts of an image.

Feed-Forward Neural Network: After attention, each patch embedding is processed


through a full connected network to transform the representation.

Layer Normalization 10 and Residual Connections: These stabilize the training and

facilitate smooth flowing gradients throughout the network. Such 2 that the model

captures more complex and abstract relations between image patches as each such layer

is repeated multiple times as data flows through the network.

Decoder:

Unlike most sequence-to-sequence tasks like translation, ViT does not inherently need a

conventional decoder; instead, 1 it is a lot more of an application that uses the encoder

itself to gain the representation of the image that they later go on to utilize for classification

purposes.

In the final stages, all the output representations are aggregated together as a

classification token closely resembling the BERT [CLS] token 4 is used for NLP to

summarize information from the entire image. The token then passed through the

classifier, 1 the output of which typically takes an MLP that gives the class under which

the image should be classified. Since this 3 is a classification problem, the task does not

need a specialized decoder for sequential outputs, which makes ViT simpler in that respect

than other transformer-based models.

Attention Network: The attention network is the core ability of ViT to capture the

relationships throughout 1 the image. The multi-head self-attention mechanism calculates

attention scores amongst all the patches in the input image. Because localized receptive

fields in CNNs make it difficult 13 to focus on different parts of the image simultaneously,

ViT can catch intricate dependencies between distant, far-apart spatial patches.

The multi-head attention projects the input embeddings into multiple attention heads,

learning different aspects of the input data-this will allow the ViT to attend on the different

regions of the image at one time, allowing the model to capture both local features-for

instance textures or edges and global features-shapes or objects.

4 A number of these outputs from the attention heads are then sent through a
feedforward network and fed back again for further process. This architecture enables ViT

to learn well from the image and understand its contents and the global context of it.

Tokenizer: Tokenizer in ViT: It converts the image to 1 a sequence of tokens, which the

transformer then processes. First and foremost, the image is divided 30 into fixed-size,

non-overlapping patches usually at 16x16 pixels. These patchwork is flattened into one-

dimensional vectors and then linearly embedded into a higher-dimensional space using a

learned projection. The process transforms the image 1 into a sequence of vectors each

representing a unique part of the image. These are added with positional encoding 3 to

ensure that the spatial relationships between them stay preserved so that the transformer

can have track of where every patch located is in the original image. The transformer takes

these patches as input, applies self-attention on capturing the dependencies among them,

and feeds these vectors into a multi-layered transformer. Such a tokenization approach is

very different from the prevalent pixel-wise processing in CNNs. This makes it possible to

treat an image similarly to 1 a sequence of tokens, just like NLP models happen to treat

text.

3.3.3.4 BEiT (Bidirectional Encoder Image Transformer)

BEiT(Bidirectional Encoder Image Transformer) is a highly powerful model that uses the

transformer architecture, which was originally proposed for Natural Language Processing

tasks. Contrary to traditional CNNs that make use of localized filters to capture the features

of the images, BEiT exploits 4 the fact that the transformer network is capable of

capturing contextual relationships within an image and big dependencies. The pretext task

is the key innovation of BEiT. This is inspired by the masked language model. BERTs have

used this pretext task to enrich the understanding of image structure in a bidirectional way.

Bringing it along the way in processing visual inputs much more efficiently, it will achieve

very good performances 7 in image classification tasks.

Architecture: The BEiT architecture follows 1 the transformer encoder structure: First, the

image divides into fixed-size patches like in ViT. These patches are treated as tokens and
input through positional embeddings so that spatial information is retained. They are then

processed in a group by the transformer encoder component. But it's the pretraining

strategy that lets BEiT stand out: a vision-specific version of masked language modeling.

Unlike BERT, which predicts missing words, BEiT predicts the original patches 13 of an

image that have been masked out, forcing the model to learn the global structure of the

image.

These should make for much more general image representations and thus have the

model be really very good at classification tasks when fine-tuned on specific datasets.

Encoder:

The encoder for BEiT is structured as 1 a transformer encoder: a series of layers

consisting of self-attention followed by a feedforward neural network. The input image is

split into patches, and the sequences of patched are embedded by a patch embedding

layer that maps flattened patches to a higher-dimensional space, adding positional

encoding to the patches for spatial relationships. These patches then feed into a

transformer encoder. This self-attention mechanism in the encoder allows capturing long

range dependencies inside an image and enables models to learn intricate relationships for

different parts of an image. Traditional CNNs were traditionally not able to capture this

since their architecture cares little for local dependencies 4 in an image.

Decoder: BEiT lacks a traditional decoder that some sort 43 of transformer models use in

sequence-to-sequence tasks, such as text generation or translation. The model will

primarily be utilized 15 in the application of image classification, with its focus being placed

on learning efficient, informative representations of images using the encoder only.

7 However, in the context of its pretraining task masked image modeling, decoder can be

viewed as a mechanism where the model attempts to reconstruct the original patches of

the image from the corrupted input during the pretraining phase. 45 Output from the

encoder is typically passed through a classification head --usually a simple MLP --in the

fine-tuning phase for making predictions about the class of image.


Attention Network: The attention network would appear to be the heart of the transformer

encoder in BEiT. BEiT employs a multi-head self-attention mechanism that allows

projecting input representations (image patches) into different attention heads. It allows

every attention head 17 to attend to different parts of the input sequence. This mechanism

allows BEiT to calculate 4 the weighted sum of patches fed into it along with their relative

significance and it also allows it to look up features at any place across an image

irrespective of distance and therefore, overcomes a few drawbacks of the classical CNN

where each pixel has limited knowledge outside the receptive field 11 of a convolutional

operation. 2 Following this, the feed-forward networks work position-wise and add non-

linearity for making the transformed input features flexible. In the encoding layers, layer

normalization with residual connections 13 is used to enhance stability in training and

smooth gradient flows during backpropagation.

Tokenization:

1 One of the central roles that processing plays in the input image within BEiT is that the

image is split into patches which do not overlap each other. Unlike patching using a fixed

grid of pixels directly applied in traditional CNN these are treated almost like the tokens

within models of text processing like BERT which are also split initially by the tokenizer into

small patches, then flattened and mapped to a high-dimensional space via a linear

embedding layer. This set of embeddings serves as the input tokens of the model. In this

pretraining phase, patches are masked 17 so that the model must learn to predict the

original patches from the context given by the surrounding visible patches. This forces 1

the model to learn textured and contextual representations of images akin to BERT's

learning contextualized representations of text. Pretraining of such sort empowers the

tokenizer to ascertain that it transforms images into the right format for input into the

model, fine-tuned for such downstream tasks as image classification.


CHAPTER 10 4

RESULTS AND DISCUSSION

4.1 CONVOLUTIONAL NEURAL NETWORKS (CNN):

Figure 6 CNN Training and Validation Loss

Figure7 CNN Training Accuracies and Validation Accuracies

The graphs above are training and validation loss and accuracy over 20 epochs, which

indicates how the training was progressing. Inside the loss plot, 5 both training and

validation loss are decreased plots. It means that the model is learning and reducing its

errors with time. The validation loss tracks very well with the training loss; hence the model

is not overfitting and generalizing very well to the validation data.

The accuracy graph again shows improvement in both 3 training and validation accuracy

over each epoch of training and the former peaks to about 60–70% at the end of training.

18 Validation accuracy is figured to exceed that of training early on but then convergence

closely towards the final epochs suggests stability and balance to this model's learning.
Such convergence between 3 training and validation accuracy is a good sign of the

general capability of the model, though further epochs of training or tuning may improve

performance.

4.2 VGG19

Figure 8 VGG - 19 Training and Validation Loss

Figure 9 Training and Validation Accuracies

The graphs above plot the training and validation loss and accuracy over 20 epochs,

reporting on the learning of the model. In the loss graph, both training and validation losses

steadily decrease as time progresses, suggesting that the model is successfully learning

and reducing error. The losses at the end show closeness to each other, implying the

model is not grossly overfitting to the training dataset, since there is no significant

difference between the two graphs.

In the accuracy graph, training and validation accuracies improve constantly over epochs

with validation accuracy at times crossing the training accuracy. As such, together with a

similar performance 2 in terms of results on both the training and on the validation set, it

suggests that this model has good generalization skills without overfitting. Upon finishing

the 20 epochs of the process, this model achieved validation accuracy approaching 0.72,

according to the test accuracy metric. The steady improvement and the high alignment 5

between training and validation metrics give the impression that the model is actually

learning and, potentially, there are improvements possible through adjusting

hyperparameters or training for more epochs.

4.3 ViT
Figure 10 Training and Vlaidation Accuracies

Figure 11 Training and Validation Loss

These plots show both the training and validation accuracy and loss over 30 epochs. As

can be seen in the accuracy plot, 20 the training accuracy is at a steady increase while

going near perfect at 95-98%, and the validation accuracy is also climbing up but oscillated

near 90-92% fitting. This would mean 5 that the model is doing great on the training data

and not too bad on the validation data where there is little overfitting since the training

accuracy is bigger than the validation accuracy.

In loss plot, training loss keeps decreasing consistently and therefore probably the model is

learning well but the validation loss fluctuates without dropping smoothly, and then even

levels off with some oscillation. The gap between 10 training and validation loss says

something about overfitting: it learns the patterns specific to the training data well but does

not generalize equally well on the unseen ones. This pattern indicates that although there

is good performance in general, regularization or early stopping would probably make

generalization better.

4.4 BEiT( Bidirectional Encoder Image Transformer)

Figure 12 BEiT model Results

When both training loss 5 and validation loss decrease, and both training accuracy and

validation accuracy increase, it indicates that the model is learning effectively and
generalizing well to new data.

20 When the training loss continues to decrease but the validation loss starts to increase,

or if the training accuracy increases significantly while the validation accuracy plateaus or

decreases, it could be a sign of overfitting. This means 23 the model is too specialized to

the training data and may not perform well on new, unseen data.

Figure 13 Classification report of BEIT Model

The classification report shows strong model performance on the skin cancer classification

task. Precision, recall, and F1-scores are high across most classes, with "VASC" achieving

perfect precision and recall (1.00). The "DF" class, however, has a lower recall of 0.69,

indicating 31% of true "DF" cases are missed. Despite this, the model maintains a high

overall accuracy of 91%. The macro and weighted averages for 2 precision, recall, and

F1-score are 0.91, reflecting consistent performance. 39 While the model performs well

overall, improving "DF" recall could further enhance its effectiveness.

4o mini

Model

ACCURACIES

CNN

78.80

VGG 19

73.24

ViT

98.18

BEiT

98.81

Table 1 – Models Accuracies comparison


7 The model training accuracies also indicate a significant performance gap between

traditional CNN-based models and transformer-based models. While 3 the CNN model

reached an accuracy of 78.80%, VGG19, yet another CNN-based model, only succeeded

to 73.24%. This only signifies that they were capable enough to some degree 10 in

classifying the skin cancer images but had to fail at learning complex global features.

Conversely, transformer-based models revealed much more significant improvement

because 25 the best result of ViT results in 98.18% and BEiT performs the best because it

carries out the best as the model performs at 98.81%. Therefore, the result indicates that

transformer-based models, especially BEiT, produce better local and global features in

dermoscopic image classification.

Given the good performance of BEiT during training, this 11 was used for testing. Inferred

from the classification reports generated through testing on an independent 2 dataset

consisting of 957 images is a good general performance of 91% overall, with solid

individual class performance metrics in precision, recall, and F1-score. VASC was accurate

at both 4 precision and recall, with the NV, BCC, and SCC classes all having higher

scores on all of the metrics evaluated. However, some types were problematic: "DF"

(Dermatofibroma) had lower recall at only 0.69 and likely due to difficulties in distinguishing

its features from others. "BKL" 3 (Benign Keratosis-Like Lesions) also showed slightly

lower precision and recall and likely due to overlap of characteristics with other diagnoses.

The BEiT model exhibited resilience in differentiating among 4 various types of skin

cancer, attaining notable accuracy alongside well-balanced precision and recall in the

majority of classifications.

The transformer-based methodology is very robust in extracting features, a possibility for

real-time 2 detection of skin cancer, especially useful in clinical settings. Its remarkable

43 performance in the test signifies that this model can help dermatologists to make sharp

diagnoses regarding their patients' allergies to the dermis; however, some classes require
more improvements for enhanced diagnostic correctness.

UI OUTPUT

Figure 14 UI prediction Output

CHAPTER 6

10 CONCLUSION AND FUTURE WORK

We created and tested in the present study a classification model for skin cancer using the

architecture BEiT, which outperformed the conventional CNN-based models like VGG19

and CNN 11 in terms of accuracy as well as the ability of extracting the feature. Based on

the strength of the BEiT model to capture local as well as global features, the model

achieved training accuracy of 98.81% with a testing accuracy of 91%, thus promising a use

as a diagnostic tool for proficiently classifying skin cancer. The classification report further

demonstrated 2 robust performance across most classes but still exhibits challenges in

identifying the exact types of skin lesions like Dermatofibroma (DF) and Benign Keratosis-

Like Lesions (BKL).

Even so, further enhancements of both performance accuracy and reliability are still

possible. After-study research could concentrate on improving the effectiveness 10 of a

model by continuing with the development of complex data augmentation approaches

applicable only to dermoscopic images; those include color normalization and more

operations and 7 generation of synthetic images by GANs in order to better explain more

sparse categories. Also, training with a more substantial but perhaps more heterogeneous

set may also lead to stronger generalization, especially concerning classes that share

attributes.

An avenue for further study is multi-scale attention mechanisms that would be added into

the model. It will then 10 have a greater chance to pay more attention to smaller, but
diagnostically important features of dermoscopic images. Also, optimization in real time

may improve faster and efficient inference 4 for increasing the model's application to real

clinical environments. These are the areas on which this model can better become

accurate and robust, leading 2 the way for clinicians to obtain an accurate and reliable

instrument to early and accurately diagnose skin cancer.

Appendices

Pre processing

import pandas as pd

import numpy as np

import time

import random

from tqdm import tqdm

import os

import cv2

import re

import matplotlib.pyplot as plt

from glob import glob

import tensorflow_hub as hub

import tensorflow as tf

import time

from PIL import Image

from scipy.spatial.qhull import QhullError

from scipy import spatial

spatial.QhullError = QhullError

from imgaug import augmenters as iaa

import os
import cv2

from tqdm import tqdm

from albumentations import HorizontalFlip, VerticalFlip

# 9 import imgaug as ia

# import imgaug.augmenters as iaa

from tensorflow.keras.utils import load_img

from tensorflow.keras import layers

from tensorflow.keras.models import Model, load_model

from tensorflow.keras.callbacks import *

from tensorflow.keras.applications import ResNet50

from tensorflow.keras.utils import plot_model

from sklearn.metrics import confusion_matrix

from albumentations import RandomRotate90, GridDistortion, HorizontalFlip, VerticalFlip,

RandomBrightnessContrast, ShiftScaleRotate, Rotate

hp = {}

hp['image_size'] = 512

hp['num_channels'] = 3

hp['batch_size'] = 32

hp['lr'] = 1e-4

hp["num_epochs"] = 30

hp['num_classes'] = 8

hp['dropout_rate'] = 0.1

hp['class_names'] = ["MEL", "SCC", "BCC", "AK", "BKL", "DF", "VASC", "NV"]

#read data

md =
pd.read_csv("C://Users//mahid//Downloads//isic-2019//ISIC_2019_Training_GroundTruth.c

sv")

md.head()

md.shape

# seperate Melenoma, 19 Basal cell carcinoma, Squamous cell carcinoma from dataset

mel_images = md.loc[md['MEL'] == 1, 'image'].tolist()

bcc_images = md.loc[md['BCC'] == 1, 'image'].tolist()

scc_images = md.loc[md['SCC'] == 1, 'image'].tolist()

nv_images = md.loc[md['NV'] == 1, 'image'].tolist()

ak_images = md.loc[md['AK'] == 1, 'image'].tolist()

bkl_images = md.loc[md['BKL'] == 1, 'image'].tolist()

vasc_images = md.loc[md['VASC'] == 1, 'image'].tolist()

unk_images = md.loc[md['UNK'] == 1, 'image'].tolist()

df_images = md.loc[md['DF'] == 1, 'image'].tolist()

# length of data

mel_count = len(mel_images)

bcc_count = len(bcc_images)

scc_count = len(scc_images)

nv_count = len(nv_images)

ak_count = len(ak_images)

bkl_count = len(bkl_images)
vasc_count = len(vasc_images)

unk_count = len(unk_images)

df_count = len(df_images)

print(mel_count, bcc_count, scc_count, nv_count, ak_count, bkl_count, vasc_count,

df_count)

labels =["MEL", "SCC", "BCC", "AK", "BKL", "DF", "VASC", "NV"]

sizes = [mel_count, bcc_count, scc_count, ak_count, bkl_count,df_count,vasc_count,

nv_count ]

#plot pie chart

colors = ['#ff9999', '#66b3ff', '#99ff99']

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)

centre_circle = plt.Circle((0,0),0.70,fc='white')

fig = plt.gcf()

fig.gca().add_artist(centre_circle)

plt.axis('equal')

plt.title('Distribution of Images')

MEL = []

SCC = []

BCC = []

NV = []

AK = []

VASC = []

DF = []

BKL = []
path =

"C://Users//mahid//Downloads//isic-2019//ISIC_2019_Training_Input//ISIC_2019_Training_

Input"

for i in os.listdir(path):

#print(i)

name = i.split('.')[-2]

if name in mel_images:

MEL.append(os.path.join(path, i))

elif name in scc_images:

SCC.append(os.path.join(path, i))

elif name in bcc_images:

BCC.append(os.path.join(path, i))

elif name in nv_images:

NV.append(os.path.join(path, i))

elif name in ak_images:

AK.append(os.path.join(path, i))

elif name in vasc_images:

VASC.append(os.path.join(path, i))

elif name in df_images:

DF.append(os.path.join(path, i))

elif name in bkl_images:

BKL.append(os.path.join(path, i))

len(MEL), len(SCC), len(BCC), len(NV), len(AK), len(VASC), len(DF), len(BKL)

"""# Data Preprocessing


# Hair removing Function

"""

def apply_dullrazor(image_path):

# Read the image

img = cv2.imread(image_path)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Convert the image to grayscale

gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Black hat filter

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))

blackhat = cv2.morphologyEx(gray_scale, cv2.MORPH_BLACKHAT, kernel)

# Gaussian filter

bhg = cv2.GaussianBlur(blackhat, (3, 3), cv2.BORDER_DEFAULT)

# Binary thresholding (MASK)

ret, mask = cv2.threshold(bhg, 10, 255, cv2.THRESH_BINARY)

# Replace pixels of the mask

dst = cv2.inpaint(img, mask, 6, cv2.INPAINT_TELEA)

return img, dst

original, processed = apply_dullrazor(MEL[100])


plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)

plt.imshow(original)

plt.title('Original Dermoscopy Image')

plt.axis('off')

plt.subplot(1, 2, 2)

plt.imshow(processed)

plt.title('Segmented Image')

plt.axis('off')

original_images = []

processed_images = []

os.makedirs('C:/Users/mahid/Downloads/isic-2019//preproessed')

# len(MEL), len(SCC), len(BCC), len(NV), len(AK), len(VASC), len(DF), len(BKL)

img_list = MEL[90:95] + SCC[90:95] + BCC[90:95] + NV[90:95] + AK[90:95] +

VASC[90:95]+DF[90:95] + BKL[90:95]

len(img_list)

for i, filename in enumerate(img_list):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

cv2.imwrite("/kaggle/working/preproessed/img_"+str(i)+".png", processed)
# Append to the lists

original_images.append(original)

processed_images.append(processed)

fig, axs = plt.subplots(len(original_images), 2, figsize=(10, 5 * len(original_images)))

36 for i in range(len(original_images)):

axs[i, 0].imshow(original_images[i])

axs[i, 0].set_title('Original Image'+str([i]))

axs[i, 0].axis('off')

axs[i, 1].imshow(processed_images[i])

axs[i, 1].set_title('Processed Image')

axs[i, 1].axis('off')

"""# Apply Hair removal to all images"""

os.makedirs("C:/Users/mahid/Downloads/isic-2019/DF", exist_ok=True)

import shutil

shutil.rmtree("C:/Users/mahid/Downloads/isic-2019//DF")

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/MEL", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/SCC", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/BCC", exist_ok=True)

# len(MEL), len(SCC), len(BCC), len(NV), len(AK), len(VASC), len(DF), len(BKL)


os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/NV", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/AK", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/VASC", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/DF", exist_ok=True)

os.makedirs("C:/Users/mahid/Downloads/isic-2019//DF/BKL", exist_ok=True)

# os.makedirs("/kaggle/working/DA/BCC", exist_ok=True)

start = time.time()

for i, filename in enumerate(MEL):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019//DF/MEL/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()

for i, filename in enumerate(SCC):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/SCC/img_"+str(i)+".png",
processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()

for i, filename in enumerate(BCC):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019//DF/BCC/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

# len(MEL), len(SCC), len(BCC), len(NV), len(AK), len(VASC), len(DF), len(BKL)

start = time.time()

for i, filename in enumerate(NV):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/NV/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()
for i, filename in enumerate(AK):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/AK/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()

for i, filename in enumerate(VASC):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/VASC/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()

for i, filename in enumerate(DF):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)


processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/DF/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

start = time.time()

for i, filename in enumerate(BKL):

if filename.endswith('.jpg') or filename.endswith('.png'):

# Apply the function

original, processed = apply_dullrazor(filename)

processed = cv2.resize(processed, (512,512))

cv2.imwrite("C:/Users/mahid/Downloads/isic-2019/DF/BKL/img_"+str(i)+".png",

processed)

print("Time Taken: %f" % (time.time() - start))

"""# Data Augmentation"""

9 import os

import cv2

from tqdm import tqdm

from albumentations import HorizontalFlip, VerticalFlip

def augment_data_bcc(images, save_path, W=224, H=224, augment=True):

save_images = []

os.makedirs(save_path, exist_ok=True) # Ensure the save directory exists


for x in tqdm(images, total=len(images)):

name = x.split("/")[-1].split(".")

image_name = name[0]

image_ext = name[1]

image = cv2.imread(x)

if augment: # Augmentations applied if augment is True

# Apply HorizontalFlip

aug = HorizontalFlip(p=1.0)

augmented = aug(image=image)

x1 = augmented["image"]

# Apply VerticalFlip

aug = VerticalFlip(p=1.0)

augmented = aug(image=image)

x2 = augmented["image"]

aug = Rotate(p=1.0, limit=270)

augemented = aug(image=image)

x3 = augemented["image"]

aug = Rotate(p=1.0, limit=90)

augemented = aug(image=image)

x4 = augemented["image"]

# Collect original and augmented images

save_images = [(image, "original"), (x1, "aug1"), (x2, "aug2"), (x3,"aug3"), (x4,


"aug4")]

# else:

# save_images = [(image, "original")] # Only original image

try:

# Save all augmented images with unique filenames

for img, suffix in save_images:

img_resized = cv2.resize(img, (W, H)) # Resize 44 to (224, 224)

temp_img_name = f"{image_name}_{suffix}.{image_ext}" # Add suffix to

filename

image_path = os.path.join(save_path, temp_img_name)

cv2.imwrite(image_path, img_resized)

except Exception as e:

print(f"Error processing image {image_name}: {e}")

continue

def augment_data_bcc(images, save_path, W=224, H=224, augment=True):

os.makedirs(save_path, exist_ok=True) # Ensure the save directory exists

for x in tqdm(images, total=len(images)):

name = os.path.basename(x).split(".") # Better handling for Windows paths

image_name = name[0]

image_ext = name[1]

image = cv2.imread(x)

if image is None:
print(f"Failed to read image: {x}")

continue # Skip the image if reading fails

save_images = [(image, "original")] # Start with the original image

if augment: # Augmentations applied if augment is True

# Apply HorizontalFlip

aug = HorizontalFlip(p=1.0)

augmented = aug(image=image)

x1 = augmented["image"]

# Apply VerticalFlip

aug = VerticalFlip(p=1.0)

augmented = aug(image=image)

x2 = augmented["image"]

# Collect original and augmented images

save_images.extend([(x1, "aug1"), (x2, "aug2")])

try:

# Save all images (original and augmented) with unique filenames

for img, suffix in save_images:

img_resized = cv2.resize(img, (W, H)) # Resize to (W, H)

temp_img_name = f"{image_name}_{suffix}.{image_ext}" # Add suffix to

filename

image_path = os.path.join(save_path, temp_img_name)

cv2.imwrite(image_path, img_resized)

except Exception as e:
print(f"Error processing image {image_name}: {e}")

continue

# Example usage

augment_data_bcc(BCC, "C:/Users/mahid/Downloads/isic-2019/DAA/BCC")

bcc_l = glob("C:/Users/mahid/Downloads/isic-2019/DAA/BCC/*")

len(bcc_l)

9 import os

import cv2

from tqdm import tqdm

from albumentations import HorizontalFlip

def augment_data_mel(images, save_path, W=224, H=224, augment=True):

os.makedirs(save_path, exist_ok=True) # Ensure the save directory exists

for x in tqdm(images, total=len(images)):

name = os.path.basename(x).split(".") # Use os.path.basename for better path

handling

image_name = name[0]

image_ext = name[1]

image = cv2.imread(x)

if image is None:

print(f"Failed to read image: {x}")

continue # Skip the image if reading fails


save_images = [(image, "original")] # Start with the original image

if augment: # Apply augmentations if augment is True

# Apply Horizontal Flip

aug = HorizontalFlip(p=1.0)

augmented = aug(image=image)

x1 = augmented["image"]

# Collect original and augmented images

save_images.append((x1, "aug1"))

try:

# Save all images (original and augmented) with unique filenames

for img, suffix in save_images:

img_resized = cv2.resize(img, (W, H)) # Resize to (W, H)

temp_img_name = f"{image_name}_{suffix}.{image_ext}" # Add suffix to

filename

image_path = os.path.join(save_path, temp_img_name)

cv2.imwrite(image_path, img_resized) # Save the image

except Exception as e:

print(f"Error processing image {image_name}: {e}")

continue

# Example usage

augment_data_mel(MEL, "C:/Users/mahid/Downloads/isic-2019/DA/MEL")

mel_l = glob("C:/Users/mahid/Downloads/isic-2019/DA/MEL/*")
len(mel_l)

import os

import cv2

from tqdm import tqdm

from albumentations import HorizontalFlip, VerticalFlip

def augment_data_bcc(images, save_path, W=224, H=224, augment=True):

os.makedirs(save_path, exist_ok=True) # Ensure the save directory exists

for x in tqdm(images, total=len(images)):

name = os.path.basename(x).split(".") # Better handling for Windows paths

image_name = name[0]

image_ext = name[1]

image = cv2.imread(x)

if image is None:

print(f"Failed to read image: {x}")

continue # Skip the image if reading fails

save_images = [(image, "original")] # Start with the original image

if augment: # Augmentations applied if augment is True

# Apply HorizontalFlip

aug = HorizontalFlip(p=1.0)

augmented = aug(image=image)

x1 = augmented["image"]
# Apply VerticalFlip

aug = VerticalFlip(p=1.0)

augmented = aug(image=image)

x2 = augmented["image"]

# aug = Rotate(p=1.0, limit=270)

# augemented = aug(image=image)

# x3 = augemented["image"]

# aug = Rotate(p=1.0, limit=90)

# augemented = aug(image=image)

# x4 = augemented["image"]

# Collect original and augmented images

save_images.extend([(x1, "aug1"), (x2, "aug2")]) # , (x3, "aug3"), (x4, "aug4")

try:

# Save all images (original and augmented) with unique filenames

for img, suffix in save_images:

img_resized = cv2.resize(img, (W, H)) # Resize to (W, H)

temp_img_name = f"{image_name}_{suffix}.{image_ext}" # Add suffix to

filename

image_path = os.path.join(save_path, temp_img_name)

cv2.imwrite(image_path, img_resized)

except Exception as e:

print(f"Error processing image {image_name}: {e}")

continue
# Example usage

#augment_data_bcc(AK, "C:/Users/mahid/Downloads/isic-2019/DAA/AK")

ak_l = glob("C:/Users/mahid/Downloads/isic-2019/DAA/AK/*")

len(ak_l)

augment_data_bcc(SCC, "C:/Users/mahid/Downloads/isic-2019/DAA/SCC")

augment_data_bcc(DF, "C:/Users/mahid/Downloads/isic-2019/DAA/DF")

augment_data_bcc(VASC, "C:/Users/mahid/Downloads/isic-2019/DAA/VASC")

augment_data_bcc(NV, "C:/Users/mahid/Downloads/isic-2019/DAA/NV")

augment_data_bcc(BKL, "C:/Users/mahid/Downloads/isic-2019/DAA/BKL" 9 )

"""## After Augmentation

## Balanced Data

"""

import os

import shutil

def move_images_from_folders(src_folder, dest_folder, image_limit=1195):

# Get all subfolders from the source folder

subfolders = [folder for folder in os.listdir(src_folder) if


os.path.isdir(os.path.join(src_folder, folder))]

for folder in subfolders:

# Create a corresponding subfolder 17 in the destination folder

new_dest_folder = os.path.join(dest_folder, folder)

os.makedirs(new_dest_folder, exist_ok=True)

# Get 10 all images in the current subfolder

folder_path = os.path.join(src_folder, folder)

images = [os.path.join(folder_path, img) for img in os.listdir(folder_path) if

os.path.isfile(os.path.join(folder_path, img))]

# Sort images to ensure consistent ordering

images.sort()

# Take only the first `image_limit` images

selected_images = images[:image_limit]

for img_path in selected_images:

# Extract image file name

img_name = os.path.basename(img_path)

# Define the new path in the destination subfolder

new_img_path = os.path.join(new_dest_folder, img_name)

try:

# Copy the image to the new destination subfolder (use `shutil.move` to move

instead of copy)
shutil.copy(img_path, new_img_path)

print(f"Copied {img_name} to {new_dest_folder}")

except Exception as e:

print(f"Failed to copy {img_name}: {e}")

# Example usage

src_folder = "C:/Users/mahid/Downloads/isic-2019/DAA"

dest_folder = "C:/Users/mahid/Downloads/selected_images_new"

move_images_from_folders(src_folder, dest_folder)

ak = glob("C:/Users/mahid/Downloads/selected_images_new/AK/*")

mel = glob("C:/Users/mahid/Downloads/selected_images_new/MEL/*")

bcc = glob("C:/Users/mahid/Downloads/selected_images_new/BCC/*")

scc = glob("C:/Users/mahid/Downloads/selected_images_new/SCC/*")

nv = glob("C:/Users/mahid/Downloads/selected_images_new/NV/*")

bkl = glob("C:/Users/mahid/Downloads/selected_images_new/BKL/*")

df = glob("C:/Users/mahid/Downloads/selected_images_new/DF/*")

vasc =glob("C:/Users/mahid/Downloads/selected_images_new/VASC/*")

new_mel_count = len(mel)

new_bcc_count = len(bcc)

new_scc_count = len(scc)

new_nv_count = len(nv)

new_ak_count = len(ak)

new_bkl_count = len(bkl)

new_df_count = len(df)

new_vasc_count = len(vasc)
labels = ["MEL", "BCC", "SCC", "AK", "BKL", "DF", "VASC", "NV" ]

sizes = [new_mel_count, new_bcc_count, new_scc_count, new_ak_count,new_bkl_count,

new_df_count, new_vasc_count,new_nv_count]

# Colors for the pie chart (you can expand the color list if needed)

colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0', '#ffb3e6', '#ff6666', '#c4e17f']

# Plot the pie chart

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)

# Adding a central circle for a donut chart effect

centre_circle = plt.Circle((0, 0), 0.70, fc='white')

fig = plt.gcf()

fig.gca().add_artist(centre_circle)

# Equal aspect ratio ensures that pie is drawn as a circle.

plt.axis('equal')

plt.title('Distribution of Images across 8 Classes')

plt.show()

CNN

# Install gdown

# !pip install gdown

# Download the file from Google Drive

# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-

FGz/view?usp=sharing
# !gdown --id 1_1vTv57QQcDCrbV6bzwpgTUZhd4y52ke

!gdown --id 1oEtO0YA9nHr0H2unX4QR-2bHNX9b-FGz

# Extract the tar file

import tarfile

# Open the tar file

with tarfile.open('Augumented_images_new.tar', 'r') as tar:

tar.extractall() # Extract to the current working directory

# List the extracted files to verify

import os

# Check the 14 contents of the current directory

print(os.listdir())

import os

import cv2

import numpy as np

import torch

import torchvision.transforms as transforms

from torch.utils.data import Dataset, DataLoader

9 from sklearn.model_selection import train_test_split

from PIL import Image

import torch.nn as nn

import torch.optim as optim

import matplotlib.pyplot as plt

import torch.nn.functional as F
# Function to remove hair from images

def apply_dullrazor(image_path):

img = cv2.imread(image_path)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))

blackhat = cv2.morphologyEx(gray_scale, cv2.MORPH_BLACKHAT, kernel)

bhg = cv2.GaussianBlur(blackhat, (3, 3), cv2.BORDER_DEFAULT)

ret, mask = cv2.threshold(bhg, 10, 255, cv2.THRESH_BINARY)

dst = cv2.inpaint(img, mask, 6, cv2.INPAINT_TELEA)

return dst

# Custom dataset to apply dullrazor and prepare data

class SkinCancerDataset(Dataset):

def __init__(self, root_dir, transform=None):

self.root_dir = root_dir

self.transform = transform

self.image_paths = []

self.labels = []

# Loop through subfolders and get image paths and labels

for class_idx, class_folder in enumerate(os.listdir(root_dir)):

class_path = os.path.join(root_dir, class_folder)

if os.path.isdir(class_path):

for image_name in os.listdir(class_path):

image_path = os.path.join(class_path, image_name)

self.image_paths.append(image_path)
self.labels.append(class_idx)

def __len__(self):

return len(self.image_paths)

def __getitem__(self, idx):

image_path = self.image_paths[idx]

label = self.labels[idx]

image = apply_dullrazor(image_path) # Apply dullrazor function

image = Image.fromarray(image)

if self.transform:

image = self.transform(image)

return image, label

# Define transforms (no color-changing augmentations)

transform = transforms.Compose([

transforms.Resize((224, 224)),

transforms.RandomHorizontalFlip(),

transforms.RandomVerticalFlip(),

transforms.RandomRotation(20),

transforms.ToTensor()

14 ])

# Load dataset and split

dataset = SkinCancerDataset(root_dir="/kaggle/working/Augumented_images_new",

transform=transform)
train_data, test_data = train_test_split(dataset, test_size=0.2, random_state=42)

train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42)

train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

val_loader = DataLoader(val_data, batch_size=32, shuffle=False)

test_loader = DataLoader(test_data, batch_size=32, shuffle=False)

class CustomCNN(nn.Module):

def __init__(self, num_classes):

super(CustomCNN, self).__init__()

self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)

self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)

self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)

self.conv4 = nn.Conv2d(256, 512, kernel_size=3, padding=1)

self.fc1 = nn.Linear(512 * 14 * 14, 1024)

self.fc2 = nn.Linear(1024, num_classes)

self.dropout = nn.Dropout(0.5)

self.pool = nn.MaxPool2d(2, 2)

def forward(self, x):

x = self.pool(F.relu(self.conv1(x)))

x = self.pool(F.relu(self.conv2(x)))

x = self.pool(F.relu(self.conv3(x)))

x = self.pool(F.relu(self.conv4(x)))

x = x.view(-1, 512 * 14 * 14)

x = F.relu(self.fc1(x))

x = self.dropout(x)

x = self.fc2(x)
return x

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CustomCNN(num_classes=8) # 8 classes 7 in the dataset

model = nn.DataParallel(model) # Wrap the model for multiple GPUs

model = model.to(device)

# Define the loss function and optimizer with L2 regularization

criterion = nn.CrossEntropyLoss() # Cross-entropy loss for classification

optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4) # L2

regularization

# Function to train and validate the model using GPUs

def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=20):

train_acc_history = []

val_acc_history = []

train_loss_history = []

val_loss_history = []

for epoch in range(num_epochs):

model.train()

running_loss = 0.0

correct = 0

total = 0

for inputs, labels in train_loader:

inputs, labels = inputs.to(device), labels.to(device) # Move data to GPU


optimizer.zero_grad()

6 outputs = model(inputs)

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

_, predicted = outputs.max(1)

total += labels.size(0)

correct += predicted.eq(labels).sum().item()

train_acc = 100.0 5 * correct / total

train_loss = running_loss / len(train_loader)

train_acc_history.append(train_acc)

train_loss_history.append(train_loss)

# Validation step

model.eval()

val_loss = 0.0

correct = 0

total = 0

with torch.no_grad():

6 for inputs, labels in val_loader:

inputs, labels = inputs.to(device), labels.to(device) # Move data to GPU

outputs = model(inputs)

loss = criterion(outputs, labels)

val_loss += loss.item()
_, predicted = outputs.max(1)

5 total += labels.size(0)

correct += predicted.eq(labels).sum().item()

val_acc = 100.0 * correct / total

val_loss = val_loss / len(val_loader)

val_acc_history.append(val_acc)

val_loss_history.append(val_loss)

print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc:

{train_acc:.2f}, "

f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}")

return train_acc_history, val_acc_history, train_loss_history, val_loss_history

6 # Train the model

train_acc, val_acc, train_loss, val_loss = train_model(model, train_loader, val_loader,

criterion, optimizer, num_epochs=20)

# Save the model

torch.save(model.state_dict(), "skin_cancer_model.pth")

18 import matplotlib.pyplot as plt

# Provided training and validation loss and accuracy data

train_loss_history = [2.0301, 1.7410, 1.6451, 1.5336, 1.4192, 1.3667, 1.3115, 1.2734,

1.2153, 1.1801, 1.1374, 1.1247, 1.0372, 1.0094, 0.9296, 0.9064, 0.8544, 0.7971, 0.7540,

0.7304]
val_loss_history = [1.7590, 1.6307, 1.5635, 1.4442, 1.3952, 1.3771, 1.3163, 1.2706,

1.2710, 1.2359, 1.2415, 1.2013, 1.2172, 1.1425, 1.1483, 1.1416, 1.1184, 1.1350, 1.2033,

1.0989]

train_acc_history = [18.01, 30.27, 35.24, 41.55, 44.88, 47.07, 49.10, 50.78, 53.48, 54.32,

56.18, 56.78, 60.26, 60.53, 64.40, 65.76, 67.67, 68.94, 71.09, 72.31]

val_acc_history = [30.92, 35.62, 38.76, 45.62, 47.52, 48.24, 49.35, 50.00, 50.52, 52.22,

53.14, 55.75, 54.51, 54.97, 58.10, 59.08, 58.95, 58.82, 59.35, 61.37]

# Plot loss

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)

plt.plot(train_loss_history, label='Train Loss')

plt.plot(val_loss_history, label='Validation Loss')

plt.xlabel("Epochs")

plt.ylabel("Loss")

plt.title("Train and Validation Loss")

plt.legend()

# Plot accuracy

plt.subplot(1, 2, 2)

plt.plot(train_acc_history, label='Train Accuracy')

plt.plot(val_acc_history, label='Validation Accuracy')

plt.xlabel("Epochs")

plt.ylabel("Accuracy (%)")

plt.title("Train and Validation Accuracy")

plt.legend()
plt.tight_layout()

plt.show()

import torch

from sklearn.metrics import roc_auc_score

import torch.nn.functional as F

# Load 7 the trained model

model = CustomCNN(num_classes=9) # Replace with the number of classes in your

dataset

model.load_state_dict(torch.load("/kaggle/working/skin_cancer_model.pth")) # Load the

saved model

model = model.to(device)

6 model.eval() # Set model to evaluation mode

# Define loss function (same as during training)

criterion = nn.CrossEntropyLoss()

def test_model(model, test_loader):

test_loss = 0.0

correct = 0

total = 0

all_labels = []

all_preds = []

# Testing loop

with torch.no_grad():

for inputs, labels in test_loader:


inputs, labels = inputs.to(device), labels.to(device)

outputs = model(inputs)

loss = criterion(outputs, labels)

test_loss += loss.item()

_, predicted = outputs.max(1)

total += labels.size(0)

correct += predicted.eq(labels).sum().item()

# Collect predictions and true labels for AUC

all_labels.extend(labels.cpu().numpy())

all_preds.extend(torch.softmax(outputs, dim=1).cpu().numpy()[:, 1]) # Assuming

binary classification for AUC

test_loss = test_loss / len(test_loader)

test_acc = 100.0 5 * correct / total

test_auc = roc_auc_score(all_labels, all_preds, multi_class='ovr') # AUC for multi-class

classification

print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.2f}%, Test AUC:

{test_auc:.2f}")

return test_loss, test_acc, test_auc

# Create test DataLoader (same as train and val loaders)

test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Run the test

test_loss, test_acc, test_auc = test_model(model, test_loader)


VGG – 19

# Install gdown

!pip install gdown

# Download the file 37 from Google Drive

# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-

FGz/view?usp=sharing

# !gdown --id 1_1vTv57QQcDCrbV6bzwpgTUZhd4y52ke

!gdown --id 1oEtO0YA9nHr0H2unX4QR-2bHNX9b-FGz

# Extract the tar file

import tarfile

# Open the tar file

with tarfile.open('Augumented_images_new.tar', 'r') as tar:

tar.extractall() # Extract to the current working directory

# List the extracted files to verify

import os

# Check the contents 4 of the current directory

print(os.listdir())

import os

import cv2

import numpy as np

import torch

import torch.nn as nn
import torch.optim as optim

import torch.nn.functional as F

from torchvision import datasets, models, transforms

from torch.utils.data import DataLoader, random_split

import 14 matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

# Hair removal function

def apply_dullrazor(image_path):

img = cv2.imread(image_path)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Convert the image to grayscale

gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Black hat filter

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))

blackhat = cv2.morphologyEx(gray_scale, cv2.MORPH_BLACKHAT, kernel)

# Gaussian filter

bhg = cv2.GaussianBlur(blackhat, (3, 3), cv2.BORDER_DEFAULT)

# Binary thresholding (MASK)

ret, mask = cv2.threshold(bhg, 10, 255, cv2.THRESH_BINARY)

# Replace pixels of the mask

dst = cv2.inpaint(img, mask, 6, cv2.INPAINT_TELEA)


return img, dst

# Custom dataset with preprocessing

class SkinCancerDataset(torch.utils.data.Dataset):

46 def __init__(self, image_dir, transform=None):

self.image_dir = image_dir

self.classes = os.listdir(image_dir) # Assuming subfolder names are class names

self.filepaths = []

self.labels = []

self.transform = transform

for idx, class_name in enumerate(self.classes):

class_folder = os.path.join(image_dir, class_name)

for image_name in os.listdir(class_folder):

image_path = os.path.join(class_folder, image_name)

self.filepaths.append(image_path)

self.labels.append(idx) # Class index as label

def __len__(self):

return len(self.filepaths)

def __getitem__(self, idx):

image_path = self.filepaths[idx]

img, hair_removed_img = apply_dullrazor(image_path)

# Convert to tensor

if self.transform:

img = self.transform(hair_removed_img)
label = self.labels[idx]

return img, label

18 # Data augmentation without changing color

data_transforms = transforms.Compose([

transforms.ToPILImage(),

transforms.Resize((224, 224)),

transforms.RandomHorizontalFlip(),

transforms.RandomVerticalFlip(),

transforms.RandomRotation(15),

transforms.ToTensor(),

transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

14 ])

# Load dataset

image_dir = '/kaggle/working/Augumented_images_new'

dataset = SkinCancerDataset(image_dir, transform=data_transforms)

# Train, validation, test split

train_size = int(0.7 * len(dataset))

val_size = int(0.15 * len(dataset))

test_size = len(dataset) - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size,

test_size])

# Data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# VGG19 model with dropout and regularization

class VGG19Modified(nn.Module):

def __init__(self, num_classes):

super(VGG19Modified, self).__init__()

self.vgg = models.vgg19(pretrained=True)

# Freeze feature extraction layers

for param in self.vgg.features.parameters():

param.requires_grad = False

# Modify the classifier

self.vgg.classifier = nn.Sequential(

nn.Linear(25088, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, 4096),

nn.ReLU(),

nn.Dropout(0.5),

nn.Linear(4096, num_classes)

8 def forward(self, x):

x = self.vgg(x)

return x
# Instantiate model, 18 loss function, and optimizer

num_classes = len(os.listdir(image_dir)) # Assuming folder names are class labels

model = VGG19Modified(num_classes=num_classes)

# Use GPU if available

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = nn.DataParallel(model)

model = model.to(device)

# Loss and optimizer

6 criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.001) # Added

weight_decay for L2 regularization

# Training and validation function

def train_model(model, criterion, optimizer, train_loader, val_loader, num_epochs=20):

train_loss = []

val_loss = []

train_acc = []

val_acc = []

for epoch in range(num_epochs):

model.train()

running_loss = 0.0

correct = 0

total = 0
for images, labels in train_loader:

images, labels = images.to(device), labels.to(device)

optimizer.zero_grad()

outputs = model(images)

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

_, predicted 5 = torch.max(outputs, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()

epoch_loss = running_loss / len(train_loader)

epoch_acc = correct / total

train_loss.append(epoch_loss)

train_acc.append(epoch_acc)

# Validation

model.eval()

val_running_loss = 0.0

val_correct = 0

val_total = 0

with torch.no_grad():

for images, labels in val_loader:

images, labels = images.to(device), labels.to(device)


outputs = model(images)

loss = criterion(outputs, labels)

val_running_loss += loss.item()

5 _, predicted = torch.max(outputs, 1)

val_total += labels.size(0)

val_correct += (predicted == labels).sum().item()

val_epoch_loss = val_running_loss / len(val_loader)

val_epoch_acc = val_correct / val_total

val_loss.append(val_epoch_loss)

val_acc.append(val_epoch_acc)

print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {epoch_loss:.4f}, Train Acc:

{epoch_acc:.4f}, Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}")

return train_loss, train_acc, val_loss, val_acc

# 3 Training the model

num_epochs = 20

train_loss, train_acc, val_loss, val_acc = train_model(model, criterion, optimizer,

train_loader, val_loader, num_epochs)

# Plot accuracy and loss

def plot_metrics(train_loss, val_loss, train_acc, val_acc):

epochs = range(1, len(train_loss) + 1)

plt.figure(figsize=(14, 5))
# Loss plot

plt.subplot(1, 2, 1)

plt.plot(epochs, train_loss, 'b', label='Train Loss')

plt.plot(epochs, val_loss, 'r', label='Val Loss')

plt.title('Loss')

plt.xlabel('Epochs')

plt.ylabel('Loss')

plt.legend()

# Accuracy plot

plt.subplot(1, 2, 2)

plt.plot(epochs, train_acc, 'b', label='Train Acc')

plt.plot(epochs, val_acc, 'r', label='Val Acc')

plt.title('Accuracy')

plt.xlabel('Epochs')

plt.ylabel('Accuracy')

plt.legend()

plt.show()

plot_metrics(train_loss, val_loss, train_acc, val_acc)

# Testing the model

model.eval()

test_correct = 0

test_total 8 =0
with torch.no_grad():

for images, labels in test_loader:

images, labels = images.to(device), labels.to(device)

outputs = model(images)

5 _, predicted = torch.max(outputs, 1)

test_total += labels.size(0)

test_correct += (predicted == labels).sum().item()

test_acc = test_correct / test_total

print(f'Test Accuracy: {test_acc:.4f}')

# Save the model's state_dict (weights only)

torch.save(model.state_dict(), "skin_cancer_classification_VGG19_20epoch_FT.pth")

VIT MODEL

22 import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files 3 are available in the read-only "../input/" directory

# For example, running this (by clicking run or pressing Shift+Enter) will list all files under

the input directory

import os

for dirname, _, filenames in os.walk('/kaggle/input'):


for filename in filenames:

print(os.path.join(dirname, filename))

# 14 You can write up to 20GB to the current directory (/kaggle/working/) that gets

preserved as output when you create a version using "Save & Run All"

# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of

the current session

# Install gdown

!pip install gdown

# 37 Download the file from Google Drive

# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-

FGz/view?usp=sharing

# !gdown --id 1_1vTv57QQcDCrbV6bzwpgTUZhd4y52ke

!gdown --id 1oEtO0YA9nHr0H2unX4QR-2bHNX9b-FGz

# Extract the tar file

import tarfile

# Open the tar file

with tarfile.open('Augumented_images_new.tar', 'r') as tar:

tar.extractall() # Extract to the current working directory

# List the extracted files to verify

import os

# Check the contents of the current directory


print(os.listdir())

import os

import cv2

import torch

import torch.nn as nn

from torch.optim import AdamW

from torchvision import transforms

from torch.utils.data import DataLoader, random_split

from transformers import ViTForImageClassification, ViTConfig

18 import matplotlib.pyplot as plt

# Apply DullRazor to all images

def apply_dullrazor(image_path):

img = cv2.imread(image_path)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Convert the image to grayscale

gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Black hat filter

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))

blackhat = cv2.morphologyEx(gray_scale, cv2.MORPH_BLACKHAT, kernel)

# Gaussian filter

bhg = cv2.GaussianBlur(blackhat, (3, 3), cv2.BORDER_DEFAULT)

# Binary thresholding (MASK)


ret, mask = cv2.threshold(bhg, 10, 255, cv2.THRESH_BINARY)

# Replace pixels of the mask

dst = cv2.inpaint(img, mask, 6, cv2.INPAINT_TELEA)

return dst

# Dataset directory

data_dir = '/kaggle/working/Augumented_images_new'

# Preprocessing transformation (Resize, Normalize)

preprocess 8 = transforms.Compose([

transforms.ToPILImage(),

transforms.Resize((384, 384)), # ViT input size

transforms.ToTensor(),

transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])

])

# Custom Dataset Class for loading and 2 preprocessing the images

class SkinCancerDataset(torch.utils.data.Dataset):

def __init__(self, root_dir, transform=None):

self.root_dir = root_dir

self.transform = transform

self.image_paths = []

self.labels = []

self.classes = os.listdir(root_dir)

for label, class_name in enumerate(self.classes):

class_dir = os.path.join(root_dir, class_name)


for img_name in os.listdir(class_dir):

img_path = os.path.join(class_dir, img_name)

self.image_paths.append(img_path)

self.labels.append(label)

8 def __len__(self):

return len(self.image_paths)

def __getitem__(self, idx):

img_path = self.image_paths[idx]

label = self.labels[idx]

img = apply_dullrazor(img_path) # Apply hair removal

if self.transform:

img = self.transform(img)

return img, label

# Load Dataset and Create Train/Validation/Test splits

dataset = SkinCancerDataset(data_dir, transform=preprocess)

train_size = int(0.7 * len(dataset))

val_size = int(0.15 * len(dataset))

test_size = len(dataset) - train_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, 31 [train_size, val_size,

test_size])

# Dataloaders

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)


test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

# ViT Model Configuration with dropout

config = ViTConfig.from_pretrained('google/vit-base-patch16-384',

num_labels=len(dataset.classes),

hidden_dropout_prob=0.3, # Dropout in hidden layers

attention_probs_dropout_prob=0.3) # Dropout in attention layers

# Load Pretrained ViT model with dropout

model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-384',

config=config, ignore_mismatched_sizes=True)

model = nn.DataParallel(model)

model.to('cuda' if torch.cuda.is_available() else 'cpu')

# Loss and Optimizer with L2 regularization

6 criterion = nn.CrossEntropyLoss()

optimizer = AdamW(model.parameters(), lr=0.0001, weight_decay=1e-4) # AdamW with

weight decay for L2 regularization

# Training and Validation Function

def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=20):

train_acc_history, val_acc_history = [], []

train_loss_history, val_loss_history = [], []

8 for epoch in range(num_epochs):

model.train()

train_loss, correct_train = 0.0, 0

total_train = 0
for images, labels in train_loader:

images, labels = images.to('cuda'), labels.to('cuda')

optimizer.zero_grad()

outputs = model(images).logits

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

train_loss += loss.item()

5 _, predicted = torch.max(outputs, 1)

correct_train += (predicted == labels).sum().item()

total_train += labels.size(0)

train_acc = correct_train / total_train

train_acc_history.append(train_acc)

train_loss_history.append(train_loss / len(train_loader))

model.eval()

val_loss, correct_val = 0.0, 0

total_val 8 =0

with torch.no_grad():

for images, labels in val_loader:

images, labels = images.to('cuda'), labels.to('cuda')

outputs = model(images).logits

loss = criterion(outputs, labels)

val_loss += loss.item()
5 _, predicted = torch.max(outputs, 1)

correct_val += (predicted == labels).sum().item()

total_val += labels.size(0)

val_acc = correct_val / total_val

val_acc_history.append(val_acc)

val_loss_history.append(val_loss / len(val_loader))

print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss / len(train_loader):.4f}, '

f'Train Acc: {train_acc:.4f}, Val Loss: {val_loss / len(val_loader):.4f}, Val Acc:

{val_acc:.4f}')

return train_acc_history, val_acc_history, train_loss_history, val_loss_history

# 3 Train the model

train_acc, val_acc, train_loss, val_loss = train_model(model, train_loader, val_loader,

criterion, optimizer, num_epochs=30)

# Plot accuracy and loss

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)

plt.plot(train_acc, label='Train Accuracy')

plt.plot(val_acc, label='Validation Accuracy')

plt.xlabel('Epochs')

plt.ylabel('Accuracy')

plt.legend()

plt.subplot(1, 2, 2)
plt.plot(train_loss, label='Train Loss')

plt.plot(val_loss, label='Validation Loss')

plt.xlabel('Epochs')

plt.ylabel('Loss')

plt.legend()

plt.show()

# Save the model's state_dict (weights only)

torch.save(model.state_dict(), "skin_cancer_classification_VIT_20epoch_FT.pth")

BEITMODEL

# Install gdown

!pip install gdown

# Download the file from Google Drive

# https://drive.google.com/file/d/1oEtO0YA9nHr0H2unX4QR-2bHNX9b-

FGz/view?usp=sharing

# !gdown --id 1_1vTv57QQcDCrbV6bzwpgTUZhd4y52ke

!gdown --id 1oEtO0YA9nHr0H2unX4QR-2bHNX9b-FGz

# Extract the tar file

import tarfile
# Open the tar file

with tarfile.open('Augumented_images_new.tar', 'r') as tar:

tar.extractall() # Extract to the current working directory

# List the extracted files to verify

import os

# Check the contents of the current directory

print(os.listdir())

9 import os

import cv2

import torch

import random

import numpy as np

import matplotlib.pyplot as plt

from torchvision import transforms

from PIL import Image

from sklearn.model_selection import train_test_split

from torch.utils.data import DataLoader, Dataset

from transformers import BeitForImageClassification, BeitFeatureExtractor, BeitConfig,

BeitModel

8 import torch.nn as nn

import torch.optim as optim

# Function to apply dull razor effect

def apply_dullrazor(image_path):
img = cv2.imread(image_path)

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

gray_scale = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))

blackhat = cv2.morphologyEx(gray_scale, cv2.MORPH_BLACKHAT, kernel)

bhg = cv2.GaussianBlur(blackhat, (3, 3), cv2.BORDER_DEFAULT)

ret, mask = cv2.threshold(bhg, 10, 255, cv2.THRESH_BINARY)

dst = cv2.inpaint(img, mask, 6, cv2.INPAINT_TELEA)

return img, dst

9 import os

import cv2

import torch

import random

import numpy as np

import matplotlib.pyplot as plt

from torchvision import transforms

from PIL import Image

from sklearn.model_selection import train_test_split

8 from torch.utils.data import DataLoader, Dataset

from transformers import BeitForImageClassification, BeitFeatureExtractor

from IPython.display import clear_output

import torch.nn as nn

import torch.optim as optim

# Custom dataset class

class SkinCancerDataset(Dataset):

def __init__(self, img_paths, labels, transform=None):


self.img_paths = img_paths

self.labels = labels

self.transform = transform

def __len__(self):

return len(self.img_paths)

def __getitem__(self, idx):

img_path = self.img_paths[idx]

label = self.labels[idx]

# Apply hair removal

_, processed_img = apply_dullrazor(img_path)

pil_img = Image.fromarray(processed_img)

if self.transform:

pil_img = self.transform(pil_img)

return pil_img, label

9 import os

import pandas as pd

# Define the directory containing the subfolders

directory = '/kaggle/working/Augumented_images_new'

# Initialize lists to hold image names and subfolder names

image_names = []
subfolder_names = []

# Loop through each subfolder and collect image names

for subfolder in os.listdir(directory):

subfolder_path = os.path.join(directory, subfolder)

# Check if it's a directory (subfolder)

if os.path.isdir(subfolder_path):

for image_name in os.listdir(subfolder_path):

image_names.append(image_name)

subfolder_names.append(subfolder)

18 # Create a DataFrame and save it to a CSV file

df = pd.DataFrame({'Image Name': image_names, 'Subfolder Name': subfolder_names})

df.to_csv('/kaggle/working/image_folder_mapping.csv', index=False)

print("CSV file created successfully!")

# Prepare your dataset path

dataset_path = "/kaggle/working/Augumented_images_new" # Update with your actual

path

# Define class names and corresponding indices

class_names = ['NV', 'VASC', 'MEL', 'AK', 'BCC', 'DF', 'SCC', 'BKL']

class_to_idx = {name: idx for idx, name in enumerate(class_names)}

idx_to_class = {idx: name for idx, name in enumerate(class_names)}

# Prepare data with class names


img_paths = []

labels = []

# Populate img_paths and labels using the class-to-index mapping

for class_name in class_names:

class_folder = os.path.join(dataset_path, class_name)

if os.path.isdir(class_folder):

for img_name in os.listdir(class_folder):

img_paths.append(os.path.join(class_folder, img_name))

labels.append(class_to_idx[class_name]) # Use the mapped numeric label

# Rest of the code remains the same

from torch.utils.data import DataLoader, Dataset

from sklearn.model_selection import train_test_split

# Train-test-validation split

X_train, X_temp, y_train, y_temp = train_test_split(img_paths, labels, test_size=0.2,

random_state=42, stratify=labels)

X_val, X_test, 31 y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5,

random_state=42, stratify=y_temp)

# Transformations

transform = transforms.Compose([

transforms.Resize((244, 244)), # Resize to 244x244

transforms.ToTensor(),

])

# Custom dataset class with dull razor effect


class SkinCancerDataset(Dataset):

def __init__(self, img_paths, labels, transform=None):

self.img_paths = img_paths

self.labels = labels

self.transform = transform

def __len__(self):

return len(self.img_paths)

8 def __getitem__(self, idx):

img_path = self.img_paths[idx]

label = self.labels[idx]

# Apply hair removal

_, processed_img = apply_dullrazor(img_path)

pil_img = Image.fromarray(processed_img)

if self.transform:

pil_img = self.transform(pil_img)

return pil_img, label

# Create datasets

train_dataset = SkinCancerDataset(X_train, y_train, transform)

val_dataset = SkinCancerDataset(X_val, y_val, transform)

test_dataset = SkinCancerDataset(X_test, y_test, transform)

# Create data loaders


train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

8 import torch

import torch.nn as nn

import torch.optim as optim

from transformers import BeitForImageClassification, BeitConfig

# Set device

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the configuration for the BEiT model with modified dropout rates

config = BeitConfig.from_pretrained("microsoft/beit-base-patch16-224")

config.hidden_dropout_prob = 0.1 # Dropout in hidden layers

config.attention_probs_dropout_prob = 0.1 # Dropout in attention layers

# Initialize model with modified configuration

21 model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224",

config=config)

# Add additional dropout layer before classification head

class ModifiedBeitModel(nn.Module):

def __init__(self, base_model, dropout_prob=0.5):

super(ModifiedBeitModel, self).__init__()

self.base_model = base_model

self.dropout = nn.Dropout(dropout_prob) # Additional dropout before classification

head
self.classifier = base_model.classifier

8 def forward(self, x):

# Get the hidden states from the base model

x = self.base_model(x).logits

# Apply dropout before the classifier

x = self.dropout(x)

return x

# Wrap the base 23 model with the additional dropout layer

model = ModifiedBeitModel(model)

# Enable DataParallel if multiple GPUs are available

model = nn.DataParallel(model)

model.to(device)

# Define loss function and optimizer

6 criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=2e-5)

# Training loop with class names

num_epochs = 20

train_losses, val_losses, train_acc, val_acc = [], [], [], []

for epoch in range(num_epochs):

model.train()

running_loss = 0.0

correct = 0
total = 0

for imgs, labels in train_loader:

imgs, labels = imgs.to(device), labels.to(device)

optimizer.zero_grad()

outputs = model(imgs)

loss = criterion(outputs, labels)

loss.backward()

optimizer.step()

running_loss += loss.item()

16 _, predicted = torch.max(outputs.data, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()

train_losses.append(running_loss / len(train_loader))

train_acc.append(correct / total)

# Validation

model.eval()

val_running_loss = 0.0

val_correct = 0

val_total 32 =0

with torch.no_grad():

for imgs, labels in val_loader:

imgs, labels = imgs.to(device), labels.to(device)

outputs = model(imgs)
loss = criterion(outputs, labels)

val_running_loss += loss.item()

_, val_predicted = torch.max(outputs.data, 1)

val_total += labels.size(0)

val_correct += (val_predicted == labels).sum().item()

val_losses.append(val_running_loss / len(val_loader))

val_acc.append(val_correct / val_total)

print(f"Epoch [{epoch + 1}/{num_epochs}], Train Loss: {train_losses[-1]:.4f}, Train Acc:

{train_acc[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}, Val Acc: {val_acc[-1]:.4f}")

print("Training complete.")

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score,

precision_score, recall_score, f1_score, roc_auc_score, roc_curve, precision_recall_curve

40 import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import label_binarize

import numpy as np

6 # Set model to evaluation mode

model.eval()

test_correct = 0

test_total = 0

all_preds = []

all_labels = []
# Collect predictions and labels

with torch.no_grad():

for imgs, labels in test_loader:

imgs, labels = imgs.to(device), 16 labels.to(device)

outputs = model(imgs) # Removed .logits as model output should be logits

_, predicted = torch.max(outputs.data, 1)

test_total += labels.size(0)

test_correct += (predicted == labels).sum().item()

# Store predicted and true labels for later analysis

all_preds.extend(predicted.cpu().numpy())

all_labels.extend(labels.cpu().numpy())

# Calculate overall accuracy

test_acc = accuracy_score(all_labels, all_preds)

print(f"Test Accuracy: {test_acc:.4f}")

# Calculate 3 precision, recall, and F1-score

precision = precision_score(all_labels, all_preds, average='weighted')

recall = recall_score(all_labels, all_preds, average='weighted')

f1 = f1_score(all_labels, all_preds, average='weighted')

print(f"Precision: {precision:.4f}")

print(f"Recall: {recall:.4f}")

print(f"F1 Score: {f1:.4f}")


# Generate classification report

print("\nClassification Report:")

print(classification_report(all_labels, all_preds, target_names=class_names))

# Display 2 the confusion matrix with class names

conf_matrix = confusion_matrix(all_labels, all_preds)

plt.figure(figsize=(8, 6))

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',

xticklabels=class_names, yticklabels=class_names)

plt.title('Confusion Matrix')

plt.xlabel('Predicted Labels')

plt.ylabel('True Labels')

plt.show()

# 10 ROC Curve and AUC Score

# One-vs-rest approach

all_labels_binarized = label_binarize(all_labels, classes=list(range(len(class_names))))

all_preds_binarized = label_binarize(all_preds, classes=list(range(len(class_names))))

fpr = {}

tpr = {}

roc_auc 31 = {}

for i in range(len(class_names)):

fpr[i], tpr[i], _ = roc_curve(all_labels_binarized[:, i], all_preds_binarized[:, i])

roc_auc[i] = roc_auc_score(all_labels_binarized[:, i], all_preds_binarized[:, 36 i])

# Plot ROC curves for each class


plt.figure(figsize=(10, 8))

for i, label in enumerate(class_names):

plt.plot(fpr[i], tpr[i], label=f"ROC curve for {label} (area = {roc_auc[i]:.2f})")

plt.plot([0, 1], [0, 1], 'k--') # Diagonal line for random classifier

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel("False Positive Rate")

plt.ylabel("True Positive Rate")

plt.title("ROC Curves for Each Class")

plt.legend(loc="lower right")

plt.show()

# Precision-Recall Curve

plt.figure(figsize=(10, 8))

for i, label in enumerate(class_names):

precision, recall, _ = precision_recall_curve(all_labels_binarized[:, i],

all_preds_binarized[:, i])

plt.plot(recall, precision, label=f"Precision-Recall curve for {label}")

plt.xlabel("Recall")

plt.ylabel("Precision")

plt.title("Precision-Recall Curves for Each Class")

plt.legend(loc="lower left")

plt.show()

# Save 7 the trained model

model_save_path = 'scd_beit_2.pth'

torch.save(model.state_dict(), model_save_path)

print(f"Model saved to {model_save_path}")


UI

import streamlit as st

import torch

9 from PIL import Image

from torchvision import transforms

from transformers import BeitForImageClassification, BeitFeatureExtractor

import os

# Load the 21 model

from transformers import BeitForImageClassification

# Load the pre-trained model

model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")

# Modify the classifier to have 8 output classes

model.classifier = torch.nn.Linear(model.classifier.in_features, 8)

# Now load the weights (with strict=False if needed)

model.load_state_dict(torch.load("C:/Users/mahid/Downloads/scd_beit_1.pth",

map_location=torch.device('cpu')), strict=False)

model.eval()

model.to('cuda' 8 if torch.cuda.is_available() else 'cpu')

# Define the image transformations

transform = transforms.Compose([
transforms.Resize((224, 224)), # Resize to match model input size

transforms.ToTensor(),

])

# Class labels

class_labels = {

0: "NV" 1 ,

1: "VASC",span class='highlighted color-1'>>

2: "MEL",span class='highlighted color-1'>>

3: "AK",span class='highlighted color-1'>>

4: "BCC",span class='highlighted color-1'>>

5: "DF",span class='highlighted color-1'>>

6: "SCC",span class='highlighted color-1'>>

7: "BKL"

# Function to predict class for a single image

def predict_image_class(image, model, device):

image = transform(image).unsqueeze(0).to(device) # Add batch dimension and move to

device

with torch.no_grad():

outputs = model(image).logits

5 _, predicted = torch.max(outputs, 1)

return predicted.item()

# Streamlit Dashboard

st.title("Image Classification Dashboard")

st.write("Upload an image to classify it.")


# Upload image

uploaded_file = st.file_uploader("Choose an image", type=["jpg", "png", "jpeg"])

if uploaded_file:

st.write("Classifying the image...")

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Open the image file

image = Image.open(uploaded_file).convert('RGB')

# Predict class

predicted_class = predict_image_class(image, model, device)

label = class_labels.get(predicted_class, "Unknown")

# Display result

st.image(image, caption=f"Predicted Class: {label} (Class {predicted_class})",

use_column_width=True)

REFERENCES

[1]. Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra, Abdus

Sobur.(2024). A Study on the Application of Machine Learning and Deep Learning

Techniques for Skin Cancer Detection. International Journal of Computer and Systems

Engineering Vol:18, No:1, 2024. DOI:10.5281/zenodo.10525954

[2]. Himel, G. M. S., Islam, M. M., Al-Aff, K. A., Karim, S. I., & Sikder, M. K. U. (2024). Skin
cancer segmentation and classification using Vision Transformer for automatic analysis in

Dermatoscopy-Based noninvasive digital system. International Journal of Biomedical

Imaging, 2024, 1–18. https://doi.org/10.1155/2024/3022192

[3]. Naeem, A., Anees, T., Khalil, M., Zahra, K., Naqvi, R. A., & Lee, S. (2024). SNC_NET:

Skin cancer detection by integrating handcrafted and Deep Learning-Based features using

dermoscopy images. Mathematics, 12(7), 1030. https://doi.org/10.3390/math12071030

[4]. Vachmanus, S., Noraset, T., Piyanonpong, W., Rattananukrom, T., & Tuarob, S.

(2023). DeepMetaForge: a Deep Vision-Transformer Metadata-Fusion network for

automatic skin lesion classification. IEEE Access, 11,

145467–145484. https://doi.org/10.1109/access.2023.3345225

[5]. Yang, G., Luo, S., & Greer, P. (2023). A novel Vision transformer model for skin cancer

classification. Neural Processing Letters, 55(7),

9335–9351. https://doi.org/10.1007/s11063-023-11204-5

[6] Pacal, I., Alaftekin, M., & Zengul, F. D. (2024). Enhancing Skin Cancer Diagnosis Using

Swin Transformer with Hybrid Shifted Window-Based Multi-head Self-attention and

SwiGLU-Based MLP. Deleted Journal. https://doi.org/10.1007/s10278-024-01140-8

[7]. Gulzar, Y., & Khan, S. A. (2022). Skin lesion Segmentation Based on Vision

Transformers and Convolutional Neural Networks—A Comparative study. Applied

Sciences, 12(12), 5990. https://doi.org/10.3390/app12125990

[8]. Arshed, M. A., Mumtaz, S., Ibrahim, M., Ahmed, S., Tahir, M., & Shafi, M. (2023). Multi-

Class skin cancer classification using vision transformer networks and convolutional neural

Network-Based Pre-Trained models. Information, 14(7),

415. https://doi.org/10.3390/info14070415

[9]. Cirrincione, G., Cannata, S., Cicceri, G., Prinzi, F., Currieri, T., Lovino, M., Militello, C.,

Pasero, E., & Vitabile, S. (2023). Transformer-Based approach to melanoma

Detection. Sensors, 23(12), 5677. https://doi.org/10.3390/s23125677


[10] A large dataset to enhance skin cancer classification with Transformer-Based deep

neural networks. (2024). IEEE Journals & Magazine | IEEE

Xplore. https://ieeexplore.ieee.org/document/10623626?denied=

[11] Xu, J., Gao, Y., Liu, W., Huang, K., Zhao, S., Lu, L., DAMO Academy, Alibaba Group,

Wang, X., Hua, X.-S., Wang, Y., Chen, X., & Department of Dermatology, Xiangya Hospital

Central South University. (2021). RemixFormer: a transformer model for precision skin

tumor differential diagnosis via multi-modal imaging and non-imaging data. In X-

SkinTumor-10 Dataset [Journal-

article]. https://www.cs.jhu.edu/~lelu/publication/MICCAI%202022_paper1023_RemixForm

er.pdf

[12] Nahata, H., & Singh, S. P. (2020). Deep learning solutions for skin cancer detection

and diagnosis. In Learning and analytics in intelligent systems (pp.

159–182). https://doi.org/10.1007/978-3-030-40850-3_8

[13]. Performance enhancement of skin cancer classification using computer vision.

(2023). IEEE Journals & Magazine | IEEE

Xplore. https://ieeexplore.ieee.org/abstract/document/10181245

[14]. Rashid, J., Ishfaq, M., Ali, G., Saeed, M. R., Hussain, M., Alkhalifah, T., Alturise, F., &

Samand, N. (2022). Skin cancer disease detection using transfer learning

technique. Applied Sciences, 12(11), 5714. https://doi.org/10.3390/app12115714

[15]. Gregoor, A. M. S., Sangers, T. E., Bakker, L. J., Hollestein, L., De Groot, C. a. U. –.,

Nijsten, T., & Wakkee, M. (2023). An artificial intelligence based app for skin cancer

detection evaluated in a population based setting. Npj Digital

Medicine, 6(1). https://doi.org/10.1038/s41746-023-00831-w

2
2

2
Sources
https://link.springer.com/article/10.1007/s11063-023-11204-5
1 INTERNET
2%
https://link.springer.com/article/10.1007/s10278-024-01140-8
2 INTERNET
2%
https://www.nature.com/articles/s41598-022-22644-9
3 INTERNET
1%
https://www.mdpi.com/2079-3197/11/3/52
4 INTERNET
1%
https://medium.com/@frederik.vl/interpreting-training-validation-accuracy-and-loss-cf16f0d5329f
5 INTERNET
1%
https://discuss.pytorch.org/t/using-mseloss-instead-of-crossentropy-for-ordinal-regression-
6 classification/102473
INTERNET
1%
https://www.nature.com/articles/s41598-022-22882-x
7 INTERNET
1%
https://arcwiki.rs.gsu.edu/en/dali/pytorch_basic_data_loader
8 INTERNET
1%
https://github.com/aleju/imgaug/issues/839
9 INTERNET
<1%
https://www.mdpi.com/2076-3417/12/11/5714
10 INTERNET
<1%
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9227226/
11 INTERNET
<1%
https://www.ncbi.nlm.nih.gov/books/NBK603721
12 INTERNET
<1%
https://viso.ai/deep-learning/attention-mechanisms
13 INTERNET
<1%
https://www.tensorflow.org/tutorials/load_data/images
14 INTERNET
<1%
https://www.geeksforgeeks.org/vgg-net-architecture-explained
15 INTERNET
<1%
https://stackoverflow.com/questions/67295494
16 INTERNET
<1%
https://towardsdatascience.com/transformers-explained-visually-part-2-how-it-works-step-by-step-
17 b49fa4a64f34
INTERNET
<1%
https://www.restack.io/p/beginners-guide-to-artificial-intelligence-answer-keras-validation-accuracy-cat-ai
18 INTERNET
<1%
https://www.mdpi.com/2673-5261/5/2/10
19 INTERNET
<1%
https://stackoverflow.com/questions/59167577/relationship-between-training-accuracy-and-validation-
20 accuracy
INTERNET
<1%
https://medium.com/voxel51/streamline-computer-vision-workflows-with-hugging-face-transformers-and-
21 fiftyone-0b377d4ac745
INTERNET
<1%
https://stackoverflow.com/questions/76767585/importerror-cannot-im…
22 INTERNET
<1%
https://machinelearningmastery.com/overfitting-and-
23 INTERNET
<1%
ieeexplore.ieee.org/document/10366268
24 INTERNET
<1%
https://towardsdatascience.com/different-ways-of-improving-training-accuracy-c526db15a5b2
25 INTERNET
<1%
https://www.irma-international.org/affiliate/s-geetha/306346/
26 INTERNET
<1%
https://www.pennmedicine.org/.../types-of-melanoma/melanoma-sk…
27 INTERNET
<1%
https://pubmed.ncbi.nlm.nih.gov/38839675
28 INTERNET
<1%
https://resource.hix.ai/messages/professional-thank-you-messages
29 INTERNET
<1%
https://link.springer.com/article/10.1007/s10462-023-10595-0
30 INTERNET
<1%
https://stackoverflow.com/questions/38250710/how-to-split-data-into-3-sets-train-validation-and-test
31 INTERNET
<1%
https://medium.com/@qakunnath/fine-tuning-a-pre-trained-vgg16-mod…
32 INTERNET
<1%
https://www.bing.com/ck/a?!&&p=4ae09e23c18383fa8fe184f2c9b342049f416fcead66df969fea51b77e7bcf48Jm
ltdHM9MTczMTYyODgwMA&ptn=3&ver=2&hsh=4&fclid=11014217-5de6-6430-14ff-572f5c4b6571&u=a1L2lt
YWdlcy9zZWFyY2g_cT1tZWxhbm9tYStjYW4rb3JpZ2luYXRlK2Zyb20rYWxsK2FyZWFzK29mK3RoZStza2luK2FuZCt
vZnRlbitiZWdpbnMrYXMrYStuZXcrb3IrY2hhbmdpbmcrbW9sZSt3aXRoK2lycmVndWxhcitib3JkZXJzK3ZhcmllZ2F0
33 ZWQrY29sb3JhdGlvbithbmQrYXN5bW1ldHJ5JnFwdnQ9bWVsYW5vbWErY2FuK29yaWdpbmF0ZStmcm9tK2FsbC
thcmVhcytvZit0aGUrc2tpbithbmQrb2Z0ZW4rYmVnaW5zK2FzK2ErbmV3K29yK2NoYW5naW5nK21vbGUrd2l0aCt
pcnJlZ3VsYXIrYm9yZGVycyt2YXJpZWdhdGVkK2NvbG9yYXRpb24rYW5kK2FzeW1tZXRyeSZGT1JNPUlHUkU&ntb
=1
INTERNET
<1%
https://chennai.vit.ac.in/member/dr-v-s-kanchana-bhaaskaran/
34 INTERNET
<1%
https://link.springer.com/article/10.1007/s00105-023-05289-1
35 INTERNET
<1%
https://stackoverflow.com/questions/66302994
36 INTERNET
<1%
https://www.howtogeek.com/791362/how-to-download-files-and-folders-from-google-drive/
37 INTERNET
<1%
pmc.ncbi.nlm.nih.gov/articles/PMC4445438/
38 INTERNET
<1%
bing.com/videos
39 INTERNET
<1%
https://stackoverflow.com/questions/72029842
40 INTERNET
<1%
https://link.springer.com/content/pdf/10.1007/978-3-031-59318-5_5.pdf
41 INTERNET
<1%
https://medium.com/@bragadeeshs/max-pooling-streamlining-neural...
42 INTERNET
<1%
https://www.kolena.com/blog/4-types-of-machine-learning-embeddings-and-4-embedding-
models/#:~:text=Tokenization and embedding layer: When processing text, LLMs,to a high-dimensional vector
43 that encodes semantic information.
INTERNET
<1%
https://github.com/OpenAI/CLIP/issues/248
44 INTERNET
<1%
https://deepgram.com/learn/visualizing-and-explaining-transformer...
45 INTERNET
<1%
https://stackoverflow.com/questions/71615680/problem-with-loading-image-data-using-pytorch-dataset-and-
46 dataloader
INTERNET
<1%

EXCLUDE CUSTOM MATCHES OFF

EXCLUDE QUOTES ON

EXCLUDE BIBLIOGRAPHY ON

You might also like