Density and Bi-Rads Classification
Density and Bi-Rads Classification
OF BREAST MAMMOGRAPHY
by
Cengizhan Şahin
Yeditepe University
Faculty of Engineering
Department of Computer Engineering
2023
i
DENSITY AND BI-RADS CLASSIFICATION
OF BREAST MAMMOGRAPHY
APPROVED BY:
ii
ACKNOWLEDGEMENTS
First of all I would like to thank my advisor Prof. Dr. Emin Erkan Korkmaz for his
guidance and support throughout my project.
iii
ABSTRACT
Breast cancer is the general term for malignant (cancerous) tumors that occur in breast
tissue. Among women worldwide, breast cancer is the most common type of cancer. Ac-
cording to the World Health Organization (WHO), approximately 2.1 million women are
diagnosed with breast cancer each year and around 627,000 women die from the disease
annually. In Turkey, approximately 40,000 women are diagnosed with breast cancer each
year according to Turkey Cancer Registry data. Artificial intelligence (AI) technologies are
used for the diagnosis and treatment of breast cancer. These technologies are used to ana-
lyze images such as mammography images or very low-resolution images. Mammography
is a radiological imaging method that plays an important role in the early diagnosis of breast
cancer. Deep learning allows data analytics and learning processes to be performed using
artificial neural networks (ANNs). Deep learning algorithms can be used to diagnose breast
cancer from mammography images. These algorithms can identify signs of cancer in new
images by learning from pre-labeled mammography images. In this study, a new approach
will be developed using deep learning techniques for breast cancer diagnosis from mammog-
raphy images. Two different methods will be used for diagnosis. Deep learning models such
as ViT,a novel architecture constucted with ViT model, a CNN model and our optimized
CNN model with metaheuristic optimization algorithm will be tested for the classifier. We
proposed a novel architecture for this classification task and results will be compared at the
end.
iv
ÖZET
Meme kanseri, meme dokusunda meydana gelen kötü huylu (kanserli) tümörler için
kullanılan genel bir terimdir. Dünya çapında kadınlar arasında meme kanseri en yaygın
kanser türüdür. Dünya Sağlık Örgütü’ne (WHO) göre, her yıl yaklaşık 2,1 milyon kadına
meme kanseri teşhisi konulmakta ve yılda yaklaşık 627.000 kadın bu hastalıktan ölmekte-
dir. Türkiye’de ise Türkiye Kanser Kayıtları verilerine göre her yıl yaklaşık 40.000 kadına
meme kanseri teşhisi konulmaktadır. Meme kanserinin teşhis ve tedavisi için yapay zeka
(AI) teknolojileri kullanılmaktadır. Bu teknolojiler, mamografi görüntüleri veya çok düşük
çözünürlüklü görüntüler gibi görüntüleri analiz etmek için kullanılır. Mamografi, meme
kanserinin erken teşhisinde önemli rol oynayan radyolojik bir görüntüleme yöntemidir. De-
rin öğrenme, veri analitiği ve öğrenme süreçlerinin yapay sinir ağları (YSA) kullanılarak
gerçekleştirilmesini sağlar. Derin öğrenme algoritmaları, mamografi görüntülerinden meme
kanserini teşhis etmek için kullanılabilir. Bu algoritmalar, önceden etiketlenmiş mamo-
grafi görüntülerinden öğrenerek yeni görüntülerdeki kanser belirtilerini tespit edebilir. Bu
çalışmada, mamografi görüntülerinden meme kanseri teşhisi için derin öğrenme teknikleri
kullanılarak yeni bir yaklaşım geliştirilecektir. Teşhis için iki farklı yöntem kullanılacak-
tır. Sınıflandırıcı için ViT, ViT modeli ile oluşturulmuş yeni bir mimari, bir CNN modeli
ve metasezgisel optimizasyon algoritması ile optimize edilmiş CNN modelimiz gibi derin
öğrenme modelleri test edilecektir. Bu sınıflandırma görevi için yeni bir mimari önerdik ve
sonuçlar sonunda karşılaştırılacaktır.
v
TABLE OF CONTENTS
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ÖZET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF SYMBOLS/ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . xiii
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. Breast Cancer Classification . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1. Breast Imaging Reporting and Data System (BI-RADS) Classification 3
1.3.2. Mammographic Density . . . . . . . . . . . . . . . . . . . . . . . 3
1.4. Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . 4
1.5. Vision Transformer (ViT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6. Meta-heuristic Optimization Algorithms . . . . . . . . . . . . . . . . . . . 6
1.6.1. COOT Optimization Algorithm . . . . . . . . . . . . . . . . . . . 6
1.7. Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8. Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.9. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9.1. Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9.2. As a Basis for Other Applications . . . . . . . . . . . . . . . . . . 8
1.10. Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.11. Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.12. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1. BREAST CANCER STAGE CLASSIFICATION ON DIGITAL MAMMO-
GRAM IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2. Deep Learning RN-BCNN Model for Breast Cancer BI-RADS Classification 10
2.3. Comparison of segmentation-free and segmentation-dependent computer-aided
diagnosis of breast masses on a public mammography dataset . . . . . . . . 11
2.4. An integrated framework for breast mass classification and diagnosis using
stacked ensemble of residual neural networks . . . . . . . . . . . . . . . . . 11
2.5. Designing a grey wolf optimization based hyper-parameter optimized con-
volutional neural network classifier for skin cancer detection . . . . . . . . 12
3. ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
vi
3.1. TEKNOFEST Artificial Intelligence Competition Dataset in Healthcare . . 13
3.1.1. Preprocess of Dataset Excel . . . . . . . . . . . . . . . . . . . . . 14
3.1.2. Preprocess of Images . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2.1. NOT Bitwise Operator . . . . . . . . . . . . . . . . . . . 15
3.1.2.2. Horizontal Flipping . . . . . . . . . . . . . . . . . . . . 17
3.2. The mini-MIAS Mammography Dataset . . . . . . . . . . . . . . . . . . . 18
3.2.1. Preprocess of Labels Text File . . . . . . . . . . . . . . . . . . . . 19
3.2.2. Preproccess of Images . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3. Vision Transformer Model . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4. Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . 23
3.4.1. Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2. Max Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.3. Our CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5. CNN COOT Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6. Vision Transformer COOT Optimization on Classifier Layer . . . . . . . . 26
3.7. Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4. DESIGN AND IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . 29
4.1. Preprocess of MIAS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2. Preprocess of TEKNOFEST Mammography Dataset . . . . . . . . . . . . 31
4.3. Vision Transformer Model . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1. Lighting Module Implementation . . . . . . . . . . . . . . . . . . 34
4.4. CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4.1. Implementing the CNN Model . . . . . . . . . . . . . . . . . . . . 36
4.5. CNN COOT Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5.1. Implementing the COOT Optimization Algorithm . . . . . . . . . . 39
4.6. ViT COOT Optimization on Classifier Layer . . . . . . . . . . . . . . . . . 40
4.6.1. Implementing the Custom ViT Model . . . . . . . . . . . . . . . . 42
5. TEST AND RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1. MIAS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.1. Vision Transformer Model without COOT Optimization . . . . . . 44
5.1.2. Vision Transformer Model with COOT Optimization . . . . . . . . 45
5.1.3. Vision Transformer Model with Unfrozen Layers . . . . . . . . . . 46
5.1.4. CNN Model without COOT Optimization . . . . . . . . . . . . . . 48
5.1.5. CNN Model with COOT Optimization . . . . . . . . . . . . . . . . 48
5.1.6. Comparison of CNN models with ViT Models . . . . . . . . . . . 49
5.1.7. Comparison of Results with Other Studies on mini-MIAS Dataset . 49
5.2. TEKNOFEST Mammography Dataset . . . . . . . . . . . . . . . . . . . . 50
5.2.1. Vision Transformer Model . . . . . . . . . . . . . . . . . . . . . . 50
vii
5.2.2. CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.1. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
viii
LIST OF FIGURES
Figure 1.1. Distribution of the Top Five Most Common Cancer Types in Women
Worldwide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Figure 1.2. Distribution of the Top 10 Most Common Cancer Types in Women as
a Proportion of Total Cancer. . . . . . . . . . . . . . . . . . . . . . . 2
Figure 3.10. LCC Image After Bitwise NOT Operator was Used . . . . . . . . . . 18
ix
Figure 3.15. Basic Flowchart of COOT algorithm . . . . . . . . . . . . . . . . . . 26
x
Figure 5.17. Loss Graphs of CNN with COOT . . . . . . . . . . . . . . . . . . . . 49
xi
LIST OF TABLES
xii
LIST OF SYMBOLS/ABBREVIATIONS
xiii
1. INTRODUCTION
Cancer remains a leading cause of mortality worldwide. However, thanks to the quick
development of technology and the emergence of artificial intelligence, we now have the
opportunity to support medical experts and help in the early identification of cancer. This
project’s main goal is to investigate how cutting-edge algorithms may help to accurately
classify BI-RADS classes and mammography density. We want to improve patient outcomes
by enhancing the diagnostic process and utilizing the potential of artificial intelligence.
Breast cancer is a type of cancer that occurs when cells in the breast tissue abnormally
and uncontrollably divide and multiply, forming a lump or mass [1]. Breast cancer can be
categorized as normal or abnormal, and benign or malignant. Benign tumors grow slowly
and do not spread to neighboring tissues or other parts of the body, unlike malignant tumors
which can cause harm.
According to Turkey Cancer Statistics, breast cancer is the leading cause of mortality in
women in Turkey. Approximately 24,000 new cases are diagnosed every year, and about 5%
of these cases result in death [2]. Breast cancer can occur in women of any age after puberty,
but the risk increases with age.
In 2020, there were 2.3 million women diagnosed with breast cancer worldwide, resulting
in 685,000 deaths. As of the end of 2020, there were 7.8 million women who had been
diagnosed with breast cancer in the past 5 years and were still alive, making breast cancer
the most common cancer in the world. According to GLOBOCAN 2020 data released by
the International Agency for Research on Cancer, breast cancer is the most common cancer
among women worldwide, as shown in Figure 1. [3].
Figure 1.1. Distribution of the Top Five Most Common Cancer Types in Women
Worldwide
1
According to the 2017 Turkey Cancer Statistics by the Turkish Ministry of Health, Gen-
eral Directorate of Public Health, the mortality rate in women is shown in Figure 1.2 [3].
Figure 1.2. Distribution of the Top 10 Most Common Cancer Types in Women as a
Proportion of Total Cancer.
In addition, approximately 0.5-1% of breast cancer cases are seen in men. Breast cancer
treatment in men is the same as the methods applied to women.
Breast cancer treatment is particularly effective in terms of survival when detected early,
at around 90% levels. Early detection and diagnosis of breast cancer increases the likelihood
of successful treatment and provides the patient with a chance of complete recovery [4].
Computer-aided detection of cancerous cells is an important process in the early detection of
breast cancer and helps experts automate the detection process.
Various medical imaging tools are used to analyze the human body in the diagnosis, treat-
ment, or monitoring process of breast cancer. These are Mammography imaging tool, Ul-
trasound imaging tool, Magnetic Resonance Imaging (MRI), Histological imaging tool, and
Thermological imaging tools [5].
2
imaging. Mammography provides information about the thickness, shape, and structure of
breast tissues and can detect breast cancer in its early stages. It also helps to detect other
changes that may occur in breast tissue, such as fibroadenomas and fibrocystic changes [5].
Mammography is used for two purposes. These are screening mammography used for
evaluating the cancer risk of asymptomatic women and diagnostic mammography used for
diagnosing breast cancer in symptomatic women [6].
Malignancy is a series of diseases that affect irregular changes and the rise of breast cells.
Breast cells are classified into two forms, cancerous cells, and non-cancerous cells. Cancer-
ous cells are malignant tumors, and they are divided into invasive cancer and in situ cancer.
Non-cancerous cells are benign or normal cells. After radiology experts evaluate mammog-
raphy images, they classify the condition of the breast with certain numbers to convey their
findings precisely and clearly. These numbers are classified with a general system called
BI-RADS in breast imaging reports [7]. The BI-RADS classification is shown in Table 1.1
Table 1.1. BI-RADS Classification List
Category Description
BI-RADS 0 Additional imaging evaluation and/or comparison to prior mammograms is needed.
BI-RADS 1 The breasts are entirely fatty.
BI-RADS 2 There are some findings, but they are benign.
BI-RADS 3 There is probably a benign abnormality, with less than 5% risk of malignancy.
BI-RADS 4 There is a suspicious abnormality, with 30-40% risk of malignancy.
BI-RADS 5 The abnormality is highly suggestive of malignancy, with a risk of over 90%.
BI-RADS 6 Biopsy-proven malignancy.
Mammographic density refers to the amount of glandular and fibrous tissue in the breast
relative to the amount of fat tissue. Radiologists often grade mammographic density on a
scale ranging from almost entirely fatty (lowest density) to extremely dense (highest density).
One of the most commonly used scales is the American College of Radiology Mammography
3
Reporting and Data System, which classifies mammographic density as A, B, C, or D. In some
cases, a numerical scale of 1-4 may be used instead [8].
• Type A (almost entirely fatty, lowest density): Approximately 10% of women have
predominantly fatty, very low-density breasts.
• Type B (scattered fibroglandular density): Roughly 40% of women fall into this low-
density category.
• Type C (heterogeneously dense): About 40% of women have this mammographic den-
sity, which is considered dense and can obscure small cancers.
• Type D (extremely dense): Approximately 10% of women have extremely dense breasts,
which can decrease the sensitivity of mammography.
Dense breast tissue makes it more difficult for radiologists to detect cancer on mammo-
grams. Dense (fibrous and glandular) breast tissue appears white on a mammogram. Breast
masses and cancers can also appear white, making them harder to see. In contrast, fat tissue
appears almost black on a mammogram. Therefore, if most of the breast is fatty tissue, it is
easier to see a white tumor [9].
A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is par-
ticularly well-suited for image recognition and processing tasks. It is made up of multiple
layers, including convolutional layers, pooling layers, and fully connected layers.
The convolutional layers are the key component of a CNN, where filters are applied to
the input image to extract features such as edges, textures, and shapes. The output of the
convolutional layers is then passed through pooling layers, which are used to down-sample
the feature maps, reducing the spatial dimensions while retaining the most important infor-
mation. The output of the pooling layers is then passed through one or more fully connected
layers, which are used to make a prediction or classify the image.
CNNs are trained using a large dataset of labeled images, where the network learns to rec-
ognize patterns and features that are associated with specific objects or classes. Once trained,
a CNN can be used to classify new images, or extract features for use in other applications
such as object detection or image segmentation.[10]
4
1.5. Vision Transformer (ViT)
Vision transformers have extensive applications in popular image recognition tasks such
as object detection, image segmentation, image classification, and action recognition. More-
over, ViTs are applied in generative modeling and multi-model tasks, including visual ground-
ing, visual-question answering, and visual reasoning.
In ViTs, images are represented as sequences, and class labels for the image are predicted,
which allows models to learn image structure independently. Input images are treated as a
sequence of patches where every patch is flattened into a single vector by concatenating the
channels of all pixels in a patch and then linearly projecting it to the desired input dimen-
sion.[11] The total architecture is called Vision Transformer and shown in Figure 1.3.
(vi) Pre-train the model with image labels (fully supervised on a huge dataset)
5
This steps shows how ViT works.[12]
(i) The objective function is defined and it is desired to minimize this function.
(ii) Parameters that will minimize this objective function are determined and the values of
these parameters are randomly assigned.
(iii) The objective function is evaluated using these parameters, and a ”total error” value is
calculated based on these evaluations.
(iv) The parameters are changed to minimize this ”total error” value, and this step is re-
peated.
(v) These steps are repeated several times until the parameters that will minimize the ob-
jective function are identified.
6
The COOT algorithm can be used in various design problems and generally provides fast
results.
Transfer learning is a technique in machine learning where a model trained on one task is
used as the starting point for a model on a second task. This can be useful when the second
task is similar to the first task, or when there is limited data available for the second task.
By using the learned features from the first task as a starting point, the model can learn more
quickly and effectively on the second task. This can also help to prevent overfitting, as the
model will have already learned general features that are likely to be useful in the second
task.[14]
1.8. Terms
• Pretrained neural network refers to first training a model on one task or dataset. Then
using the parameters or model from this training to train another model on a different
task or dataset.[16]
• Finetune is a process of adjusting the neural network weights to better fit the training
data. This can be done by increasing or decreasing the learning rate, or by changing
the network architecture. Fine tuning is often used to improve the performance of a
neural network on a specific task or dataset.[17]
• Hyperparameters are the parameters which were determined before training process.
• Overfitting occurs when a machine learning model is too complex and learns too much
from the training data. In other words, the model fits the training data too well, but
fails to generalize to new, unseen data. This results in poor performance and inaccurate
predictions.[18]
• Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn, gradu-
ally improving its accuracy.[19]
7
1.9. Motivation
1.9.1. Accessibility
This project aims to provide a solution for radiologists who regularly examine mammo-
grams and seek ways to alleviate their workload, thereby addressing an important aspect of
accessibility in the field of breast cancer diagnosis. By developing an efficient and accurate
classification system for BI-RADS classes and mammography density, we strive to empower
radiologists with a reliable tool that can assist in their decision-making process and enhance
their efficiency, ultimately contributing to improved accessibility to timely and accurate di-
agnoses.
The Vision Transformer model and CNN models can be used in any computer vision
problem, including object detection, classification, feature extraction, and segmentation.
• Limited availability of high-quality labeled data for training and validation. This can
lead to overfitting, where the model performs well on the training data but poorly on
new, unseen data.
• Class imbalance where one or more classes are underrepresented in the dataset. This
can lead to biased predictions and poor performance on the minority classes.
• Preprocessing the mammography images to extract the regions of interest and remove
artifacts and noise is critical for accurate classification. However, preprocessing can
be time-consuming and can introduce artifacts and noise that affect the classification
performance.
• The Vision Transformer model and CNNs can be highly complex, with many layers
8
and parameters. This can make training and optimization challenging and require large
amounts of computational resources.spectrum.
The aim of this project is to address the classification and density detection of breast
cancer, which is a prevalent and life-threatening cancer. The primary objective is to reduce the
workload of radiologists in analyzing mammography images. To achieve this goal, the project
utilizes advanced computer vision techniques, including the Vision Transformer model and
a CNN model that is optimized using metahueristic optimization algorithms, specifically the
COOT algorithm. Transfer learning is used to enhance the performance of the models. The
proposed solution is expected to contribute to the early detection of breast cancer and improve
patient outcomes.
1.12. Requirements
This project developed using Google Colab, Python, Preprocessing and Deep learning
libraries. In this case there will be only 3 requirements. A computer, internet connection and
access to the google colab.
9
2. BACKGROUND
Dr. G. Rasitha Banu, Fathima N Sakeena, Mrs. Mumtaj, and Mr. Agha Sheraz Hanif
(2018)[20] published a study to detect the stage of breast cancer from digital mammographic
images based on pixel size. The MIAS (Mammogram Image Analysis Society) database was
used as the dataset. This study was evaluated through preprocessing, region of interest (ROI)
segmentation, feature extraction and selection, and classification stages. For preprocessing,
a Gaussian filter and adaptive histogram equalization (AHE) were used to remove noise. In
the segmentation stage, ROI was used to divide areas corresponding to different objects to
enhance the tumor area in mammographic images. In the feature extraction and selection
stage, the pixel size area was calculated to identify various stages of breast cancer using
ROI. In this study, 50 malignant mammogram images were used from the MIAS database,
including 16 malignant mammogram images from the dense glandular group, 16 from the
fatty group, and 18 from the fatty glandular group. Additionally, the pixel area was calculated
using ROI. The value of the pixel depends on the stage of the detected cancer. In the Random
Forest Classifier, there were 22 correctly identified samples and 28 misidentified samples.
The performance analysis of the classifiers showed that J48 had an accuracy of 96.66%,
which was better than Random Forest Algorithms.
2.2. Deep Learning RN-BCNN Model for Breast Cancer BI-RADS Classification
Shahbaz Siddeeq, vd. (2021) [21] published a study in which they performed BI-RADS
classification using the INbreast dataset. After dividing the dataset into training-validation-
test sets, data augmentation was applied to the training set using a combination of random
rotation and zoom called elastic deformation to achieve better performance. Then, a custom
10
ResNet-based neural network (RN-BCNN) was trained. The proposed model was trained in
four different ways: random dataset, homogeneous dataset, augmented dataset, and unaug-
mented dataset. The augmented and randomly generated dataset showed the best perfor-
mance with 85.9%. Better results were obtained compared to similar studies previously pub-
lished.
Rebecca Sawyer Lee, Jared A. Dunnmon, Ann He, Siyi Tang, Christopher Re, and Daniel
L. Rubin (2021)[22] performed four high-performing methods trained and evaluated using
standard partitions on the CBIS-DDSM (Curated Breast Imaging Subset DDSM) mass clas-
sification dataset. These methods include the Bag-of-Visual Words (BoVW) approach and a
standard Convolutional Neural Network (CNN) based approach in the field of deep learning.
BoVW subjected a region-of-interest (ROI) bounded patch to a filtering process and then
calculated a basic feature set over the entire ROI using SIFT (Scale-Invariant Feature Trans-
form) via a bounding box. Forward image classification relies on a feature histogram based
on the number of assigned image patches by the unified clustering approach. Finally, a logis-
tic regression classifier regularized by L1, also known as LASSO (Least Absolute Shrinkage
and Selection Operator), was trained. The CBIS-DDSM dataset was used as the dataset in
this paper. Each method was trained and tested using the training and test partitions provided
in the dataset, which contained 691 training samples (355 benign, 336 malignant) and 200
test samples (117 benign, 83 malignant). According to the results obtained, CNN showed
better performance than all other approaches.
2.4. An integrated framework for breast mass classification and diagnosis using stacked
ensemble of residual neural networks
Asma Baccouche, Begonya Garcia-Zapirain and Adel S. Elmaghraby (2022) [23] pro-
poses a stack model for the classification and diagnosis of breast masses. This stack model
consists of the basic model of ResNet architecture and its modifications. Three different
ResNet models were suggested for the classification of breast masses in this study. To create
this stack model, the last fully connected layer of each ResNetV2 architecture is removed and
considered as a two-layer meta-classifier model that combines the layers of the three mod-
els. Three different fully connected layers with sizes of 1000, 100, and 10 were combined
with Sigmoid and ReLU activation functions in this meta-classifier model. This stack model
showed good performance in the classification of breast masses.
11
2.5. Designing a grey wolf optimization based hyper-parameter optimized convolutional
neural network classifier for skin cancer detection
A method for detecting skin cancer has been proposed by Rasmiranjan Mohakud and Ra-
jashree Dash (2021)[24]. This method is a CNN model that uses a skin image to determine
the likelihood of skin cancer. Grey Wolf Optimization (GWO) was used to optimize the hy-
perparameters of the CNN model. The CNN model aims to automatically select the features
of the skin image and use these features to predict the likelihood of skin cancer. GWO was
used to optimize the hyperparameters of the CNN model and was compared with other meth-
ods. The results showed that GWO outperformed other methods. This study demonstrated
that GWO can be an effective method. The proposed model’s effectiveness was validated
by comparing it with other nature-inspired techniques, such as Particle Swarm Optimization
(PSO) and Genetic Algorithm (GA), on the multi-class ISIC (International Standard Indus-
trial Classification) skin lesion dataset. 80% of the images were used for training, and 20%
were used for testing. The weighted model consisted of 3 convolutional layers, 3 relu layers,
3 dropout layers, 3 max-pooling layers, 1 flatten layer, and 2 dense layers. A comparative
study was conducted using three different artificial neural network models for the skin cancer
classification problem. These models included PSO, GA, and GWO. The ISIC skin lesion
dataset was used to compare the performance of these models. The results showed that the
GWO-based model performed better than the other two models. The GWO-based model
had an accuracy value of 98.33% and obtained a lower loss value compared to the other two
models. The results of this study showed that the GWO-based CNN model optimized with
automatic hyperparameters can perform well in solving the skin cancer classification problem
under different environmental conditions.
12
3. ANALYSIS
The dataset to be used throughout the project consists of mammogram images prepared
for the TEKNOFEST 2023 Artificial Intelligence Competition in Healthcare, using data pro-
vided by the Turkish Ministry of Health - General Directorate of Health Information Systems
to TUSEB (Scientific and Technological Research Council of Turkey). The total number of
usable images is 15.907, and the total number of patients is 3.978. The training dataset in-
cludes BI-RADS classes as BI-RADS0, BI-RADS1-2, BI-RADS4-5. It contains information
about breast composition as ’A, B, C, D’ [25].
Upon examination of the dataset, it is found that there are four files in DICOM format
for each patient. These are named ’RCC’, ’RMLO’, ’LCC’, ’LMLO’.
The letter ’R’ indicates the right breast, and the letter ’L’ indicates the left breast. Figure 3.1
shows the inside-to-outside view, and Figure 3.2 shows the outside-to-inside view.
The file content for each patient containing mammographic images in the dataset is as
shown in Figure 3.3.
The files in the dataset are as shown in Figure 3.4. Each patient has been assigned a
number here. The assigned numbers are used here. When using data labels, each patient
is labeled according to their number. The columns from left to right include ”HastaNo”
(Patient Number), ”BIRADS CATEGORY” and ”BREAST COMPOSITION”. We will use
the BIRADS CATEGORY for our classification task. A portion of the dataset is shown in
Figure 3.5.
Class balance of BI-RADS classes and breast composition classes are shown in Figure
3.6 and Figure 3.7
13
Figure 3.1. Mediolateral Oblique View Figure 3.2. Craniocaudal View of the
of the Left Breast Left Breast
Every patient has 4 mammography images as LCC, LMLO, RCC, RMLO in their own
folder but the every image does not have own label in excel and it is not splitted to train,
validation and test. So by creating new rows for every patient and in order to hold the in-
14
Figure 3.5. Dataset Labels in an Excel Table
Figure 3.6. Class Distribution of the Figure 3.7. Class Distribution of the
BI-RADS Category Breast Composition Category
formation more accessible preprocess the excel file and splitting the labels and information
according to train, validation and test is a must. As the Figure 3.8 shows final form of the
excel, Algorithm 1,2 and 3 shows that the pseudo code of the preprocess steps.
Preprocessing plays a vital role in enhancing the quality and usability of medical images.
As part of our pipeline, we will employ various preprocessing techniques to prepare the in-
put data for further analysis. First we used NOT bitwise operator to get rid of unnecessary
background noice and make visible the breast tissue. After that we used horizontal flipping
image to get LCC and LMLO look alike RCC and RMLO in order to eliminate the differences
between training images.
3.1.2.1. NOT Bitwise Operator. Using bitwise operators on image is very common prepro-
cess steps in literature. Using not bitwise operator is help us for extraction of essential part
15
Algorithm 1 Function to create rows
Function create_rows_1(save_name_array, df , index):
rows ← [] for each name in save_name_array do
attributes ← split name by ”.” append to rows a dictionary with the following
keys and values: ”ID” ← split attributes[0] by ”_”[0] ”HASTANO” ← split
attributes[0] by ”_”[1] ”YÖN” ← split attributes[0] by ”_”[2] ”YÖNTEM”
← split attributes[0] by ”_”[3] ”FİLTRE” ← split attributes[0] by ”_”[4] ”BI-
RADS KATEGORİSİ” ← get value from df at index and ”BIRADS KATEGORİSİ”
column ”MEME KOMPOZİSYONU” ← get value from df at index and ”MEME
KOMPOZİSYONU” column
end
return rows
16
Figure 3.8. Preprocessed Excel File
of the image. We read all the images one by one and used bitwise not operator by using a
preprocess library.
3.1.2.2. Horizontal Flipping. In order to flip horizontally all the LMLO and LCC images,
after reading images one by one and checking the corresponding id on dataset Excel, we
checked if the image was LCC or LMLO. After that, all the LCC and LMLO images flipped
horizontally with using a preprocess library.
17
Figure 3.10. LCC Image After Bitwise NOT Figure 3.11. LCC Image After Horizontal
Operator was Used Flip was Used
The mini-MIAS dataset is a subset of the Mammographic Image Analysis Society (MIAS)
database, which contains digital mammograms of 161 women. The mini-MIAS dataset con-
tains some images, each of which is a pair of left and right mammograms of the same woman.
The images are 1024 pixels by 1024 pixels in size and have been centered in the matrix. The
total image count is 322. The labels for the images are provided in a text file and there is 7
columns in that text file. Columns information are given below and the example of text file
of labels are shown in Table 3.1.
CALC = Calcification
18
Table 3.1. Mini-MIAS dataset first 4 images label format
REFNUM BG CLASS SEVERITY X Y Radius
mdb001 G CIRC B 535 425 197
mdb002 G CIRC B 522 280 69
mdb003 D NORM
mdb004 D NORM
ASYM = Asymmetry
NORM = Normal
7th column: Approximate radius (in pixels) of a circle enclosing the abnormality.[26]
In this project, we are going to classify 2nd column which is character of background
tissue.
After reading the labels text file as csv we did a label encoding process to every column
2, 3 and 4. While doing this we splitted the dataframe as train, validation and test after using
shuffle process on dataframe. Finally we had 3 labels excel files for train, validation and test.
You can see the psuedocode on algorithm 4.
19
Figure 3.12. Gaussian Blur applied image Figure 3.13. Rotation applied image
The mini-MIAS dataset consists of only 322 images, which is relatively small for an im-
age classification task. To enhance the learning capabilities of our model, we employed data
augmentation techniques, specifically rotation and Gaussian blur. During the preprocessing
phase, we iterated through each image in the dataset while simultaneously processing the
corresponding label text file. Within this loop, a random number was generated, either 0 or
1. When 0 was generated, we applied Gaussian blur(Figure 3.12) to introduce noise to the
image. Conversely, if 1 was generated, we generated another random number, again either
0 or 1. In the case of 0 being generated, we performed a 10 degree rotation on the image,
while a -10 degree rotation was applied(Figure 3.13) when 1 was generated. The purpose of
employing these random filters was to augment the dataset, ensuring that each image had one
augmented counterpart. Algorithm 4
20
Algorithm 4 Split dataset into train, val, and test sets
Function SplitDataset(df, DATASET_PATH, SAVE_PATH):
Create a list of random numbers from 0 to the length of the df DataFrame Shuffle the
list of random numbers Create three empty DataFrames: train_df, val_df, and test_df
foreach index in random_numbers do
Open the image file at DATASET_PATH + df.loc[index, ”REFNUM”] + ”.pgm”
Convert the image to a NumPy array if the random number is 0 then
Apply a Gaussian blur to the image
end
else if the random number is 1 then
Generate a random number from 0 to 1 if the random number is 0 then
Rotate the image to the left
end
else
Rotate the image to the right
end
end
Create a row for the image and its metadata Append the row to the df DataFrame if
index < 0.8 × length of df then
Append the row to the train_df DataFrame Write the image to the file
SAVE_PATH + ”train/” + df.loc[index, ”REFNUM”] + ”.png”
end
else if index < 0.9 × length of df then
Append the row to the val_df DataFrame Write the image to the file SAVE_PATH
+ ”val/” + df.loc[index, ”REFNUM”] + ”.png”
end
else
Append the row to the test_df DataFrame Write the image to the file SAVE_PATH
+ ”test/” + df.loc[index, ”REFNUM”] + ”.png”
end
end
Write the df DataFrame to the file SAVE_PATH + ”info.xlsx” Create a LabelEncoder
object for each of the BG, CLASS, and SEVERITY columns Fit the LabelEncoder objects
to the df DataFrame Transform the train_df, val_df, and test_df DataFrames using the
LabelEncoder objects Write the train_df, val_df, and test_df DataFrames to the files
SAVE_PATH + ”train.xlsx”, SAVE_PATH + ”val.xlsx”, and SAVE_PATH + ”test.xlsx”,
respectively
21
3.3. Vision Transformer Model
In order to effectively handle training dataset, we devised a strategy tailored to the re-
source limitations of Colab. To overcome potential memory constraints, we implemented
a custom dataset class, derived from an existing class, which enabled us to efficiently ma-
nipulate the dataset. This involved implementing a method to retrieve item and label from
the dataset at a specific index, allowing us to load and preprocess the images in a memory-
friendly manner.
Next, we developed a Vision Transformer (ViT) class, inheriting from another class, to or-
chestrate the training, validation, and testing stages. Within this class, we defined evaluation
metrics to assess the model’s performance across epochs and in the final outcome.
To implement the ViT model itself, we integrated the state-of-the-art ViT model into
a ”ViT” object using our custom class. Additionally, we incorporated techniques to prevent
overfitting and leverage the best weights. We employed early stopping and model checkpoint
objects, which facilitated the training process. If the validation loss did not decrease for seven
steps, training was stopped, and the best weights were saved for subsequent testing of the
model.
For training the ViT model, we utilized a ”Trainer” object that streamlined the training
process. By providing the training dataset and validation dataset to the trainer, we iteratively
updated the model’s parameters. Finally, we evaluated the model’s performance using the test
dataset, enabling a comprehensive assessment of its generalization capabilities. Throughout
this entire process, we saved the model for future use with our proposed model.
Finally, to evaluate our model, we plotted the train loss, train accuracy, validation loss and
validation accuracy graphs to obtain more accurate feed-backs if the model learned properly.
22
3.4. Convolutional Neural Network (CNN)
The use of Convolutional Neural Network (CNN) models in mammography tasks has
become increasingly popular in recent years. A CNN is a type of deep learning model that is
particularly effective in analyzing visual data, such as images. It consists of multiple layers
that perform various operations on the input data to extract meaningful features and make
accurate predictions.
The convolution layer is responsible for applying filters or kernels to the input data. These
filters are small matrices that are convolved with the input image, performing element-wise
multiplication and summation operations. The purpose of this convolution operation is to
detect important features or patterns in the data, such as edges, textures, or shapes. Each
filter in the convolution layer specializes in detecting a specific feature, and by applying
multiple filters, the layer can capture a wide range of features simultaneously.
Following the convolution layer, we have the max pooling layer. Its role is to downsample
the feature maps generated by the convolution layer. By reducing the spatial dimensions
of the feature maps, the pooling layer retains the most salient information while discarding
unnecessary details. The most common pooling technique is max pooling, where the input
is divided into non-overlapping regions, and only the maximum value within each region is
retained. This process helps to make the network more robust to small spatial translations
and reduces the number of parameters, making the model more computationally efficient.
In the presented CNN architecture, three key layers, each consisting of a convolution
layer followed by a max pooling layer, have been integrated. This arrangement allows the
network to progressively learn and extract higher-level features from the input data. The
output of these layers is then fed into a classifier for making predictions. Our architecure can
seen in the Figure 3.14
To develop the CNN model, the training data was used to train the network, while the
validation data was employed to evaluate its performance. The model weights were saved
23
after training, which can be utilized for testing on new, unseen data. Before using the test
data, the train loss, train accuracy, validation loss, and validation accuracy were plotted on
graphs. These visualizations provide valuable insights into how well the model has learned
from the training data and whether it is generalizing effectively to unseen data.
In this specific implementation, the COOT algorithm is used to optimize a CNN model
by finding the best parameters for each layer. The CNN model consists of three convolutional
layers followed by a flatten layer and a fully connected layer. The parameters to be optimized
include the number of filters (nk), kernel size (ks), pooling size (ps), and dropout rate (dr) for
each convolutional layer.
24
We defined a main function as the fitness function that represents the CNN model. It takes
a parameter vector ‘x‘ as input and builds the CNN model using the provided parameters. The
model is then trained using the ‘fit‘ function with the specified training and validation data
generators.
During the training process, the validation loss, loss, and accuracy values are recorded
and stored in a DataFrame called ‘df‘. The ‘df‘ DataFrame is then saved to a CSV file for
further analysis.
The COOT algorithm is initialized with a population size of 5 (‘pop‘) and a maximum
number of iterations of 10 (‘MaxIter‘). The dimension of the solution space is 12 (‘dim‘),
corresponding to the 12 parameters to be optimized. The lower and upper bounds of the
parameter search interval are specified using the ‘lb‘ and ‘ub‘ arrays.
The ‘COOT‘ function implements the main optimization loop of the COOT algorithm.
It initializes the leader positions and fitness values and performs the iterative optimization
process. The algorithm updates the positions of the coots and leaders based on the defined
equations and boundary checks. The convergence curve and the best solution found are stored
and returned.
Overall, this code implements the COOT algorithm (Figure 3.15) to optimize a CNN
model by finding the best parameters for each layer. The algorithm iteratively updates the
solutions to improve the model’s performance, and the convergence curve is recorded for
analysis.
25
3.6. Vision Transformer COOT Optimization on Classifier Layer
To implement our model, we defined a class called CustomModel. This module inte-
grated the Vision Transformer model, the ANN classifier, and various activation functions.
We used the Vision Transformer model that we trained before as the base for our Vision
Transformer for the first method. For the second method we used not trained version of the
Vision Transformer model.
26
For training and evaluation, we utilized train and validation datasets. During training,
we employed an early stopping callback to monitor the validation loss and halt training if
overfitting occurred. We recorded the training and validation loss as well as the accuracy for
each epoch.
To evaluate the performance of our model, we calculated the validation loss and accuracy
after each epoch. We also saved the best-performing model based on the validation loss.
We used 4 parameters for COOT optimization to optimize dense parameters and dropout
rates. Additionally we used 10 maximum iteration and population of 5 coots. The algorithm
of COOT can be seen at Figure 3.15.
Finally, we analyzed the results by comparing the performance of different model con-
figurations and selecting the configuration with the best validation accuracy as the optimal
model architecture.
27
Figure 3.16. The State Diagram of the Figure 3.17. The State Diagram of the MIAS
TEKNOFEST Pipeline Pipeline
3.7. Pipelines
In order to accommodate the differences between the two datasets, we have developed
two separate pipelines. Each dataset has its own unique preprocessing steps and utilizes dif-
ferent models. The state diagrams for these pipelines, based on the TEKNOFEST Mammog-
raphy dataset and the mini MIAS dataset, are depicted in Figures 3.16 and 3.17 respectively.
These diagrams provide a visual representation of the various states involved in the pipelines’
workflow, showcasing the distinct processes employed for each dataset.
28
4. DESIGN AND IMPLEMENTATION
The implementation of this project was done in Python, utilizing the Google Colab en-
vironment. Python was chosen as the programming language due to its extensive library
support for preprocessing, deep learning, and machine learning tasks. The most frequently
used libraries in this project include sklearn, keras, pytorch, pandas, numpy, and matplotlib.
Python was the preferred programming language for this project due to its robust library
ecosystem, which offers a wide range of tools and functionalities for tasks such as data pre-
processing, deep learning, and machine learning. The Google Colab environment was uti-
lized for its convenience and accessibility, providing a web-based platform for coding and
executing Python scripts.
Among the various libraries used in this implementation, sklearn (scikit-learn) is a popu-
lar machine learning library in Python, providing efficient tools for data preprocessing, fea-
ture extraction, and model evaluation. Keras is a high-level neural networks library, known
for its user-friendly interface and compatibility with different deep learning frameworks.
PyTorch, another widely used library, offers a dynamic neural network framework, en-
abling easy implementation of complex neural architectures and facilitating efficient training
and inference processes. Pandas is a versatile library for data manipulation and analysis,
providing convenient data structures and functions for handling structured data.
Numpy, a fundamental library for scientific computing in Python, offers support for large,
multi-dimensional arrays and a wide range of mathematical operations. Lastly, Matplotlib is
a plotting library that enables the creation of visualizations and graphs to effectively commu-
nicate data insights.
By leveraging these libraries, the project benefits from the extensive functionality and
convenience they provide, allowing for efficient implementation of preprocessing, deep learn-
ing, and machine learning tasks in Python.
The implemented code is responsible for conducting various operations on images and
generating new datasets for training, validation, and testing purposes and augmenting the
29
dataset. In order to achieve this, the code makes use of several libraries including pandas,
numpy, random, and cv2 (OpenCV). Additionally, it utilizes functions from the PIL (Python
Imaging Library) and sklearn.preprocessing modules.
To begin with, the code imports the necessary libraries and initializes three empty dataframes
that will be used to store the training, validation, and testing data. Furthermore, three func-
tions are defined: ‘gaussianBlur‘, which applies Gaussian blur to an input image, and ‘ro-
tateLeft‘ and ‘rotateRight‘, which rotate an image to the left and right respectively.
Another important function in the code is ‘createRow‘, which takes an index and a dataframe
(‘df‘) as input and generates a new row using specific values extracted from the dataframe.
This function is crucial as it allows for the creation of new data rows for each processed
image.
The code proceeds by initializing an array called ‘randArr‘ that contains random indices.
This array is used to shuffle the order in which the images are processed. For each image,
the code performs the following steps:
2. The corresponding image file is opened based on the randomly selected index, and it
is converted into a numpy array.
3. Depending on the random number, either Gaussian blur or rotation is applied to the
image, resulting in a new image.
4. The ‘createRow‘ function is called to generate a new data row for the image.
5. The new row is concatenated with the original dataframe (‘df‘), thereby updating it
with the newly generated data.
6. Based on the current iteration index, the data is divided into the appropriate dataframes
for training, validation, and testing. The transformed images are also saved in their respective
folders.
30
7. Finally, the updated dataframe (‘df‘) is saved to an Excel file named ”info.xlsx” at
the specified save path. Additionally, the labels for the train, validation, and test datasets are
separately saved to Excel files.
It is important to note that the OpenCV library’s functions were utilized for applying the
Gaussian blur and rotation augmentation methods. The remaining parts of the algorithm were
implemented by the us.
We used OpenCV library’s function for applying NOT bitwise operator for all of the
images by reading it one by one and saving the images to same directory using a for loop and
iterating over directory names using ’os’ library.
After that we implemented the code for flipping the LCC and LMLO images and creating
dataset for splitting the images and labels for train, validation and test.
2. ‘flipImage(image)‘: This function flips the input image horizontally using the ‘np.fliplr‘
function and returns the flipped image.
4. ‘saveImages(saveArray, dir)‘: This function saves the images in the ‘saveArray‘ list
to the specified directory (‘dir‘) using the ‘cv2.imwrite‘ function.
5. ‘createRows1(saveNameArray, df, index)‘: This function creates rows of data for the
first dataframe (‘newDf1‘) based on the ‘saveNameArray‘, the original dataframe (‘df‘), and
the current ‘index‘ value. It extracts attributes from the save names and combines them with
specific values from the original dataframe, forming a dictionary for each row. The rows are
then collected into a list and returned.
31
6. ‘appendToDf(newDf, rows)‘: This function appends the rows of data in the ‘rows‘
list to the provided dataframe (‘newDf‘). It iterates over the rows and uses ‘pd.concat‘ to
concatenate each row as a new DataFrame to the existing ‘newDf‘. The updated ‘newDf‘ is
returned.
The remaining part of the code initializes several dataframes (‘newDf1‘, ‘newDf2‘, ‘trainDf1‘,
‘valDf1‘, ‘testDf1‘, ‘trainDf2‘, ‘valDf2‘, ‘testDf2‘) and other necessary variables. Then, it
enters a loop that iterates over the indices of the original dataframe (‘df‘).
Within each iteration, the code performs the following steps: - Reads the RCC, RMLO,
LCC, and LMLO images using the ‘readImages‘ function. - Checks if the image shapes are
valid (1080x1080x3). If not, it skips to the next iteration. - Flips the LCC and LMLO images
using the ‘flipImage‘ function. - Generates saving names for the images using the ‘create-
SavingNames‘ function. - Creates rows of data for ‘newDf1‘ using the ‘createRows1‘ func-
tion. - Appends the rows to the corresponding dataframes (‘newDf1‘, ‘trainDf1‘, ‘valDf1‘,
‘testDf1‘) using the ‘appendToDf‘ function. - Saves the images to the appropriate folders
using the ‘saveImages‘ function.
Finally, the updated dataframes (‘newDf1‘, ‘trainDf1‘, ‘valDf1‘, ‘testDf1‘) are saved to
Excel files, and the program finishes.
This code implementation demonstrates the utilization of various functions and dataframes
to process and organize the image data, create new rows of data, and save the transformed
images and data to Excel files.
Only for flipping image we used OpenCV’s built-in function. The rest of the code was
implemented by us.
The developed code begins by importing the required libraries, including pandas, os, PIL,
torch, and various modules from torchvision, torch.nn, and other packages. These libraries
are necessary for data handling, model creation, optimization, and evaluation.
Next, a custom dataset class (‘CustomDataset‘) is defined, which inherits from the ‘torch.utils.data.Dat
class. This class takes in the root directory of the dataset, the path to an Excel file containing
labels, and an optional transform parameter. In the ‘__getitem__‘ method, it retrieves the
32
image path and label corresponding to the given index from the Excel file. It then opens the
image using PIL, converts it to grayscale, and applies the specified transform. Finally, the
transformed image and label are returned.
Instances of the ‘CustomDataset‘ class are created for the training, validation, and test
sets, specifying the root directory and Excel file paths for each set. These datasets are then
passed to ‘DataLoader‘ objects, which enable efficient loading of the data in batches during
training and evaluation.
The loss function (‘nn.CrossEntropyLoss‘) and optimizer (‘optim.Adam‘) are defined for
training the model.
An instance of ‘ViTClassifier‘ is created, along with the necessary callbacks for early
stopping and model checkpointing. A PyTorch Lightning trainer (‘pl.Trainer‘) is initialized
with the desired settings, including the maximum number of epochs, the logger for Tensor-
Board logging, and the callbacks. The ‘fit‘ method is called to train the model on the provided
dataloaders.
After training, the best model is loaded using the ‘ModelCheckpoint‘ callback. The ‘test‘
33
method is then called to evaluate the model on the test dataloader. The model is put into eval-
uation mode (‘vit_classifier.eval()‘) and iterates over the test dataloader to collect predicted
labels and true labels. The F1 score is calculated using ‘sklearn.metrics.f1_score‘, and the
confusion matrix is computed using ‘sklearn.metrics.confusion_matrix‘.
In summary, this code implements a pipeline for training, validating, and testing a Vi-
sion Transformer model for image classification. It demonstrates the use of custom datasets,
dataloaders, and PyTorch Lightning for efficient training and evaluation of deep learning
models.
The ViTClassifier Lightning module extends the pl.LightningModule class and overrides
several key methods:
forward: This method defines the forward pass of the model, where the input data is
passed through the model layers to generate the output logits.
training_step: This method is called during the training loop for each batch of data. It
calculates the model’s logits, computes the loss based on the provided criterion, and logs the
training loss and accuracy.
validation_step: Similar to training_step, this method is called during the validation loop
to calculate the loss and accuracy on the validation set. test_step: This method is called during
the testing loop to calculate the loss and accuracy on the test set. Additionally, it collects the
predicted labels and true labels for further evaluation.
configure_optimizers: This method is responsible for defining the optimizer for the model.
34
It returns an instance of the optimizer, which is used during training.
By implementing these methods within the Lightning module, the training, validation,
and testing steps are clearly defined and separated, making the code more modular and main-
tainable. Additionally, PyTorch Lightning provides many other features and utilities, such
as automatic checkpointing, early stopping, and distributed training support, which further
simplify the training process.
The developed code consists of an implementation for an image classification task using
a convolutional neural network (CNN). Here is an explanation of the code and its functions:
First, the necessary libraries and modules are imported, including ‘pandas‘, ‘numpy‘,
‘tensorflow‘, and ‘keras‘. These libraries provide tools for data handling, image preprocess-
ing, and building neural networks.
Next, the code loads the labels for the training, testing, and validation datasets from Ex-
cel files using the ‘pd.read_excel‘ function. The file paths for the image datasets are also
specified.
The code then sets up the image data generators using ‘ImageDataGenerator‘ from Keras.
These generators perform data augmentation and rescaling on the images. Separate genera-
tors are created for the training, validation, and testing datasets.
The number of training and validation samples, as well as other hyperparameters like the
number of epochs and batch size, are specified.
Next, the code creates a CNN model using the ‘Sequential‘ class from Keras. The model
consists of several convolutional layers, max-pooling layers, a flatten layer, and fully con-
nected layers. The model is compiled with the categorical cross-entropy loss function and
the SGD optimizer.
The model is then trained using the ‘fit‘ function. The training data is provided through
the ‘train_generator‘, and the validation data is provided through the ‘val_generator‘. The
training process is logged, and the training and validation loss and accuracy metrics are
recorded.
35
After training, the model is evaluated on the test dataset using the ‘evaluate‘ function,
and the results are stored in the ‘score‘ variable. The validation and training loss values are
extracted from the training history.
The trained model is saved to a file using the ‘save_model‘ function, and its architecture
is visualized and saved as an image using the ‘plot_model‘ function.
Finally, the code plots the validation loss and training loss curves using ‘matplotlib‘.
It also loads the saved model, makes predictions on the test dataset, and generates a clas-
sification report and confusion matrix using ‘classification_report‘ and ‘confusion_matrix‘
functions from ‘sklearn.metrics‘.
Overall, this code demonstrates the process of training a CNN for image classification us-
ing Keras and TensorFlow libraries, including data loading, data augmentation, model build-
ing, training, evaluation, and result analysis.
1. Input Layer: - The input layer is defined using the ‘input_shape‘ parameter, which
specifies the shape of the input image tensors. - The input images are expected to have
a width and height of 224 pixels and 3 color channels (RGB), as defined by ‘img_width‘,
‘img_height‘, and ‘input_shape‘.
2. Convolutional Layers: - The CNN starts with a convolutional layer defined by ‘Conv2D‘
with 32 filters, a filter size of (3, 3), and the ReLU activation function. - This layer is fol-
lowed by a max-pooling layer defined by ‘MaxPooling2D‘ with a pool size of (2, 2). - The
process is repeated with another convolutional layer of 64 filters and a max-pooling layer. - A
third convolutional layer with 128 filters and a max-pooling layer is added for more complex
feature extraction.
3. Flatten Layer: - The output from the last convolutional layer is flattened into a 1-
dimensional vector using the ‘Flatten‘ layer. - This allows the data to be fed into a fully
connected layer.
4. Fully Connected Layers: - A fully connected layer with 64 units and the ReLU activa-
tion function is added using ‘Dense‘. - A dropout layer with a dropout rate of 0.5 is included
to prevent overfitting. - Finally, the output layer with 3 units (corresponding to the number
36
of classes) and the softmax activation function is added.
5. Model Compilation: - The model is compiled using the stochastic gradient descent
(SGD) optimizer with a learning rate of 0.001. - The categorical cross-entropy loss function
is used for multi-class classification. - The accuracy metric is specified to monitor the model’s
performance during training.
The CNN architecture implemented above follows a common pattern of alternating con-
volutional and pooling layers to extract hierarchical features from the input images. The fi-
nal fully connected layers are responsible for learning the classification decision boundaries
based on the extracted features.
The code developed implements the COOT algorithm and it is used for hyperparame-
ter optimization in convolutional neural networks (CNNs). In this developed code we used
the COOT algorithm for previously implemented CNN model. The implementation utilizes
various modules and functions from the Keras and TensorFlow libraries.
The code begins by importing the necessary libraries, including Keras, TensorFlow, NumPy,
and Pandas, to handle image data, neural network models, data preprocessing, and evaluation.
It also imports the required modules for image data generation and metrics calculation.
Next, the file paths for the training, validation, and testing datasets are specified. The
images are loaded using the ImageDataGenerator class from Keras, which performs data
augmentation techniques such as rescaling, shear range, zoom range, and horizontal flip to
enhance the dataset.
The labels for the datasets are read from Excel files using the Pandas library. The labels
are then converted to categorical format for multi-class classification by using ImageData-
Generator class which is provided by keras library.
The training, validation, and testing data generators are created using the flow_from_dataframe
function, which generates batches of augmented image data from the specified directories.
The generators provide the image data along with their corresponding labels for training,
validation, and testing.
37
Same CNN model was used for optimizing the hyperparameters. The CNN model ar-
chitecture is defined using the Sequential class from Keras. The model consists of three
convolutional layers with specified filter sizes, kernel sizes, activation functions, and pool-
ing sizes. Dropout layers are added to prevent overfitting. The flattened output is passed
through fully connected layers with a ReLU activation function. The output layer uses the
softmax activation function for multi-class classification.
The model is compiled with the categorical_crossentropy loss function and the SGD op-
timizer. The training process is initiated using the fit function, where the model is trained on
the training data generator for the specified number of epochs. Early stopping is applied to
monitor the validation loss and stop training if there is no improvement.
The model’s performance is evaluated using the evaluate function, which computes the
loss and accuracy on the validation dataset.
The COOT algorithm is then implemented using the defined functions. The algorithm
starts by initializing the Vulture population, including the leaders and coots, within the speci-
fied bounds. Fitness values are calculated for each vulture using the provided fitness function.
During the iterations of the COOT algorithm, leaders and coots exchange positions based
on certain conditions. The best fitness score and corresponding position are updated through-
out the iterations. The convergence curve, which tracks the best fitness score over iterations,
is also recorded.
Finally, the implementation includes a function for boundary checking to ensure that the
vultures’ positions remain within the defined bounds.
Overall, the implemented code combines the COOT algorithm with CNNs using Keras
to optimize hyperparameters for multi-class classification tasks. The code provides a frame-
work for automatically tuning the architecture of CNNs and evaluating their performance on
image datasets.
38
4.5.1. Implementing the COOT Optimization Algorithm
2. Fitness Evaluation: - The ‘evaluate_fitness‘ function calculates the fitness value for
each vulture in the population. - In your implementation, the fitness value is determined
by training and evaluating the neural network architecture associated with each vulture on a
validation dataset. - The fitness value is typically a performance metric such as classification
accuracy or loss. It indicates how well the neural network performs on the given task.
39
criterion is met, as specified by the ‘max_iterations‘ parameter. - The termination criterion
can also be based on the convergence of fitness values or a maximum computational budget.
7. Convergence Curve: - The ‘convergence_curve‘ list stores the best fitness value
achieved in each iteration. - This list is used to plot a convergence curve, which shows how
the fitness score improves over the iterations.
The algorithm of COOT was found in Internet. We implemented the code to become
suitable for python and this project.
The developed code implements a COOT optimization algorithm for image classification
using the Vision Transformer (ViT) model.
The code begins by importing the necessary libraries and dependencies, including pandas,
numpy, torch, and various modules from the Keras and TensorFlow libraries. It also mounts
the Google Drive to access the dataset files.
Next, the code defines a custom dataset class, ‘CustomDataset‘, which inherits from
the ‘torch.utils.data.Dataset‘ class. This class reads image and label data from Excel files
and transforms the images using specified transformations such as resizing, converting to
grayscale, and normalization.
The transformations are defined using the ‘transforms.Compose‘ function from torchvi-
sion. The dataset is then instantiated for the training, validation, and test sets using the ‘Cus-
tomDataset‘ class and ‘DataLoader‘ for efficient data loading.
Following the dataset setup, the code defines a ViTClassifier class, which is a PyTorch
Lightning module. This class encapsulates the ViT model, criterion (loss function), and op-
timizer. It defines methods for the training, validation, and testing steps, where the forward
40
pass is performed, and the loss and accuracy are calculated and logged.
Next, a custom model class, ‘CustomModel‘, is defined. This class extends ‘torch.nn.Module‘
and represents the customized ViT model for image classification. It loads a pre-trained ViT
model, freezes its parameters, and appends fully connected layers for classification. The
model’s forward method performs the necessary computations and returns the output proba-
bilities.
The code also includes an ‘EarlyStopping‘ class, which is a callback for early stopping
during training based on a specified monitoring metric and patience value.
A helper function named ‘fun‘ is defined, which takes a configuration vector as input and
performs the training and evaluation process for the ViT model using the specified configu-
ration. It returns the validation accuracy.
Next, several utility functions are defined, such as initializing the vulture population (‘ini-
tial‘ function), calculating fitness values for each vulture (‘CaculateFitness1‘ function), and
boundary checking to ensure the vultures’ positions are within the specified bounds (‘Bor-
derCheck1‘ function).
The code also includes a convergence curve to track the global best fitness value over
iterations. Finally, the ‘COOT‘ function saves the results to a CSV file for analysis and
prints the best configuration and validation accuracy.
In summary, the code combines the ViT model with the COOT Optimization Algorithm
for image classification. It defines the necessary classes, functions, and utilities to train and
evaluate the ViT model using different configurations optimized by the COOT Optimization
Algorithm. The results are saved for further analysis and reporting.
41
4.6.1. Implementing the Custom ViT Model
1. Pre-trained ViT Model: The saved Vision Transformer model has been loaded and the
classifier layer has been removed. After that we freezed all the layers for training and after
that we unfreezed the layers for second training in order to compare the results.
2. Flattening Layer: After obtaining the feature representation from the pre-trained ViT
model, the feature tensor is flattened into a 2D tensor using the ‘view‘ method. This trans-
formation converts the high-dimensional tensor into a single vector, which is then fed into
the fully connected layers.
3. Fully Connected Layers: Following the flattening operation, the flattened tensor is
passed through a series of fully connected layers (also known as dense layers or linear layers)
in a sequential manner. Each fully connected layer is represented by the ‘nn.Linear‘ class in
PyTorch.
- The fully connected layers enable the model to learn non-linear patterns and relation-
ships within the input data. Each layer performs a matrix multiplication between the input
tensor and a weight matrix, followed by the application of a non-linear activation function to
introduce non-linearity into the model.
- In the implemented code, two fully connected layers are used. The first layer is created
with ‘nn.Linear‘ and takes the flattened tensor as input with a specified number of input fea-
tures (determined by the size of the flattened tensor) and a chosen number of output features
(typically a hyperparameter that can be adjusted). The output of this layer is passed through
a rectified linear unit (ReLU) activation function using ‘nn.ReLU‘.
- The output of the first fully connected layer is then fed into the second fully connected
layer, which takes the number of input features from the previous layer and maps it to the
number of output features equal to the ‘num_classes‘ parameter. This layer is followed by a
softmax activation function, which produces a probability distribution over the classes.
- The number of fully connected layers and their sizes can be customized based on the
specific requirements of the classification task and the complexity of the dataset.
42
4. Output Layer: The final fully connected layer in the model outputs logits, which are
raw values that represent the predictions for each class. These logits can be interpreted as the
model’s confidence scores for each class. To obtain a probability distribution over the classes,
a softmax activation function is applied to the logits, which converts them into probabilities.
The class with the highest probability is typically considered as the predicted class by the
model.
By combining the pre-trained ViT model with additional fully connected layers, the ‘Cus-
tomModel‘ class extends the capabilities of the pre-trained model, allowing for fine-tuning
and adaptation to a specific image classification task. The fully connected layers enable the
model to learn task-specific features and decision boundaries, enhancing its performance on
the target task.
43
5. TEST AND RESULTS
In this section, we are going to compare results of each model which trained with same
dataset. The models which trained with MIAS dataset will be compared with different studies
by done with MIAS dataset.
We aimed to classify the density of each mammography image within the dataset. We
trained multiple models, as explained earlier, and obtained the following results for compar-
ison.
The Vision Transformer model without COOT optimization achieved an accuracy of 78%
on the test dataset, with an F1 score of 0.77. The training accuracy, training loss, validation
accuracy, and validation loss graphs are shown in Figures 5.1, 5.2, 5.3, and 5.4, respectively.
The confusion matrix is presented in Figure 5.5.
The model demonstrated good generalization to the data. However, due to the limited
availability of data, the results were obtained under data constraints. The training graphs
indicate that the model achieved the best possible performance without overfitting the data.
Figure 5.1. Train Accuracy Graph of ViT Figure 5.2. Train Loss Graph of ViT
44
Figure 5.3. Val Accuracy Graph of ViT Figure 5.4. Val Loss Graph of ViT
Next, we utilized the model weights to optimize the classifier layer constructed with the
COOT optimization algorithm, aiming to enhance performance. The optimization results
were saved to an Excel file, as shown in Figure 5.6. Based on the validation loss, validation
accuracy, train loss and train accuracy, we selected the best hyperparameters to train and test
the model. The hyperparameters can be found in the Excel file, on the last row. The accuracy
and F1 score obtained were 78.57% and 0.77, respectively. The training graphs are illustrated
in Figures 5.7 and 5.8. Confusion matrix can seen in Figure 5.9.
45
Figure 5.7. Accuracy Graphs of ViT with Figure 5.8. Loss Graphs of ViT with COOT
COOT Optimization Optimization
Comparing the results, we observed that the model achieved similar performance to the
one without COOT optimization. Several reasons could account for this outcome. Firstly,
the limited availability of data might have influenced the performance once again. When the
ViT model extracted features from the images and passed the data to the classifier optimized
with COOT, it is possible that the ViT model could not extract sufficient features to improve
the classifier’s performance. Secondly, the ViT model may not require optimization of the
classifier layer. We will further compare the results obtained from the model that does not
use the earlier obtained weights with this model.
For the second optimization process of the ViT model, we trained the ViT model with un-
frozen layers while optimizing the classifier layer. The Excel file containing the optimization
details can be seen in Figure 5.10. We employed the best hyperparameters, which are pro-
vided in the 7rd row of the Excel file (Figure 5.10). This time, our model performed better,
achieving an accuracy of 80% and an F1 score of 0.7982. The training graphs are displayed
in Figures 5.11 and 5.12. Confusion matrix can be seen in Figure 5.13.
46
Figure 5.10. Excel File of Optimization Iterations
Figure 5.11. Accuracy Graphs of ViT with Figure 5.12. Loss Graphs of ViT with COOT
COOT Optimization Optimization
The model demonstrated superior generalization compared to the other ViT models. Op-
timizing the classifier layer while training the ViT layers yielded improved performance,
as the ViT layers extracted features from images based on the classifier layer, resulting in
increased performance. Given the limitation of data availability, further improvements in
model performance could have been achieved. The ViT model’s classifier layer might be
insufficient for this dataset and the extracted features. With our optimization and three layers
of the classifier layer, we achieved good generalization of the dataset.
47
Figure 5.15. Confusion Matrix of CNN
Figure 5.14. Loss Graphs of CNN Model Model
Further, we trained a CNN model with the previously described architecture: a 4-layered
CNN with Max Pooling layers and 1 classifier layer. The performance of CNN models on
mammography images has been extensively studied in prior research (reference other re-
search papers).
Following the training process, the CNN model achieved a test accuracy of 45% and an
F1 score of 0.43. The training graphs and confusion matrix can be seen in Figures 5.14 and
5.15. Based on these results, we can conclude that the model would have better generalization
capabilities if we had a larger training dataset. Furthermore, the obtained results indicate that
the hyperparameters used were not optimal. When comparing the performance of the CNN
model, which did not utilize the COOT algorithm for optimization, with the non-optimized
ViT model, we can infer that the Vision Transformer model performed better under the current
circumstances.
For the final test on this dataset, we employed the COOT optimization algorithm to opti-
mize the hyperparameters of the CNN model we previously trained. We optimized all layers
in the model except for 1 Convolution layer, Max pooling layer, and the classifier layer. The
Excel file containing the details of the optimization process can be seen in Figure 5.16, along
with the training graphs shown in Figure 5.17. However, the optimized CNN model yielded
an accuracy of 51% and an F1 score of 0.52 on the test dataset. The confusion matrix obtained
from the test results is displayed in Figure 5.18. As can seen the CNN model generalized data
48
Figure 5.16. Excel File of Optimization Iterations
Figure 5.17. Loss Graphs of CNN with Figure 5.18. Confusion Matrix of CNN with
COOT COOT
Upon comparing the results, we observed that the optimization of the CNN model out-
performed the non-optimized version. However, it is noteworthy that all of our trained ViT
models performed better than our optimized CNN model. Therefore, in the context of fea-
ture extraction, our research indicates that transformer models, such as ViT, exhibit strong
performance. Our proposed simple CNN model couldn’t be able to extract enough features
with this dataset. It might cause from simplicity of architecture.
We compared our results with two previous studies conducted on the mini-MIAS dataset.
The first study, ”Using Deep Learning for Mammography Classification” by Pınar Uskaner
49
Hepsağ, Selma Ayşe Özel, and Adnan Yazıcı[27], proposed a CNN model and achieved an
accuracy of 87% and an F1 score of 0.84 for classifying mammography images based on
density using the mini MIAS dataset.
The second study, ”Deep Learning from Small Dataset for BI-RADS Density Classifica-
tion of Mammography Images” by Peng Shi, Chongshu Wu, Jing Zhong, and Hui Wang[28],
presented an optimized CNN model that achieved an accuracy of 83.6%. Although the F1
score is not explicitly mentioned in their study, their models performed well in terms of den-
sity classification. Our results can be compared to these studies to evaluate the performance
of our models on the MIAS dataset.
Table 5.1. Comparison of Models on F1 Score and Accuracy Metrics
Model Accuracy F1-score
CNN 45% 43%
Vision Transformer 78% 77%
CNN COOT 51% 52%
Vision Transformer COOT 80% 79%
Paper 1 CNN Model 87% 87%
Paper 2 CNN Model 83.6%
We aimed to classify the BI-RADS class of each mammography image within the dataset.
We trained multiple models, as explained earlier, and obtained the following results for com-
parison.
The Vision Transformer model achieved an accuracy of 61% on the test dataset, with an
F1 score of 0.57. The training and validation loss graphs are shown in Figures 5.19, 5.20 re-
spectively. The confusion matrix is presented in Figure 5.21.
51
Figure 5.22. Loss Graphs of CNN Figure 5.23. Confusion Matrix of CNN
The CNN model has been acquired f1 score of 0.32 and accuracy of 49%. The train-
ing loss, and validation loss graphs are shown in Figure 5.22. The confusion matrix is pre-
sented in Figure 5.23. During the experimentation, the CNN model encountered difficulties
in generalizing the dataset effectively to achieve an optimal solution. In contrast, the Vision
Transformer model exhibited superior performance in comparison. It is plausible that the
simplicity of the CNN model’s architecture might have hindered its ability to perform opti-
mally. The graphical representations clearly indicate that the highest level of generalization
attainable was achieved by the Vision Transformer model.
52
6. CONCLUSION
Our contribution to the literature has shown that the model architecture developed for the
MIAS dataset has improved the performance metrics compared to its normal usage. How-
ever, the CNN structure created could not extract sufficient features from both datasets. In
the case of the MIAS dataset, this limitation is believed to stem from both data scarcity and
the simplicity of the model architecture. Moreover, the simplicity of the model architec-
ture resulted in poorer performance compared to all other trained Vision Transformer models
when applied to the Teknofest dataset. Implementing the COOT optimization algorithm en-
hanced the performance achieved on the MIAS dataset. In conclusion, our study has proven,
as supported by the literature, that facilitating the work of radiologists can be accomplished
by utilizing a well-labeled dataset with an adequate amount of data, encompassing density
and BI-RADS classes. However, we were unable to uncover a highly performing and reliable
model suitable for radiologists’ use.
53
Bibliography
54
[15] What is deep learning? https://www.ibm.com/topics/deep-learning.
[16] What does pre-training a neural network mean? https://www.baeldung.com/cs/neural-
network-pre-training.
[17] Fine tuning in deep learning: What you need to know, https://reason.town/fine-tuning-
in-deep-learning/.
[18] What is overfitting in deep learning? https://www.devopsschool.com/blog/what-is-
overfitting-in-deep-learning/.
[19] What is machine learning? https://www.ibm.com/topics/machine-learning.
[20] G. Banu, “Breast cancer stage classification on digital mammogram images,” Inter-
national Journal of Computer Science and Information Security,, vol. 18, pp. 81–88,
May 2018.
[21] S. Siddeeq, J. Li, H. Bhatti, A. Manzoor, and U. S. Malhi, “Deep learning rn-bcnn
model for breast cancer bi-rads classification,” Jun. 2021, pp. 219–225. doi: 10.1145/
3447587.3447620.
[22] R. Sawyer Lee, J. A. Dunnmon, A. He, S. Tang, C. Ré, and D. L. Rubin, “Compar-
ison of segmentation-free and segmentation-dependent computer-aided diagnosis of
breast masses on a public mammography dataset,” Journal of Biomedical Informatics,
vol. 113, p. 103 656, 2021, issn: 1532-0464. doi: https : / / doi . org / 10 . 1016 /
j.jbi.2020.103656. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S1532046420302847.
[23] A. Baccouche, B. Zapirain, and A. Elmaghraby, “An integrated framework for breast
mass classification and diagnosis using stacked ensemble of residual neural networks,”
Scientific Reports, vol. 12, p. 12 259, Jul. 2022. doi: 10.1038/s41598-022-15632-6.
[24] R. Mohakud and R. Dash, “Designing a grey wolf optimization based hyper-parameter
optimized convolutional neural network classifier for skin cancer detection,” Journal
of King Saud University - Computer and Information Sciences, vol. 34, no. 8, Part
B, pp. 6280–6291, 2022, issn: 1319-1578. doi: https://doi.org/10.1016/j.
jksuci.2021.05.012. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S1319157821001270.
[25] Teknofest yarışmaları - 34 sağlıkta yapay zeka yarışması, https://www.teknofest.org/tr/competition
Son erişim tarihi: 10 Ocak 2023.
[26] The mini-mias database of mammograms, http://peipa.essex.ac.uk/info/mias.html.
[27] A. Y. Pınar Uskaner Hepsağ Selma Ayşe Özel, “Using deep learning for mammography
classification,” IEEE Access, vol. 5, pp. 17 518–17 525, 2017. doi: 10.1109/ACCESS.
2017.2755170.
55
[28] P. Shi, C. Wu, and H. Wang, “Deep learning from small dataset for bi-rads density
classification of mammography images,” IEEE Access, vol. 7, pp. 137 544–137 553,
2019. doi: 10.1109/ACCESS.2019.2934512.
56