0% found this document useful (0 votes)
16 views23 pages

A New Deep-Learning-Based Model For Breast Cancer

This article presents a novel deep-learning model aimed at improving breast cancer diagnosis from medical images, achieving accuracies of 93% and 95% on ultrasound and histopathology images, respectively. The model incorporates advanced techniques such as granular computing, shortcut connections, and learnable activation functions to enhance diagnostic accuracy and reduce the workload on medical professionals. The study highlights the importance of early detection in breast cancer treatment and the potential of machine learning to assist in this critical area.

Uploaded by

shanthinidevip01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views23 pages

A New Deep-Learning-Based Model For Breast Cancer

This article presents a novel deep-learning model aimed at improving breast cancer diagnosis from medical images, achieving accuracies of 93% and 95% on ultrasound and histopathology images, respectively. The model incorporates advanced techniques such as granular computing, shortcut connections, and learnable activation functions to enhance diagnostic accuracy and reduce the workload on medical professionals. The study highlights the importance of early detection in breast cancer treatment and the potential of machine learning to assist in this critical area.

Uploaded by

shanthinidevip01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

diagnostics

Article
A New Deep-Learning-Based Model for Breast Cancer
Diagnosis from Medical Images
Salman Zakareya 1 , Habib Izadkhah 1,2, * and Jaber Karimpour 1

1 Department of Computer Science, University of Tabriz, Tabriz 5166616471, Iran;


[email protected] (S.Z.); [email protected] (J.K.)
2 Research Department of Computational Algorithms and Mathematical Models, University of Tabriz,
Tabriz 5166616471, Iran
* Correspondence: [email protected]

Abstract: Breast cancer is one of the most prevalent cancers among women worldwide, and early
detection of the disease can be lifesaving. Detecting breast cancer early allows for treatment to begin
faster, increasing the chances of a successful outcome. Machine learning helps in the early detection
of breast cancer even in places where there is no access to a specialist doctor. The rapid advancement
of machine learning, and particularly deep learning, leads to an increase in the medical imaging
community’s interest in applying these techniques to improve the accuracy of cancer screening. Most
of the data related to diseases is scarce. On the other hand, deep-learning models need much data
to learn well. For this reason, the existing deep-learning models on medical images cannot work as
well as other images. To overcome this limitation and improve breast cancer classification detection,
inspired by two state-of-the-art deep networks, GoogLeNet and residual block, and developing
several new features, this paper proposes a new deep model to classify breast cancer. Utilizing
adopted granular computing, shortcut connection, two learnable activation functions instead of
traditional activation functions, and an attention mechanism is expected to improve the accuracy
of diagnosis and consequently decrease the load on doctors. Granular computing can improve
diagnosis accuracy by capturing more detailed and fine-grained information about cancer images.
The proposed model’s superiority is demonstrated by comparing it to several state-of-the-art deep
models and existing works using two case studies. The proposed model achieved an accuracy of 93%
Citation: Zakareya, S.; Izadkhah, H.;
and 95% on ultrasound images and breast histopathology images, respectively.
Karimpour, J. A New Deep-
Learning-Based Model for Breast
Cancer Diagnosis from Medical
Keywords: medical image; breast cancer diagnoses; machine learning; deep learning; classification
Images. Diagnostics 2023, 13, 1944.
https://doi.org/10.3390/
diagnostics13111944
1. Introduction
Academic Editors: Mugahed A.
Al-antari and Gary J. Whitman Breast cancer is the most commonly diagnosed form of cancer worldwide and the
second leading cause of cancer-related deaths. In 2020, breast cancer was diagnosed in
Received: 9 April 2023 2.3 million women globally, resulting in 685,000 fatalities. Additionally, as of the end of
Revised: 15 May 2023 2020, 7.8 million women had received a breast cancer diagnosis within the last five years [1].
Accepted: 28 May 2023
Clinical studies have demonstrated that early detection is crucial for effective treatment
Published: 1 June 2023
and can significantly improve the survival rate of breast cancer patients [2].
Computer-aided detection and diagnosis (CAD) software systems have been devel-
oped and clinically used since the 1990s to support radiologists in screening, improve predic-
Copyright: © 2023 by the authors.
tive accuracy, and prevent misdiagnosis due to fatigue, eye strain, or lack of experience [3].
Licensee MDPI, Basel, Switzerland. The rapid progress of machine learning in both application and efficiency, especially
This article is an open access article deep learning, has increased the interest of the medical community in using these tech-
distributed under the terms and niques to improve the accuracy of cancer screening from images. Machine learning can
conditions of the Creative Commons play an essential role in helping medical professionals in the early detection of cancerous
Attribution (CC BY) license (https:// lesions. Despite the benefits of using these techniques, cancer screening is associated with
creativecommons.org/licenses/by/ a high risk of false positives and false negatives. However, early detection of cancer can
4.0/). contribute to up to a 40% decrease in the mortality rate [2].

Diagnostics 2023, 13, 1944. https://doi.org/10.3390/diagnostics13111944 https://www.mdpi.com/journal/diagnostics


Diagnostics 2023, 13, 1944 2 of 23

Deep-learning networks employ a deeply layered architecture that enables hierarchical


learning and progressive extraction of features from data, starting from simple to progres-
sively more complex abstractions. By autonomously learning the maximum possible set of
features, the deep-learning algorithm can deliver the most precise results, rendering these
networks highly effective for medical image classification and the identification of features
such as lesions [4,5].
The performance of convolutional neural networks (CNNs) is hindered by several
challenges, and the most popular CNNs attempt to improve their performance by simply
stacking convolution layers deeper and deeper. However, the significant areas in an image
can vary considerably in size, making it difficult to select the appropriate kernel size for
the convolution operation due to the extreme variability in the information’s location. For
information that is more locally distributed, a smaller kernel is recommended, whereas
a larger kernel is chosen for information distributed more widely. Very deep networks
are more prone to overfitting, and gradient updates are challenging to share throughout
the entire network, in addition to being computationally expensive to naively stack large
convolution processes [6,7].
Granular computing is a type of computing that focuses on the use of granules,
which are small, discrete pieces of knowledge that can be combined, manipulated, or
analyzed to solve complex problems. It is a form of computing that is based on the idea
of breaking down complex problems into smaller, more manageable pieces. Granular
computing has been used in various areas, such as data mining, decision support systems,
and knowledge discovery. Granular computing can be used in image classification by
dividing the image into smaller regions or sub-regions, also known as granules, and
extracting features from them [8].
This study proposes a novel deep-learning model designed to enhance the accuracy
of breast cancer detection while simultaneously reducing the network’s parameters to
improve training time. Inspired by GoogLeNet and the residual block, the proposed model
considers both the depth and width of the network. Multiple filters of varying sizes operate
at the same level, and their outputs are concatenated and transmitted to the next mod-
ule. Additionally, the use of shortcut connections and two learnable activation functions,
as opposed to traditional activation functions, is expected to reduce time consumption
and improve diagnostic accuracy, thereby potentially alleviating the workload of medi-
cal professionals. We also propose a granular computing-based algorithm for capturing
more detailed and fine-grained information about cancer images. We will apply granular
computing to the dataset before starting the training process.
The paper’s contributions can be outlined as follows:
1. The proposed model has the highest diagnostic accuracy compared to existing breast
cancer methods;
2. This paper is the first study to use the granular computing concept in disease diagnosis;
3. Utilizing wide and depth networks, shortcut connections, and intermediate classifiers,
we design a new deep network to improve the detection of breast cancer;
4. An attention mechanism is proposed to highlight the important features in the input
image, thereby resulting in improved accuracy of the classifier;
5. Two learnable activation functions are developed and utilized instead of traditional
activation functions. Learnable activation functions provide a flexible framework that
can be fine-tuned during training for optimal performance on specific tasks.
The following is the structure of this work: Section 2 addresses the related works,
Section 3 presents the model and implementation, Section 4 presents the results and
discussion, and ultimately, the conclusion is presented.

2. Related Work
Breast cancer ranks among the most prevalent cancers affecting women worldwide.
Early detection and accurate diagnosis are crucial factors for effective treatment and im-
proved patient outcomes [9]. Ultrasound imaging is a widely used method for breast
Diagnostics 2023, 13, 1944 3 of 23

cancer screening and diagnosis, but it requires skilled radiologists to interpret the images
accurately [10]. According to the National Breast Cancer Foundation’s 2020 report, AI has
been successfully used to diagnose more than 276,000 breast cancer cases. By analyzing
breast cancer images using AI, breast lumps (masses), mass segmentation, breast density,
and breast cancer risk can be identified. In the majority of patients, lumps in the breast are
the most common sign of breast cancer [9]; therefore, their detection is an essential step
used in CAD.
A review of deep-learning applications in breast tumor diagnosis utilizing ultrasound
and mammography images is provided in [11]. Moreover, the research summarizes the
latest progressions in computer-aided diagnosis/detection (CAD) systems that rely on deep-
learning methodologies to automatically recognize breast images, ultimately enhancing
radiologists’ diagnostic precision. Remarkably, the classification process underpinning the
novel deep-learning approaches has demonstrated significant usefulness and effectiveness
as a screening tool for breast cancer.
Recent studies have explored the use of deep-learning techniques, particularly convolu-
tional neural networks (CNNs), for automated breast ultrasound image classification [12–15].
These studies have shown encouraging results. Several convolutional neural networks
(CNNs) models are used for breast cancer image classifications including AlexNet, VGGNet,
GoogLeNet, ResNet, and Inception. In their study, the authors of [5] categorized breast
lesions as either benign or malignant. They developed a CNN model to remove speckle
noise from the ultrasound images and then proposed another CNN model for classifying
the ultrasound images. The study [16] discriminates benign cysts from malignant masses
in US images.
In the study presented in [17], various deep-learning models were employed to clas-
sify breast cancer ultrasound images based on their benign, malignant, or normal status.
A dataset comprising a total of 780 images was utilized, and data augmentation and
preprocessing techniques were applied. Three models were evaluated for classification.
Specifically, ResNet50 achieved an accuracy of 85.4%, ResNeXt50 achieved 85.83%, and
VGG16 achieved 81.11%.
The study [18] introduced a novel ensemble deep-learning-enabled clinical decision
support system for the diagnosis and classification of breast cancer based on ultrasound
images. The study presented an optimal multilevel thresholding-based image segmentation
technique for identifying tumor-affected regions. Additionally, an ensemble of three deep-
learning models was developed to extract features, and an optimal machine-learning
classifier was utilized to detect breast cancer.
In the study [14], the authors proposed a system to classify breast masses into normal,
benign, and malignant. Ten well-known, pre-trained CNNs classification models were
compared, and the best model was Inception ResNetV2. In [19], a vector-attention network
(BVA Net) was proposed to classify benign and malignant mass tumors in the breast.
In [20], the authors proposed a CNN-based CAD system for breast ultrasound image
classification (benign and malignant lesions). The study [21] developed a deep-learning
model based on ResNet18 CNN architecture for breast ultrasound image classification.
In addition, the study [22] compared the performance of different deep-learning models,
including CNNs, recurrent neural networks (RNNs), and hybrid models, for breast cancer
diagnosis on ultrasound images. In addition to binary classification, some studies have also
explored multiclass classification of breast ultrasound images. For example, the study [23]
proposed a CNN-based CAD system that can classify breast lesions into four categories:
benign, malignant, cystic, or complex cystic-solid. The system achieved an overall accuracy
of 87% on a dataset of 1000 images.
Gao et al. have devised a computer-aided diagnosis (CAD) system geared toward
screening mammography readings, which demonstrated an accuracy rate of approximately
92% [24]. Similarly, in several studies [25,26], multiple convolutional neural networks
(CNNs) were employed for mass detection in mammographic and ultrasound images.
Diagnostics 2023, 13, 1944 4 of 23

The study conducted by [3] provides a comprehensive review of the techniques used
for the diagnosis of breast cancer in histopathological images. The state-of-the-art machine-
learning approaches employed at each stage of the diagnosis process, including traditional
methods and deep-learning methods, are presented, and a comparative analysis between
the different techniques is provided. The technical details of each approach and their
respective advantages and disadvantages are discussed in detail.
Lee et al. [27] conducted a study utilizing a deep-learning-based computer-aided pre-
diction system for ultrasound (US) images. The research involved a total of 153 women with
breast cancer, comprising 59 patients with lymph node metastases (LN+) and 94 patients
without (LN−). Multiple machine-learning algorithms, including logistic regression, sup-
port vector machines (SVMs), XGBoost, and DenseNet, were trained and evaluated on the
US image data. The study found that the DenseNet model exhibited the best performance,
achieving an area under the curve (AUC) of 0.8054. This study highlights the potential of
deep-learning techniques in the development of accurate and efficient prediction systems
for breast cancer diagnosis using US imaging.
Sun et al. [28] conducted a study utilizing a convolutional neural network (CNN)
trained and tested on ultrasound images of 169 patients. The training dataset consisted of
248 US images from 124 patients, while the testing dataset comprised 90 US images from
45 patients. The results of the study revealed a somewhat inferior performance, with an
AUC of 0.72 (SD 0.08) and an accuracy of 72.6% (SD 8.4). Notably, the validation process did
not include cross-validation or bootstrapping methods. These findings suggest that further
research is necessary to improve the performance of CNNs in breast cancer diagnosis using
ultrasound imaging.
In a study by [29], a comparison was made between convolutional neural networks
(CNNs) and traditional machine-learning (ML) methods, specifically random forests, in the
context of breast cancer diagnosis. The study utilized a dataset of 479 breast cancer patients,
comprising 2395 breast ultrasound images. The research also focused on different regions
of the ultrasound images, including intratumoral, peritumoral, and combined regions, to
train and evaluate the models. The study found that CNNs outperformed random forests
in all modalities (p < 0.05), and the combination of intratumoral and peritumoral regions
provided the best result, with an AUC of 0.912 [0.834–99.0]. While confidence intervals
were provided, the method used to determine them was not mentioned. These results
highlight the potential of CNNs in breast cancer diagnosis using ultrasound imaging and
the importance of considering different regions of the image in the analysis.
The study proposed by [30] implemented the multilevel transfer-learning (MSTL) algo-
rithm using three pre-trained models, namely EfficientNetB2, InceptionV3, and ResNet50,
along with three optimizers, which included Adam, Adagrad, and stochastic gradient
descent (SGD). The study utilized 20,400 cancer cell images, 200 ultrasound images from
Mendeley, and 400 from the MT-Small dataset. This approach has the potential to reduce
the need for large ultrasound datasets to realize powerful deep-learning models. The
results of this study demonstrate the effectiveness of the MSTL algorithm in breast cancer
diagnosis using ultrasound imaging.
The study [31] presents a review of studies investigating the ability of deep-learning
(DL) approaches to classify histopathological breast cancer images. The article evaluates
current DL applications and approaches to classify histopathological breast cancer im-
ages based on papers published by November 2022. The study findings indicate that
convolutional neural networks, as well as their hybrids, represent the most advanced DL
approaches currently in use for this task. The authors of the study defined two categories
of classification approaches, namely binary and multiclass solutions, in the context of DL-
based classification of histopathological breast cancer images. Overall, this review provides
insights into the current state of the art in DL-based classification of histopathological
breast cancer images and highlights the potential of advanced DL approaches to improve
the accuracy and efficacy of breast cancer diagnosis.
Diagnostics 2023, 13, 1944 5 of 23

The study [32] proposed a breast cancer classification technique that leverages a
transfer-learning approach based on the VGG16 model. To preprocess the images, a median
filter was employed to eliminate speckle noise. The convolution layers and max pooling
layers of the pre-trained VGG16 model were utilized as feature extractors, while a two-layer
deep neural network was devised as a classifier.
The vision transformer (ViT) architecture has been proven to be advantageous in
extracting long-range features and has thus been employed in various computer vision
tasks. However, despite its remarkable performance in traditional vision tasks, the ViT
model’s supervised training typically necessitates large datasets, thereby posing difficulties
in domains where it is challenging to amass ample data, such as medical image analysis.
In [33], the authors introduced an enhanced ViT architecture, denoted as ViT-Patch, and
investigated its efficacy in addressing a medical image classification problem, namely,
identifying malignant breast ultrasound images.
In summary, these studies showcase the capability of deep-learning techniques in
automating breast image classification and underscore the significance of devising precise
CAD systems to support radiologists in detecting breast cancer. The majority of current
approaches employ pre-existing deep-learning architectures for detecting breast cancer. In
the following, we introduce a novel architecture that surpasses all previous methods.

3. Methodology
Inspired by GoogLeNet [34] and residual block [35] and adding several other features,
in this paper, we developed a new deep architecture for breast cancer detection from
images. GoogLeNet and residual block are based on convolutional neural network (CNN)
architecture. GoogLeNet is a deep convolutional neural network architecture developed
by Google’s research team in 2014. It was the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) in 2014 and achieved state-of-the-art performance on a
variety of computer vision tasks.
The GoogLeNet architecture consists of a 22-layer deep neural network with a unique
“Inception” module that enables the network to efficiently capture spatial features at differ-
ent scales using parallel convolutional layers. The Inception module combines multiple
convolutional filters of different sizes and concatenates their outputs, allowing the network
to capture both fine-grained and high-level features. On the other hand, a residual block
is a building block used in deep neural networks that helps to address the problem of
vanishing gradients during training. A residual block consists of two or more stacked
convolutional layers followed by a shortcut connection that bypasses these layers. The
shortcut connection allows the gradient to be directly propagated to earlier layers, allowing
for better optimization and deeper architectures.
This study introduces a novel deep-learning-based architecture for breast cancer
detection that stands out from existing architectures in four significant aspects, resulting in
superior performance.
1. Proposing a granular computing-based algorithm aiming to extract more detailed and
fine-grained information from breast cancer images, leading to improved accuracy
and performance;
2. Utilizing wide and deep modules, shortcut connections, and intermediate classifiers
simultaneously in the architecture;
3. Designing an attention mechanism; the attention mechanism in CNNs provides a
powerful tool for selectively focusing on relevant features in the input data, enabling
the network to achieve better accuracy and efficiency;
4. Designing two learnable activation functions and using them instead of traditional
activation functions.
Figure 1 depicts the overall process proposed in this paper. The input of the proposed
method is a breast cancer image. If the size of the images is different, they are resized to
a pre-determined size. After that, the pixels of the image are normalized between [0,1].
Resizing and normalization are preprocessing steps. After the preprocessing step, we use
4. Designing two learnable activation functions and using them instead of traditiona
activation functions.
Figure 1 depicts the overall process proposed in this paper. The input of the propose
method is a breast cancer image. If the size of the images is different, they are resized to
Diagnostics 2023, 13, 1944 6 of 23
pre-determined size. After that, the pixels of the image are normalized between [0,1
Resizing and normalization are preprocessing steps. After the preprocessing step, we us
granular computing to highlight important features of an image. The output of the prev
granular computing to highlight important features of an image. The output of the previous
ousisstep
step usedistoused
train to
thetrain the proposed
proposed deep-learning
deep-learning model. Thesemodel. These
steps are used steps
for all are used for a
images
images in the dataset. In the following, we will describe each
in the dataset. In the following, we will describe each of these cases. of these cases.

Figure1.1.The
Figure Theoverall
overall process
process of of
thethe proposed
proposed method.
method. All steps
All steps in figure
in this this figure are repeated
are repeated for for a
images.
all images.

3.1. Granular Computing


3.1. Granular Computing
Granular computing can be used in image classification by dividing the image into
smallerGranular
regions orcomputing
sub-regions,canalsobe used as
known ingranules,
image classification
and extractingby dividing
features fromthe image int
them.
smaller
This regions
approach or sub-regions,
can improve the accuracy also known
of the as granules,
classification and extracting
task by capturing features from
more detailed
them.
and This approach
fine-grained can improve
information about thethe accuracy
image [8]. of the classification task by capturing mor
detailed
Here, and fine-grained
we propose generalinformation
steps by which about the image
granular [8]. can be applied to image
computing
classification
Here, wein deep learning:
propose general steps by which granular computing can be applied to im
age Input: An image
classification into be classified.
deep learning:
Output: A label representing
Input: An image to be classified. the class of the image.
Preprocessing: Resize the image to a fixed size and normalize the pixel values to a
Output: A label representing the class of the image.
range between 0 and 1.
Preprocessing: Resize the image to a fixed size and normalize the pixel values to
Granulation: Divide the image into smaller regions or sub-regions, known as granules.
range
This canbetween 0 andby1.using techniques such as windowing, tiling, or segmentation.
be performed
Granulation:
Feature Extraction:Divide the granule,
For each image into smaller
extracts regions
features usingor sub-regions,
techniques such known
as local as gran
ules. This
binary can histograms
patterns, be performed by using
of oriented techniques
gradients, such
or CNN. aswill
This windowing, tiling,
result in a set or segmenta
of feature
vectors,
tion. one for each granule.
Feature Aggregation: Combine the feature vectors obtained from the granules and
use them to classify the image. This can be performed by using techniques such as mean
pooling or max pooling.
Diagnostics 2023, 13, 1944 7 of 23

Figure 2 shows the proposed steps for granular computing used in this paper. Con-
sidering the above steps, we propose Algorithm 1 for applying granular computing in
this paper. In this algorithm, we have used the pre-trained VGG16 architecture to extract
features for each granularity. The size of each granular is considered to be 32 × 32. We
apply granular computing to the dataset before starting the training process.

Algorithm 1. Extracting more detailed features by granular computing


Repeat the following steps for all images
img = Load the image

Preprocessing step: resizing and normalization


img = resize(img, (224, 224))
img = normalize(img/255.0)

Granulation step: split the image in windows of size 24 * 24


granules = []
for i in range(0, 224, 32):
for j in range(0, 224, 32):
granule = img[i:i + 32, j:j + 32,:]
granules.append(granule)

Feature Extraction step:


model = VGG16(weights= ‘imagenet’, include_top = False, input_shape = (32, 32, 3))
features = []
for granule in granules:
feature = model.predict(np.expand_dims(granule, axis = 0)).squeeze()
features.append(feature)

Feature Aggregation step:


features = np.array(features)
aggregated_features = np.mean(features, axis = 0)

3.2. Learnable Activation Function


Sigmoid, ReLU, and tanh are widely used activation functions in artificial neural
networks, and they perform satisfactorily in many cases. However, these functions have
some limitations that make their use suboptimal, thus necessitating the development of
learnable activation functions.
Learnable activation functions in artificial neural networks are like functions that
can adapt and modify themselves based on the input received by the neural network.
These functions are characterized by a group of parameters that can be fine-tuned by the
neural network during the training process. The primary advantage of learnable activation
functions is their flexibility and adaptability in adjusting to different types of input data
being processed. There are two primary types of learnable activation functions in artificial
neural networks:
- Parametric activation functions: These functions have a fixed form (such as sigmoid or
ReLU), but they add extra learnable parameters to them. The neural network changes
these parameters to adjust the activation function appropriately on training data;
- Adaptive activation functions: These functions do not have any fixed form or formula.
Instead, they rely on a neural network structure such as RNN or LSTM to learn
appropriate activation functions.
We, here, develop two learnable activation functions named LAF_sigmoid and LAF_relu.
To develop a new parametric learnable activation function, we start by defining a general
Diagnostics 2023, 13, x FOR PEER REVIEW 7 of 24

Diagnostics 2023, 13, 1944 Feature Extraction: For each granule, extracts features using techniques such 8 of 23as local
binary patterns, histograms of oriented gradients, or CNN. This will result in a set of fea-
ture vectors, one for each granule.
formFeature Aggregation:
of the function that hasCombine
learnable the feature vectors
parameters. obtained
Let us call from the
this function granules and
“Learnable
use them to classify the image. This
Activation Function”, or LAF for short: can be performed by using techniques such as mean
pooling or max pooling.
Figure 2 shows the proposed LAF(x;steps a ∗ granular
W) = for F(x; W) + bcomputing used in this paper. (1) Con-
sidering the above steps, we propose Algorithm 1 for applying granular computing in this
Here, a and b are adjustable parameters that are learned during training, and F()
paper. In this algorithm, we have used the pre-trained VGG16 architecture to extract fea-
is a non-linear function that defines the shape of the activation function. The weight
tures
matrixfor
Weach granularity.
contains learnableThe sizethat
values of each granular
determine the is considered
shape to be
of F(), and it is32optimized
× 32. We apply
granular computing
through backpropagation.to the dataset before starting the training process.

Figure 2. The
Figure 2. The granular
granularcomputing
computing process
process proposed
proposed in paper.
in this this paper.

To further
Algorithm develop the
1. Extracting LAF,detailed
more we can features
choose a by
suitable non-linear
granular function F(). One
computing
possible choice is the sigmoid function:
Repeat the following steps for all images
img = Load the image F(x; W) = 1/(1 + exp(−Wx)) (2)

The sigmoid
Preprocessing function
step: is a common
resizing choice for activation functions due to its smoothness
and normalization
and boundedness, which is important for the stable training of neural networks. The LAF
img = resize(img, (224, 224))
with the Sigmoid function becomes:
img = normalize(img/255.0)
LAF_sigmoid(x; W, a, b) = a ∗ (1/(1 + exp(−Wx))) + b (3)
Granulation step: split the image in windows of size 24 * 24
Diagnostics 2023, 13, 1944 9 of 23

This activation function will be used in the dense layers of the network.
Another possible choice for F() is the ReLU function:

F(x; W) = max(0, Wx) (4)

The ReLU function is preferred for some tasks because of its simplicity and computa-
tional efficiency. The learnable parameters a and b can be added to shift and scale the ReLU
function, resulting in the LAF with the ReLU function:

LAF_relu(x; W, a, b) = a ∗ max(0, Wx) + b (5)

This activation function will be used in the convolutional layers of the network.
The values of a, b, and W can be trained through backpropagation using gradient
descent or other optimization algorithms. The choice of the initial values and number of
hidden units are important factors that can affect the success of training the neural network
using LAFs. There are several advantages of using learnable activation functions in artificial
neural networks:
1. Improved performance: By incorporating learnable activation functions, the neural
network performance can be improved significantly. This is because the activation
function adapts to the input data, allowing for a more accurate representation of
complex relationships between features;
2. Non-linear mapping: Learnable activation functions allow for non-linear mappings
between input and output, which can capture more complex patterns in the data;
3. Flexibility: With traditional activation functions, the network architecture is fixed.
However, using learnable activation functions allows for more flexibility in the
network architecture, as the activation function can be modified according to the
specific task;
4. Reduced overfitting: Learnable activation functions can also help reduce overfitting,
as they can adapt to the input data and generalize better to new data that has not been
seen before;
5. Efficient training: The use of learnable activation functions can also make the training
process more efficient by allowing gradients to be propagated through the network
more smoothly. This can lead to faster convergence and improved performance.

3.3. The Attention Mechanism


The attention mechanism in convolutional neural networks (CNNs) is a tool that
enables networks to selectively focus on specific parts of input data that are crucial for
making decisions. Mimicking the process of human attention, this mechanism allows
neural networks to attend selectively to relevant features in input data. The attention
mechanism can be incorporated into various parts of a network, including the input layer,
the convolutional layer, or the dense layer. Typically, the attention mechanism is added on
top of the convolutional layer as an auxiliary module.
The attention mechanism takes as input the output features of the preceding layer and
computes a set of attention weights for each feature. These weights reflect the importance of
each feature for the current task. To compute these attention weights, a sub-network called
the attention module, which comprises one or more layers, is employed. The attention
weights are then multiplied element-wise with the output features of the preceding layer to
obtain a weighted sum of the characteristics. This weighted sum is then passed to the next
layer of the network. By doing this, the attention mechanism selectively focuses on key
parts of the input data and enhances their impact on the output.
The attention mechanism can be trained end-to-end using backpropagation, and the
attention weights can be learned jointly with the neural network parameters. During
training, the attention mechanism learns to weigh the importance of various features based
on their relevance to the task. This enables the network to selectively attend to the most
informative parts of the input data and ignore irrelevant or noisy features.
Diagnostics 2023, 13, 1944 10 of 23

In this section, we propose an attention mechanism to apply to the output layer (top
layer). The attention mechanism in CNNs can highlight the salient regions in images that
are significant for the classification task.
In the case of breast cancer detection, this can be useful since certain regions of the
breast image may contain more relevant features for cancer detection compared to others.
Here, we develop an attention mechanism for CNN:
1. Start with a standard convolutional layer with filters of size (k, k) and stride s;
2. Add a second convolutional layer with filters of size 1 × 1 and stride 1. This layer will
compute a scalar attention weight for each pixel in the input image;
3. Apply a Softmax activation function to the output of the attention layer to ensure that
the weights sum up to 1 for each pixel;
4. Multiply the attention weight maps element-wise with the input image to obtain the
attended input image;
5. Feed the attended input image into the next layer of the CNN.
The idea behind this attention mechanism is that the second convolutional layer learns
to compute a scalar attention weight for each pixel in the input image, based on its relevance
to the task at hand. The softmax activation function ensures that the attention weights
sum up to one for each pixel, making them interpretable as a probability distribution
over the pixels. The element-wise multiplication of the attention weight maps and input
image highlights or downplays certain pixels, improving the accuracy of the CNN on the
given task.

3.4. Wide and Depth Networks, Short Connections, and 1 × 1 Convolutional Layers
Our developed network takes advantage of the features of wide and deep networks,
short connections, and 1 × 1 convolutional layers. In the following, we will examine
each one.
Wide and deep neural networks such as GoogLeNet offer improved accuracy, higher
capacity, faster convergence, and better regularization. They have demonstrated impressive
performance in various tasks, including image recognition, speech recognition, and natural
language processing. Their advantages make them an attractive choice when designing
neural networks.
In neural networks, short connections, which are used in ResNet and DenseNet
networks, are a type of connection between the neurons that bypass one or more layers
in the network. These connections allow information to flow between two layers that are
not directly connected in the network architecture. Short connections, also known as skip
connections, in neural networks can be represented mathematically as an element-wise
summation or concatenation operation between the input to a layer and the output of that
layer. In other words, the output of a layer is added to or concatenated with the input to
that layer or a previous layer. For example, in a convolutional neural network (CNN), a
short connection can be introduced between two convolutional layers by adding the output
of the first convolutional layer to the input of the second convolutional layer. This can be
added as follows:
x1 = Convolutional_layer_1(input)

x2 = Convolutional_layer_2(x1 + input)
where “+” denotes element-wise summation.
Short connections or residual connections in neural networks have several advantages:
1. Improved gradient flow: By adding short connections, the gradient can flow through
the network more effectively, which eliminates the vanishing gradient problem. The
gradient can be propagated directly to earlier layers, allowing the network to train
deeper architectures;
Diagnostics 2023, 13, 1944 11 of 23

2. Improved training speed: The use of short connections reduces the number of layers in
the critical learning path, which can speed up the training process. The reduced depth
also means that less computation is required, resulting in a more efficient model;
3. Improved accuracy: Short connections enable the learning of more complex functions
by allowing the network to make use of the information from earlier layers. This can
result in higher accuracy in tasks such as image recognition and speech processing;
4. Reduced overfitting: Short connections can help reduce overfitting by providing a
regularization mechanism. They allow the network to learn simpler representations
for the input data, which leads to better generalization.
In CNNs, 1 × 1 convolutional layers are utilized as a type of layer that executes a
convolution operation by convolving the input tensor with a kernel of size 1 × 1. Despite
their small size, 1 × 1 convolutional layers have various advantages in CNNs:
1. Dimensionality reduction: 1 × 1 convolutional layers can be used to reduce the dimen-
sionality of feature maps, which can be useful in reducing the computational complex-
ity of CNNs while maintaining their accuracy. By using 1 × 1 convolutional layers,
the number of parameters can be reduced while still retaining the important features;
2. Non-linear transformations: Even though it has a kernel of size 1 × 1, this layer applies
non-linear transformations to the input feature maps. The non-linear activation
function applied after the convolution operation contributes to this non-linearity;
3. Improved model efficiency: By reducing the number of parameters, 1 × 1 convolu-
tional layers reduce the computational cost of the model. This can, in turn, improve
the efficiency of the implementation of the model, allowing it to be run on smaller
devices or with fewer computational resources;
4. Feature interaction: A 1 × 1 convolutional layer can act as a feature interaction layer
and induce correlations between features, which can further enhance the representa-
tion power of the network.
These advantages make 1 × 1 convolutional layers an important building block in
CNNs, especially in deeper networks where computational cost and memory usage are of
key concern.

3.5. The Designed Architecture


To reduce the high volume of processing in the GoogLeNet network, we changed the
architecture from fully connected to sparsely connected network architectures within the
convoluted layers. The Inception layer, which was inspired by the Hebbian principle of
human learning, is critical to this sparsely connected architecture. For example, a deep-
learning model for recognizing a particular pattern (e.g., face) in an image might have
a layer that focuses on individual parts of an image. The next layer then focuses on the
overall pattern in the image and identifies the various objects in it. To this end, the layer
requires appropriate filter sizes to detect these objects. The Inception layer is crucial in this
scenario, allowing internal layers to determine which filter size is relevant for learning the
required information. Therefore, even if the size of the pattern in the image is different, the
layer can recognize it accordingly.
The proposed system is shown in Figure 3, which shows the block diagram used
to diagnose medical images. The proposed system has the same structure as the simple
GoogLeNet network. We replaced the Inception module with a new module named
X-module. The structure of the new deep neural network consists of the following:
1. Input layer;
2. A convolutional-based attention layer;
3. Convolution layer;
4. Two X modules with different filter sizes followed by a down-sample module;
5. An auxiliary classifier with a learnable Softmax classifier;
6. Three X modules with different filter sizes followed by a down-sample module;
7. An auxiliary classifier with a learnable Softmax classifier;
Diagnostics 2023, 13, 1944 12 of 23

8.
Diagnostics 2023, 13, x FOR PEER REVIEW An X-module followed the Average pool, dropout layer; 12 of 24
9. A dense layer-based attention layer;
10. Learnable Softmax classifier as output layer.

Figure3.3.The
Figure Thenew
newCNN
CNNsystem
system proposed
proposed to diagnose
to diagnose medical
medical images.
images.

1. The details
Input are explained as follows:
layer;
Input Layer: In this step, the
2. A convolutional-based medical
attention image is entered into the system.
layer;
3. Convolutional-based
Convolution layer; attention layer: This layer allows for the selective focus on
specific
4. Two parts of the input
X modules withdata that hold
different significance
filter in determining
sizes followed an outcome.module;
by a down-sample
5. An auxiliary classifier with a learnable Softmax classifier;
6. Three X modules with different filter sizes followed by a down-sample module;
7. An auxiliary classifier with a learnable Softmax classifier;
8. An X-module followed the Average pool, dropout layer;
The details are explained as follows:
Input Layer: In this step, the medical image is entered into the system.
Convolutional-based attention layer: This layer allows for the selective focus on spe-
Diagnostics 2023, 13, 1944 13 of 23
cific parts of the input data that hold significance in determining an outcome.
Convolution Layer: This layer uses convolution operations to produce new feature
maps.
Convolution
X-Module: This Layer: This layer
module uses convolution
considers both the depth operations
and widthto produce new feature
of the network, withmaps.
mul-
X-Module: This module considers both the depth and width of the
tiple filters of varying sizes operating at the same level. The outputs of these filters are network, with
multiple filtersbefore
concatenated of varying
beingsizes operating
transmitted to at
thethe same level.
subsequent The outputs
module. of these
The main unitfilters
in X-
are
module is a sub-block called R-block (Figure 4), which is inspired by the residualunit
concatenated before being transmitted to the subsequent module. The main block.in
X-module is a sub-block called R-block (Figure 4), which is inspired by
The main difference is that the designed block uses learnable activation functions. The the residual block.
The
blockmain
usesdifference
a shortcutis connection;
that the designed
the inputblock uses learnable
is added activation
to the output of thefunctions. The
block to pass
block usesupdates
gradient a shortcut connection;
through thenetwork
the entire input iseasily
addedand to the output
reduce of the block
overfitting. to pass
The R-block
gradient updates through the entire network easily and reduce overfitting.
consists of two convolutional layers stacked on top of each other, with the first layer being The R-block
consists
succeeded of bytwoa convolutional
batch normalizationlayerslayer
stacked
andon top of each
a learnable other, with
activation the that
function first is
layer
de-
being succeeded by a batch normalization layer and a learnable activation
pendent on parameters. Using a parameter learnable activation function helps to reduce function that is
dependent on parameters. Using a parameter learnable activation function helps to reduce
time consumption and better learning. Three types of R-blocks are implemented upon the
time consumption and better learning. Three types of R-blocks are implemented upon
filter size. The filter size of the two convolutional layers in the first R-block is 3 × 3. In the
the filter size. The filter size of the two convolutional layers in the first R-block is 3 × 3.
second R-block, the filter size of the two convolutional layers is 5 × 5. The first convolu-
In the second R-block, the filter size of the two convolutional layers is 5 × 5. The first
tional layer in the third R-block has a 3 × 3 filter size and the second convolutional layer
convolutional layer in the third R-block has a 3 × 3 filter size and the second convolutional
has a 5 × 5 filter size. We have reduced the number of parameters and computational costs
layer has a 5 × 5 filter size. We have reduced the number of parameters and computational
in the X-module by incorporating an additional 1 × 1 convolution in the initial layer, pre-
costs in the X-module by incorporating an additional 1 × 1 convolution in the initial layer,
ceding the 3 × 3 and 5 × 5 convolutions. An extra 1 × 1 convolution is also utilized after the
preceding the 3 × 3 and 5 × 5 convolutions. An extra 1 × 1 convolution is also utilized
max pooling layer.
after the max pooling layer.

(a) (b) (c)


Figure4.4. Structure
Figure Structure of
of R-block
R-block (the
(the main
mainunits
unitsofofX-module):
X-module):(a)(a)
R-block 3 ×3 3.
R-block ×(b)
3. R-block 5 × 5.
(b) R-block 5 (c)
× 5.
R-block 3 × 5.
(c) R-block 3 × 5.

Different configurations of the X-module are implemented to build different deep


neural network (DNN) models. In the first X-module, the first R-block is 3 × 3, the second
is 5 × 5, and the third R-block has a 3 × 5 convolution filter size. In the second model,
three R-blocks of filter sizes 3 × 3 are utilized. In the third model, three R-blocks of
Diagnostics 2023, 13, x FOR PEER REVIEW 14 of 24

Different configurations of the X-module are implemented to build different deep


Diagnostics 2023, 13, 1944 neural network (DNN) models. In the first X-module, the first R-block is 3 × 3, the 14 second
of 23
is 5 × 5, and the third R-block has a 3 × 5 convolution filter size. In the second model, three
R-blocks of filter sizes 3 × 3 are utilized. In the third model, three R-blocks of filter sizes 5
×filter
5 aresizes
utilized.
5 × 5The
arestructure
utilized. of
Thethestructure
R-block and X-module
of the are shown
R-block and in Figures
X-module 4 and
are shown in5,
respectively.
Figures 4 and 5, respectively.

(a) (b)

(c)
Figure
Figure5.5.The
Thedifferent used
different structures
used into
structures X-module:
into X-module: (a)(a)
X-module
X-module includes three
includes R-blocks
three of 3of×
R-blocks
3,3called DNN R3_R3_R3. (b) X-module includes three R-blocks of 5 × 5, called DNN
× 3, called DNN R3_R3_R3. (b) X-module includes three R-blocks of 5 × 5, called DNN R5_R5_R5.R5_R5_R5. (c)
X-module includes three R-blocks of 3 × 3, 5 × 5, and 3 × 5, called DNN R3_R5_R35.
(c) X-module includes three R-blocks of 3 × 3, 5 × 5, and 3 × 5, called DNN R3_R5_R35.

Downsampling module: Following the X-module, this module is implemented to


decrease both the size of the feature map and the number of network parameters. The
pooling max function is concatenated with a 3 × 3 convolutional layer. As stated before,
deep neural networks are computationally expensive. To make it cheaper, the number of
input channels is limited by adding a 1 × 1 convolution before the 3 × 3 convolution.
Diagnostics 2023, 13, 1944 15 of 23

Dense layer-based attention layer: This attention takes as input a 3D tensor represent-
ing the output features of the previous layer and outputs a 2D tensor of attention scores,
where each score represents the relevance of a specific feature.
Output layer: The output in this step will be normal or abnormal.

4. Result and Discussion


The experiments were conducted using the Cairo University ultrasound images dataset
and the breast histopathology images dataset for training and testing. The code was written
in Python, and the experiments were performed on Kaggle, leveraging the power of their
hardware, including GPUs. To train the model, the following settings were employed:
Adam optimizer with a learning rate set to 0.0001, 100 epochs, a batch size of 32, and a
dropout rate of 50%. The number of epochs in a CNN is one of the hyperparameters that
can be tuned to improve the performance of the model. The number of epochs refers to the
number of times the entire training dataset is passed through the CNN during the training
phase. To tune the number of epochs, we used the early stopping technique. Early stopping
is a method used to prevent overfitting by monitoring the validation loss during training
and stopping the training process once the validation loss starts to increase. After applying
this technique, we set the number of epochs to 100.
In this study, the proposed approach is assessed in terms of several performance
metrics, including accuracy, loss, precision, recall, and F1 score. These metrics are defined
as follows:
Accuracy: This metric measures the overall performance of the model by calculating
the percentage of correctly predicted labels to the total number of samples in the test dataset.
Mathematically, it can be expressed as:

Accuracy = (Number of Correct Predictions)/(Total Number of Predictions) (6)

Loss: This metric represents the error between the predicted output and the actual
output. The loss function is typically defined during the training phase of the model,
and it is used to optimize the model parameters by minimizing the difference between its
predictions and the true values. The most commonly used loss function in deep learning
is the mean squared error (MSE), which measures the average of the squared differences
between the predicted and true values.
Precision: This metric measures the proportion of true positives (samples that were
correctly classified as positive) to the total number of positive predictions made by the
model. It can be calculated as:

Precision = True Positives/(True Positives + False Positives) (7)

Recall: This metric measures the proportion of true positives to the total number of
true positives and false negatives in the dataset.
F1: This metric calculates the harmonic mean of precision and recall.
Different DNN models are implemented using different configurations of the X-module,
as depicted in Figure 5. The first model has three R-blocks of 3 × 3 convolution filters.
The second model utilized three R-blocks of 5 × 5. In the third model, the first R-block
is 3 × 3, the second is 5 × 5, and the third R-block has a 3 × 3 filter size in the first
convolutional layer, and the second convolutional layer has a 5 × 5 filter size. We have
utilized various filters and kernels (kernel size). By specifying multiple values for the kernel
parameter within a filter, our model can effectively identify patterns that occur at different
scales within an image. The incorporation of multiple kernels also assists in reducing
overfitting and improving the generalization of the model. This is because including filters
with varying kernel sizes compels the network to learn more diverse and robust feature
representations, leading to an improved ability to generalize the model to new images.
The Cairo University ultrasound images dataset was collected in 2018 that consists of
780 images with an average image size of 500 × 500 pixels [36]. The images are categorized
Diagnostics 2023, 13, 1944 16 of 23

into three classes, which are normal (133 images), benign (487 images), and malignant
(210 images). The data collected at baseline include breast ultrasound images among
women in ages between 25 and 75 years old. The number of patients is 600 female patients.
Because the number of images in different classes is unbalanced, this may cause the
model to learn some classes better than others, and this can cause the model to perform
inappropriately during use or testing. To prevent this from happening, we randomly
selected an equal number of images from each class. Before starting the training process
of the model, due to the lack of data, we start the data augmentation process. We resize
Diagnostics 2023, 13, x FOR PEER REVIEW 17 of 24
the dataset using cubic interpolation to fit the input requirements of the model. For
augmentation, we applied width and height shifts of 0.1 and a horizontal flip, which tripled
the size of the dataset. With this technique, we tripled the number of data for each class.
to learn
After some
this classes
process, webetter
split thethan others,
dataset and
into this can
training cause
and test the
sets,model to perform
allocating inappro-
approximately
priately
80% for during
traininguse andor20%
testing. To prevent
for testing. this from
Of course, we happening,
have not used wethese
randomly selected
new images foran
equal number of images from each class. Before starting the training
testing. We maintain the sequence of each image so that every image appears only once inprocess of the model,
due
eachtoofthe
thelack of data, we start
aforementioned sets.the data augmentation process. We resize the dataset using
cubic Figure
interpolation
6 depictstothe
fit accuracy
the inputand requirements
loss diagrams of the model.
for the threeFor augmentation,
proposed models on wetheap-
plied
Cairowidth and height
University ultrasoundshifts of 0.1 and a horizontal flip, which tripled the size of the
dataset.
OnWith
dataset. the Cairo Universitywe
this technique, ultrasound
tripled theimages
number dataset,
of datatheforperformance of thethis
each class. After three
pro-
proposed CNN models in the testing phase is summarized in Table 1. The
cess, we split the dataset into training and test sets, allocating approximately 80% for train- third model,
DNN
ing andR3_R5_R35, whichOf
20% for testing. uses a different
course, sizenot
we have of convolutional
used these new filters, achieves
images the best
for testing. We
accuracy and low loss. It achieved 93% accuracy performance.
maintain the sequence of each image so that every image appears only once in each of the
Table 2 is the
aforementioned resultant confusion matrix of the DNN R3_R5_R35 model. This table
sets.
showsFigure 6 depicts theso
promising results that the and
accuracy proposed model hasfor
loss diagrams correctly
the threediagnosed
proposedthe models
presenceon
or absence of cancer in most cases,
the Cairo University ultrasound dataset. and it has not been able to correctly diagnose only five
cases out of 68 cases.
Proposed DNN R3_R5_R35

Proposed DNN R3_R3_R3

Figure 6. Cont. Proposed DNN R5_R5_R5


Diagnostics 2023, 13, 1944 17 of 23

Proposed DNN R5_R5_R5

Figure 6. Accuracy and loss diagrams for the three proposed models.

Table 1. The newly developed model results on Cairo University ultrasound images dataset
(test part).

Deep-Learning Model Precision Recall F1 -Score Loss Accuracy


Proposed DNN R5_R5_R5 0.88 0.88 0.88 0.2919 0.88
Proposed DNN R3_R3_R3 0.91 0.91 0.91 0.2445 0.91
Proposed DNN R3_R5_R35 0.93 0.93 0.93 0.2103 0.93

Table 2. Confusion matrix for the test dataset. (0 indicating no breast cancer and 1 indicating existing
breast cancer, one of the benign and malignant in the image.).

Predicted
0 1
Actual

0 33 4
1 1 30

As we used some of the features of the state-of-the-art GoogLeNet and ResNet archi-
tectures in the design of the new architecture, we compared the proposed architecture, i.e.,
DNN R3_R5_R35, with these architectures in Table 3. To perform this comparison, we have
utilized GoogLeNet with 22 layers and ResNet with 50 layers for comparison. The results
indicate that ResNet outperformed GoogLeNet in two critical evaluation metrics: accuracy
and F1 score. Inspection of the table containing the results reveals that the proposed method
has surpassed both of these models and yielded a higher detection accuracy than these two
state-of-the-art architectures.
Table 3. Comparison of the proposed model against GoogLeNet and ResNet on the test dataset.

Deep-Learning Model Precision Recall F1 -Score Loss Accuracy


GoogLeNet 0.86 0.87 0.85 0.59 0.87
ResNet50 (Residual Network) 0.87 0.85 0.86 0.51 0.88
DNN R3_R5_R35 0.93 0.93 0.93 0.2103 0.93

The performance of GoogLeNet during the training phase is shown in Figure 7. A


comparison between Figures 6 and 7 demonstrates that there is a suitable performance for
the proposed models and no overfitting compared to GoogLeNet. The DNN R3_R5_R35
model that uses different sizes of convolutional filters achieves the best performance.
Here, we compare the proposed architecture with the state-of-the-art image processing
architectures in terms of prediction accuracy and loss on test data. Table 4 shows these
comparisons. The proposed model is superior to all existing image processing architectures
in terms of prediction accuracy on breast cancer images.
Diagnostics 2023,13,
Diagnostics2023, 13,1944
x FOR PEER REVIEW 19 18
of of2423

GoogleNet

Figure 7.
Figure 7. Accuracy
Accuracy and
and loss
loss diagrams
diagrams for
for the
the GoogLeNet.
GoogLeNet.

Table4.4.Comparison
Table Comparison of the proposed
proposed model
model against
againstnine
ninestate-of-the-art
state-of-the-artimage
imageprocessing
processing models
models
on the
on the Cairo
Cairo University
University ultrasound images dataset.
dataset.

Deep-Learning
Deep-Learning Model Model Loss (%) Loss (%) Accuracy
Accuracy (%)(%)
AlexNet AlexNet 66 66 6969
ZFNet ZFNet 64 64 6969
VGG-16 VGG-16 6 6 7373
Inception v4
Inception v4 46 46 8585
MobileNetMobileNet 55 55 8585
WideResNetWideResNet 39 39 8888
GoogLeNet GoogLeNet 59 59 8787
ResNet34 ResNet34 61 61 8383
ResNet50 ResNet50 51 51 8888
ProposedProposed DNN R3_R5_R35
DNN R3_R5_R35 21 21 9393

We
We also
also applied
applied the
the proposed model to
proposed model to the
the breast
breast histopathology
histopathologyimagesimagesdataset
datasettoto
further
furtherevaluate
evaluateit.it.The
Theoriginal dataset
original consisted
dataset of 162
consisted whole-mount
of 162 whole-mount slideslide
images of breast
images of Co
cancer
breast specimens
cancer specimens at 40×. From
scannedscanned at 40x.that,
From277,524 of size 50of×size
patches patches
that, 277,524 50 were extracted
50 × 50 were
Co
(198,738
extractedIDC-negative and 78,786and
(198,738 IDC-negative IDC-positive). Invasive Ductal
78,786 IDC-positive). Carcinoma
Invasive (IDC) is the
Ductal Carcinoma
most
(IDC) is the most common subtype of all breast cancers. We choose 40,000 imagesfrom
common subtype of all breast cancers. We choose 40,000 images in total the
in total of
dataset: 20,000 random images from both classes. We split the dataset into training and
Co
test sets, allocating 80% for training and 20% for testing. The comparison results of the
Diagnostics 2023, 13, 1944 19 of 23

proposed DNN R3_R5_R35 model against existing approaches are listed in Table 5. The
best results are highlighted in bold. It is evident from the table that the proposed model
has outperformed the other models in the two assessed criteria.

Table 5. Comparative analysis on breast histopathology images dataset.

Ref. Year Method/Model Accuracy (%) F1 -Score (%)


[32] 2023 Transfer learning with VGG16 91 89
[33] 2023 ViT-Patch 32 89 -
[37] 2021 CNN architecture 87 87
[38] 2021 DenseNet121 Model 86 87
[38] 2021 DenseNet169 85 86
[38] 2021 MobileNet 86 85
[38] 2021 ResNet50 84 83
[38] 2021 VGG19 84 80
[38] 2021 VGG16 85 85
[38] 2021 EfficientNetB0 84 84
[38] 2021 EfficientNetB4 82 82
[38] 2021 EfficientNetB5 84 84
[39] 2020 Residual learning-based CNN 84 83
[40] 2020 CNN 85 82
Patch-Based Deep-Learning
[41] 2021 85 85
Modeling
Proposed Study Proposed DNN R3_R5_R35 Model 95 93

The objective of this study was to enhance the accuracy of breast cancer detection
through the application of deep-learning techniques in the development of computer-aided
detection systems. The proposed model, utilizing various filter sizes, demonstrated 93%
and 95% accuracy on two distinct datasets: ultrasound images and breast histopathology
images, respectively. The second goal was to decrease the parameters of the network,
aiming to improve the training time. The time issue during the training process for any
deep-learning model is still challenging and depends on the facilities that be used. Training
the model on ultrasound and histopathology images takes less than two hours and less
than six hours, respectively, which is suitable compared to other DNN models.
A short review above, we can conclude the main findings of this paper as follows:
1. The granular computing technique used in this paper, by breaking down images
into smaller, more granular components, can effectively extract features from images,
allowing for more accurate and efficient image analysis. This leads to increased effi-
ciency by reducing the computational complexity of image analysis tasks. Moreover,
breaking down images into smaller, granular computing can improve the accuracy of
image analysis tasks, leading to more reliable results;
2. Activation functions with learnable parameters offer greater flexibility and adaptabil-
ity compared to traditional activation functions with fixed parameters. This allows
the network to better adapt to different types of data and tasks. These functions can
also improve the flow of gradients through the network during training, making it
easier to optimize the network and reduce the risk of vanishing gradients. Better
regularization is another advantage of these functions. Learnable activation functions
can be used as a form of regularization, helping to prevent overfitting by constraining
the network’s capacity and reducing the risk of memorization;
3. In this study, a range of filters and kernels of varying sizes were employed to effec-
tively identify patterns at multiple scales within an image. By incorporating multiple
kernels within a filter, the network was able to learn diverse and robust feature repre-
sentations, which helped to reduce overfitting and improve the generalization of the
model. This approach enabled the model to consider a wider range of input features,
Diagnostics 2023, 13, 1944 20 of 23

leading to higher accuracy in complex tasks compared with a model that employs a
single filter and kernel. The use of multiple kernels within a filter, therefore, represents
an effective strategy for improving the ability of a neural network to generalize to
new images by facilitating the learning of more sophisticated features across a range
of spatial scales;
4. Utilizing a wide and depth network, shortcut connections, attention layers, auxil-
iary classifiers, and using a learnable activation function improves the accuracy of
diagnosis and consequence and decreases the load on doctors. In addition, using
1 × 1 convolutions reduces time consumption in the model. Compared to existing
breast cancer methods, the proposed model achieves the highest diagnostic accuracy.
There are two potential limitations to the presented model:
1. The extraction of some patterns from the image may be dependent on the granularity
size. In the proposed granulation, the granularity size was set to 32 × 32 pixels,
regardless of the image size. Consequently, some patterns may not be extracted, weak-
ening the effectiveness of granulation. However, the model’s overall performance
demonstrates that the proposed granulation method outperformed state-of-the-art
models for the datasets under consideration;
2. Incorporating granularity in a model requires additional time before the training
process can commence. It is worth noting, however, that once these granules have
been established, they can be reused multiple times.

5. Conclusions and Future Work


By automating the diagnosis process, healthcare professionals can focus on providing
personalized treatment plans for patients. This not only improves patient outcomes but
also reduces healthcare costs by minimizing unnecessary procedures and tests. The use of
machine-learning and deep-learning techniques in breast cancer detection has the potential
to revolutionize the way breast cancer is diagnosed and managed. Incorporating these
tools into healthcare systems has the potential to lower cancer-related mortality rates and
enhance the overall management of breast cancer. This paper proposed a novel granular
computing-based deep-learning model for breast cancer detection, which is evaluated
under ultrasound images and breast histopathology images datasets. The proposed model
has used some effective features of GoogLeNet and ResNet architectures (such as wide
and depth modules, 1 × 1 convolutional filters, auxiliary classifiers, and skip connection)
and has added some new features such as granular computing, activation functions with
learnable parameters, and attention layer to the new architecture. Granular computing
can extract the important features of the image and create a new image with the important
features highlighted before sending it to the training process. This feature makes the model
require fewer images than in the case where granularity is not used. The proposed model
achieved an accuracy improvement compared to state-of-the-art models. In particular,
the deep-learning model based on granular computing exhibited an accuracy of 93% and
95% on two real-world datasets, ultrasound images and breast histopathology images,
respectively. The model delivered promising results on the datasets. The findings may
encourage radiologists and physicians to leverage the model in the early detection of breast
cancer, leading to improved diagnosis accuracy, reduced time consumption, and eased
workload of doctors. It has been confirmed that granular computing has a positive effect
on the performance of problems where the number of available images is small, such as
breast cancer.
For future work, the following items can be performed: (1) The framework can
be further optimized for real-time performance. Taking inspiration from the MobileNet
architecture, e.g., separable convolutions feature, the number of parameters of the proposed
architecture can be reduced so that the accuracy does not decrease much. (2) Instead of
granular computing, one can use fuzzy clustering. This method uses fuzzy logic to group
pixels in an image into clusters based on their similarity. (3) There are usually very few
medical photos. The number of these photos can be increased with techniques such as
Diagnostics 2023, 13, 1944 21 of 23

Sketch2Photo [42] before starting the learning process. This will increase the accuracy of
the model. (4) Over the last 30 years, hyperspectral imagery (HSI) has gained prominence
for its ability to discern anomalies from natural ground objects based on their spectral
characteristics. The importance of HSI has been recognized in a variety of remote sensing
applications, including but not limited to object classification, hyperspectral unmixing,
anomaly detection, and change detection [43]. We can use this technique to identify breast
cancer. (5) The primary challenge in content-based image retrieval (CBIR) systems is the
presence of a semantic gap that must be narrowed for effective retrieval. To address this
issue, various techniques, such as those outlined in [44], can be employed to incorporate
semantic considerations. (6) To reduce the processing time, it is suggested to use distributed
and parallel similarity retrieval techniques, such as [45], on large CT image sequences.
(7) The proposed framework can be extended to include other types of cancer detection,
such as lung or prostate cancer. This would enable the development of a comprehensive
cancer detection system that can be integrated into existing healthcare systems.

Author Contributions: Conceptualization, S.Z. and H.I.; methodology, S.Z. and H.I.; software, S.Z.;
validation, S.Z., H.I. and J.K.; formal analysis, S.Z., H.I. and J.K.; investigation, S.Z., H.I. and J.K;
resources, S.Z., H.I. and J.K; data curation, S.Z., H.I. and J.K; writing—original draft preparation, S.Z.;
writing—review and editing, S.Z., H.I. and J.K.; visualization, S.Z.; supervision, H.I. and J.K; project
administration, H.I. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Publicly available datasets were analyzed in this study. Ultrasound
images dataset: https://www.data-in-brief.com/article/S2352-3409(19)31218-1/fulltext (accessed on
1 September 2022). Breast Histopathology Images: https://www.kaggle.com/datasets/paultimoth
ymooney/breast-histopathology-images (accessed on 1 September 2022.).
Conflicts of Interest: There is no conflict of interest.

References
1. World Health Organization: Breast Cancer Web Site. Available online: https://www.who.int/newsroom/fact-sheets/detail/brea
st-cancer (accessed on 30 September 2022).
2. Mahmood, T.; Li, J.; Pei, Y.; Akhtar, F.; Imran, A.; Rehman, K.U. A brief survey on breast cancer diagnostic with deep learning
schemes using multi-image modalities. IEEE Access 2020, 8, 165779–165809. [CrossRef]
3. Liew, X.Y.; Hameed, N.; Clos, J. A review of computer-aided expert systems for breast cancer diagnosis. Cancers 2021, 13, 2764.
[CrossRef]
4. Almajalid, R.; Shan, J.; Du, Y.; Zhang, M. Development of a deep-learning-based method for breast ultrasound image segmentation.
In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL,
USA, 17–20 December 2018; pp. 1103–1108.
5. Latif, G.; Butt, M.O.; Al Anezi, F.Y.; Alghazo, J. Ultrasound image despeckling and detection of breast cancer using deep CNN. In
Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh,
Vietnam, 14–15 October 2020; pp. 1–5.
6. Zhu, W.; Xiang, X.; Tran, T.D.; Hager, G.D.; Xie, X. Adversarial deep structured nets for mass segmentation from mammograms.
In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA,
4–7 April 2018; pp. 847–850.
7. Al-Antari, M.A.; Al-Masni, M.A.; Choi, M.T.; Han, S.M.; Kim, T.S. A fully integrated computer-aided diagnosis system for digital
X-ray mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inform. 2018, 117, 44–54. [CrossRef]
[PubMed]
8. Yao, J.T.; Vasilakos, A.V.; Pedrycz, W. Granular computing: Perspectives and challenges. IEEE Trans. Cybern. 2013, 43, 1977–1989.
[CrossRef] [PubMed]
9. Lee, J.; Kang, B.J.; Kim, S.H.; Park, G.E. Evaluation of computer-aided detection (CAD) in screening automated breast ultrasound
based on characteristics of CAD marks and false-positive marks. Diagnostics 2022, 12, 583.
10. Cheng, H.D.; Shan, J.; Ju, W.; Guo, Y.; Zhang, L. Automated breast cancer detection and classification using ultrasound images: A
survey. Pattern Recognit. 2010, 43, 299–317. [CrossRef]
Diagnostics 2023, 13, 1944 22 of 23

11. Jiménez-Gaona, Y.; Rodríguez-Álvarez, M.J.; Lakshminarayanan, V. Deep-learning-based computer-aided systems for breast
cancer imaging: A critical review. Appl. Sci. 2020, 10, 8298. [CrossRef]
12. Wang, S.; Huang, J. Breast Lesion Segmentation in Ultrasound Images by CDeep3M. In Proceedings of the 2020 International
Conference on Computer Engineering and Application (ICCEA), Guangzhou, China, 18–20 March 2020; pp. 907–911.
13. Wei, K.; Wang, B.; Saniie, J. Faster Region Convolutional Neural Networks Applied to Ultrasonic Images for Breast Lesion
Detection and Classification. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT),
Romeoville, IL, USA, 31 July–1 August 2020; pp. 171–174.
14. Badawy, S.M.; Mohamed, A.E.N.A.; Hefnawy, A.A.; Zidan, H.E.; GadAllah, M.T.; El-Banby, G.M. Classification of Breast
Ultrasound Images Based on Convolutional Neural Networks—A Comparative Study. In Proceedings of the 2021 International
Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 13–15 July 2021; pp. 1–7.
15. Tang, P.; Yang, X.; Nan, Y.; Xiang, S.; Liang, Q. Feature Pyramid Nonlocal Network With Transform Modal Ensemble Learning for
Breast Tumor Segmentation in Ultrasound Images. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2021, 68, 3549–3559.
16. Xiao, T.; Liu, L.; Li, K.; Qin, W.; Yu, S.; Li, Z. Comparison of transferred deep neural networks in ultrasonic breast masses
discrimination. BioMed Res. Int. 2018, 2018, 4605191. [CrossRef]
17. Uysal, F.; Köse, M.M. Classification of Breast Cancer Ultrasound Images with Deep Learning-Based Models. Eng. Proc. 2022, 31, 8.
18. Ragab, M.; Albukhari, A.; Alyami, J.; Mansour, R.F. Ensemble deep-learning-enabled clinical decision support system for breast
cancer diagnosis and classification on ultrasound images. Biology 2022, 11, 439. [CrossRef]
19. Xing, J.; Chen, C.; Lu, Q.; Cai, X.; Yu, A.; Xu, Y.; Huang, L. Using BI-RADS stratifications as auxiliary information for breast
masses classification in ultrasound images. IEEE J. Biomed. Health Inform. 2020, 25, 2058–2070. [CrossRef]
20. Ragab, D.A.; Sharkas, M.; Marshall, S.; Ren, J. Breast cancer detection using deep convolutional neural networks and support
vector machines. PeerJ 2019, 7, e6201. [CrossRef]
21. Yu, X.; Wang, S.H. Abnormality diagnosis in mammograms by transfer learning based on ResNet18. Fundam. Inform. 2019,
168, 219–230. [CrossRef]
22. Islam, M.M.; Haque, M.R.; Iqbal, H.; Hasan, M.M.; Hasan, M.; Kabir, M.N. Breast cancer prediction: A comparative study using
machine learning techniques. SN Comput. Sci. 2020, 1, 1–14. [CrossRef]
23. Alzubaidi, L.; Al-Shamma, O.; Fadhel, M.A.; Farhan, L.; Zhang, J.; Duan, Y. Optimizing the performance of breast cancer
classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics
2020, 9, 445. [CrossRef]
24. Gao, Y.; Geras, K.J.; Lewin, A.A.; Moy, L. New frontiers: An update on computer-aided diagnosis for breast imaging in the age of
artificial intelligence. AJR Am. J. Roentgenol. 2019, 212, 300. [CrossRef]
25. Yap, M.H.; Pons, G.; Marti, J.; Ganau, S.; Sentis, M.; Zwiggelaar, R.; Marti, R. Automated breast ultrasound lesions detection using
convolutional neural networks. IEEE J. Biomed. Health Inform. 2017, 22, 1218–1226. [CrossRef]
26. Moon, W.K.; Lee, Y.W.; Ke, H.H.; Lee, S.H.; Huang, C.S.; Chang, R.F. Computer-aided diagnosis of breast ultrasound images
using ensemble learning from convolutional neural networks. Comput. Methods Programs Biomed. 2020, 190, 105361. [CrossRef]
27. Lee, Y.W.; Huang, C.S.; Shih, C.C.; Chang, R.F. Axillary lymph node metastasis status prediction of early-stage breast cancer using
convolutional neural networks. Comput. Biol. Med. 2021, 130, 104206. [CrossRef]
28. Sun, Q.; Lin, X.; Zhao, Y.; Li, L.; Yan, K.; Liang, D.; Li, Z.C. Deep learning vs. radiomics for predicting axillary lymph node
metastasis of breast cancer using ultrasound images: Don’t forget the peritumoral region. Front. Oncol. 2020, 10, 53. [CrossRef]
[PubMed]
29. Sun, S.; Mutasa, S.; Liu, M.Z.; Nemer, J.; Sun, M.; Siddique, M.; Ha, R.S. Deep learning prediction of axillary lymph node status
using ultrasound images. Comput. Biol. Med. 2022, 143, 105250. [CrossRef] [PubMed]
30. Ayana, G.; Park, J.; Jeong, J.W.; Choe, S.W. A novel multistage transfer learning for ultrasound breast cancer image classification.
Diagnostics 2022, 12, 135. [CrossRef]
31. Yusoff, M.; Haryanto, T.; Suhartanto, H.; Mustafa, W.A.; Zain, J.M.; Kusmardi, K. Accuracy Analysis of Deep Learning Methods
in Breast Cancer Classification: A Structured Review. Diagnostics 2023, 13, 683. [CrossRef] [PubMed]
32. Hossain, A.A.; Nisha, J.K.; Johora, F. Breast Cancer Classification from Ultrasound Images using VGG16 Model based Transfer
Learning. Int. J. Image Graph. Signal Process. 2023, 13, 12. [CrossRef]
33. Feng, H.; Yang, B.; Wang, J.; Liu, M.; Yin, L.; Zheng, W.; Liu, C. Identifying Malignant Breast Ultrasound Images Using ViT-Patch.
Appl. Sci. 2023, 13, 3489. [CrossRef]
34. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
36. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [CrossRef]
37. Alanazi, S.A.; Kamruzzaman, M.M.; Islam Sarker, M.N.; Alruwaili, M.; Alhwaiti, Y.; Alshammari, N.; Siddiqi, M.H. Boosting
breast cancer detection using convolutional neural network. J. Healthc. Eng. 2021, 2021, 1–11. [CrossRef]
38. Seemendra, A.; Singh, R.; Singh, S. Breast cancer classification using transfer learning. In Evolving Technologies for Computing,
Communication and Smart World: Proceedings of ETCCS; Springer: Singapore, 2020; pp. 425–436.
Diagnostics 2023, 13, 1944 23 of 23

39. Gour, M.; Jain, S.; Sunil Kumar, T. Residual learning based CNN for breast cancer histopathological image classification. Int. J.
Imaging Syst. Technol. 2020, 30, 621–635. [CrossRef]
40. Shahidi, F.; Daud, S.M.; Abas, H.; Ahmad, N.A.; Maarop, N. Breast cancer classification using deep learning approaches and
histopathology image: A comparison study. IEEE Access 2020, 8, 187531–187552. [CrossRef]
41. Hirra, I.; Ahmad, M.; Hussain, A.; Ashraf, M.U.; Saeed, I.A.; Qadri, S.F.; Alfakeeh, A.S. Breast cancer classification from
histopathological images using patch-based deep learning modeling. IEEE Access 2021, 9, 24273–24287. [CrossRef]
42. Liu, H.; Xu, Y.; Chen, F. Sketch2Photo: Synthesizing photo-realistic images from sketches via global contexts. Eng. Appl. Artif.
Intell. 2023, 117, 105608. [CrossRef]
43. Wang, S.; Hu, X.; Sun, J.; Liu, J. Hyperspectral anomaly detection using ensemble and robust collaborative representation. Inf. Sci.
2023, 624, 748–760. [CrossRef]
44. Zhuang, Y.; Chen, S.; Jiang, N.; Hu, H. An Effective WSSENet-Based Similarity Retrieval Method of Large Lung CT Image
Databases. KSII Trans. Internet Inf. Syst. 2022, 16, 7.
45. Zhuang, Y.; Jiang, N.; Xu, Y.; Xiangjie, K.; Kong, X. Progressive Distributed and Parallel Similarity Retrieval of Large CT Image
Sequences in Mobile Telemedicine Networks. Wirel. Commun. Mob. Comput. 2022, 2022, 6458350. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like