Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images

Korkmaz, Merve; Kaplan, Kaplan

doi:10.3390/app15031005

Open AccessArticle

Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images

by

Merve Korkmaz

¹

and

Kaplan Kaplan

^2,*

¹

Department of Computer Engineering, Kocaeli University, 41001 Kocaeli, Turkey

²

Department of Software Engineering, Kocaeli University, 41001 Kocaeli, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1005; https://doi.org/10.3390/app15031005

Submission received: 29 November 2024 / Revised: 29 December 2024 / Accepted: 15 January 2025 / Published: 21 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

The early detection of breast cancer is crucial for both accelerating the treatment process and preventing the spread of cancer. The accuracy of diagnosis is also significantly influenced by the experience of pathologists. Many studies have been conducted on the correct diagnosis of breast cancer to help specialists and increase the accuracy of diagnosis. This study focuses on classifying breast cancer using deep learning models, including pre-trained VGG16, MobileNet, DenseNet201, and a custom-built Convolutional Neural Network (CNN), with the final dense layer optimized via the particle swarm optimization (PSO) algorithm. The Breast Histopathology Images Dataset was used to evaluate the performance of the model, forming two datasets: one with 157,572 images at 50 × 50 × 3 (Experimental Study 1) and another with 1116 images resized to 224 × 224 × 3 (Experimental Study 2). Both original (50 × 50 × 3) and rescaled (224 × 224 × 3) images were tested. The highest success rate was obtained using the custom-built CNN model with an accuracy rate of 93.80% for experimental study 1. The MobileNet model yielded an accuracy of 95.54% for experimental study 2. The experimental results demonstrate that the proposed model exhibits promising, and superior classification accuracy compared to state-of-the-art methods across varying image sizes and dataset volumes.

Keywords:

breast cancer; classification; deep learning; particle swarm optimization; transfer learning

1. Introduction

Breast cancer is a life-threatening disease defined by the uncontrolled division of cells in breast tissue, with the potential to spread to other parts of the body. As observed for all diseases, the early diagnosis of breast cancer can significantly increase the chances of treatment and survival rates. The early diagnosis of breast cancer can improve patient response to treatment and increase the likelihood that the disease will be treated before spreading to other organs. Therefore, the early detection of breast cancer is of great importance. For this purpose, many methods are used to diagnose early-stage breast cancer. The most commonly used methods for the diagnosis of breast cancer include ultrasonography, mammography, magnetic resonance imaging, and biopsy. After ultrasonography, magnetic resonance imaging, and mammography, a definitive diagnosis can be made based on the examination of the biopsy sample from the suspicious tissue or mass detected in the breast tissue. Specialists analyze thin tissue slices using an optical microscope and determine the level of malignant tumor and cancerous area based on cell shape, density, and tissue structure. Because the implementation of all of these procedures takes a long time, it results in a long period of disease diagnosis and creates a disadvantage in the diagnosis process. Because pathological images are obtained in various complex and well-equipped hospitals, these procedures require a specialist physician with extensive knowledge and experience. The accuracy of the diagnosis depends largely on the pathologist’s expertise in examining the sample taken from the breast and making a diagnosis. Artificial intelligence (AI)-supported software is needed to help diagnose breast cancer to prevent misdiagnosis and minimize the workload of specialist doctors. The AI-supported systems currently used to diagnose breast cancer have not yet fully resolved this issue. Consequently, reducing the incidence of cancer, an important problem in human health, is possible through the early diagnosis of cancer and application of appropriate treatment to the diagnosed patient. The early diagnosis of cancer and implementation of the necessary treatment are of great importance for treating cancer and reducing cancer-related mortality rates. To reduce these processes and increase the accuracy of disease diagnosis, many researchers have proposed various deep learning models for the diagnosis and classification of breast cancer [1].

In recent years, deep learning methods, particularly pre-trained models, have received significant interest in medical image analysis due to their ability to leverage these models and reduce the need for large, annotated datasets. These techniques speed up the development of reliable diagnostic tools by enabling the adaptation of well-known architectures, such VGG16, DenseNet, and ResNet, for domain-specific applications. Due to its lightweight design and lower computational complexity, MobileNet has become one of the most effective models among these architectures, making it especially well-suited for medical applications. MobileNet, which was designed using deep separable convolution layers, is well-suited for resource-constrained environments like real-time diagnostics and mobile health applications because it delivers excellent accuracy with fewer parameters. The growing use of MobileNet in healthcare demonstrates its ability to bridge the gap between cutting-edge diagnostic methods and real-world application, providing a viable solution for tasks like illness categorization and detection in radiographic and histopathological imaging with high accuracy [2].

The remaining layout of the presented paper for the breast cancer classification is structured as follows: Section 2 provides a detailed explanation of the state-of-the-art studies developed for breast cancer diagnosis. Section 3 illustrates the materials and methods employed in this study, including dataset details, preprocessing techniques, and the methodologies used. In this section, pre-trained VGG16, MobileNet, DenseNet201, and a custom-built CNN, with the final dense layer optimized via the particle swarm optimization (PSO) algorithm, are defined. Section 4 highlights the experimental results. In this section, two experimental studies are conducted. The first set contains 157,572 images of a 50 × 50 × 3 size, and the second one contains 1116 images of a 224 × 224 × 3 size. For these experiments, a custom-built CNN model and VGG16, MobileNet, and DenseNet201 models were employed. All defined performance metrics and outcomes of the proposed approach are presented. Section 5 elaborates on the discussion, which provides an in-depth analysis of the findings and their implications. Section 6 presents the comparison of the state-of-the-art studies, placing the proposed method in the context of existing research. Finally, Section 7 concludes the paper with a summary of the findings and possible directions for future work.

2. Literature Review

An analysis of the studies in the literature revealed that several investigations have been carried out to identify and categorize breast cancer. Kumari et al. conducted a study in which they used the VGG16, Xception, and DenseNet201 models to classify breast cancer independently of image size. They used BreakHis and IDC datasets for experiments. To make the images suitable for using in transfer learning models, they resized the images to 224 × 224 × 3. In addition, they performed feature extraction using transfer learning methods prior to classification, and classification was performed after this process. The best result among the models considered was obtained using the DenseNet21 model as 99.12% accuracy on the BreakHis dataset and 99.42% accuracy on the IDC dataset [3]. Sharmin et al. presented a new approach that combines deep learning (DL) and machine learning (ML) algorithms to classify breast cancer using breast histopathological images. In this study, the authors extracted the features in the images using the ResNet50V2 model. With these features, breast cancer classification was performed using Decision Tree (DT), Random Forest (RF), Extra Tree (ET), Ada Boosting (AbB), Histogram Gradient Boosting Classifier (HGBC), Gradient Boosting Classifier (GBC), Extreme Boosting Classifier (XGB), and Light Boosting Classifier (LGB) models. These models were tested using 800 breast histopathological images. In all of these studies, the best classification result was obtained using the LGB model with an accuracy of 95% [4]. Joshi et al. classified breast cancer using the BreakHis and IDC datasets. The images in the datasets were augmented using various transformations. In the data preprocessing step, dropout and batch normalization were used as regularization techniques. In addition, they compared the performance of CNN models by customizing and using the EfficientNetB0, ResNet50, and Xception models. In this study, the Xception model provided the best results with an accuracy rate of 93.33% for images with 40× resolution in the BreakHis dataset. The same model gave the best results with an accuracy rate of 88.08% on the IDC dataset [5]. Ali et al. used an imbalanced dataset of breast ultrasound images to classify breast cancer. In this study, data augmentation and data balancing techniques were used to address the problem of imbalanced datasets. Using InceptionV3, ResNet50, and DenseNet121, they obtained classification predictions for each model. They trained the ensemble learning model using these three classification predictions. The ensemble learning model yielded an accuracy rate of 90% [6]. Vedanvita et al. classified breast cancer histopathological images from the IDC dataset using CNNs and ML models. They employed CNN, RF, ET, logistic regression (LR), and gradient boosting (GB) for classification. These models obtained accuracy rates of 80.36%, 71.80%, 73.06%, 66.39%, and 77.29%, respectively [7]. In another study by Talo, the dataset known as BreakHis was used. In this study, the ResNet-50 model was used to classify breast cancer histopathological images as benign or malignant. The approximation ratios of the images in the dataset were 40×, 100×, 200×, and 400×. In this study, Talo applied the proposed model to four approximation ratios and compared the effect of the accuracy rates of the models on the image approximation ratios. As a result of this study, he obtained the highest accuracy value (98.83%) from the group of images with 40 approximation ratios [8]. Karakeci and Talu used Faster RCNN (region-based CNN) and Mask RCNN algorithms to evaluate the success of the analysis of breast histopathological images. They resized the images in the dataset to 256 × 256, 512 × 512, and 1024 × 1024 pixels. They studied various sizes and found that the 256 × 256 size produced the best results [9]. In a study by Dandil et al., they proposed a hybrid system based on feature fusion, BoW (Bag of Words), and Deep Neural Networks (DNN) methods for the binary classification of breast cancer using breast histopathology images. The proposed hybrid method achieved 94.5% and 80.8% accuracy on the training and test datasets, respectively [10]. Khan et al. developed a 2D CNN model with three optimized hidden layers for breast cancer classification. This model achieved 95% accuracy, and a 0.94 F1 score in binary classification between benign and malign cancerous images [11]. Ozdemir proposed a new Fully Convolutional Network (FCN) model for breast cancer classification using a public dataset containing breast ultrasound images. Ozdemir, who also used the U-Net model in his study, compared the performances of these two models. He obtained average results as a result of five experiments and obtained a 0.772 mean IoU, 0.716 precision score, 0.777 recall score, and 0.745 F1 score for the FCN model. In the U-Net model, a 0.764 mean IoU, 0.676 precision score, 0.804 recall score, and 0.730 F1 Score were obtained [12]. Boukaache et al. used two different datasets containing ultrasound images for breast cancer classification. One of these datasets has two classes, and the other dataset has three classes. Using the data augmentation method on these datasets, the authors examined the performance increase in ResNet18 and VGG16 models. The dataset with two classes gave an accuracy rate of 84.5% in the ResNet18 model, and after applying the data augmentation method, the model yielded an accuracy of 88.2%. When the same process was applied to the dataset with the three classes, it gave an 82% accuracy rate and 82.5% accuracy by applying the data augmentation method. In the VGG16 model, the dataset with two classes obtained an accuracy rate of 94.33% before the data augmentation method was applied and an accuracy rate of 97.8% after the data augmentation method was applied. The dataset with the three classes obtained an accuracy rate of 86.5% without the data augmentation and 90% after the data augmentation method [13]. Karagoz et al. conducted a study to reduce false-positive and false-negative rates in breast cancer diagnosis and proposed a DL-based model. Their proposed model provides high accuracy, efficiency, and consistency by using multiple image mammograms. The proposed model, which includes unsupervised feature extraction with a variational autoencoder (VAE) and classification steps with a CNN, demonstrated high performance on both local and public datasets. The proposed model was able to reduce the false positive call rate from 6.13% to 2.61% and achieved an AUC of 0.98 on the INbreast dataset [14]. In this study, Mani et al. optimized the hyperparameters of the pre-trained VGG16 model using the particle swarm optimization (PSO) algorithm to classify breast cancer. After this optimization process, the features of the images in the dataset were extracted using the PSO-optimized VGG16 model. They then trained these features using fully connected layers and a LR classifier to perform breast cancer classification. Their hybrid model yielded 95.7%, 94.1%, 96.9%, and 91.2% accuracy on the BreakHis dataset at 40×, 100×, 200×, and 400× magnifications, respectively [15]. Karakurt and Iseri employed a CNN architecture with three convolutional layers, three ReLu layers, three pooling layers, and one fully connected layer for breast pathology images classification. They tried various filter sizes and filter numbers on this CNN architecture. In addition, to measure the success of the architecture, the F1 score, accuracy, precision, and recall values were expected to approach the value of 1, and the mean squared error (MSE) and mean square approximation (MSA) values were expected to approach the value of 0. The architectural model with the highest success has a 3 × 3 filter size and 32 filters. In this architectural model, the F1 score was 0.8238, accuracy was 0.8787, precision was 0.8381, and recall was 0.8762. Similarly, the MSE and MSA values in this architecture were measured as 0.1195 and 0.2497, respectively [16]. Muhammad et al. presented a new deep feature extraction model for detecting breast cancer subtypes based on a TL model using the BreakHis dataset. As a result of using cubic support vector machines (SVM) as classifiers in the final stage of their proposed DRNet model, this model gave average accuracy rates of 98.61%, 98.04%, 97.68%, and 97.71% at 40×, 100×, 200×, and 400× zoom rates, respectively [17]. Özgür and Keser proposed the Inception-ResNet-V2 model in which Inception and ResNet models are used together to detect breast cancer tumors using DL methods. In this study, they aimed to facilitate classification by determining the location and regional density of the cancerous region by masking and filling the mammography images in the dataset during data pre-processing. They then performed classification in the Inception–ResNet-V2 model. Because of this classification, they obtained a 96.21% accuracy rate, 97.48% recall value, 98.18% precision value, and 97.83% F1 score [18]. Dandil and Serin used breast cancer histopathology images with 40×, 100×, 200×, and 400× zoom ratios to classify breast cancer in their study. They proposed deep learning models supported by pretrained networks. Unlike previous studies in the literature, these studies compared pre-trained models among themselves by making additions such as data duplication and data reduction. The ResNet50, Xception, InceptionV3, and DenseNet201 models were recommended for this study. The Xception model, which uses a 200× zoom ratio, was among the models that produced the best results. The proposed model achieved a success rate of 98.01% [19]. In another study, Erdem and Aydın classified breast cancer images using a dataset known as BreastHis. They improved a new hybrid model called the VIHist network using the VGG16 and InceptionV3 models from the pretrained CNN models. In this hybrid model, they achieved an accuracy of 99.03% by performing classification without manually extracting any features [20]. Bayramoglu et al. conducted a study to classify breast cancer as benign or malignant. Two models were developed with the aim of making the classification independent of the image zoom ratio. While the first model predicted only malignant tissue, the second model simultaneously predicted both malignant tissue and image magnification. These models achieved 83.25% and 80.10% accuracy rates, respectively [21]. Alghodhaifi et al. proposed a scheme to classify breast histopathology images using two architectures, IDCDNet and IDCNet, which are similar to the VGGNet architecture. As a result of this study, they stated that the IDCNet architecture demonstrated the best performance with an accuracy rate of 87.13% [22]. Murali classified breast cancer as IDC (invasive ductal carcinoma) and non-IDC using a dataset comprising breast histopathological images. In this study, the dataset was normalized to include the same number of IDC and non-IDC images to eliminate the imbalance in the dataset. In this study, CNN architecture was trained for 100 epochs, and it reached maximum performance with an accuracy of 87.47% after the 88th epoch. When running it for 100 epochs to validate the CNN architecture he created on the test dataset, he observed that the model gave the highest accuracy rate (89.56% accuracy) after the 94th epoch [23].

In the automatic detection of breast cancer by Cruz-Roa et al., a dataset containing the whole breast images of 162 female patients was used. In this study, researchers compared the feature extraction approach in deep learning, specifically by performing manual feature extraction with a CNN. In this approach, where manual feature extraction was used, the model was evaluated using the F1 score and balanced accuracy ratio. They observed that these two measurement values yielded rates of 71.80% and 84.23%, respectively [24]. Zeng and Zhang used an automatic machine learning (AutoML) method to classify breast cancer histopathological images as positive or negative for IDCs. In this study, they increased the dataset to obtain a total of 397,524 images. While 363,396 of these images were used for the AutoML model, they used 34,128 for the holdout method. They reserved the images for the AutoML allocating 80% for training, 10% testing, and 10% validation. In this experimental study, they obtained an average accuracy of 91.6%. They obtained a balanced accuracy rate of 84.6% in the data used for the holdout method [25]. Chatterjee and Krishna presented a new approach using a dataset comprising breast histopathological images in their IDC breast cancer diagnosis study. Their proposed approach is the deep residual neural network (DRNN). Within the scope of the study, 3000 images in the dataset were selected as IDC positive and 4500 as IDC negative. In the preprocessing step of the dataset, they created a subset and chose different color spaces and color channels from the CIELAB (Commission Internationale de l’Eclairage LAB) color space. They obtained an image matrix with seven channels by combining the four different color channels they selected with the original red–green–blue (RGB) color space to use in model training. They used the Gaussian blur method to remove possible noise from the images. In the last stage, to improve noise, the authors made the dataset suitable for model training using the contrast-limited adaptive histogram equation and contrast-limited adaptive histogram equalization (CLAHE) method. They used 6000 of 7500 images in the dataset, which were suitable for model training and testing. They used 600 of the 6000 images reserved for model training to determine model accuracy. As a result of the study, their proposed DRNN method obtained an accuracy rate of 99.29% [26]. Karatayev et al. used CNN on the classification of IDC breast cancer. They performed classifications on the VGG16, ResNet18, DenseNet, and CancerNet models that were pretrained on the dataset. They compared the classification results of the CNN model they proposed. The proposed CNN model provided better results than the previously trained models with an accuracy of 92% [27]. Sathe et al. classified breast cancer using a CNN on a dataset containing histopathological images of breasts. In this study, the model was tested on 1728 images. As a result of this study, the CNN model obtained an accuracy of 87.64% [28]. Tasnim et al. used transfer learning models to classify breast cancer. The AlexNet, VGG19, InceptionV3, Xception, and GoogLeNet models were considered. A total of 27,800 images with and without IDCs were randomly selected from 277,524 images in the dataset and used for training and testing the models. As a result of this study, the AlexNet, VGG19, InceptionV3, Xception, and GoogLeNet models obtained 96.74%, 94.83%, 92.48%, 90.72%, and 97.80%, respectively [29]. Yilmaz et al. used deep learning models to compare two models in their breast cancer classification study. The deep learning models used in this study were the DenseNet-201 and Xception models. They trained, tested, and validated the models using 31,827 images selected from 277,254 images in the dataset. As a result of this study, the DenseNet-201 and Xception models yielded 96.74% and 96.69% accuracy rates, respectively [30]. Kote et al. classified breast cancer using machine learning and deep learning using a dataset containing breast histopathology images. Unlike other studies, they used two new datasets created by a random selection of 277,524 images to compare the classifications. They referred to an unbalanced class distribution in the first dataset and used a dataset consisting of 65,279 non-IDCs and 24,721 IDC-containing images. They used LeNet, AlexNet, VGG19, VGG16, ResNet50, SVM, and Twin SVM models in their classification task on the unbalanced dataset. They stated that the accuracy rates of these models on the unbalanced dataset were 73%, 79%, 81%, 85%, 88%, 86%, and 73%, respectively. In the second dataset, they preferred a balanced class distribution and therefore used 5547 images, which they randomly selected from the original dataset containing 2788 and 2759 IDCs from two separate classes, respectively. They used LeNet, AlexNet, VGG19, VGG16, ResNet50, SVM, and Twin SVM models, which they previously used on an unbalanced dataset, in their study on a dataset with balanced class distribution. They stated that the accuracy rates of these models on the balanced dataset were 65%, 72%, 73%, 74%, 78%, 76%, and 62%, respectively [31]. The conclusions of this study are as follows: classification using an unbalanced dataset gave better results than classification using a balanced dataset, and the ResNet model gave the best results for both classification methods. Narayanan et al. conducted a breast cancer classification study using a CNN and a dataset of histopathological images. Unlike other studies in the literature, the authors resized the images to be 48 × 48 instead of using their original size. With the data preprocessing step, the images were made suitable for training in the CNN model using the color constancy technique on the images and the histogram equalization method. The model they obtained using CNN with color constancy gave an AUC of 0.935. The CNN model they obtained using the histogram equalization method gave an AUC of 0.876 [32]. Pukale et al. proposed a CNN model for the early detection of breast cancer in a study they conducted using a dataset of breast histopathology images. In the dataset preprocessing step, the images were converted to grayscale and any noise that may exist was removed. As a final step, they performed classification using the CNN model by extracting various features, such as energy, entropy, randomness, correlation, and homogeneity, from the images. The CNN model suggested by the authors gave an accuracy rate of 90% [33]. Chapala and Sujatha used the ResNet model, which is a deep learning method, in their study to classify breast cancer. By selecting random images representing different numbers from the dataset, they divided the dataset into subsets and designated these subsets as training and test data at different rates. For this purpose, they first created a dataset containing 157,572 images, 141,814 of which were IDC-positive images and 15,758 of which were IDC-negative images, and separated the dataset into 90% training and 10% test data. Classification studies were conducted using this model together with the ResNet50 and ResNet34 models. These models obtained 91% and 90% accuracy on this dataset, respectively. They divided the dataset, which consisted of 118,180 IDC-positive images and the remaining 39,392 IDC-negative images, into 75% training and 25% test data. In their classification study, the ResNet50 and ResNet34 classification models obtained 88% and 86% accuracy, respectively. Finally, in their study, they divided a dataset containing 78,786 IDC positive and 78,786 IDC negative images (placing 157,275 images into 50% training and 50% test datasets) and then used them to train the ResNet50 and ResNet34 models. Because of this classification, both models obtained an accuracy rate of 79% [34]. In his study on the diagnosis of breast cancer, Dang included a dataset containing histopathological breast images. In this study, he classified the IDC breast cancer dataset using the MobileNetV2 and EfficientNet models. This resulted in test accuracy rates of 92.35% and 91.02% for the classification models, respectively [35].

In response to findings from recent literature, many studies have investigated the classification of breast cancer using various breast image datasets. There are main features of this study that distinguish it from other studies in the literature. First, this study analyses the effect of datasets of different sizes on the performance of deep learning models. In addition, this study was conducted using two sub-datasets obtained by randomly selecting images from the original dataset as a solution strategy for data imbalance problems in it. This solution is important in terms of applicability because real-world datasets often exhibit such imbalances. This study also emphasizes transfer learning capabilities. In the first dataset, the models were trained using pre-trained networks without changing the dimensions of the images. In the other dataset, classification was performed by resizing the dimensions of the images to match the standard input dimensions of the pretrained networks. This helps us understand how deep learning models can cope with changing image dimensions in the dataset and how they can perform better under certain conditions. This study also addresses the gaps in the literature by focusing on the important differences introduced by the approach of performing classification using the original dimensions (50 × 50 × 3) of images in the dataset with pre-trained networks. Furthermore, this customization strategy was tested on a dataset with a larger number of images, which increases the reliability of the study. This approach suggests that it contributes to a better understanding of dataset sizing processes in the future. Furthermore, it provides a practical and promising solution to data imbalance problems by offering a new perspective on the dataset sizing process. This study clearly demonstrates the potential advantages of data sizing for deep learning models, which is an important finding that can inform future studies. To propose an effective and superior technique, various models were employed for the classification of the Breast Histopathology Images Data Set in this study. Some transfer learning methods, such as VGG16, MobileNet, and DenseNet201, and a custom-built CNN model were used for the classification phase. The custom-built CNN model yielded an accuracy rate of 93.80% in experimental study 1, and the MobileNet model obtained an accuracy rate of 95.54% in experimental study 2.

3. Materials and Methods

In this study, the breast histopathology images dataset was preprocessed by resizing to different image sizes. VGG16, MobileNet, and DenseNet201, as well as a custom-built Convolutional Neural Network (CNN) model were employed for the classification task. As a result, this study investigates how different image sizes and numbers of images affect the performance of deep learning techniques. Finally, a comparison of the proposed model’s performance was conducted. The workflow of the proposed breast cancer detection system is illustrated in Figure S1.

3.1. Breast Histopathology Images Dataset

The dataset used in this study is known in the literature as Breast Histopathology Images [36]. This dataset comprises 162 whole-mount slide images of breast cancer specimens, each scanned at 40× magnification. Within this dataset, there are a total of 277,524 histopathology image patches of breast cancer, categorized into 198,738 patches labeled as negative for invasive ductal carcinoma (IDC) and 78,786 patches labeled as positive for IDC, with each patch sized at 50 × 50 pixels. This folder consists of folders named with the ID number of each patient in the dataset and two subfolders within these folders. The subfolders are named 0 and 1. Folder 0 represents folders with images without IDCs (negative) and folder 1 represents images with IDCs (positive). Figure 1 shows some sample images for positive IDC and Figure 2 shows some sample images for negative IDC.

3.2. Data Preprocessing

During the dataset analysis, it was observed that the number of images in the folders representing class 0 was approximately 2.5 times greater than the number of images in the folders representing class 1, as shown in Figure S2. In addition, it was found that some images belonging to class 0 in the original dataset were smaller than 50 × 50 pixels and were deemed irrelevant. To prevent these smaller and irrelevant images from adversely affecting model performance, they were removed from the dataset. Subsequently, to address the issue of dataset imbalance, a balanced subset was created by randomly selecting an equal number of images from both class 0 and class 1. This resulted in a well-organized dataset comprising 78,786 images from class 0 and 78,786 images from class 1, totaling 157,572 images, all of size 50 × 50 × 3 as in Figure S3. Furthermore, a secondary subset was generated, containing 1116 images in total, randomly selected from this balanced dataset, with 558 images from class 0 and 558 images from class 1. The analyses were performed using these two newly curated subsets.

3.3. Convolutional Neural Network (CNN)

Convolutional Neural Network (CNN) models are a special type of neural network used to process high-dimensional data, such as video and image data. The term “convolution” indicates that this network uses a mathematical operation of the same name. The aspect that differentiates this network from multi-layer artificial neural networks is that it uses convolution. CNNs consist of several layers that are sequential. Each of these layers is designed to perform different operations during model training.

Input layer: The input layer receives input data, (e.g., images, video, text, and audio) of certain sizes to the network.

Convolution layer: The convolution layer follows the input layer and helps extract various feature maps by applying filters of the specified size to the data.

Pooling layer: The pooling layer is generally used to reduce the size of the feature map by selecting the highest or average pixel values within a certain filter size applied to the feature map. This comes after the convolution layer, depending on the preferred pooling operation. The common layer helps reduce the computational load and generalize the network.

Flatten layer: The flatten layer is the most important and last layer before the fully connected layer. This layer transforms the matrices containing information about the feature maps from the previous layers into a one-dimensional tensor and then transfers it to the fully connected layer.

Fully connected layer: The fully connected layer is the last layer of the Convolutional Neural Network. The proposed method takes the input tensor from the flatten layer and provides full coupling between all features. This process is transferred to the output layer for classification or regression.

Batch normalization layer: This layer is typically used after convolutional or dense layers and normalizes the activations in the outputs of these layers. By this mechanism, the model trains faster and becomes more stable.

Dropout layer: This layer, which is typically used after dense layers, temporarily disables some random neurons, thereby reducing overfitting and increasing the generalizability of the model.

Dense layer: In this layer, each neuron (node) in the model is fully connected to all neurons in the previous layer. This layer helps the model draw meaningful conclusions from the received data and is often used in the decision-making phase of the model.

3.4. VGG16

The VGG16 model is part of the VGGNet series, which was developed by a group called the Visual Geometry Group (VGG) at Oxford University in 2014 [37]. This model comprises a total of 16 layers, including 1 input layer, 10 convolution layers, 3 maximum pooling layers, and 2 fully connected layers. The final fully connected layer in the model has 1000 channels and is trained to produce results for 1000 classes in the ImageNet dataset. By default, the image input sizes are 224 × 224 × 3.

3.5. DenseNet201

The DenseNet201 model was introduced in an article published by a group of researchers in 2016 [38]. This model is a 201-layer version of DenseNet, which is a densely connected neural network architecture. DenseNet models provide deeper and more efficient learning by combining feature maps derived from previous layers, unlike other pretrained models. This allows the model to perform better with fewer parameters. The DenseNet model, unlike common neural network architectures, uses dense connections in each layer that are linked to all previous layers. This improves network learning efficiency and increases the depth of the network while reducing overfitting. In addition, unlike pre-trained models, the proposed model employs a global average pooling layer for classification. The last fully connected layer is trained for 1000 classes of the ImageNet dataset with an image input size of 224 × 224 × 3 by default.

3.6. MobileNet

MobileNet, developed by Google researchers in 2017, was proposed to provide lower computational costs and better performance on mobile devices with low hardware specifications. The MobileNet model reduces the number of learnable parameters by using different types of convolution layers, such as depthwise separable 1 × 1 convolutions, to process each input channel. It also uses this convolution to reduce the size of the feature map during feature extraction. MobileNet’s distinctive feature is its use of depthwise separable convolutions, which are essentially standard convolutions divided into depthwise and 1 by 1 pointwise convolutions as can be seen from Figure 3. Every input channel is filtered in depthwise convolution independently. The 1 by 1 pointwise convolution that follows combines all of the depthwise convolution outputs in a linear way. This factorization leads to a significant reduction in both the computing cost and the size of the model. Following each convolution layer in the model, batch normalization and the non-linear activation function ReLU have both been applied. Stride processes are used for downsampling in both the depthwise convolution layers and the first convolution. Then, the average pooling layer, fully connected layer, and softmax classifier are added. The basic MobileNet contains a total of 28 layers, including depthwise and pointwise convolution layers [39]. The last fully connected layer is trained for 1000 classes of the ImageNet dataset with default image input dimensions of 224 × 224 × 3. MobileNet detailed architecture is given as in Table S1.

3.7. Particle Swarm Optimization

Particle swarm optimization (PSO) is a population-based stochastic optimization technique inspired by the social behavior of bird flocking. In PSO, each candidate solution is represented as a particle with a velocity navigating through the problem space, akin to a flock of birds in search of food. Each particle adjusts its position by integrating elements of its own historical best position and current position, along with those of neighboring particles, to determine its next movement within the search space. It optimizes a problem by iteratively improving a candidate solution relative to a certain quality measure. PSO has gained significant popularity due to its performance across applications, particularly because of its relatively few parameters to tune and its ability to exhibit emergent behaviors. However, a key limitation is its slow convergence rate in high-dimensional search spaces, often failing to reach the global optimum. This shortcoming is not only due to local optima but also the potential constraint of particle velocities, which can limit exploration to sub-regions of the overall search space [40,41].

4. Experimental Results

Experimental studies within the scope of this research were conducted using two newly curated datasets derived from the original dataset obtained from the Kaggle platform. Two experimental studies are conducted, one with a dataset consisting of 157,572 images of 50 × 50 × 3 dimensions and the other with 1116 images of 50 × 50 × 3 dimensions. Using deep learning methods, a custom-built CNN model and VGG16, MobileNet, and DenseNet201 models were used to classify the datasets.

For a fair classification process, the same CNN architecture was employed in both experimental studies as seen in Figure 4. Two consecutive convolutional (Conv2D) layers are employed, followed by a max pooling (MaxPool2D) layer, which reduces the dimensionality of the feature map by half, thereby enhancing the density of critical features. Then, to ensure stability during the training process, batch normalization is applied post-convolution, normalizing the distribution of data from the preceding layers. To reduce the risk of overfitting and improve the model’s generalizability, a dropout layer was added, randomly deactivating neurons during training. Subsequently, a second block of convolutional and pooling layers is implemented. The resulting deep features are then flattened by a flatten layer. The model architecture continues with the addition of five fully connected (dense) layers, each followed by a batch normalization and dropout layer to maintain robust learning and reduce the risk of overfitting. In particular, the weight of the final dense layer is optimized using the PSO algorithm. This process begins by extracting and flattening the current weights of the model’s final dense layer, and then the PSO algorithm optimizes these weights.

Regularization techniques such as dropout and batch normalization were incorporated in the architecture of the custom-built CNN model and in the customized layers of the transfer learning models. This method aimed to increase the generalization ability of the models. Dropout helped reduce overfitting by randomly deactivating neurons during training, while batch normalization ensured more stable learning by normalizing intermediate outputs. In this way, excessive dependencies were prevented during the learning process of the model. In addition, the training process was terminated by applying the early stopping technique when the validation loss did not improve for 10 epochs. These strategies were effectively used to optimize the training process of the model and minimize overfitting. By using this strategy, the training process was focused on periods that contributed to meaningful learning while avoiding overfitting to the training dataset. Specifically, early stopping restored the model’s weights from the epoch where the validation loss was minimal. This allowed us to use the most generalizable version of the model rather than one trained for unnecessary additional epochs. This approach was particularly beneficial given computational constraints, as it optimized resource usage while maintaining robust model performance.

All experimental studies conducted as part of this research were performed on a computer equipped with an 11th generation Intel^® Core™ i7-11800H processor, 16 GB of RAM, and an NVIDIA GeForce RTX 3060 6 GB graphics card. The Anaconda platform (version 23.1.0) was used in the experiments, and the Jupyter Notebook environment in Anaconda was employed for conducting the analyses. The programming language used was Python 3.9.16. The primary libraries used in the experiments were scikit-learn 1.2.0, CUDA Toolkit 11.3.1, cuDNN 8.2.1, Keras 2.6.0, and TensorFlow (including TensorFlow-GPU) 2.6.0.

In this section, the results of the deep learning models implemented within the scope of this study are shown. The results of the deep learning models trained using 157,575 data samples with dimensions of 50 × 50 × 3 are presented under the title “Experimental Study 1”. On the other hand, the results of the deep learning models using 1116 images with dimensions of 224 × 224 × 3 are provided under the title “Experimental Study 2”.

4.1. Experimental Study 1

In this section, the results obtained from classifying the dataset consisting of 157,572 images, each with dimensions of 50 × 50 × 3, using deep learning methods are presented. For the deep learning methods, an original CNN model using particle swarm optimization in the final classification layer and customized pretrained VGG16, MobileNet, and DenseNet201 models were used. The results obtained by these models are explained in detail. A total of 80% of the dataset was used as training data, 10% as test data, and 10% as validation data.

4.1.1. Results Obtained Using the Custom-Built CNN Model

The proposed model architecture processes input images of 50 × 50 × 3 pixels with three color channels (RGB). The model was compiled using the binary cross entropy loss function (BCE) and trained using the Adam optimization algorithm. The BCE loss function aims to minimize the error by measuring the difference between the probabilities predicted by the model and the actual labels (IDC (1): cancer and IDC (0): non-cancer). BCE, which is compatible with the sigmoid activation function, is a standard method, especially in binary classification problems, and plays an effective role in increasing the prediction accuracy of the model. Training was conducted with 50 epochs, a learning rate of 0.001, and a batch size of 200. To prevent overfitting, an early stopping mechanism was employed, halting the training process if no improvement in validation error was observed over 10 consecutive epochs. The model achieved an accuracy of 93.80%, a precision of 84.20%, a recall of 74.76%, and an F1 score of 79.20%. On the validation dataset, the model yielded an accuracy of 93.69%. Figure 5 shows the accuracy and loss graphs of the model, respectively.

4.1.2. Results Obtained Using the VGG16 Model

The VGG16 model architecture, which by default accepts input images of size 224 × 224 × 3, was customized and retrained to meet the specific requirements of this study. The image input size in the VGG16 architecture was modified to 50 × 50 × 3. During model training, the previous layers were initialized with pre-trained weights from the ImageNet dataset, which allowed these layers to retain their learned parameters. In contrast, only the final layers were trained. Subsequently, additional layers are appended to the model by accessing its output layer. In the output layer of the model, the sigmoid activation function is employed to calculate the probabilities of two classes, with the BCE loss function preferred in the training process. The model was trained with the following parameters: Adam optimization algorithm, 0.001 learning rate, 50 epochs, and 200 batch size. These settings enabled the model to classify correctly by effectively optimizing the loss function. The model achieved 92.99% accuracy, 81.78% precision, 68.97% recall, and an F1 score of 74.83%. In addition, the model achieved an accuracy of 93.03% on the validation dataset. Figure 6 presents the accuracy and loss graph of the VGG16 model developed within the framework of experimental study 1.

4.1.3. Results Obtained Using the MobileNet Model

The MobileNet model architecture accepts the size of image input as 224 × 224 × 3 by default. Because the image dimensions in the dataset used in this study are 50 × 50 × 3, the MobileNet architecture was customized and retrained to meet the specific requirements of this study. Therefore, the image input size in the MobileNet architecture was adjusted to 50 × 50 × 3. The classification layer of the model was replaced by a two-output sigmoid activation function. In addition, the weights of the model were initialized randomly, and the model was then retrained on the dataset. The model parameters were set as follows: BCE loss function, Adam optimizer, a learning rate of 0.001, 50 epochs, and a batch size of 200. The model yielded an accuracy of 92.75%, precision of 84.61%, recall of 65.04%, and F1 score of 73.55%. Figure 7 illustrates the accuracy and loss graph of the MobileNet model developed for experimental study 1.

4.1.4. Results Obtained Using the DenseNet201 Model

The default image input size of the DenseNet201 model architecture is 224 × 224 × 3. Because the image dimensions in the dataset used in this study were 50 × 50 × 3, the DenseNet201 architecture was customized and retrained. The image input size in the DenseNet201 architecture was rearranged to 50 × 50 × 3. The classification layer of the model was replaced by a two-output sigmoid activation function. In addition, the weights of the model were initialized randomly, and the model was trained again on the dataset. The model parameters were configured as follows: Adam optimizer, BCE loss function, a learning rate of 0.001, 50 epochs, and a batch size of 200. The model trained for 50 epochs achieved an accuracy of 90.17%, precision of 67.20%, recall of 70.69%, and F1 score of 68.90%. Figure 8 demonstrates the accuracy and loss graph of the DenseNet201 model developed for experimental study 1.

4.2. Experimental Study 2

In this section, the results obtained as a result of resizing the images (1116 images) from 50 × 50 × 3 dimensions to 224 × 224 × 3 dimensions and classification results using the proposed methods are presented. The deep learning methods were a custom-built CNN model and a customized pretrained VGG16, MobileNet, and DenseNet201 model. In the training stage of the models, 80% of the dataset was used as training data, 10% for testing, and 10% for validation.

4.2.1. Results Obtained Using the Custom-Built CNN Model

In this study, a custom-built CNN model was employed to classify breast cancer images. The model architecture accepts the images with 224 × 224 pixels and RGB input. The developed model was compiled using the BCE loss function with the Adam optimizer and trained with a batch size of 32 samples, and a learning rate of 0.001 over 50 epochs. To avoid overfitting, an early stopping strategy was used to automatically stop training if no validation error improvement was seen for 10 successive epochs. The model trained over 50 epochs achieved an accuracy of 92.86%, precision of 90%, recall of 56.25%, and F1 score of 69.23%. Furthermore, the proposed model reached an accuracy rate of 87.50% on the validation dataset. Figure 9 highlights the trend of the accuracy and loss graph of the model.

4.2.2. Results Obtained Using the VGG16 Model

The VGG16 model architecture accepts the image input size as 224 × 224 × 3 by default. Therefore, the dimensions and channels of all images in the dataset were resized from 50 × 50 × 3 to 224 × 224 × 3. In model training, the previous layers were initialized with weights from the ImageNet dataset to maintain the pre-trained weights, and only the last layers were trained. Subsequently, new layers were added by accessing the output layer of the model. The sigmoid activation function is used in the last output layer. The model parameters were set as follows: BCE loss function, Adam optimizer, a learning rate of 0.001, 50 epochs, and a batch size of eight. The model trained for 50 epochs achieved an accuracy of 86.60%, a precision of 75%, a recall of 42.86%, and an F1 score of 54.54%. The model also yielded an accuracy rate of 90.18% on the validation dataset. Figure 10 depicts the accuracy and loss function graph of the VGG16 model for experimental study 2.

4.2.3. Results Obtained Using the MobileNet Model

The MobileNet model is typically designed to operate on input images with a resolution of 224 × 224 × 3. For this reason, the dimensions of all images in the dataset were resized from 50 × 50 × 3 to 224 × 224 × 3. The classification layer of the model was replaced by a two-output sigmoid activation function. In addition, the weights of the model were initialized randomly, and the model was then retrained on the dataset. The model parameters were set as follows: Adam optimizer, BCE loss function, a learning rate of 0.001, 50 epochs, and a batch size of 32. The model trained for 50 epochs yielded an accuracy rate of 95.54%, precision of 90.90%, recall of 71.43%, and an F1 score of 80%. The accuracy graph and loss function graph of the MobileNet model developed for experimental study 2 are depicted in Figure 11.

4.2.4. Results Obtained Using the DenseNet201 Model

The DenseNet201 model is typically developed to run with an input image resolution of 224 × 224 × 3. Therefore, all images in the dataset were resized from 50 × 50 × 3 to 224 × 224 × 3. The classification layer of the model was replaced by a two-output sigmoid activation function. In addition, the weights of the model were initialized randomly, and the model was then retrained on the dataset. The model parameters were set as follows: BCE loss function, Adam optimizer, a learning rate of 0.001, 50 epochs, and a batch size of eight. The model trained for 50 epochs achieved an accuracy of 92.86%, precision of 83.33%, recall of 62.5%, and F1 score of 71.43%. Figure 12 shows the accuracy and loss graphs of the DenseNet201 model developed for experimental study 2, respectively.

5. Discussion

The proposed scheme offers several advantages and demonstrates unique contributions to the field of breast cancer diagnosis. By employing a dual-dataset approach, this study evaluates the impact of dataset size and image resolution on model performance, providing valuable insights into the influence of data characteristics. The integration of particle swarm optimization (PSO) into the dense layer of the custom-built CNN model represents a novel methodological advancement, significantly enhancing classification accuracy and computational efficiency. In this study, model performances were evaluated using two datasets. The first dataset contains 157,572 images with 50 × 50 × 3 dimensions, while the second dataset contains 1116 images with 224 × 224 × 3 dimensions. Training was performed on both datasets, and the results were compared. The results obtained in experimental study 1, utilizing a dataset of 157,572 images with dimensions of 50 × 50 × 3, demonstrate that all employed deep learning models achieved superior accuracy. When the performance metrics of the model were analyzed, it was observed that the custom-built CNN model had the highest accuracy, precision, recall, and F1 score value among all of the other models. This model yielded 93.80% accuracy for the experimental study 1. The analysis of the results from experimental study 2, which utilized a dataset of 1116 images with dimensions of 224 × 224 × 3, reveals that all models achieved high accuracy rates, similar to the rates obtained in experimental study 1. Among the evaluated models, the MobileNet model obtained an accuracy rate of 95.54% for the experimental study 2. Moreover, the use of a smaller dataset in this study influenced the precision, recall, and F1 score values of the models. The results of experimental study 1 exhibited consistent accuracy, precision, recall, and F1 score values across all models; however, experimental study 2, which employed a smaller dataset, showed a decrease in precision, recall, and F1 scores for all models except the MobileNet model. Despite achieving high accuracy and precision with limited data, these models exhibited low recall values. Consequently, it was observed that the dataset size plays an important role in the evaluation of model performance. Models trained with 50 × 50 × 3 dimensions generally exhibited higher accuracy rates, with more balanced and elevated precision, recall, and F1 scores compared to the results of experimental study 2. It was observed that models trained with larger-sized images (224 × 224 × 3) generally achieved higher precision and demonstrated better overall performance. However, other metrics such as recall, and F1 score, were lower, likely due to the smaller dataset size.

In addition, the analysis of the false-negative and false-positive classifications by the CNN model using heatmap histogram graphs revealed that activation values were low in some images, while others showed a wider range. This indicates that the model failed to exhibit sufficient activation in certain cancerous regions and focused on irrelevant areas in other samples. Similarly, the analysis of the heatmap histogram graphs for images in which the model made false positive classifications revealed a wide range of activation values. This indicates that the model generated high activation values in some regions, which led to their incorrect classification as cancerous. It is concluded that these issues may arise because the images in the dataset are patched from 162 whole-mount slide images. This is because the limited size of the patch images significantly affected the model’s ability to grasp a wider context and distinguish healthy tissues accurately. Figure 13 shows the images classified as false negatives (FN) and their corresponding Grad-CAM heatmap histogram plots. Similarly, Figure 14 contains the images classified as false positive (FP) and the corresponding Grad-CAM heatmap histograms. The original images shown in both figures are in color (RGB) format, representing histopathology images. The histograms in these figures represent the distribution of activation values (between zero and one) in the Grad-CAM heat maps, not the pixel intensity distribution of the original RGB images. The Grad-CAM heatmaps visualize the image regions that the model finds important during classification, making it possible to understand the model’s decision mechanism and analyze the factors that cause classification errors. These visualizations play a critical role in evaluating the model’s behavior and identifying areas for improvement. In addition, the results of the comparison of the models are presented in Table 1 and Table 2, together with their computational complexities and training times.

As shown in Table 1, the custom-built CNN yielded the most efficient model in terms of both best performance and low computational cost by providing the highest accuracy rate of 93.80%, the lowest floating-point operations per second (FLOPS) value with 0.0889 G FLOPS, and the shortest training time. On the other hand, the VGG16 and MobileNet models gave similar results in terms of accuracy rates, and while the VGG16 model required more computation with a higher FLOPS value, the MobileNet model offered similar performance with a lower FLOPS. The short training time and high accuracy of the custom-built CNN model make it the most effective model. In terms of performance and cost, the CNN model has the best performance to cost ratio with high accuracy and low computational costs. The MobileNet model is an efficient alternative with a fast-training time and low FLOPS; however, its accuracy is lower than that of the CNN model. In conclusion, the custom-built CNN model demonstrated superiority over transfer learning models in terms of performance and cost-effectiveness.

Table 2 presents the results of the deep learning models obtained from experimental study 2. As shown in Table 2, the MobileNet model demonstrated the highest performance with an accuracy rate of 95.54%. This model has low computational costs with a FLOPS value of 1.15 G and a training time of about 4 min, making it the most efficient model. The custom-built CNN model, which followed the MobileNet model, achieved an accuracy of 92.86%, a FLOPS value of 2.34 G, and a training time of approximately 2 min. The proposed model offers a combination of high accuracy, low computational cost, and short training time, making it an ideal option for large datasets and scenarios with limited computational resources. The DenseNet201 model provides an intermediate balance with 92.86% accuracy, 8.63 G FLOPS, and approximately 10 min of training time, whereas the VGG16 model, with 30.8 G FLOPS and approximately 3 min of training time, requires high computational cost and delivers the lowest performance with 86.60% accuracy. In addition, in Figure 15, a comparison of experimental study 1 and experimental study 2 results is presented.

6. Comparison of State-of-the-Art Studies

The classifications of breast cancer are compared in Table 3. It can be seen that the accuracy of the proposed model is superior to that of other studies in the literature. This comparison reveals that the deep learning models employed in this study yielded superior results compared to those reported in existing literature.

7. Conclusions

As in many medical decision support systems, deep learning methods have begun to play a crucial role in diagnosing and the early detection of breast cancer. Breast cancer is one of the most diagnosed and deadliest cancer types in women. The early diagnosis of breast cancer is important for its treatment. In this study, the VGG16, MobileNet, and DenseNet201 models and a custom-built CNN model were employed to classify breast cancer. To enhance the capability of the custom-built CNN model, the weight of the final dense layer was optimized using the PSO algorithm. The Breast Histopathology Images Data Set was used to calculate model performance. This dataset was used to form two distinct datasets: one with 157,572 images of size 50 × 50 × 3 (experimental study 1) and another with 1116 images of size 224 × 224 × 3 (experimental study 2). This analysis aimed to determine how variations in image dimensions, dataset size, and model architecture affect classification results. Results from experimental study 1, using a larger dataset of 50 × 50 × 3 images, revealed that all deep learning models achieved high accuracy. In the customized transfer learning models applied within deep learning methods, the original dimensions of the images in the dataset were used as input instead of the default image input sizes. The most efficient model in terms of performance and low computational cost was the custom-built CNN model, which had the fastest training time, the lowest FLOPS value (0.0889 G), and the highest accuracy rate (93.80%). The proposed model, with its high performance and low computational cost, was particularly notable for its balanced metrics across accuracy, precision, recall, and F1 score. In experimental study 2, where a smaller balanced dataset of 1116 images at 224 × 224 × 3 dimensions was used, all models also achieved high accuracy. Among these models, MobileNet performed exceptionally well, achieving the highest accuracy of 95.54%. The custom-built CNN model achieved an accuracy of 92.86%, demonstrating a favorable balance between high performance, low computational cost, and shorter training time. These results also demonstrate that the proposed methods are not prone to overfitting and support these findings. In addition, the study revealed that while models trained with the larger dataset (224 × 224 × 3) achieved higher precision and overall performance, they suffered from lower recall and F1 scores compared to models trained with the smaller dataset (50 × 50 × 3), highlighting the significant impact of dataset size on model performance.

In future studies, many mammography images of the same patients can be included together with breast cancer histopathology images collected from new patients. The datasets created from both types of images can be used cooperatively to diagnose cancer and its variants using deep learning methods, which depend on the diversity of the data. In addition, the models’ high accuracy in this study indicates their potential for clinical use; nevertheless, more research is necessary to assess how well they integrate into real-world diagnostic procedures. Future studies may focus on evaluating the real-time performance of the models and their adaptability to various clinical settings to ensure practical utility in assisting healthcare professionals, so that long-term losses can be prevented in the process of diagnosing the disease or determining the type of cancer.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app15031005/s1, Figure S1: Workflow diagram of the proposed study; Figure S2: Distribution of images belonging to each class in the dataset; Figure S3: (a) Class distribution of images in the dataset used in experimental study 1 and (b) class distribution of images in the dataset used in experimental study 2; Table S1: MobileNet detailed architecture.

Author Contributions

Conceptualization, K.K.; methodology, M.K. and K.K.; software, M.K.; validation, M.K. and K.K; formal analysis, K.K.; investigation, M.K.; resources, M.K.; data curation, M.K.; writing—original draft preparation, M.K.; writing—review and editing, M.K. and K.K.; visualization, M.K. and K.K.; supervision, K.K.; project administration, M.K. and K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

This article originated from Merve Korkmaz’s Master’s thesis titled “Comparative Classification of Breast Cancer with Machine Learning and Deep Learning Techniques”, supervised by Kaplan Kaplan at Kocaeli University, Institute of Natural Sciences, Department of Computer Engineering.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Salunkhe, P.B.; Patil, P.S. Rapid tri-net: Breast cancer classification from histology images using rapid tri-attention network. Multimed. Tools Appl. 2024, 83, 74625–74655. [Google Scholar] [CrossRef]
Jenefa, A.; Lincy, A.; Naveen, V.E. A framework for breast cancer diagnostics based on MobileNetV2 and LSTM-based deep learning. In Computational Intelligence and Modelling Techniques for Disease Detection in Mammogram Images; Academic Press: Cambridge, MA, USA, 2024; pp. 91–110. [Google Scholar]
Kumari, V.; Ghosh, R. A Magnification-Independent Method for Breast Cancer Classification Using Transfer Learning. Healthc. Anal. 2023, 3, 100207. [Google Scholar] [CrossRef]
Sharmin, S.; Ahammad, T.; Talukder, M.A.; Ghose, P. A Hybrid Dependable Deep Feature Extraction and Ensemble-Based Machine Learning Approach for Breast Cancer Detection. IEEE Access 2023, 11, 87694–87708. [Google Scholar] [CrossRef]
Joshi, S.A.; Bongale, A.M.; Olsson, P.O.; Urolagin, S.; Dharrao, D.; Bongale, A. Enhanced Pre-Trained Xception Model Transfer Learned for Breast Cancer Detection. Computation 2023, 11, 17. [Google Scholar] [CrossRef]
Ali, M.D.; Ahmed, M.; Hasan, S.; Siddiquee, M.; Rahman, M.; Rahman, T. Breast Cancer Classification through Meta-Learning Ensemble Technique Using Convolution Neural Networks. Diagnostics 2023, 13, 2242. [Google Scholar] [CrossRef]
Vedanvita, G.; Racha Ganesh, S. Breast Cancer Classification Using Deep Convolutional Neural Networks. CVR J. Sci. Technol. 2022, 23, 52–57. [Google Scholar] [CrossRef]
Dandıl, E.; Selvi, A.O.; Çevik, K.K.; Yıldırım, M.S.; Uzun, S. A Hybrid Method Based on Feature Fusion for Breast Cancer Classification Using Histopathological Images. Eur. J. Sci. Technol. 2021, 29, 129–137. [Google Scholar] [CrossRef]
Karakeçi, Z.B.; Talu, M.F. Cancer Detection and Location Methods in Histopathological Images. Eur. J. Sci. Technol. 2021, 23, 608–616. [Google Scholar] [CrossRef]
Talo, M. Classification of Histopathological Breast Cancer Images Using Convolutional Neural Networks. Fırat Univ. J. Eng. Sci. 2019, 31, 391–398. [Google Scholar]
Khan, S.R.; Raza, A.; Meeran, M.T.; Bilhaj, U. Enhancing Breast Cancer Detection Through Thermal Imaging and Customized 2D CNN Classifiers. VFAST Trans. Softw. Eng. 2023, 11, 80–92. [Google Scholar] [CrossRef]
Özdemir, C. A New FCN Model for Cancer Cell Segmentation in Breast Ultrasound Images. Afyon Kocatepe Univ. J. Sci. Eng. 2023, 23, 1160–1170. [Google Scholar] [CrossRef]
Boukaache, A.; Nasser Edinne, B.; Boudjehem, D. Breast Cancer Image Classification Using Convolutional Neural Networks (CNN) Models. Int. J. Inform. Appl. Math. 2024, 6, 20–34. [Google Scholar] [CrossRef]
Karagöz, M.A.; Demirci, A.; Yıldız, E.; Güler, B. Deep Learning-Based Breast Cancer Diagnosis with Multiview of Mammography Screening to Reduce False Positive Recall Rate. Turk. J. Electr. Eng. Comput. Sci. 2024, 32, 382–402. [Google Scholar] [CrossRef]
Chandana Mani, R.K.; Kamalakannan, J.; Pandu Rangaiah, Y.; Anand, S. A Bio-Inspired Method for Breast Histopathology Image Classification Using Transfer Learning. J. Artif. Intell. Technol. 2023, 3, 89–101. [Google Scholar] [CrossRef]
Karakurt, M.; İşeri, İ. Classification of Pathology Images with Deep Learning Methods. Eur. J. Sci. Technol. 2022, 33, 192–206. [Google Scholar] [CrossRef]
Muhammad, B.; Ozkaynak, F.; Varol, A.; Tuncer, T. A Novel Deep Feature Extraction Engineering for Subtypes of Breast Cancer Diagnosis: A Transfer Learning Approach. In Proceedings of the 10th International Symposium on Digital Forensics and Security (ISDFS), Istanbul, Turkey, 6–7 July 2022. [Google Scholar] [CrossRef]
Özgür, S.N.; Keser, S.B. Classification of Breast Cancer Tumors with Deep Learning Algorithms. Turk. J. Nat. Sci. 2021, 10, 212–222. [Google Scholar] [CrossRef]
Dandil, E.; Serin, Z. Breast Cancer Detection on Histopathological Images Using Deep Neural Networks. Eur. J. Sci. Technol. 2020, 451–463. [Google Scholar] [CrossRef]
Erdem, E.; Aydın, T. Breast Cancer Histopathological Image Classification. J. Inf. Technol. 2021, 14, 87–94. [Google Scholar] [CrossRef]
Bayramoglu, N.; Kannala, J.; Heikkila, J. Deep Learning for Magnification Independent Breast Cancer Histopathology Image Classification. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2440–2445. [Google Scholar] [CrossRef]
Alghodhaifi, H.; Alghodhaifi, A.; Alghodhaifi, M. Predicting Invasive Ductal Carcinoma in Breast Histology Images Using Convolutional Neural Network. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 15–19 July 2019; pp. 374–378. [Google Scholar] [CrossRef]
Murali, A. Non-Invasive, Early Detection of Invasive Ductal Carcinoma (IDC) via Deep Convolutional Neural Networks Using Breast Cancer Histology Images. Int. J. Sci. Eng. Res. 2019, 10, 1788–1795. [Google Scholar]
Cruz-Roa, A.; Gilmore, H.; Basavanhally, A.; Feldman, M.; Ganesan, S.; Shih, N.; Tomaszewski, J.; Madabhushi, A. Automatic Detection of Invasive Ductal Carcinoma in Whole Slide Images with Convolutional Neural Networks. In Medical Imaging 2014: Digital Pathology; SPIE: San Diego, CA, USA, 2014. [Google Scholar] [CrossRef]
Zeng, Y.; Zhang, J. A Machine Learning Model for Detecting Invasive Ductal Carcinoma with Google Cloud AutoML Vision. Comput. Biol. Med. 2020, 122, 103861. [Google Scholar] [CrossRef]
Chatterjee, C.C.; Krishna, G. A Novel Method for IDC Prediction in Breast Cancer Histopathology Images Using Deep Residual Neural Networks. In Proceedings of the 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), Jaipur, India, 28–29 September 2019; pp. 95–100. [Google Scholar] [CrossRef]
Karatayev, M.; Khalyk, S.; Adai, S.; Lee, M.H.; Demirci, M.F. Breast Cancer Histopathology Image Classification Using CNN. In Proceedings of the 16th International Conference on Electronics Computer and Computation (ICECCO), Kaskelen, Kazakhstan, 25–26 November 2021. [Google Scholar] [CrossRef]
Pushkar Sathe, A.P.; Bombay, M.; Mani, G.S.; Kalathil, D. Cancer Detection Using Machine Learning. Int. Res. J. Eng. Technol. 2020, 7, 399–406. [Google Scholar] [CrossRef]
Tasnim, Z.; Hassan, F.; Ahmed, M.; Mahfuzur Rahman, A. Classification of Breast Cancer Cell Images Using Multiple Convolution Neural Network Architectures. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 308–315. [Google Scholar] [CrossRef]
Yilmaz, F.; Kose, O.; Demir, A. Comparison of Two Different Deep Learning Architectures on Breast Cancer. In Proceedings of the 2019 Medical Technologies Congress (TIPTEKNO), Izmir, Turkey, 3–5 October 2019; pp. 39–42. [Google Scholar] [CrossRef]
Kote, S.; Agarwal, S.; Kodipalli, A.; Martis, R.J. Comparative Study of Classification of Histopathological Images. In Proceedings of the 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Op-timization Techniques (ICEECCOT), Mysuru, India, 10–11 December 2021; pp. 156–160. [Google Scholar] [CrossRef]
Narayanan, B.N.; Krishnaraja, V.; Ali, R. Convolutional Neural Network for Classification of Histopathology Images for Breast Cancer Detection. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 15–19 July 2019; pp. 291–295. [Google Scholar] [CrossRef]
Pukale, P.D.D.; Kale, D.; Jadhav, R.; Gaikwad, A.; Hadke, A. Deep Learning for Early Detection of Breast Cancer Using Histopathological Images. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 1115–1119. [Google Scholar] [CrossRef]
Chapala, H.R.; Sujatha, B. ResNet: Detection of Invasive Ductal Carcinoma in Breast Histopathology Images Using Deep Learning. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; pp. 60–67. [Google Scholar] [CrossRef]
Dang, A.Q. Cancer Prediction Using Machine Learning Algorithms; Computer Science Senior Capstone 2020; Earlham College: Richmond, Indiana, 2020; Available online: https://portfolios.cs.earlham.edu/wp-content/uploads/2020/05/aqdang16_paper_final.pdf (accessed on 5 January 2023).
Breast Histopathology Images Dataset. Kaggle. 2017. Available online: https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images (accessed on 9 October 2024).
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2017, 4700, 4700–4708. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Pawan, Y.N.; Prakash, K.B.; Chowdhury, S.; Hu, Y.C. Particle Swarm Optimization Performance Improvement Using Deep Learning Techniques. Multimed. Tools Appl. 2022, 81, 27949–27968. [Google Scholar] [CrossRef]
Kushwaha, N.; Pant, M. Modified Particle Swarm Optimization for Multimodal Functions and Its Application. Multimed. Tools Appl. 2019, 78, 23917–23947. [Google Scholar] [CrossRef]

Figure 1. Images with positive IDCs in the dataset.

Figure 2. Images with negative IDCs in the dataset.

Figure 3. Architecture of MobileNet model developed for these experimental studies.

Figure 4. Architecture of custom-built CNN model within the scope of experimental studies.

Figure 5. (a) Accuracy and (b) loss function graph of the custom-built CNN model within the scope of experimental study 1.

Figure 6. (a) Accuracy and (b) loss function graph of VGG16 model within the scope of experimental study 1.

Figure 7. (a) Accuracy and (b) loss function graph of MobileNet model within the scope of experimental study 1.

Figure 8. (a) Accuracy and (b) loss function graph of DenseNet201 model within the scope of experimental study 1.

Figure 9. (a) Accuracy and (b) loss function graph of custom-built CNN model within the scope of experimental study 2.

Figure 10. (a) Accuracy and (b) loss function graph of VGG16 model within the scope of experimental study 2.

Figure 11. (a) Accuracy and (b) loss function graph of MobileNet model within the scope of experimental study 2.

Figure 12. (a) Accuracy and (b) loss function graph of DenseNet201 model within the scope of experimental study 2.

Figure 13. Sample images classified as false negatives and heatmap histogram graphs.

Figure 14. Sample images classified as false positives, and heatmap histogram graphs.

Figure 15. Comparison of the results of experimental study 1 and experimental study 2.

Table 1. Accuracy rates, computational complexities, and training times of deep learning models obtained as a result of experimental study 1.

Model	Number of Data: 157,572 Image Size: 50 × 50 × 3	Accuracy	FLOPS (Batch_Size = 1)	Training Time (Second)
VGG16	Batch_size: 200 Epoch:50	92.99%	1.5 G	1660.61
MobileNet	Batch_size: 200 Epoch:50	92.75%	0.049 G	1025.61
DenseNet201	Batch_size: 200 Epoch:50	90.18%	0.399 G	4858.08
The custom-built CNN model	Batch_size: 200 Epoch:50	93.80%	0.0889 G	789.09

Table 2. Accuracy rates, computational complexities, and training times of deep learning models obtained as a result of experimental study 2.

Model	Number of Data: 1116 Image Size: 224 × 224 × 3	Accuracy	FLOPS (Batch_Size = 1)	Training Time (Second)
VGG16	Batch_size: 8 Epoch:50	86.60%	30.8 G	167.39
MobileNet	Batch_size: 32 Epoch:50	95.54%	1.15 G	258.82
DenseNet201	Batch_size: 8 Epoch:50	92.86%	8.63 G	591.83
The custom-built CNN model	Batch_size: 32 Epoch:50	92.86%	2.34 G	111.16

Table 3. Comparative results with state-of-the-art literature.

Name (Year)	Method	Dataset	Accuracy Rates
[25]	AutoML vs. Holdout	(Breast Histopathology Images, Kaggle)	AutoML: 91.6% Holdout: 84.6%
[26]	DRNN	(Breast Histopathology Images, Kaggle) A subset of 7500 images was used.	99.29%
[27]	VGG16, ResNet18, DenseNet, CancerNet, and CNN	(Breast Histopathology Images, Kaggle)	CNN: 92%
[28]	AlexNet, VGG19, InceptionV3, Xception, and GoogLeNet	(Breast Histopathology Images, Kaggle) A total of 1728 images were used for model testing.	AlexNet: 96.74% VGG19: 94.83% InceptionV3: 92.48% Xception: 90.72% GoogLeNet: 97.80%
[29]	AlexNet, VGG19, InceptionV3, Xception, and GoogLeNet	(Breast Histopathology Images, Kaggle) A total of 27,800 images were used for model training and testing.	AlexNet: 96.74% VGG19: 94.83% InceptionV3: 92.48% Xception: 90.72% GoogLeNet: 97.80%
[30]	DenseNet-201 and Xception	(Breast Histopathology Images, Kaggle) A total of 31,827 images were used.	DenseNet-201: 96.74% Xception: 96.69%
[31]	LeNet, AlexNet, VGG19, VGG16, ResNet50, SVM, and Twin SVM	(Breast Histopathology Images, Kaggle) Two datasets were used in the studies, one was balanced and the other was imbalanced. The imbalanced dataset contained 65,279 IDC-negative images and 24,721 IDC-positive images. On the other hand, the balanced dataset consisted of 2788 IDC-positive images and 2759 IDC-negative images.	Unbalanced dataset: LeNet: 73% AlexNet: 79% VGG19: 81% VGG16: 85% ResNet50: 88% SVM: 86% Twin SVM: 73% Balanced dataset LeNet: 65% AlexNet: 72% VGG19: 73% VGG16: 74% ResNet50: 78% SVM: 76% Twin SVM: 62%
[32]	CNN with color constancy and CNN with histogram equalization	(Breast Histopathology Images, Kaggle)	CNN with Color Constancy AUC Value: 0.935 CNN with Histogram Equalization AUC Value: 0.876
[33]	CNN	(Breast Histopathology Images, Kaggle)	90%
[34]	ResNet50 and ResNet34	(Breast Histopathology Images, Kaggle)	ResNet50: 91% ResNet34: 90%
[35]	MobileNetV2 and EfficientNet	(Breast Histopathology Images, Kaggle)	MobileNetV2: 92.35% EfficientNet: 91.02%
This Study	A custom-built CNN model, MobileNet, DenseNet201, and VGG16	(Breast Histopathology Images, Kaggle)	Experimental Study 1 A custom-built CNN model: 93.80% MobileNet: 92.75% DenseNet201: 90.18% VGG16: 92.99% Experimental Study 2 A custom-built CNN model: 92.86% MobileNet: 95.54% DenseNet201: 92.86% VGG16: 86.60%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Korkmaz, M.; Kaplan, K. Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images. Appl. Sci. 2025, 15, 1005. https://doi.org/10.3390/app15031005

AMA Style

Korkmaz M, Kaplan K. Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images. Applied Sciences. 2025; 15(3):1005. https://doi.org/10.3390/app15031005

Chicago/Turabian Style

Korkmaz, Merve, and Kaplan Kaplan. 2025. "Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images" Applied Sciences 15, no. 3: 1005. https://doi.org/10.3390/app15031005

APA Style

Korkmaz, M., & Kaplan, K. (2025). Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images. Applied Sciences, 15(3), 1005. https://doi.org/10.3390/app15031005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effectiveness Analysis of Deep Learning Methods for Breast Cancer Diagnosis Based on Histopathology Images

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Breast Histopathology Images Dataset

3.2. Data Preprocessing

3.3. Convolutional Neural Network (CNN)

3.4. VGG16

3.5. DenseNet201

3.6. MobileNet

3.7. Particle Swarm Optimization

4. Experimental Results

4.1. Experimental Study 1

4.1.1. Results Obtained Using the Custom-Built CNN Model

4.1.2. Results Obtained Using the VGG16 Model

4.1.3. Results Obtained Using the MobileNet Model

4.1.4. Results Obtained Using the DenseNet201 Model

4.2. Experimental Study 2

4.2.1. Results Obtained Using the Custom-Built CNN Model

4.2.2. Results Obtained Using the VGG16 Model

4.2.3. Results Obtained Using the MobileNet Model

4.2.4. Results Obtained Using the DenseNet201 Model

5. Discussion

6. Comparison of State-of-the-Art Studies

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI