1-s2.0-S2590123024017006-main (1)

Results in Engineering 24 (2024) 103448
Contents lists available at ScienceDirect
Results in Engineering
journal homepage: www.sciencedirect.com/journal/results-in-engineering
Automated lung cancer detection using novel genetic TPOT feature

optimization with deep learning techniques
Mohamed Hammad a,b,* , Mohammed ElAffendi a , Muhammad Asim a,c ,
Ahmed A. Abd El-Latif a,d , Radwa Hashiesh b
a
EIAS Data Science Lab, College of Computer and Information Sciences, and Center of Excellence in Quantum and Intelligent Computing, Prince Sultan University,
Riyadh 11586, Saudi Arabia
b
Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin El Kom 32511, Egypt
c
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China
d
Department of Mathematics and Computer Science, Faculty of Science, Menoufia University, 32511, Egypt
A R T I C L E I N F O A B S T R A C T
Keywords: Lung cancer remains a leading cause of cancer-related deaths globally. Early and accurate detection is crucial for
Lung cancer improving patient outcomes. Traditional methods, relying on manual interpretation of medical images, are time-
CNN consuming and prone to errors. Deep learning, particularly convolutional neural networks (CNNs), offers an
Genetic technique
automated alternative capable of learning intricate patterns from medical images. However, previous deep
TPOT
LSTM
learning models for lung cancer detection have faced challenges such as limited data, inadequate feature
Hybrid models extraction, interpretability issues, and susceptibility to data variability. This paper presents a novel deep learning
methodology that addresses these limitations. Our approach leverages expansive datasets, incorporates advanced
feature extraction techniques, improves interpretability, and accommodates the diverse nature of lung cancer.
Specifically, we develop dedicated models for both chest X-ray and CT images utilizing publicly available
datasets from Kaggle. Through the integration of feature selection and model selection techniques—such as
employing a genetic algorithm in conjunction with the tree-based pipeline optimization tool (TPOT)—we ach
ieved remarkable accuracy. Our X-ray model attains an overall accuracy of 95.47%, the CT model achieves an
accuracy of 98.70%, and the combined model achieves an impressive overall accuracy of 98.93%. Our meth
odology significantly enhances the performance and efficiency of lung cancer detection and is a valuable tool for
early diagnosis and intervention.
1. Introduction images, reducing the requirement for manual feature engineering or

domain knowledge [14]. This autonomy, coupled with the availability
Lung cancer poses a substantial global public health challenge and is of extensive datasets, opens avenues for significantly enhancing the ef
a significant contributor to cancer-related mortality [1]. The urgency of ficiency of lung cancer detection.
early detection cannot be overstated, as it facilitates timely intervention Deep learning models have demonstrated exceptional achievements
and treatment, ultimately improving patient outcomes. However, the across various domains, including medical image analysis and pattern
manual interpretation of medical images by radiologists is limited by the recognition [15–20]. In the realm of lung cancer detection, these models
fact that it is time-consuming, subjective, and susceptible to diagnostic can revolutionize medical imaging by identifying lung nodules or le
errors due to the intricate and varied nature of lung cancer. Recent sions, potentially reshaping lung cancer screening practices [21]. By
advancements in deep learning, particularly convolutional neural net harnessing the capabilities of deep neural networks, these models
works (CNNs), present a highly promising avenue for enhancing lung complement radiologists’ diagnostic processes, offering second opinions
cancer detection [2–13]. These algorithms, with a focus on CNNs, or preliminary screenings, thereby enhancing accuracy and efficiency
autonomously decipher intricate patterns and features from medical [21].
* Corresponding author.
E-mail addresses: [email protected], [email protected] (M. Hammad), [email protected] (M. ElAffendi), [email protected] (M. Asim),
[email protected] (A.A. Abd El-Latif), [email protected] (R. Hashiesh).
https://doi.org/10.1016/j.rineng.2024.103448
Received 28 August 2024; Received in revised form 10 October 2024; Accepted 17 November 2024
Available online 18 November 2024
2590-1230/© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
M. Hammad et al. Results in Engineering 24 (2024) 103448
Despite these achievements, previous deep learning methods for and Linear Discriminant Analysis (LDA). Deep features from CT scans
detecting lung cancer have several limitations. Insufficient training data, are extracted, and dimensionality is reduced with LDA to classify lung
reliance on raw pixel data without advanced feature extraction, inter nodules as malignant or benign. The ODNN is optimized using a Modi
pretability issues, and a limited ability to handle the variability in lung fied Gravitational Search Algorithm (MGSA), achieving 96.2% sensi
cancer presentations have hindered the effectiveness of these methods. tivity, 94.2% specificity, and 94.56% accuracy in lung cancer
This research proposes an innovative deep learning-based approach to classification. Shah et al. [5] developed an ensemble approach
lung cancer detection, aiming to surmount these limitations. Our combining multiple CNNs for lung nodule detection, using the LUNA 16
methodology combines the strengths of deep learning algorithms with dataset of annotated CT scans. The Deep Ensemble 2D CNN, consisting
extensive datasets, leveraging larger, more diverse, and carefully an of three CNNs with different configurations, achieved a 95% accuracy in
notated datasets to enhance model generalizability. By utilizing a classifying cancerous and non-cancerous images, outperforming the
comprehensive dataset, our method augments detection performance baseline method. Rajasekar et al. [10] demonstrated improved perfor
and robustness. Additionally, we incorporate feature extraction tech mance across six deep learning algorithms, including CNN, CNN GD,
niques to capture meaningful and discriminative features from medical VGG-16, VGG-19, Inception V3, and Resnet-50. Analyzing both CT scan
images. Our method employs strategies that enhance the transparency of and histopathological images, the results indicate that detection accu
deep learning models for the diverse nature of lung cancer by designing racy is higher when histopathological tissues are used for analysis.
models capable of capturing different tumor subtypes, stages, and var Rehman et al. [12] presented a CNN-based model for tumor and nodule
iations. Training our models on a comprehensive dataset ensures detection in CT scans. It includes preprocessing with filtering for image
robustness and reliable performance across diverse scenarios. enhancement and postprocessing with morphological operators for fine
The novel contributions of this study are as follows: segmentation. The model, using an active contour algorithm, achieved
98.33% accuracy, 99.25% validity, and 98.18% dice similarity,
• Development of deep learning models tailored for X-ray and chest CT demonstrating its effectiveness.
image analysis in lung cancer detection. While these studies demonstrate the potential of deep learning
• Employed deep learning techniques to learn high-level representa models for lung cancer detection, they also exhibit certain limitations, as
tions of lung abnormalities, enabling the identification of subtle signs follows:
of cancer.
• The integration of CNN layers with long short-term memory (LSTM) • Many studies [5,7,8,10] focus on models tailored to specific datasets,
layers offers benefits in capturing spatial and temporal de which may limit their ability to generalize to diverse real-world data.
pendencies, learning hierarchical representations, handling sequen For example, Shah et al. [5] used the LUNA 16 dataset for training,
tial data, mitigating overfitting, and enhancing interpretability. but this dataset may not be fully representative of broader clinical
• Utilized of the genetic algorithm within the tree-based pipeline populations, raising concerns about the robustness of the model
optimization tool (TPOT) framework to automatically select the best when applied to different patient groups. Similarly, Rajasekar et al.
features and model architecture, ensuring optimal model [10] focused on combining CT and histopathological data but did not
performance. explore the model’s performance on more diverse, multimodal
• Developed an automated system to replace manual interpretation datasets beyond their experimental setting.
and subjective diagnosis, leading to improved accuracy and effi • Several studies [2–4,6–8,12] rely on traditional feature extraction
ciency in lung cancer detection. A portion of the source code is techniques followed by machine learning algorithms for classifica
accessible at: https://colab.research.google.com/drive/14YK_t tion. Although these approaches can achieve high accuracy on
u8wcEKOlUCfd1pQBRJU3R3kVMpq?usp=sharing controlled datasets, manual feature engineering may not fully cap
ture the complex and subtle patterns indicative of lung cancer,
2. Related work especially in heterogeneous medical images. For instance, the reli
ance on traditional methods in Lakshmanaprabu et al. [2] and Reh
The field of medical imaging analysis has experienced significant man et al. [12] limits the model’s capacity to adapt to varying
changes in recent years, with a noticeable increase in the use of deep conditions in unseen data, potentially reducing its clinical applica
learning techniques, specifically for the detection of lung cancer [2–13]. bility. Similarly, Shakeel et al. [3] focused primarily on enhancing
This emergence has not only been regarded as highly promising but has image quality through preprocessing techniques, neglecting the deep
also revolutionized the approach to lung cancer detection within the learning models’ ability to extract relevant features automatically,
realm of medical imaging analysis. Diverse studies have leveraged deep which may hinder detection of subtle abnormalities.
learning models to enhance lung cancer detection using various imaging • Deep learning models often lack transparency, making them difficult
modalities, including X-ray [6–9] and computed tomography (CT) [2–5, to interpret. This is a critical issue in medical applications where
10–13] methods. This section critically reviews the literature on deep explainability is crucial for clinical decision-making. Most of the
learning algorithms for lung cancer diagnosis, elucidating their meth reviewed studies [2–8,10,12] treat the models as "black boxes,"
odologies, performance measures, and significant contributions. which limits the trust that medical professionals can place in their
Several researchers have focused on deep learning models applied to predictions. Even in studies such as Rajasekar et al. [10], where
X-ray images for lung cancer detection. For instance, Alshmrani et al. [6] different architectures were compared, the interpretability of the
utilized a pretrained VGG-19 model combined with a CNN for feature results was not explicitly addressed, leaving room for improvement
extraction, achieving an accuracy of 93.75%. Bharati et al. [7] presented in understanding how these models make decisions.
a hybrid deep model for X-ray-based lung cancer detection, employing a • Access to large and diverse datasets remains a challenge in many
VGG pretrained with a CNN and achieving an accuracy of 73%. Bhan studies [2,5,10]. For instance, Lakshmanaprabu et al. [2] and Raja
dary et al. [8] introduced two distinct deep learning models, one for sekar et al. [10] achieved high accuracy with their respective data
classifying X-ray images into pneumonia and normal categories and sets, but their reliance on specific, limited datasets might not
another incorporating improved AlexNet and handcrafted features for translate well to broader clinical applications. A lack of diversity in
increased lung cancer diagnosis accuracy, achieving an overall classifi training data can limit the model’s ability to perform well across
cation accuracy of 97.27%. Conversely, CT-scan images have also been different populations and imaging settings.
extensively used in deep learning-based lung cancer detection models. • While large datasets such as LUNA 16 [5] can enhance the perfor
Lakshmanaprabu et al. [2] presented an automated method for diag mance of deep learning models, their accessibility often depends on
nosing lung CT images using an Optimal Deep Neural Network (ODNN) researchers’ resources. Not all research groups can access large,
2
annotated datasets due to cost, data-sharing restrictions, or institu 3.2. X-ray dataset
tional limitations. This poses a significant barrier for some studies,
where smaller datasets may constrain the ability to fully train and The dataset employed in our model comprises X-ray images of pa
validate deep learning models. tients with pneumonia and normal lung conditions. These images were
resized to 128×128 pixels and normalized between 0 and 1; the images
To address the limitations of previous methods, our proposed fell into two classes: PNEUMONIA and NORMAL. The training set con
approach integrates advanced deep learning algorithms with large- sisted of 4684 images, with 1172 images in the testing set. While there is
scale, diverse datasets, enhancing both generalizability and interpret mention of a validation set, specific details are not provided. Potential
ability. While many earlier studies relied on specific datasets and challenges include class imbalance and limited information on the
traditional feature extraction techniques, our first model, combining validation set. The dataset, sourced from a Kaggle repository by [23], is
CNN with LSTM for X-ray image classification, achieved an accuracy of well organized with "train," "test," and "val" folders, each containing
95.47%, effectively capturing temporal dependencies in the data. Our "pneumonia" and "normal" subfolders. It encompasses 5,863 X-ray im
second model, utilizing CNN and LSTM for CT-scan classification, ages in JPEG format, allowing for binary classification. Focused on pe
demonstrated an impressive accuracy of 99.04%, highlighting its capa diatric patients aged one to five years, the images were part of routine
bility to extract relevant features from CT images. Additionally, we clinical care and underwent rigorous quality control to exclude
introduced a combined model that integrates both X-ray and CT-scan low-quality scans. Diagnostic evaluations by two physicians and an
images using CNN, a genetic algorithm, and TPOT, achieving a additional review by a third expert ensured that the dataset was suitable
remarkable accuracy of 99.20%. This approach mitigates the limitations for training our artificial intelligence system. Fig. 2 shows examples of
of limited data diversity and interpretability seen in prior studies, sug chest X-ray images extracted from the dataset.
gesting strong potential for real-world clinical applications and assisting
healthcare professionals in the early and accurate diagnosis of lung 3.3. Combined features dataset
cancer.
The features extracted from the chest CT and X-ray datasets were
3. Datasets merged to form a combined feature dataset. Features were extracted
using our trained model, and the resulting features were saved as CSV
3.1. Chest CT dataset files. The training and testing set sizes are 1506 and 377, respectively.
Features were normalized between 0 and 1. The merging process aimed
In the implementation of our second model, we focused on the chest to enhance the diversity and richness of features for better model
CT scan dataset retrieved from Kaggle [22] (https://www.kaggle.com/ generalization. Fig. 3 shows the class distribution in the combined
datasets/mohamedhanyyy/chest-ctscan-images). This dataset, which dataset used to extract lung features.
contains diverse chest cancer types—large cell carcinoma (LCC),
adenocarcinoma (AD), and squamous cell carcinoma (SCC)—along with
3.4. Data processing
a category of normal cells—consists of images in JPG or PNG format
tailored to our model’s specifications. The dataset encompasses 685
In the data processing phase, the Chest CT dataset underwent several
training images and 315 testing images, which are resized to 256×256
key steps. Initially, image preprocessing involved resizing images to
pixels, converted to grayscale, and normalized between 0 and 1.
256×256 pixels, converting them to grayscale, and normalizing pixel
Notably, the dataset’s organization involves distinct subdirectories
values between 0 and 1. For feature extraction, deep learning models
within the primary data folder for the training, testing, and validation
were employed, followed by linear discriminant analysis (LDA) for
steps. Appendix A provides further details on each chest cancer type. It is
dimensionality reduction. The features were further optimized using the
essential to acknowledge the limitations, such as potential biases toward
modified gravitational search algorithm (MGSA). Additionally, data
specific architectures and feature extraction techniques, as well as
filtering was implemented to remove instances with specific labels,
limited diversity in some classes. Fig. 1 visually displays examples of
addressing potential biases. Similarly, the X-ray dataset underwent
chest CT scan images from the utilized dataset.
image processing, with images resized to 128×128 pixels and
Fig. 1. Chest CT-scan image samples.
3
Fig. 2. Sample chest X-ray images were obtained from the dataset (normal class (0) and pneumonia class (1)).
The subsequent extraction of relevant features from the amalgam

ated data serves as a crucial precursor to our subsequent research en
deavors. This amalgamation, coupled with meticulous feature
extraction, establishes the foundation for our in-depth exploration and
analysis. Notably, the conversion of data into binary format, combined
with the manipulation of extracted features, culminates in the applica
tion of the TPOT classifier for making predictions. Our methodology
underscores the significance of early lung cancer detection, a critical
aspect of healthcare. The fusion of advanced deep learning techniques
with intricate dataset handling shows the potential for robust and ac
curate predictions. This approach aligns with the imperative nature of
early cancer detection, contributing to enhanced patient outcomes and
timely intervention.
4.1. First deep model for classifying X-ray images
A deep learning model designed for the precise classification of chest

X-ray images into "normal" or "pneumonia" categories was introduced.
X-ray images were resized to 128×128 pixels, and training/testing sets
were established. Fig. 5 shows the first deep model’s structure. In
Fig. 3. Class distribution to extracted lung features dataset. addition, Table 1 provides hyperparameter settings and network speci
fications of the first model. During training, we employ categorical
normalized. Data splitting was performed to create distinct training, cross-entropy loss, the Adam optimizer, and a 0.001 learning rate,
test, and validation sets, followed by random shuffling of the dataset. implementing early stopping to curb overfitting after 20 epochs of
The combined features dataset was created by merging features from the stagnant validation accuracy. Model evaluation encompasses key met
Chest CT and X-ray datasets, and the resulting features were normalized rics such as accuracy, recall, precision, F1-score, overall accuracy,
between 0 and 1. To ensure data quality and validation, the label dis average accuracy, and a comprehensive confusion matrix, offering a
tribution was examined, and data cleaning involved filtering to address robust assessment of its performance on test data, which we will discuss
biases or imbalances. Scaling was applied to the features to maintain in the results section. Table 2 provides an overview of the model ar
uniformity and prevent dominance. The features extracted from both the chitecture incorporating convolutional layers with batch normalization,
Chest CT and X-ray datasets were utilized to train predictive (CNN & LSTM layers, and a concluding dense layer with a sigmoid activation
LSTM) models. function for binary classification.
The forthcoming Results Section will delve into the impact and Our developed deep model consists of eleven 2D convolutional layers
effectiveness of the combined features dataset. This will involve a followed by batch normalization layers. The output of every convolu
comparative analysis of the predictive model’s performance with other tional layer undergoes a ReLU activation function, given by:
datasets, shedding light on the robustness and applicability of the pro f(x) = (0, x) (1)
posed methodology.
where x is the input to the ReLU function and f(x) represents the output.
4. Methodology This activation function is designed to nullify all negative values in the
input while preserving positive values. After the last batch normaliza
In this Sectionsection, we present a comprehensive overview of our tion layer, we reshape the output tensor to be suitable for the subsequent
approach tofor detecting lung cancer through the application of deep LSTM layers. This reshaping operation can be represented as:
learning techniques. Our study revolves around a meticulous process
encompassing data collection, preprocessing, model architecture design, reshaped output = reshap(output tensor) (2)
training, and evaluation, elucidatedas shown in Fig. 4. Our methodology
where output_tensor represents the output of the last batch normaliza
features two distinct models, each leveraging a different dataset with
tion layer and reshaped_output represents the reshaped tensor.
varying types of medical images. The initial model is built upon the X-
The LSTM layers in our model employ the Tanh activation function,
ray dataset, while the subsequent model utilizes the CT dataset. The
given by:
cascade of steps outlined in Fig. 4 underscores the careful orchestration
of our approach. Initial data conversion into a binary format enhances f(x) = (ex − − e(− x) )/(ex + e(− x) ) (3)
computational efficiency, laying the groundwork for subsequent oper
ations. A pivotal phase involves the integration of both datasets, where x is the input to the Tanh activation function, and f(x) represents
resulting in the formation of a consolidated dataset conducive to the output. The Tanh function maps the input to a value between -1 and
comprehensive analysis. 1, providing centered and bounded nonlinearity. This activation
4
Fig. 4. General diagram of our method.
5
Table 1
Hyperparameter settings and network specifications of the first model.
Parameter Value
Input Shape (128, 128, 3)

Batch Size 32
Number of Epochs 200
Learning Rate 0.001
Loss Function Categorical Crossentropy
Optimizer Adam
Early Stopping Patience 20
Activation Function (Conv Layers) ReLU
Activation Function (LSTM) Tanh
Activation Function (Dense Layer) Sigmoid
Output Classes 2
function is suitable for capturing temporal dependencies within the

data.
Finally, for binary classification, we include a Dense output layer
with a Sigmoid activation function, given by:
f(x) = 1/(1 + e(− x) ) (4)
where x represents the input to the Sigmoid activation function, and f(x)
represents the output. The Sigmoid function is utilized to transform the
input into a numerical value ranging from 0 to 1, which signifies the
likelihood of the input being classified as a member of the positive class
(i.e., indicating the presence of lung cancer). This activation function
provides a probabilistic interpretation of the model’s output, facilitating
binary classification. Our deep learning model utilizes the catego
rical_crossentropy loss function and the Adam optimizer [25,26] with a
learning rate of 0.001 for model compilation. The choice of this loss
function is suitable for our classification problem since it quantifies the
discrepancy between the projected probabilities of class membership
and the actual class labels. The Adam optimizer, known for its efficiency
and adaptability, is utilized to update the model parameters during
training, with a learning rate of 0.001 to control the step size for
parameter updates.
During the training phase, the model is optimized by inputting
labeled examples from the training data to adjust its parameters. The
validation data, which are a distinct subset of the dataset, are used to
assess the model’s performance and observe its capacity to generalize.
Early stopping is a regularization approach used to minimize overfitting.
The training process can be interrupted if the validation accuracy does
not improve for a specified number of consecutive epochs. In our sce
nario, the training process is halted prematurely if the validation accu
racy fails to improve for a continuous span of 20 epochs. If the validation
accuracy does not improve for a consecutive period of 20 epochs, the
training process is halted to prevent further training on a potentially
overfitting model. Early stopping aids in achieving an equilibrium be
tween the intricacy of the model and its ability to generalize by halting
unneeded training iterations that may result in overfitting the training
data.
4.2. Second deep model developed to classify CT images
Our second deep learning model is designed to classify chest CT

images, specifically focusing on distinguishing between "large cell car
cinoma (LCC), adenocarcinoma (AD), and squamous cell carcinoma
(SCC)" and "normal." The model structure, illustrated in Fig. 6, un
dergoes key steps for effective classification. During data loading, im
ages are fetched from the dataset, organized by labels (LCC, AD, SCC,
normal), resized to 256×256 pixels, and converted to grayscale. Sub
sequent data preparation involves partitioning images into training and
Fig. 5. The overall structure of our first model. testing sets, numerical coding of labels, and normalization. Data labeling
transforms labels into a binary task, categorizing LCC, AD, and SCC as
abnormal and normal as the other. The model architecture includes
convolutional layers, max pooling layers, dropout layers, and an LSTM
6
Table 2
Network architecture details of our first model.
Layer Type Layer Name Filters/Units Kernel Size Strides Activation Function
Input input_layer
Conv2D conv2d 32 (3, 3) (1, 1) ReLU
Batch Normalization batch_normalization
Conv2D conv2d_1 32 (3, 3) (1, 1) ReLU
Conv2D conv2d_2 64 (3, 3) (2, 2) ReLU
Batch Normalization batch_normalization_1
Conv2D conv2d_3 64 (3, 3) (1, 1) ReLU
Conv2D conv2d_4 64 (3, 3) (1, 1) ReLU
Add skip connection
Conv2D conv2d_5 128 (3, 3) (2, 2) ReLU
Conv2D conv2d_6 128 (3, 3) (1, 1) ReLU
Conv2D conv2d_7 128 (3, 3) (1, 1) ReLU
Add skip connection_1
Conv2D conv2d_8 256 (3, 3) (2, 2) ReLU
Conv2D conv2d_9 256 (3, 3) (1, 1) ReLU
Conv2D conv2d_10 256 (3, 3) (1, 1) ReLU
Reshape reshape
LSTM lstm 128 Tanh
LSTM lstm_1 64 Tanh
Dense output_layer 2 Sigmoid
layer, as summarized in Table 3. Training involves categorical cross- Training for multiple epochs allows the model to iteratively update its
entropy loss, the Adam optimizer, and 100 epochs with early stop parameters and learn more complex representations from the data. By
ping. The evaluation metrics, akin to our first model, include confusion training for 100 epochs, we give the model sufficient opportunity to
matrix computations for in-depth analysis. Table 4 provides hyper learn from the dataset and converge to an optimal solution. To mitigate
parameter settings and network specifications of the second model. the issue of overfitting and enhance the model’s ability to generalize, we
Our developed model consists of three 2D convolutional layers, each incorporate the utilization of early stopping and model checkpointing
of which is subsequently followed by a max pooling layer and a dropout callbacks. Additionally, model checkpointing is used to save the model
layer, which has a dropout rate of 0.25. This architecture is effective at with the best validation performance during training. This ensures that
extracting hierarchical features from the input data and preventing we retain the weights and parameters of the model that achieve the
overfitting. The ReLU activation function is employed after each con highest validation accuracy. By saving the best model checkpoint, we
volutional layer. After the convolutional layers, the application of max can avoid losing the progress made by the model during training and
pooling layers decreases the spatial dimensions of the feature maps. The utilize the best-performing model for subsequent evaluation or deploy
process of max pooling entails partitioning the feature map into distinct ment. Table 5 shows the settings of all the hyperparameters for both
regions that do not overlap and subsequently identifying the highest models.
value within each region. The output of the max pooling operation re
tains the most salient features while reducing the spatial size. Max
4.3. Extracted features
pooling helps in achieving translation invariance and spatial down
sampling. To prevent overfitting and improve generalization, we include
For the chest CT dataset, we employed a CNN to classify images into
dropout layers after each max pooling layer. During each training iter
four classes: adenocarcinoma, large cell carcinoma, normal, and squa
ation, dropouts randomly deactivate a portion of the input units by
mous cell carcinoma. After training the model, we loaded it and
setting them to zero. This regularization technique promotes the
extracted features using the LSTM layer. These features were then
acquisition of resilient and comprehensive representations by discour
reshaped and saved for further analysis. For the chest X-ray dataset, we
aging the dependence on particular input features. The convolutional
utilized a similar approach, employing a CNN for binary classification
layer’s output is subsequently reshaped into a format that is appropriate
into pneumonia (positive) and normal (negative) cases. The trained
for the subsequent LSTM layer. The final output layer utilizes the Soft
model was loaded, and features were extracted using the LSTM layer.
Max activation function, a widely employed method for multiclass
These features were reshaped for the chest CT dataset. Now we have
classification tasks. The SoftMax function calculates the probability
features from both datasets, which serve as a valuable resource for
distribution among various classes. It takes the output from the previous
subsequent stages in our research, allowing us to explore combined
layer and normalizes it to ensure that the predicted probabilities sum to
features and their impact on model performance. The extraction process
1. This enables the model to provide class probabilities for the final
enables us to focus on essential characteristics for accurate classifica
classification decision, allowing for effective multiclass classification.
tion, enhancing the efficiency of our overall diagnostic system.
The compilation procedure in the proposed model incorporates the
employment of the categorical cross-entropy loss function and the Adam
optimizer, which is consistent with the approach employed in the initial 4.4. Combined model based on genetic TPOT feature selection
model. During the training process, the model is trained for a pre
determined number of epochs—specifically, 100 epochs. An epoch is In this phase, we optimized the features using the genetic algorithm
defined as a single iteration through the entirety of the training dataset. to feed them into the TPOT [27] to select the best classifier for the final
decision. Fig. 7 shows the algorithm steps of the proposed combined
7
Table 3
Network architecture details of our second model.
Layer Type Layer Name Filters/ Kernel Strides Activation
Units Size Function
Input input_layer
Conv2D conv2d 32 (3, 3) (1, 1) ReLU
MaxPooling2D max_pooling2d (2, 2) (2, 2)
Dropout dropout 0.25
Conv2D conv2d_1 64 (3, 3) (1, 1) ReLU
MaxPooling2D max_pooling2d_1 (2, 2) (2, 2)
Dropout dropout_1 0.25
Conv2D conv2d_2 128 (3, 3) (1, 1) ReLU
MaxPooling2D max_pooling2d_2 (2, 2) (2, 2)
Dropout dropout_2 0.25
Reshape reshape
LSTM lstm 64 Tanh
Dense output_layer 2 Softmax
Table 4
Hyperparameter settings and network specifications of our second model.
Parameter Value
Input Shape (256, 256, 1)

Batch Size 32
Number of Epochs 200
Learning Rate 0.001
Loss Function Categorical Crossentropy
Optimizer Adam
Early Stopping Patience 20
Dropout Rate 0.25
Activation Function (Conv Layers) ReLU
Activation Function (LSTM) Tanh
Activation Function (Dense Layer) Softmax
Output Classes 2
Table 5
Details of hyperparameter setting employed to develop the two models which
extracted features.
Parameter X-ray Model Chest Model
Learning Rate 0.001 0.001

Dropout None 0.25
Number of Hidden Units 2 2
Number of Conv2D Layers 11 3
Number of Conv3D Layers 0 0
Number of Dense Layers 1 1
Number of LSTM Layers 2 1
Batch Size 32 32
Number of Epochs 200 100
model. To optimize lung cancer detection, we employed TPOT to

improve the performance of our system. Our approach began by amal
gamating features derived from both the initial and subsequent pro
posed models. These combined features underwent meticulous
optimization using a genetic algorithm within the TPOT. The primary
objective was to pinpoint the most effective classifier for our conclusive
decision-making process. Fig. 7 also succinctly delineates the pivotal
steps in our integrated model. The genetic algorithm, which is intricately
woven into TPOT, traverses several essential steps to achieve optimal
model selection. TPOT initiates a diverse population of pipelines, each
representing a potential solution to the lung cancer detection task. The
fitness of these pipelines is meticulously evaluated through training and
cross-validation, utilizing metrics such as accuracy or F1-score. Evolu
tionary operators, namely, mutation and crossover, are subsequently
applied to generate new pipeline variations, thereby enhancing di
Fig. 6. Structure of our second model.
versity. This iterative process continues over multiple generations, with
the genetic algorithm discerning the best-performing pipelines based on
their fitness scores. The outcome is the identification of the most
effective pipeline (shown in Fig. 8) for accurately distinguishing
8
Fig. 7. Pseudocode of the proposed combined method.
scores for different models, highlighting the KNeighbors classifier as

the best-performing pipeline with specific parameters. This classifier
plays a pivotal role in distinguishing between normal and abnormal
cases within X-ray and CT data, underscoring the efficacy of our clas
sification task. In essence, TPOT’s automated approach substantially
streamlines the optimization process, leading to superior performance in
lung cancer detection. The genetic algorithm within TPOT takes the
center stage, performing feature selection and model selection. The
Fig. 8. Pipeline of the best model.
resulting best model is a pipeline that integrates SelectKBest for feature
selection and the KNeighbors classifier for accurate predictions. The
between normal and abnormal cases in X-ray and CT data. code snippets further showcase the evaluation of TPOT’s performance
The genetic algorithm is utilized within the TPOT library [28] to across multiple generations, shedding light on the occurrence of
perform genetic model selection. It is responsible for evolving and different models in the selection process. Our comprehensive method
selecting the best pipelines (a combination of preprocessing steps and ology, enriched by TPOT’s automated process, not only streamlines the
machine learning models) based on their fitness (performance) on a optimization process but also propels our lung cancer detection system
given dataset. TPOT’s revolutionary mechanism automates the optimi to achieve remarkable accuracy and efficiency.
zation of machine learning pipelines, especially for intricate tasks such
as lung cancer detection. By harnessing the power of a genetic algo 5. Results and analysis
rithm, TPOT navigates a diverse solution space, iteratively refining
pipelines to achieve peak performance. The synergy of mutation and In this section, we will proceed to showcase the outcomes derived
crossover operators allows TPOT to create new pipeline variations, from our methodologies centered around deep learning, specifically in
facilitating the exploration of preprocessing steps and machine learning the detection of lung cancer. Extensive experiments were conducted on
models. This process iterates until convergence toward the best pipeline two commonly used datasets obtained from Kaggle [22,24]. One dataset
is achieved. The result of the best-performing pipeline encapsulates consisted of chest X-ray images, while the other dataset comprised CT
optimized solutions that maximize the chosen fitness metric. Fig. 9 scan images. The primary aim of our study was to assess the efficacy and
serves as a visual representation of sample internal cross-validation proficiency of our models in accurately identifying instances of lung
9
Fig. 10. Confusion matrix for the first model.
TN). In this case, the accuracy can be computed as (840 + 279) /

(840 + 20 + 33 + 279) = 0.954, or 95.47%.
• Precision: This metric signifies the model’s capacity to accurately
categorize instances as belonging to the positive class. It is calculated
as TP / (TP + FP). The precision helps assess the reliability of positive
Fig. 9. Internal CV scores used for different models. predictions. The precision can be computed as 840 / (840 + 20) =
0.976, or 97.67%.
cancer. Here, we provide a detailed analysis of the experimental results, • Recall (Sensitivity): Quantifies the efficacy of the model in accurately
including accuracy, precision, recall, F1-score, and visualizations of the detecting all instances classified as positive. Recall is calculated as TP
confusion matrices, showcasing the robustness and efficacy of our pro / (TP + FN). In this case, recall can be computed as 840 / (840 + 33)
posed approaches. The code implementation utilized the TensorFlow = 0.962, or 96.22%.
and Keras libraries for building and training the models. The experi • F1-Score: It is a mathematical measure that represents the harmonic
ments were conducted on a PC equipped with an AMD A8-5550M APU mean of precision and recall. It serves as a unified metric that
processor running at 2.1 GHz with integrated AMD Radeon HD 8550G effectively balances the evaluation of both precision and recall and is
graphics. The system had 7.37 GB of RAM (with 1.5 GB available during calculated as 2 × (Precision × Recall) / (Precision + Recall). In this
data collection) and ran on Windows 11 Home (Version 10.0, Build case, the F1-Score can be computed as 2 × (0.976 × 0.962) / (0.976
22631). The model used was an HP ProBook 645 G1, with virtualization + 0.962) = 0.968 or 96.89%.
support and Second Level Address Translation and Data Execution
Prevention features enabled for better performance and security. The Fig. 11 shows the training and test curves (a) and the loss curves of
training and experimentation were conducted on cloud-based platforms the training and testing (b). The accuracy and loss curves provide
such as Kaggle and Google Colab, leveraging their computational re valuable insights into the performance and training dynamics of the lung
sources and access to specialized deep learning libraries such as Ten cancer detection model. Regarding the accuracy curves, the training
sorFlow and Keras. We have divided this section into three parts: 1) The curve starts at 90% accuracy from the initial epoch and gradually in
first part describes the results of our first model for classifying X-ray creases, reaching 100% accuracy at epoch 60. After approximately 30
images into normal and pneumonia classes. 2) The second part describes epochs, the training accuracy stabilizes at its peak level. On the other
the results of our second model for classifying CT-scan images. 3) The hand, the test curve begins at 75% accuracy and experiences fluctua
third part describes the results of our combined model based on genetic tions in the early epochs. It initially drops to 40% accuracy at epoch 8
analysis and TPOT and then compared with lung dataset features pub but then shows improvement, reaching 93% accuracy by epoch 10. The
lished in Kaggle. test accuracy continues to fluctuate, with slight decreases and increases
until epoch 20, when it stabilizes at approximately 93%. The accuracy
remains relatively steady with minor fluctuations until the end of the
5.1. Results obtained for the first model using X-ray images training process, ultimately reaching an accuracy of 95.47%.
Turning to the loss curves, the training curve starts with a loss of 0.3
Fig. 10 shows the confusion matrix that represents the classification at epoch 0 and consistently decreases as the model learns from the data.
results of the first deep learning model to classify X-ray images of the By epoch 60, the training loss reaches 0 and remains stable thereafter.
chest into two classes: "normal" and "pneumonia." The figure shows that Conversely, the test curve begins with a loss of 0 and increases to 1.4 at
the model correctly detected 93.33% of the normal images as normal epoch 8. However, it quickly decreases to 0.2 by epoch 10. The test loss
cases. In addition, the model correctly detected 96.21% of the pneu exhibits some fluctuations thereafter, with an increase to 1.0 at epoch
monia images as pneumonia cases. From the confusion matrix, we can 15, followed by a decrease to 0.4 at epoch 20. Although the test loss
also derive several performance metrics to evaluate the classification fluctuates slightly throughout the training process, it generally remains
accuracy of the model: below 1.0. Toward the end, the loss stabilizes at approximately 0.3,
indicating that the model has the ability to maintain consistent
• Accuracy: Determined by evaluating their overall correctness. This performance.
evaluation is conducted as follows: (TP + TN) / (TP + FP + FN +
10
Fig. 12. Confusion matrix obtained for the second model.
Fig. 11. Accuracy curves obtained for the first model: (a) model accuracy and
(b) model loss.
5.2. Results obtained for the second model using chest CT images
Fig. 12 shows the confusion matrix that represents the classification

results of the second deep learning model for classifying CT scan images
of the chest into two classes: "normal" and "abnormal". The figure shows
that the model correctly detected 98.14% of the normal images as
normal cases. In addition, the model correctly detected 99.23% of the
abnormal images as abnormal cases. From the confusion matrix, the
performance metrics to evaluate the classification accuracy of the model
are as follows:
The overall accuracy is 99.047%, precision is 99.615%, recall is
99.233% and F1-score is 99.424%. Fig. 13 shows the train and test
curves (a) and the loss curves of the train and test (b) for the second
model.
In terms of the accuracy curves, the training curve starts at 80%
accuracy at epoch 0 and steadily increases to 99% accuracy by epoch 25.
There is a slight decrease to 75% accuracy and then a subsequent in
crease to 99% accuracy by the final epoch 40. After approximately 33
epochs, the training accuracy stabilizes and remains consistent. On the
other hand, the test curve begins at 83% accuracy and experiences
fluctuations in the early epochs. It initially drops to 20% accuracy at
Fig. 13. Accuracy curves obtained for the second model: (a) model accuracy,
epoch 3 but then shows improvement, reaching 90% accuracy by epoch and (b) model loss.
11
5. There is a decrease to 50% accuracy at epoch 10, followed by an in

crease to 93% accuracy at epoch 19. The test accuracy continues to
fluctuate, with slight decreases and increases until epoch 25. From
epoch 25 onwards, the test accuracy stabilizes at a high level, reaching
99% accuracy.
Examining the loss curves, the training curve starts with a loss of 0.5
at epoch 0 and steadily decreases throughout the training process. It
reaches a loss of 0 at epoch 26 and then increases slightly at epoch 27
before decreasing again. The training loss remains consistent and rea
ches a stable value of 0 after approximately 36 epochs. Similarly, the test
curve begins with a loss of 0.5 and fluctuates in the early epochs. It
increases to 1.4 at epoch 5 but then decreases to 0.2 at epoch 8. There is
an increase to 2.4 with the highest loss observed at epoch 20. However,
the test loss gradually decreases to 0.3 at epoch 25 and undergoes slight
fluctuations until epoch 30. After epoch 34, the test loss increases to 1.0
and then decreases to 0, remaining stable until the end of the training
process.
5.3. Results obtained for the combined method based on genetic analysis
and TPOT
Fig. 14. Confusion matrix obtained for the combined model.
In this method, we combine the features from the two models using
both X-ray and CT-scan images to improve the overall accuracy of the features directly from medical images, eliminating the need for manual
detection. We optimized the combined features using a genetic algo feature engineering. However, we recognize that simpler algorithms,
rithm with TPOT. We then tested 11 models, and TPOT was selected as such as SVM, can also deliver effective results and may outperform
the best model according to its CV score, as shown in Table 6. From the complex models in specific contexts. For example, while our first model,
figure, we can observe that the best score is reached using the KNeigh which employed CNN+LSTM for X-ray image classification, achieved an
bors model. As a result, we confirmed the selected model for all analyses accuracy of 95.47%, we acknowledge that certain studies may have
and compared it with other previous methods. Fig. 14 shows the shown competitive performance using lighter networks or alternative
confusion matrix that represents the classification results of the com methodologies.
bined model to classify the images from different modalities into normal We focused on two common imaging modalities used for lung cancer
and abnormal. The figure shows that the model correctly detected detection: chest X-ray and CT images. For chest X-ray images, our model
99.31% of the normal images as normal cases and misclassified 0.68% of achieved an accuracy of 95.47%. Similarly, for CT scan images, our
the normal images as abnormal cases. In addition, the model correctly model attained an accuracy of 99.04%. To overcome the limitations of
detected 98.80% of the abnormal images as abnormal cases and mis previous methods, we introduced a combined model that integrated X-
classified 1.19% of the abnormal images as normal cases. From the ray and CT-scan images using CNN, a genetic algorithm, and TPOT. This
confusion matrix, the performance metrics to evaluate the classification model achieved remarkable results, with an accuracy of 99.20%. Table 7
accuracy of the model are as follows: shows a summary of the performances of all the models.
The best model achieved an accuracy of 99.20%, precision of From the previous Table, these performance metrics indicate the
98.65%, recall of 99.06%, and F1 score of 98.86%. effectiveness of our combined model in accurately detecting lung cancer
patients.
6. Discussion The accuracy curves of the first model shown in Fig. 11 demonstrate
the model’s ability to achieve high accuracy levels, with the training
In this section, we will delve into the implications of our findings, curve reaching 100% accuracy and the test curve stabilizing at
compare them to those of previous studies, and analyze the strengths approximately 93%. The loss curves show the model’s effectiveness in
and limitations of our methodology. The results of our study provide minimizing the loss, with the training loss reaching 0 and the test loss
evidence supporting the efficacy of deep learning models in improving maintaining a relatively low value. These curves indicate that the model
lung cancer detection accuracy. By leveraging large-scale datasets and has successfully learned to classify lung cancer X-ray images accurately
advanced feature extraction techniques, our models achieved high ac and achieved good generalization performance. Additionally, the ac
curacy rates, surpassing the performance of many existing methods. The curacy curves of the second model shown in Fig. 13 demonstrate the
use of CNNs enables the automatic learning of intricate patterns and model’s ability to achieve high accuracy levels, with both the training
and test curves reaching 99% accuracy. The stability of the curves after a
Table 6 certain number of epochs indicates that the model can be generalized
TPOT results for 10 generations on our combined dataset.
TPOT Results Table 7
Summary of the performance of our models.
Generation Model Name Accuracy
Models Accuracy (%) Precision Recall F1-score
1 KNeighborsClassifier 0.9920318725099602
(%) (%) (%)
2 KNeighborsClassifier 0.9920318725099602
3 KNeighborsClassifier 0.9933598937583001 First model 95.47 97.67 96.22 96.94
4 KNeighborsClassifier 0.9933598937583001 Average Accuracy 94.77%
5 KNeighborsClassifier 0.9933598937583001 (AA)
6 KNeighborsClassifier 0.9933598937583001 Second model 99.04 99.61 99.23 99.24
7 KNeighborsClassifier 0.9933598937583001 Average Accuracy 98.69%
8 KNeighborsClassifier 0.9933598937583001 (AA)
9 KNeighborsClassifier 0.9933598937583001 TPOT on 99.20 98.65 99.06 98.86
10 KNeighborsClassifier 0.9940239043824701 combined data
12
well. Similarly, the loss curves showcase the model’s effectiveness in Table 8
minimizing the loss, with both the training and test curves reaching a Comparison of our models with other recent methods-based deep learning
final loss of 0. The stability and low loss values further highlight the approaches.
model’s successful convergence and ability to accurately classify lung Authors Methodology Image Performance
cancer images. modality
For the proposed combined method, the combined features are Lakshmanaprabu CNN+LDA+MGSA CT-scan Specificity =
optimized using a genetic algorithm in conjunction with TPOT. To et al. [2] images 94.2%
determine the best classification model, we tested 10 different models Sensitivity =
96.2% Accuracy
and utilized the CV score as the criterion for selection. As depicted in
= 94.56%
Table 6, the KNeighbors model attained the highest score, confirming its Shakeel et al. [3] CNN+IPCT CT-scan Recall = 96.80%
selection for all subsequent analyses and comparison with previous images Precision = 94%
methods. Additionally, TPOT on combined data demonstrates superior F1-score =
performance, emphasizing the benefits of automated optimization. We 95.40%
Accuracy =
compared our results with recent studies on lung cancer detection based 94.50%
on different deep learning approaches, as shown in Table 8. Shafi et al. [4] CNN+SVM CT-scan Recall = 94.50%
Comparing our results with those of previous methods, our combined images Precision = 95%
model outperformed all other approaches in terms of all the metrics F1-score =
94.50%
used. The high accuracy and precision values demonstrate the ability of
Accuracy = 94%
our combined model to accurately classify lung cancer cases, while the Shah et al. [5] Deep Ensemble 2D CT-scan Accuracy = 95%
high recall and F1-score values indicate the model’s ability to correctly CNN images
identify true positive cases and achieve a balanced performance be Mohandass et al. Improved Empirical CT-scan Precision =
tween precision and recall. These findings highlight the effectiveness [28] Wavelet Transforms images 99.40%
(IEWT) + Attention- Sensitivity =
and potential of our proposed methodology, which utilizes deep
based CNN enseNet- 99.19%
learning models and optimization techniques, for accurate and reliable 201 F1-score =
lung cancer detection. 99.11%
Lakshmanaprabu et al. [2] utilized a methodology combining a CNN, Accuracy =
99.30%
LDA, and MGSA for CT scan image analysis. Although their approach
Crasta et al. [29] 3D-VNet + 3D-ResNet CT-scan Sensitivity =
achieved a commendable accuracy of 94.56%, it relied on manual images 98.80%
feature extraction and dimensionality reduction techniques. This limits Specificity =
its ability to capture complex patterns and may lead to reduced accuracy 99.60%
and limited generalizability. Similarly, Shakeel et al. [3] employed a Accuracy =
99.20%
CNN with IPCT for CT-scan image classification, achieving an accuracy
Saha et al. [30] VER-Net CT-scan Precision = 92%
of 94.5%. However, their approach focused only on enhancing image images Recall = 91%
quality through preprocessing techniques, neglecting the potential of F1-score =
deep learning models to extract relevant features automatically. This 91.30%
Accuracy = 91%
may result in a limited ability to capture subtle abnormalities and may
Alshmrani et al. [6] VGG-19 + CNN X-ray Recall = 93.75 %
decrease the overall accuracy. Shafi et al. [4] proposed a CNN combined images Precision 97.56
with SVM for CT-scan image classification, achieving an accuracy of %
94%. While their approach showed promising results, it relied on F1-score = 95.62
handcrafted feature extraction and SVM for classification. This manual %
Accuracy =
feature engineering approach may limit its ability to handle the
96.48 %
complexity and variability of lung cancer characteristics, potentially Bharati et al. [7] STN + VGG-16 + CNN X-ray Recall = 64%
affecting the accuracy and generalizability of the model. Shah et al. [5] images Precision = 62%
developed a deep ensemble model for CT-scan image classification, F1-score = 64%
Accuracy =
achieving an accuracy of 95%. However, the ensemble approach may
70.8%
introduce additional complexity, requiring careful model selection and Bhandary et al. [8] Modified AlexNet + X-ray Specificity =
integration. This can pose challenges in terms of model interpretability SVM images 95.63%
and computational efficiency. The model by Mohandass et al. [28] uses Sensitivity =
IEWT with an attention-based CNN (DenseNet-201) on CT-scan images, 98.09%
Accuracy =
adding significant computational complexity. The wavelet transforms
97.27%
and attention mechanisms increase the need for high-end hardware, Ashwini et al. [31] CNN + grid search X-ray Accuracy =
limiting its use in real-time or resource-constrained environments. Its optimization (GSO) images 98.75%
high accuracy (99.30%) also raises concerns about overfitting, espe Jaya and CNN + LSTM X-ray Precision =
cially if the dataset lacks diversity. Similarly, Crasta et al. [29] use Krishnakumar images 96.75%
[32] Specificity =
3D-VNet and 3D-ResNet on CT-scan images, which also increases 95.13%
computational demands. The use of 3D networks adds data dimension Sensitivity =
ality, requiring more processing power and memory. While their model 96.60%
shows high sensitivity and accuracy, its complexity may limit its use in Accuracy =
93.71%
clinical settings without adequate infrastructure. Saha et al. [30]
Our First Model CNN+LSTM X-ray Precision =
introduced VER-Net for CT-scan analysis, but its lower accuracy (91%) images 97.67%
suggests limitations in capturing complex features. This may affect its Recall = 96.22%
ability to detect the abnormal lung cancer correctly. F1-score =
In terms of X-ray image analysis, Alshmrani et al. [6] combined 96.94%
Accuracy =
VGG-19 with a CNN and achieved an accuracy of 96.48%. However, 95.47%
their method focused solely on X-ray images and did not utilize CT-scan
(continued on next page)
images, limiting the diversity and comprehensiveness of the dataset.
13
Table 8 (continued ) categorizing X-ray and CT-scan images. The initial model demonstrated
Authors Methodology Image Performance a classification accuracy of 95.47% in the context of X-ray image clas
modality sification, whereas the subsequent model exhibited a notable accuracy
Our Second Model CNN+LSTM CT-scan Recall = 99.23%
of 99.04% in the domain of CT-scan image classification. Furthermore,
images Precision = our integrated model, which combines X-ray and CT scan images
99.61% through the use of a CNN, a genetic algorithm, and TPOT, demonstrates
F1-score = an impressive accuracy rate of 99.20%. When comparing our findings to
99.24%
those of previous methodologies, we achieved higher levels of accuracy
Accuracy =
99.04% and performance metrics. Our models demonstrate superior perfor
Our Combined CNN+genetic X-ray + Precision = mance compared to methodologies that depend on manual feature
Model algorithm+TPOT CT-scan 98.65% extraction, limited diversity in datasets, and suboptimal selection of
images Recall = 99.06% models. Moreover, the integration of multiple imaging modalities in our
F1-score =
98.86%
combined model augments the comprehensiveness and precision of lung
Accuracy = cancer detection. The exemplary performance exhibited by our models,
99.20% specifically the integrated model, serves as a testament to their consid
erable potential in practical clinical settings. Our proposed methodology
offers precise and reliable identification of lung cancer, thereby assisting
This may result in reduced accuracy and limited applicability to
healthcare practitioners in promptly diagnosing this condition, ulti
different imaging modalities. Bharati et al. [7] utilized STN, VGG-16,
mately resulting in improved patient prognosis. The utilization of deep
and CNNs for X-ray image classification, achieving an accuracy of
learning methodologies, optimization algorithms, and the incorporation
70.8%. Their approach demonstrated relatively lower accuracy, which
of various imaging modalities signifies notable progress in the field of
may be attributed to the limitations of handcrafted feature extraction
lung cancer detection, thereby demonstrating the efficacy of machine
and the specific architecture used. This suggests a need for more
learning in tackling intricate healthcare obstacles. Subsequent in
advanced feature-learning techniques and model optimization. Bhan
vestigations may prioritize the continued enhancement and verification
dary et al. [8] proposed a modified AlexNet combined with SVM for
of our models using more extensive and heterogeneous datasets. Addi
X-ray image classification, achieving an accuracy of 97.27%. While their
tionally, it would be valuable to investigate the implementation of these
approach demonstrated high accuracy, it relied on handcrafted features
models in clinical environments to assess their influence on patient care.
and SVM for classification. This may limit its ability to adapt to complex
and diverse lung cancer characteristics, potentially impacting the
Data availability
model’s robustness and accuracy. Ashwini et al. [31] and Jaya and
Krishnakumar [32] rely on CNN-based architectures for X-ray image
Data are available at: https://www.kaggle.com/datasets/paultimoth
analysis. These models may struggle with more complex modalities such
ymooney/chest-xray-pneumonia. and https://www.kaggle.com/dataset
as CT scans. Additionally, their dependence on parameter tuning and
s/mohamedhanyyy/chest-ctscan-images. Accessed [28-12-2023]
optimization methods, such as grid search, makes them computationally
expensive and may not yield the best results across larger datasets.
CRediT authorship contribution statement
In contrast, our first model, which employed CNN+LSTM for X-ray
image classification, achieved an accuracy of 95.47%. By leveraging the
Mohamed Hammad: Writing – review & editing, Writing – original
power of deep learning and sequential modeling, our model demon
draft, Validation, Methodology, Formal analysis, Conceptualization.
strated improved accuracy and the ability to capture temporal de
Mohammed ElAffendi: Validation, Supervision, Project administra
pendencies in the image data. Furthermore, our second model, which
tion. Muhammad Asim: Visualization, Investigation, Formal analysis.
utilized CNN+LSTM for CT-scan image classification, achieved an
Ahmed A. Abd El-Latif: Validation, Supervision, Conceptualization.
impressive accuracy of 99.04%. This highlights the effectiveness of our
Radwa Hashiesh: Writing – original draft, Software, Resources, Meth
approach in handling the specific characteristics of CT-scan images and
odology, Conceptualization.
extracting relevant features for accurate lung cancer detection. To
overcome the limitations of previous methods, we introduced a com
Declaration of competing interest
bined model that integrated X-ray and CT-scan images using CNN, a
genetic algorithm, and TPOT. This model achieved remarkable results,
The authors declare that they have no known competing financial
with an accuracy of 99.20%. By leveraging the complementary infor
interests or personal relationships that could have appeared to influence
mation from both modalities and optimizing the feature selection pro
the work reported in this paper.
cess, our combined model demonstrated superior performance in terms
of accuracy and precision.
Acknowledgments
The superior performance of our combined model suggests its po
tential applicability in real-world clinical settings, assisting healthcare
The authors would like to acknowledge the support of Prince Sultan
professionals in the early and accurate diagnosis of lung cancer. How
University for paying the Article Processing Charges (APC) of this pub
ever, we maintain that ongoing research should continue to explore a
lication. This paper is derived from a research grant funded by the
variety of methodologies, including simpler and more interpretable
Research, Development, and Innovation Authority (RDIA) - Kingdom of
models, to ensure a comprehensive understanding of lung cancer
Saudi Arabia - with grant number (13325-psu-2023-PSNU-R-3-1-EF).
detection and the optimization of diagnostic processes.
Appendix A
7. Conclusions
●Adenocarcinoma (AD):
This study introduces a novel approach based on deep learning for
AD is a neoplastic condition originating from the epithelial cells of
the detection of lung cancer, addressing the limitations observed in prior
glandular tissue in multiple organs within the human body [A1]. This
methodologies. Through the utilization of extensive datasets, sophisti
particular type of lung cancer is widely prevalent, constituting approx
cated feature extraction methods, and optimization algorithms, our
imately 40% of the total number of cases. AD commonly originates in
models exhibit enhanced accuracy, resilience, and efficiency in
the periphery of the lungs and has a greater prevalence among
14
nonsmokers and younger individuals than alternative forms of lung treatment or in conjunction with complementary modalities such as
cancer [A2]. AD is characterized by the histological manifestation of chemotherapy or radiation therapy. Nevertheless, as a result of its
aberrant glandular cells, which exhibit a propensity for organizing into aggressive nature and frequently advanced stage upon diagnosis, a sig
glandular structures or clusters [A3]. The observed cells frequently nificant number of cases of LCC are not suitable for surgical removal.
display different levels of differentiation, spanning from highly differ Chemotherapy is a frequently employed therapeutic approach for the
entiated forms that closely resemble typical glandular cells to poorly management of advanced or metastatic large-cell carcinoma [A10].
differentiated forms characterized by significant cellular atypia. Tumor Platinum-based combination regimens, such as the use of cisplatin or
cells have the potential to infiltrate neighboring lung tissue and carboplatin in conjunction with other cytotoxic agents, are frequently
disseminate to regional lymph nodes or distant locations, resulting in the employed in clinical practice. The efficacy of targeted therapies and
development of metastatic lesions. immunotherapies for the treatment of NSCLC, particularly large-cell
One of the primary difficulties associated with the diagnosis of AD is carcinoma, is currently under investigation. Differentiating large-cell
the presence of various morphological manifestations [A4]. The mani carcinoma from other types of NSCLC or small-cell lung cancer can be
festation of this condition may include the presence of a cohesive mass, a challenging due to the absence of distinct features and markers. Precise
single distinct nodule, or the occurrence of multiple nodules exhibiting a histopathological assessment, immunohistochemical staining, and mo
ground-glass appearance on imaging modalities such as CT scans. In lecular profiling play crucial roles in differentiating LCC from other
addition, AD has the ability to exhibit a diverse array of histological subtypes and providing valuable guidance for selecting appropriate
subtypes, such as lepidic, acinar, papillary, micropapillary, and solid treatment strategies.
patterns. The existence of these diverse patterns contributes to the het ●Squamous Cell Carcinoma (SCC):
erogeneity and intricacy of AD, thereby rendering its diagnosis and SCC represents a subset of NSCLC, constituting an estimated 25-30%
classification more arduous. In recent years, there have been notable of the total incidence of lung cancer [A11]. The origin of this condition
advancements in molecular profiling techniques, which have contrib can be attributed to the squamous cells that line the airways within the
uted to a deeper understanding of the molecular characteristics associ lungs. SCC exhibits a robust correlation with smoking and tends to be
ated with AD [A5]. The role of genetic alterations, specifically mutations predominantly localized in the central region of the lung, primarily
in the epidermal growth factor receptor (EGFR) gene or rearrangements within the larger bronchi [A12]. SCC is histologically characterized by
in the anaplastic lymphoma kinase (ALK) gene, in the development and the presence of malignant squamous cells exhibiting diverse levels of
progression of AD has been widely acknowledged [A6]. The molecular differentiation. The observed cellular morphology demonstrated squa
modifications have not only contributed to the enhancement of diag mous differentiation, which was distinguished by the presence of keratin
nostic precision but also facilitated the identification of targeted thera pearls and intercellular bridges. The classification of SCC involves the
peutic interventions that can enhance treatment efficacy. categorization of tumor cells into subtypes, namely, well-differentiated,
The comprehensive management of AD necessitates a multidisci moderately differentiated, or poorly differentiated, depending on the
plinary strategy, encompassing various treatment modalities such as level of differentiation exhibited. SCC typically manifests clinically with
surgery, chemotherapy, targeted therapy, immunotherapy, and radia symptoms indicative of airway obstruction, including cough, wheezing,
tion therapy. The selection of these interventions is contingent upon the and hemoptysis (the expectoration of blood) [A13]. The inhalation of
specific stage and molecular characteristics of the tumor. The primary foreign objects can lead to atelectasis, a condition characterized by the
approach for managing early-stage AD is surgical resection, whereas collapse of lung tissue. Additionally, this phenomenon may be accom
systemic therapies are commonly utilized for advanced or metastatic panied by paraneoplastic syndromes, which are clinical manifestations
disease. The implementation of targeted therapeutics, including ALK of cancer-induced hormonal or immunological impacts on organs
inhibitors and EGFR tyrosine kinase inhibitors, has significantly located at a distance from the primary tumor site. The management of
improved the prognosis and overall survival rates of individuals with SCC is contingent upon various factors, including the disease’s stage, the
particular molecular abnormalities. patient’s general well-being, and the existence of specific molecular
●Large Cell Carcinoma (LCC): modifications. The main therapeutic approach for early-stage SCC is
LCC is a variant of non-small cell lung cancer (NSCLC) that consti surgical resection. In instances where surgical intervention is not a
tutes an estimated 10-15% of the total incidence of lung cancer [A7]. viable option, the utilization of radiation therapy or a combination of
This condition is distinguished by the existence of sizable, undifferen chemotherapy and radiation therapy may be employed as a curative
tiated cells that lack the discernible characteristics observed in other modality.
varieties of lung cancer cells. LCC is widely recognized as a highly Systemic therapy plays a pivotal role in the management of advanced
aggressive variant of lung cancer characterized by its rapid growth and or metastatic SCC [A14]. Platinum-based chemotherapy regimens, such
propensity for metastasis. LCC is histologically distinguished by the as the combination of cisplatin or carboplatin with other cytotoxic
presence of notably large cells, exhibiting ample cytoplasm, prominent agents, are frequently employed in clinical practice. The treatment
nucleoli, and a heightened rate of mitosis [A8]. Tumor cells demonstrate landscape for NSCLC, specifically SCC, has undergone a significant
significant pleomorphism and do not display the differentiation patterns transformation in recent years due to the advent of targeted therapies
observed in other subtypes of non-small cell lung cancer, such as AD or and immunotherapies. Pharmaceutical agents that selectively target
SCC. The diagnosis of LCC is challenging due to its undifferentiated specific genetic aberrations, such as EGFR mutations or ALK rear
nature, which frequently results in the absence of specific markers or rangements, have demonstrated notable effectiveness in certain sub
identifiable features [A9]. The LCC has the potential to manifest in populations of individuals diagnosed with SCC. In addition to molecular
various regions of the lung and exhibits a propensity for peripheral targeted therapies, immune checkpoint inhibitors, including SCC, have
growth. This condition may manifest either as a singular mass or as exhibited significant efficacy in the management of NSCLC. The drugs
multiple nodules within the lung parenchyma. The tumor exhibits an pembrolizumab and nivolumab function by inhibiting immune system
increased tendency for metastasis, often disseminated to regional lymph checkpoints, thereby enhancing the immune system’s ability to identify
nodes and remote anatomical locations. The heightened aggressiveness and eliminate cancerous cells with greater efficacy. Despite the signifi
exhibited by LCC frequently results in a less favorable prognosis than cant progress made in the development of treatment modalities, timely
that of other subtypes of NSCLC. The therapeutic strategies employed for identification and overall prognosis of SCC continue to be difficult. The
the treatment of LCC are comparable to those utilized for other types of prognosis of patients with late-stage disease and its correlation with
NSCLC. The prognosis is contingent upon the disease’s stage at the time smoking status are generally inferior to those of patients with other
of diagnosis and the patient’s overall health status. The main therapeutic subtypes of NSCLC. Additional investigations are required to elucidate
approach for early-stage LCC is surgical resection, either as a standalone the fundamental molecular mechanisms that propel SCC and to
15
formulate more efficacious targeted therapeutic approaches. [5] A.A. Shah, H.A.M. Malik, A. Muhammad, A. Alourani, Z.A. Butt, Deep learning
ensemble 2D CNN approach towards the detection of lung cancer, Sci. Rep. 13 (1)
References
(2023) 2987.
[A1] De Marzo, A. M., Nelson, W. G., Meeker, A. K., & Coffey, D. S. [6] G.M.M. Alshmrani, Q. Ni, R. Jiang, H. Pervaiz, N.M. Elshennawy, A deep learning
(1998). Stem cell features of benign and malignant prostate epithelial architecture for multi-class lung diseases classification using chest X-ray (CXR)
cells. The Journal of urology, 160(6), 2381-2392. images, Alexand. Eng. J. 64 (2023) 923–935.
[7] S. Bharati, P. Podder, M.R.H. Mondal, Hybrid deep learning for detecting lung
[A2] Thandra, K. C., Barsouk, A., Saginala, K., Aluru, J. S., & Bar diseases from X-ray images, Inform. Med. Unlocked 20 (2020) 100391.
souk, A. (2021). Epidemiology of lung cancer. Contemporary Oncology/ [8] A. Bhandary, G.A. Prabhu, V. Rajinikanth, K.P. Thanaraj, S.C. Satapathy, D.
Współczesna Onkologia, 25(1), 45-52. E. Robbins, N.S.M. Raja, Deep-learning framework to detect lung abnormality–a
study with chest X-Ray and lung CT scan images, Pattern. Recognit. Lett. 129
[A3] Talia, K. L., Stewart, C. J., Howitt, B. E., Nucci, M. R., & (2020) 271–278.
McCluggage, W. G. (2017). HPV-negative Gastric Type Adenocarcinoma [9] S. Arvind, J.V. Tembhurne, T. Diwan, P. Sahare, Improvised light weight deep CNN
in Situ of the Cervix. The American Journal of Surgical Pathology, 41(8), based U-Net for the semantic segmentation of lungs from chest X-rays, Results Eng.
17 (2023) 100929.
1023-1033. [10] V. Rajasekar, M.P. Vaishnnave, S. Premkumar, V. Sarveshwaran, V. Rangaraaj,
[A4] Mukhopadhyay, S., & Katzenstein, A. L. A. (2011). Subclassi Lung cancer disease prediction with CT scan and histopathological images feature
fication of non-small cell lung carcinomas lacking morphologic differ analysis using deep learning techniques, Results Eng. 18 (2023) 101111.
[11] T. Mahmmod, N. Ayesha, M. Mujahid, A. Rehman, Customized deep learning
entiation on biopsy specimens: utility of an immunohistochemical panel framework with advanced sampling techniques for lung cancer detection using CT
containing TTF-1, napsin A, p63, and CK5/6. The American journal of scans, in: 2024 Seventh International Women in Data Science Conference at Prince
surgical pathology, 35(1), 15-25. Sultan University (WiDS PSU), IEEE, 2024, pp. 110–115.
[12] A. Rehman, M. Harouni, F. Zogh, T. Saba, M. Karimi, F.S. Alamri, G. Jeon,
[A5] Eroles, P., Bosch, A., Pérez-Fidalgo, J. A., & Lluch, A. (2012).
Detection of lungs tumors in CT scan images using convolutional neural networks,
Molecular biology in breast cancer: intrinsic subtypes and signaling IEEE/ACM Trans. Comput. Biol. Bioinform. 21 (4) (2023) 769–777.
pathways. Cancer treatment reviews, 38(6), 698-707. [13] R. Jain, P. Singh, M. Abdelkader, W. Boulila, Efficient lung cancer detection using
[A6] Li, T., Kung, H. J., Mack, P. C., & Gandara, D. R. (2013). Gen computational intelligence and ensemble learning, PLoS One 19 (9) (2024)
e0310882.
otyping and genomic profiling of non–small-cell lung cancer: implica [14] K.K. Patro, J.P. Allam, B.C. Neelapu, R. Tadeusiewicz, U.R. Acharya, M. Hammad,
tions for current and future therapies. Journal of Clinical Oncology, 31(8), P. Pławiak, Application of Kronecker convolutions in deep learning technique for
1039. automated detection of kidney stones with coronal CT images, Inf. Sci. 640 (2023)
119005.
[A7] Shaw, A. T., Yeap, B. Y., Mino-Kenudson, M., Digumarthy, S. R., [15] M. Hammad, M. ElAffendi, A.A. Ateya, A.A Abd El-Latif, Efficient brain tumor
Costa, D. B., Heist, R. S., ... & Iafrate, A. J. (2009). Clinical features and detection with lightweight end-to-end deep learning model, Cancers 15 (10)
outcome of patients with non–small-cell lung cancer who harbor EML4- (2023) 2837.
[16] M. Agarwal, G. Rani, A. Kumar, P. Kumar, R. Manikandan, A.H. Gandomi, Deep
ALK. Journal of clinical oncology, 27(26), 4247. learning for enhanced brain tumor detection and classification, Results. Eng. 22
[A8] Somerhausen, N. D. S. A., & Fletcher, C. D. (2000). Diffuse-type (2024) 102117.
giant cell tumor: clinicopathologic and immunohistochemical analysis [17] A.A.A. El-Latif, S.A. Chelloug, M. Alabdulhafith, M. Hammad, Accurate detection
of Alzheimer’s disease using lightweight deep learning model on MRI data,
of 50 cases with extraarticular disease. The American journal of surgical Diagnostics 13 (7) (2023) 1216.
pathology, 24(4), 479-492. [18] A. Ijaz, S. Akbar, B. AlGhofaily, S.A. Hassan, T. Saba, Deep learning for pneumonia
[A9] Delattre, O., Zucman, J., Melot, T., Garau, X. S., Zucker, J. M., diagnosis using cxr images, in: 2023 Sixth International Conference of Women in
Data Science at Prince Sultan University (WiDS PSU), IEEE, 2023, pp. 53–58.
Lenoir, G. M., ... & Thomas, G. (1994). The Ewing family of tumors–a
[19] K.K. Patro, J.P. Allam, M. Hammad, R. Tadeusiewicz, P. Pławiak, SCovNet: a skip
subgroup of small-round-cell tumors defined by specific chimeric tran connection-based feature union deep learning technique with statistical approach
scripts. New England Journal of Medicine, 331(5), 294-299. analysis for the detection of COVID-19, Biocybern. Biomed. Eng. 43 (1) (2023)
[A10] Volante, M., Birocco, N., Gatti, G., Duregon, E., Lorizzo, K., 352–368.
[20] S. Saber, K. Amin, P. Pławiak, R. Tadeusiewicz, M. Hammad, Graph convolutional
Fazio, N., ... & Papotti, M. (2014). Extrapulmonary neuroendocrine network with triplet attention learning for person re-identification, Inf. Sci. 617
small and large cell carcinomas: a review of controversial diagnostic and (2022) 331–345.
therapeutic issues. Human pathology, 45(4), 665-673. [21] S. Maurya, S. Tiwari, M.C. Mothukuri, C.M. Tangeda, R.N.S. Nandigam, D.
C. Addagiri, A review on recent developments in cancer detection using Machine
[A11] Lonardo, F., Li, X., Kaplun, A., Soubani, A., Sethi, S., Gadgeel, Learning and Deep Learning models, Biomed. Signal. Process. Control 80 (2023)
S., & Sheng, S. (2010). The natural tumor suppressor protein maspin and 104398.
potential application in non small cell lung cancer. Current pharmaceu [22] Lung Cancer Detection, Jillani, 2022, available online: https://www.kaggle.
com/datasets/jillanisofttech/lung-cancer-detection. Accessed [28-12-2023].
tical design, 16(16), 1877-1881. [23] Chest X-ray images (pneumonia), Paul Mooney, 2018, available online: https
[A12] Shimada, Y., Ishii, G., Nagai, K., Atsumi, N., Fujii, S., Yamada, ://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia.
A., ... & Ochiai, A. (2009). Expression of podoplanin, CD44, and p63 in Accessed [28-12-2023].
[24] K. Maji, S. Gupta, Evaluation of various loss functions and optimization techniques
squamous cell carcinoma of the lung. Cancer science, 100(11), 2054- for MRI brain tumor detection, in: 2023 International Conference on Distributed
2059. Computing and Electrical Circuits and Electronics (ICDCECE), IEEE, 2023, pp. 1–6.
[A13] Ryu, J. H., & Scanlon, P. D. (2001, November). Obstructive [25] P. Das, S. Gupta, J. Patra, B. Mondal, ADAMAX optimizer and categorical
crossentropy loss function-based CNN method for diagnosing Lung cancer, in: 2023
lung diseases: COPD, asthma, and many imitators. In Mayo Clinic Pro
7th International Conference on Trends in Electronics and Informatics (ICOEI),
ceedings (Vol. 76, No. 11, pp. 1144-1153). Elsevier. IEEE, 2023, pp. 806–810.
[A14] Oosting, S. F., & Haddad, R. I. (2019). Best practice in systemic [26] Olson, R. S., & Moore, J. H. (2016, December). TPOT: A tree-based pipeline
therapy for head and neck squamous cell carcinoma. Frontiers in optimization tool for automating machine learning. In Workshop on Automatic
Machine Learning (pp. 66-74). PMLR.
Oncology, 9, 815. [27] J.D. Romano, T.T. Le, W. Fu, J.H. Moore, TPOT-NN: augmenting tree-based
automated machine learning with neural network estimators, Genet. Program.
References Evolvable Mach. 22 (2021) 207–227.
[28] G. Mohandass, G.H. Krishnan, D. Selvaraj, C. Sridhathan, Lung cancer classification
using optimized attention-based convolutional neural network with DenseNet-201
[1] J. Maher, Chimeric antigen receptor (CAR) T-cell therapy for patients with lung transfer learning model on CT image, Biomed. Signal. Process. Control 95 (2024)
cancer: current perspectives, Onco Targets Ther. (2023) 515–532. 106330.
[2] S.K. Lakshmanaprabu, S.N. Mohanty, K. Shankar, N. Arunkumar, G. Ramirez, [29] L.J. Crasta, R. Neema, A.R. Pais, A novel deep learning architecture for lung cancer
Optimal deep learning model for classification of lung cancer on CT images, Future detection and diagnosis from computed tomography image analysis, Healthc. Anal.
Gen. Comput. Syst. 92 (2019) 374–382. 5 (2024) 100316.
[3] P.M. Shakeel, M.A. Burhanuddin, M.I. Desa, Lung cancer detection from CT image
using improved profuse clustering and deep learning instantaneously trained
neural networks, Measurement 145 (2019) 702–712.
[4] I. Shafi, S. Din, A. Khan, I.D.L.T. Díez, R.D.J.P. Casanova, K.T. Pifarre, I. Ashraf, An
effective method for lung cancer diagnosis from ct scan using deep learning-based
support vector network, Cancers 14 (21) (2022) 5457.
16
[30] A. Saha, S.M. Ganie, P.K.D. Pramanik, R.K. Yadav, S. Mallik, Z. Zhao, VER-Net: a [32] V.J. Jaya, S. Krishnakumar, Multi-classification approach for lung nodule detection
hybrid transfer learning model for lung cancer detection using CT scan images, and classification with proposed texture feature in X-ray images, Multimed. Tools.
BMC Med. Imaging 24 (1) (2024) 120. Appl. 83 (2024) 3497–3524, https://doi.org/10.1007/s11042-023-15281-5.
[31] S. Ashwini, J.R. Arunkumar, R.T. Prabu, N.H. Singh, N.P. Singh, Diagnosis and
multi-classification of lung diseases in CXR images using optimized deep
convolutional neural network, Soft Comput. 28 (7) (2024) 6219–6233.
17

1-s2.0-S2590123024017006-main (1)

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

1-s2.0-S2590123024017006-main (1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1-s2.0-S2590123024017006-main (1)

Uploaded by

Copyright:

Available Formats

Results in Engineering 24 (2024) 103448

Contents lists available at ScienceDirect

Automated lung cancer detection using novel genetic TPOT feature

1. Introduction images, reducing the requirement for manual feature engineering or

Fig. 1. Chest CT-scan image samples.

The subsequent extraction of relevant features from the amalgam­

4.1. First deep model for classifying X-ray images

A deep learning model designed for the precise classification of chest

Fig. 4. General diagram of our method.

Input Shape (128, 128, 3)

function is suitable for capturing temporal dependencies within the

f(x) = 1/(1 + e(− x) ) (4)

4.2. Second deep model developed to classify CT images

Our second deep learning model is designed to classify chest CT

Input Shape (256, 256, 1)

Learning Rate 0.001 0.001

model. To optimize lung cancer detection, we employed TPOT to

Fig. 7. Pseudocode of the proposed combined method.

scores for different models, highlighting the KNeighbors classifier as

Fig. 10. Confusion matrix for the first model.

TN). In this case, the accuracy can be computed as (840 + 279) /

Fig. 12. Confusion matrix obtained for the second model.

Fig. 12 shows the confusion matrix that represents the classification

5. There is a decrease to 50% accuracy at epoch 10, followed by an in­

You might also like

The subsequent extraction of relevant features from the amalgam

5. There is a decrease to 50% accuracy at epoch 10, followed by an in