0% found this document useful (0 votes)
25 views7 pages

IEEE Camera Ready Paper

The document presents a study on using the EfficientNetB1 model for automated lung cancer detection through histopathological biopsy images. It details the methodology involving data preprocessing, feature extraction, and classification, achieving significant accuracy in identifying cancer subtypes. The findings highlight the model's potential to enhance diagnostic processes and improve patient outcomes in oncology.

Uploaded by

Anish Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

IEEE Camera Ready Paper

The document presents a study on using the EfficientNetB1 model for automated lung cancer detection through histopathological biopsy images. It details the methodology involving data preprocessing, feature extraction, and classification, achieving significant accuracy in identifying cancer subtypes. The findings highlight the model's potential to enhance diagnostic processes and improve patient outcomes in oncology.

Uploaded by

Anish Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

EfficientNetB1 Model for Lung Cancer Detection

using Biopsy Images

Jess John Aaradhya Deotale


Department of Computer Engineering Department of Computer Engineering
Don Bosco Institute of Technology Don Bosco Institute of Technology
Mumbai, India Mumbai, India
[email protected] [email protected]

Figo Fernandez Dipti Jadhav


Department of Computer Engineering Department of Computer Engineering
Don Bosco Institute of Technology Don Bosco Institute of Technology
Mumbai, India Mumbai, India
[email protected] [email protected]

Abstract—Lung cancer, a pervasive and life-threatening techniques, computers can already interpret and handle
ailment, necessitates early and precise diagnosis. This high-dimensional information, such as images, 3D body
research endeavors to employ Convolutional Neural scans, and motion pictures. Machine learning methods
Networks (CNNs) to automate the detection of lung cancer in comprise a number of procedures for managing digital data.
histopathological biopsy images, addressing challenges Initially, pre-processing is employed to eliminate any
associated with timely and accurate diagnosis. Given the potentially affecting noise from the raw photos, ensuring
error-prone and time-consuming nature of manual their precision. Every unique feature of the picture that was
assessment by pathologists, an automated approach becomes saved throughout the preparatory stage is obtained during
imperative. The study encompasses a comprehensive scope,
the feature extraction step. After that, in the feature selection
including enhancing interpretability, classifying specific lung
cancer subtypes, real-time intraoperative analysis, and
process, the most important traits are identified from the
extending the application to other cancer types. The extracted data [11].
methodology involves the utilization of a pre-trained Efficient The code represents a comprehensive deep learning
Net-based model for image classification, showcasing its approach to lung cancer classification using a pre-trained
efficacy in discerning between benign and malignant lung EfficientNetB1 model. It begins by setting up the necessary
cancer cells, as evidenced by robust training results. environment, including configuring random seeds and
Moreover, the model exhibits potential for personalization in
defining constants. The dataset, consisting of lung images
lung cancer diagnosis and treatment. Research findings
categorized into three classes (No Cancer, Adenocarcinoma,
affirm that machine learning models, specifically the
EfficientNet-based architecture, markedly improve lung Squamous Cell Carcinoma), is then prepared for training
cancer detection accuracy. The model's proficiency in subtype and validation. A Convolutional Neural Network (CNN)
differentiation and its capacity for real-time surgical analysis model is constructed, incorporating a pre-trained
represent significant strides in lung cancer diagnostics and EfficientNetB1 base with frozen weights, followed by
treatment. In conclusion, this project addresses the critical additional dense layers for feature extraction and
imperative for accurate and timely lung cancer diagnosis, classification. The training process involves compiling the
providing a promising advancement in combatting this model with the Adam optimizer and sparse categorical cross
devastating disease. The developed model holds entropy loss, and training is monitored using callbacks,
transformative potential in radiology and oncology, serving as including learning rate reduction on plateau. The model's
a valuable tool for medical professionals and contributing to performance is visualized through accuracy and loss plots.
enhanced patient outcomes.
The code also includes data visualization elements, such
Keywords—EfficientNet-based model, Convolutional as a bar plot illustrating the distribution of classes in the
Neural Networks (CNNs), Cancer detection. dataset. A confusion matrix and a classification report are
generated to evaluate the model's performance on a
validation subset. Furthermore, the script demonstrates how
I. INTRODUCTION to save and load model weights for future use. Finally,
sample predictions on the validation dataset are visualized
The most recent IARC research estimates that there will to assess the model's ability to correctly classify lung
be over 19.29 million new instances of cancer worldwide in images. Overall, the code provides a holistic framework for
2021, with the United States accounting for roughly 11.8% building, training, evaluating, and visualizing a deep
of these cases, making it the country with the second-highest learning model for lung cancer classification.
percentage of new cases worldwide [9]. At 11.4 percent of
all cancer cases, lung cancer is still significantly more
common than other cancer forms. [10]
II. REVIEW OF THE LITERATURE
There are numerous phases involved in diagnosing an In [1], the paper introduces a hybrid deep learning model
illness, from gathering samples to educating specialists to
designed for the classification of lung tissue images from
make decisions based on the findings. Grouping and
the Lung and Colon Cancer Histopathological Image
forecasting of many biological data types have been carried
out with AI techniques. By utilizing deep learning (DL) dataset. This model incorporates three sub-extractors: the

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


inception_v3 network, HOG, and Daisy, which extract objective of improving early detection and prediction of
image features from various perspectives. The extracted lung cancer through deep learning-based models.
features are subsequently input into a 3-layer Fully
In [7] Lung cancer is a major concern, with AI models being
Connected Network for classification. Notably, the model
used to assist in classifying cancer cells. EfficientNetB2
achieves an impressive accuracy of 99.97% on the
showed superior accuracy, and clustering techniques were
validation set, demonstrating high accuracy across different
used for grading. The study aims to alleviate the burden on
cancer types. In summary, the proposed model effectively
pathologists, especially in regions with limited access to
classifies lung tissue images with precision.
pathological centers. Histopathological images provide
In [2], the paper explores lung carcinoma detection through detailed insights into cancer grade, enabling the
deep learning methods, specifically employing Recurrent development of accurate AI models. Various datasets and
Neural Networks (RNN) and Convolutional Neural models have achieved accuracies ranging from 86% to 99%.
Networks (CNNs). These methods exhibit potential in
In [8], The research presents a deep-learning approach for
precisely identifying and classifying nodules in chest
lung cancer detection using histopathological images. The
radiographs or CT scans. To enhance detection results and
proposed CNN model outperforms previous methods,
computational efficiency, the study incorporates Particle
achieving over 99.5% accuracy. The study validates the
Swarm Optimization (PSO). The PSO-RNN method for
model's effectiveness using a dataset of lung histopathology
lung cancer diagnosis is developed using training and
images. The research contributes a lightweight deep-
testing data from CT scans of 19 patients, with the goal of
learning strategy for accurate lung cancer diagnosis,
reducing training time and improving detection accuracy.
integrating feature extraction and classification. The
In [3], lung cancer is acknowledged as a lethal disease, findings demonstrate the potential of end-to-end CNN-
prompting the utilization of deep learning to identify small based systems for automated computer-aided lung cancer
cells in histopathological images. The significance of data detection.
preprocessing in categorizing pathological images is
With such small data sets, earlier research produced
emphasized, and data visualization is employed to enhance
somewhat meager outcomes. In addition, a number of prior
the comprehension of large datasets. Artificial neural
procedures are intricate and suffer from overfitting and
networks are utilized for categorizing medical images, and
imbalance issues. In this study, we present a novel approach
the paper underscores the importance of early detection of
that can accurately overcome the aforementioned
lung cancer for improved treatment options.
shortcomings of previous work in this field on both small
In [4], the research centers on employing Deep Learning and large data sets. Furthermore, the proposed model is an
(DL) and Digital Image Processing (DIP) to automate the end-to-end lightweight model that lowers computing
identification of cancer cells, specifically targeting colon complexity.
and lung cancer. The study illustrates the efficacy of AI-
guided diagnosis in precisely classifying cancer cells and
expediting the diagnostic process. Furthermore, the III. APPROACH
utilization of Convolutional Neural Networks (CNN) has
This section provides an overview of the dataset used in
yielded promising results, achieving a notable 96%
this study. An overview of the recommended lung cancer
accuracy in identifying polyps in colonoscopy images. detection technique is then given.
In [5], a hybrid lung cancer classification model is A. Description of the Histopathological image dataset
introduced, attaining an impressive accuracy of 99.96% used
through the application of deep learning. The model
incorporates Inception_v3, HOG, and DAISY feature We used the LC25000 dataset [22] to provide a score to
extraction methods. Implemented in Python 3.7.9 with our study. This dataset's 25,000 images were divided into 5
PyTorch 1.7.1 and Torchvision 0.8.2, the study utilizes the groups, each containing 5,000 pictures. Every JPEG image
Lung and Colon Cancer Histopathological Image dataset. is 768 pixels by 768 pixels in resolution. The lung
Evaluation metrics encompass accuracy, weighted average squamous cell carcinomas, lung adenocarcinomas, benign
precision, recall, and F1-Score. The visualization of lung tissues, and colon adenocarcinomas are the five
confusion matrix results for models employing different categories. We decided on the class associated with lung
feature extractors is also presented. cancer. All the data in this study have been standardized to
have dimensions of (32, 180, 180, 3). Twelve thousand
In [6], the document outlines a study focused on classifying images, or 0.8 percent of the total, were set aside for
lung cancer images through the application of various training, and three thousand images, or 0.2 percent, were
machine learning algorithms. It provides in-depth details on set aside for testing. Figure 1 displays pictures from the
the catBoost algorithm, as well as linear discriminant LC2500 lung histopathology sample. The first set of
analysis (LDA), linear regression (LR), and classification images is labeled for adenocarcinoma, the second for
and regression trees (CART). The research employs a benign tissue, and the third for squamous cell carcinoma.
dataset comprising 15,000 lung cancer histopathological
images and attains a substantial prediction accuracy of
99.80% with the CatBoost algorithm. Additionally, the
document delineates the methodology, encompassing data
analysis, image processing, and classification, with the
the validation dataset. These analyses contribute to a
i) comprehensive understanding of the model's learning
process and predictive capabilities, enhancing the
interpretability and reliability of the classification results.

ii)

iii)

Fig. 1. Images of lung histopathological samples: (i)


adenocarcinoma, (ii) benign tissue, (iii) squamous cell carcinoma.

B. The Proposed Approach


Our findings present a comprehensive approach to
developing and evaluating a deep learning model for the
classification of lung images into three distinct categories:
"No Cancer," "Adenocarcinoma," and "Squamous Cell
Carcinoma." The methodology encompasses several key
stages, each aimed at ensuring robust model performance
and providing insights into the classification process.
Initially, the paper details the data preprocessing and Fig. 2. System Architecture of the CNN model.
exploration phase, wherein the lung image datasets are
loaded and processed using TensorFlow's image dataset
utilities. IV. RESULTS
The datasets are split into training and validation sets, In this study, histopathology images were utilized to
and an analysis of class distribution is conducted to identify construct a deep learning framework aimed at classifying
potential data imbalances. Visualizations, including bar lung cancer. The proposed convolutional neural network
plots, are employed to illustrate the distribution of images (CNN) model was assessed using histological images of
across different classes, facilitating a deeper understanding lung cancer employing the EfficientNetB1 architecture,
of the dataset's characteristics. Subsequently, the model and the outcomes were subjected to thorough analysis.
construction and training strategy are described, focusing on
the utilization of transfer learning with the EfficientNetB1 The data were partitioned using an 80-20 split for training
architecture as the base model. Additional dense layers are and testing, respectively, employing Python 3.7. Key
appended to the base model to facilitate classification. The performance metrics, including accuracy, precision, recall,
training process is managed through the implementation of and F1-score, were prioritized for evaluating the system's
appropriate callbacks, including early stopping and learning generalization and classification capabilities. The dataset
rate reduction, to prevent overfitting and optimize model consisted of a total of 5000 images across three classes: No
performance. Performance metrics such as accuracy and Cancer, Adenocarcinoma Class, and Squamous Cell
loss are monitored and visualized over 25 epochs to track Carcinoma Class.
the model's learning progress. Figure 2 illustrates the The specific definitions of these metrics are elaborated
suggested model's architecture. Following model training, below:
the paper discusses the evaluation phase, wherein the
trained model is assessed using the validation dataset.  Precision =
Sample predictions are generated and visualized to provide
qualitative insights into the model's performance.  Recall =
Additionally, quantitative analyses, including the  Accuracy =
generation of a confusion matrix and a classification report,
offer a more in-depth understanding of the model's  F1-Score =
. ∗( )
classification accuracy and potential areas for improvement.
Moving on, we further explore avenues for further analysis
Where Tp = True Positive, Fp = False Positive,
and visualization, including the visualization of learning
Tn = True Negative and Fn = False Negative
rate schedules and the generation of sample predictions on
A.GUI Results and Analysis Figure 4 displays a biopsy image of adenocarcinoma was
inserted which has been identified correctly by the model
In this section we present the results that we have acquired subsequently showing 100 percent confidence in its
from the processing of the lung cancer detection models on prediction.
our graphical user interface .The GUI that we have
incorporated for this project is the Gradio library for
seamless and efficient creation of web apps for V. DISCUSSION
visualization of machine learning models.
A. EFFICIENTNETB1 CNN MODEL PERFORMANCE
In this section, we delve into the assessment of the
EfficientNetB1 CNN Model's performance across 25
epochs of training.
A notable trend observed was the consistent improvement
in both Accuracy and Loss of the CNN Model with each
epoch, with occasional exceptions, as depicted in Figure 4.

Fig. 3. GUI representation for EffecientNetB1 model for no-


cancer biopsy image Fig. 5. Accuracy and loss of the EfficientNetB1 model.
Likewise, the Confusion Matrix, which delineates the
Figure 3 offers a comprehensive depiction of the predicted and actual outcomes of the model for each of the
integration between biopsy image input and the graphical three classes, provided the subsequent classification
user interface (GUI) for data visualization .The user is results: -
given an input field to insert the biopsy images which can
Table. 1. Output of the model
further on be submitted or cleared off to feed a new image
to the model , once the image has been processed the results
show up adjacent to the image with the class of the lung Class Precision Recall F1- Support
cancer corresponding to the image there also a flag option Score
to flag or mark certain predictions for ambiguous prediction No Cancer 0.33 0.33 0.33 981
or prediction which need more scrutiny.
For example the above biopsy image fed to the model was Adenocarci 0.33 0.33 0.33 977
one of no cancer and the model has accurately determined noma
the cancer class and conveyed this information alongside Squamous 0.36 0.36 0.36 1042
the confidence level of its prediction. Cell
Carcinoma
Accuracy 0.34 3000
Macro 0.34 0.34 0.34 3000
Average
Weighted 0.34 0.34 0.34 3000
Average

Fig. 4. GUI representation for EfficientNetB1 model for


adenocarcinoma
epoch, the val_accuracy function displayed an opposite
behavior. The same inconsistency was observed in the Loss
function as well, as illustrated in Figure 9.

Fig. 6. Summary of the EfficientNetB1 CNN architecture. Fig. 9. Accuracy and loss of the ResNet50 model.

B. VGG16 CNN MODEL PERFORMANCE The ResNet50 CNN architecture is summarized in Figure
10, which displays four layers: Lambda, Functional, Dense
In this section, we delve into the assessment of the 2, and Dense 3: -
VGG16 CNN Model's performance across 25 epochs of
training. Unlike the EfficientNetB1 Model, the VGG16
Model exhibited significant fluctuations in both Accuracy
and Loss with each epoch; especially in the val_accuracy
and val_loss functions. While these fluctuations could
suggest a general trend of improvement or decline, it
cannot be conclusively said so, as depicted in Figure 7.

Fig. 10. Summary of the ResNet50 CNN architecture.

D. COMPARISION BETWEEN THE THREE MODELS

In this section, we will do an in-depth comparison of the


Fig. 7. Accuracy and loss of the VGG16 model. EfficientNetB1, ResNet50 and VGG16 models. We
explore the accuracy and loss metrics of these models and
The VGG16 CNN architecture is summarized in Figure 8, present our findings in Figure 11. Our analysis reveals a
which displays four layers: Lambda, Functional, Dense, distinct contrast in both accuracy and loss among the three
and Dense 1: - models. The EfficientNetB1 model consistently
demonstrates higher accuracy than the other two models,
with ResNet50 closely following, although it experiences
notable fluctuations in accuracy at the beginning and end
of the training. Conversely, the VGG16 model exhibits an
oscillating trend in accuracy, with an overall inconclusive
pattern that gradually stabilizes towards the end of the
training period.

Fig. 8. Summary of the VGG16 CNN architecture.

C. RESNET50 CNN MODEL PERFORMANCE

In this section, we delve into the assessment of the


ResNet50 CNN Model's performance across 13 epochs of
training.
The results of the ResNet50 CNN model were
inconclusive, with no clear trend observed in both
Accuracy and Loss. Although accuracy increased with each
Fig. 11. Comparison of Accuracy and Loss for all three models
Fig. 13. Confusion Matrix of VGG16 Model

Now, as for the comparison of the Confusion Matrices of


all the three models, we see that barring some insignificant
differences, all three models successfully align with the
predicted and the true labels; indicating that they were
largely successful in classifying the lung cancer biopsy
images into the Adenocarcinoma, Squamous Cell
Carcinoma and Benign Tissue labels; as depicted in Figures
12,13 and 14.

Fig. 14. Confusion Matrix of EfficientNetB1 Model

VI.CONCLUSION
To categorize lung tissue images from the Lung and Colon
Cancer Histopathological Image dataset (LC25000) [12], a
hybrid deep learning model is proposed in this study. The
768 x 768 photos that were obtained from LC25000 were
Fig. 12. Confusion Matrix of ResNet50 Model subsequently scaled to 224 x 224 so that they could be used
as model input. A feature extractor and a classifier make up
the model.
The main contribution of this work is the suggestion of a
lightweight deep-learning approach for end-to-end CNN-
based lung cancer diagnosis using EfficientNetB1 model.
After comparison with other models such as ResNet50 and
VGG16, EfficientNetB1 was deemed to be more effective
in all parameters viz. Accuracy, Loss, Precision, Recall,
etc. The efficacy of the suggested system is evaluated and
contrasted with other methods in this field using a database
of histopathology images. According to the results, our [10] PAHO and WHO, available from:
method outperformed most previous deep-learning lung https://www.paho.org/en/topics/cancer
cancer diagnosis methods. Our model's highest accuracy is (Last accessed: 01/02/2024)
0.995 percent. Compared to earlier deep models, the [11] Acs, B., Rantalainen, M. and Hartman, J., 2020. Artificial
proposed method for diagnosing lung cancer is more robust intelligence as the next step towards precision pathology. Journal of
internal medicine, 288(1), pp.62-81.
and efficient. In the future, we plan to investigate our deep
[12] Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand,
model's performance on more datasets. In addition, we may
L.A. and Mastorides, S.M., 2019. Lung and colon cancer
apply optimization strategies in conjunction with our deep
model to identify the most optimally recovered deep histopathological image dataset (lc25000). arXiv preprint
arXiv:1912.12142.
features.

Looking forward for future applications, the methodology


could be extended to include other cancer types as well,
along with integrating multi-modal data for more
comprehensive analysis, exploring personalized medicine
approaches, enabling real-time intraoperative analysis,
validating the model on diverse datasets, and investigating
optimization strategies for further performance
enhancement. The potential of deep learning models for
detection of cancer is revolutionary and will enhance
patient outcomes, and contribute to the advancement of
science in the future.

REFERENCES
[1] Chen, M., Huang, S., Huang, Z. and Zhang, Z., 2021, September.
Detection of lung cancer from pathological images using cnn model.
In 2021 IEEE International Conference on Computer Science,
Electronic Information Engineering and Intelligent Control
Technology (CEI) (pp. 352-358). IEEE.
[2] Sakr, A.S., 2022, October. Automatic Detection of Various Types of
Lung Cancer Based on Histopathological Images Using a
Lightweight End-to-End CNN Approach. In 2022 20th International
Conference on Language Engineering (ESOLEC) (Vol. 20, pp. 141-
146). IEEE.
[3] Nannapaneni, D., Saikam, V.R.S.V., Siddu, R., Challapalli, V.M.
and Rachapudi, V., 2023, February. Enhanced Image-based
Histopathology Lung Cancer Detection. In 2023 7th International
Conference on Computing Methodologies and Communication
(ICCMC) (pp. 620-625). IEEE.
[4] Cañada, J., Cuello, E., Téllez, L., García, J.M., Velasco, F.J. and
Cabrera, J., 2022, November. Assistance to lung cancer detection on
histological images using Convolutional Neural Networks. In 2022
E-Health and Bioengineering Conference (EHB) (pp. 1-4). IEEE.
[5] Maheshwari, U., Kiranmayee, B.V. and Suresh, C., 2022,
December. Diagnose Colon and Lung Cancer Histopathological
Images Using Pre-Trained Machine Learning Model. In 2022 5th
International Conference on Contemporary Computing and
Informatics (IC3I) (pp. 1078-1082). IEEE.
[6] Mohalder, R.D., Sarkar, J.P., Hossain, K.A., Paul, L. and Raihan,
M., 2021, September. A deep learning based approach to predict
lung cancer from histopathological images. In 2021 international
conference on electronics, communications and information
technology (ICECIT) (pp. 1-4). IEEE.
[7] Radhakrishnan, J.K., Aravind, K.S., Nambiar, P.R. and Sampath, N.,
2022, June. Detection of Non-small cell Lung Cancer using
Histopathological Images by the approach of Deep Learning. In
2022 2nd International Conference on Intelligent Technologies
(CONIT) (pp. 1-11). IEEE.
[8] Mridha, K., Islam, M.I., Ashfaq, S., Priyok, M.A. and Barua, D.,
2022, November. Deep Learning in Lung and Colon Cancer
classifications. In 2022 International Conference on Advances in
Computing, Communication and Materials (ICACCM) (pp. 1-6).
IEEE.
[9] Wild, C.P., Weiderpass, E. and Stewart, B.W., 2020. Cancer
research for cancer prevention World Cancer Report. World Health
Organization: Lyon, France.

You might also like