Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
Abstract— Lung cancer is the most deadly illness for prediction performance in disease classification as well as
patients, and it is an incurable condition. Patient survival rates imaged-based detection. The DL models have crucially
are greatly increased by early cancer prognosis and contributed to medical image determination for both
identification. The Computed Tomography (CT) scan is an segmentation and classification [11][12].
important imaging tool for a lung cancer diagnosis. On the other
hand, a manual CT scan inspection is laborious and prone to
mistakes. Lung cancer can be impulsively recognized by II.LITERATURE SURVEY
utilizing an image processing technique of Computer-Aided Wankhade and Vigneshwari [13] implemented early
Diagnosis (CAD). In recent years, various researchers identified diagnosis of lung cancer using Hybrid Neural Network-Based
various ways for accurate prediction by Using techniques for Cancer Cell Detection (CCDC-HNN). A hybrid method of an
deep learning (DL) and machine learning (ML). This research advanced Three-Dimensional CNN (3D-CNN) with RNN was
represents a number of methodologies like Lung cancer disease utilized for classification for enhanced diagnosis accuracy. A
prediction analysis makes use of Artificial Neural Network DNN was applied to extract the characteristics from the CT
(ANN), Logistic Regression (LR), Support Vector Machine picture and the classification approach classified the lung
(SVM), K-means clustering, Convolutional Neural Network
cancer into two main tumor cases such as benign and
(CNN) Recurrent Neural Network (RNN) and Long Short-Term
Memory (LSTM).
malignant. However, 3D-CNN needed a greater number of
computational resources to extract features because of the
Keywords—computer-aided diagnosis, computed tomography, large size of medical images.
deep learning, lung cancer, machine learning Maleki and Niaki [14] introduced a CNN to process the
CT scan and ANN was used to classify images. Image resizing
I.INTRODUCTION and denoising techniques were utilized in to image process. A
In the chronic, global health diseases are serious watershed method was used to extract the background and
complications and it is the heterogeneous diseases with foreground images in image segmentation. The dimensional
various clinically significant subtypes [1]. Lung cancer is a reduction as well as selection of features were performed to
chronic disease, which is a deadly cancer according to classify the cancer by utilizing three ML algorithms. The
pervasiveness and mortality. Cancer can strike anyone at any model performance was significantly enhanced due to the ML
age and in any portion of the body. By looking at the quantity algorithm being focused on the lung. However, this method
of CT scanned image, early diagnosis of lung cancer is misclassified some images due to the limited amount of data
important and helps save lives. [2]. A CT scan, X-ray, provided during training.
magnetic resonance imaging (MRI) and Positron Emission Shafi et al. [15] developed a new approach to classify lung
Tomography (PET) can all be used to diagnose lung cancer nodules utilizing CT images according to Maximum
cancer.[3][4]. A CT-scan technique provides a greater number Projection Intensity (MIP). This approach developed a DL-
of positive instances for lung cancer diagnosis among all those enabled SVM. The introduced CAD approach identified
systems [5]. However, manual examination of this disease pathological as well as physiological modifications in Lung
consumes much time as well as occurring misclassification cancer soft tissue cross-section. Initially, an approach was
and may result in inter-observer variability [6]. trained to identify cancer by utilizing chosen profile values
The efficient method to identify cancer using CT scans as and then tested and validated utilizing CT scans for an
well as radiotherapy is a significant treatment. Moreover, CT accurate diagnosis of lung cancer. However, this approach
scan gives minimum computational cost, faster imaging, required a greater amount of data and a feature set for training
extensive availability as well as efficient categorization as well and a model had less robust on noisy data.
detection [7][8]. Computer-aided diagnosis (CAD) is a Pradhan et al. [16] presented an optimized DL method
popular diagnostic tool for the early detection of medical with a new feature extraction approach according to feature
diseases. The Artificial Intelligence (AI) approach takes place integration for lung cancer diagnosis. A correlation-based
in an early diagnosis of fatal diseases, particularly in lung optimized weight feature extraction approach called Self
cancer detection [9][10]. Machine Learning (ML) techniques Adaptive Sea Lion Optimization (SA-SLnO) was introduced,
are utilized for the recognition and detection of various picture which utilized a meta-heuristic approach to optimized a
and signal kinds in biological applications. Hence, medical weight. This approach integrated the t-SNE features as well as
image determination based on DL approaches can be Principal Component Analysis (PCA) by weight optimization.
supported in various ways such as segmentation, detection and This approach significantly enhanced the accuracy with
classification. Recently, DL methods have improved
Authorized licensed use limited to: VIT-Amaravathi campus. Downloaded on September 16,2024 at 18:03:43 UTC from IEEE Xplore. Restrictions apply.
2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)
minimum computational time. However, the performance of The ML is divided into two types such as supervised and
this approach was affected due to no longer search time. unsupervised ML approaches. Different types of ML
techniques are utilized such as SVM, RF, ANN, and K-means
Nanglia et al. [17] developed an integration of Utilizing
clustering. These algorithms can be described in the
SVM in conjunction with Feed-Forward Back Propagation following.
Neural Network (FFBPNN) to identify lung cancer. A three-
block approach was introduced for classification as well and 1) Support Vector Machine: Kareem et al. [18] developed
another block extracted the features by SURF approach a system method for identifying lung cancer in the gathered
through optimization utilizing Genetic Algorithm (GA) as CT scan data of IQ-OTH/NCCD. This approach contains
well as the terminal block was for classification by FFBPNN. three major procedures including feature extraction,
The FFBPNN achieved a better accuracy result by integration
segmentation, and data enhancement. In pre-processing, the
of GA and SVM. However this approach was only focused on
larger problems and not suited for the simpler problems. Gaussian filtering approach was employed to both reduce
noise and improve the image quality. The Otsu thresholding
approach was utilized for the extraction of the features. The
III.TAXONOMY
SVM was utilized for a classification. The SVM classified the
The Lung Cancer disease prediction is mainly divided into collected data into three main classes such as normal, benign
ML and DL techniques. These two categories contain various as well and malignant. The number of SVM kernels as well
methods for predicting the disease of lung cancer and
as extraction of feature approaches were estimated.
providing better results. Fig.1 represents the taxonomy for
lung cancer disease prediction. 2) Logistic Regression: Yang et al. [19] implemented a
radiomics nomogram approach according to CT radiomic
features as well as clinicopathological features to identify a
tumor. The Random Forest (RF) approach was utilized to
select radiomic features from CT lung cancer images. The
multivariate Logistic Regression (LR) approach was utilized
to forecast Immune Checkpoint Inhibitors' (ICIs) Durable
Clinical Benefit (DCB)by combining the Rad-score and Cox
proportional hazards regression analysis in many variables.
The Radiomics approach was performed with a similar Rad-
score as the radiomics approach. The multivariate Cox
proportional hazard regression determination was developed
to identify a Progression-free survival. The model has
relatively a smaller cohort size due to its relatively
retrospective data nature as well as utilizing a single-cohort
center.
3) Artificial Neural Network: Apsari et al. [20] presented
an automatic digital classification algorithm of ANN as well
as Self Organizing Map (SOM) approach. The images were
exposed to segmentation through thresholding to acquire
areas of lung cancer which was followed by morphological
operations like erosion as well as dilation. The area, shape as
Fig. 1. Represents the Taxonomy of Lung Cancer prediction well and perimeter of the imagers were extracted and then
provided to the classifier of ANN. The ANN was used to
A. Lung Cancer prediction classify the disease into 3 groups such as such as According
to their similarities, lung cancer in stage II lung cancer in
Patients with lung cancer have a greater deathrate due to
this incurable illness. Patients chances of survival can be stage I and healthy lung of features.
increased by early detection of lung cancer. There are various C. Random Forest with K-means Clustering: Bhattacharjee
diagnosis approaches have been developed such as image et al. [21] presented a distinct method for malignant
processing and computer-based diagnosis for the prediction of nodule detection from CT images as well as a visualized
lung cancer. However, these methods are still challenged for device to depict the extracted features. This approach
early and accurate diagnosis. The lung cancer prediction is deployed the hybrid method based on a K-means
performed by using both ML and DL techniques and these can visualization tool and an optimized RF classifier, which
be deeply discussed following. attempted to visualize the malignant and non-malignant
clusters and adjust the hyperparameter model to yield the
B. Machine Learning techniques best results. The incorporation of the By using a
The ML algorithms are the mathematical models utilized visualization approach, it was possible to identify the
to understand or find the underlying patterns that are there in cancerous cluster and make deductions from it. Deep
the data. A collection of computer algorithms known as Learning Techniques
machine learning (ML) can recognize patterns in data, classify The DL is a branch of ML and which is based on ANN.
it, and make predictions based on the data already available. The DL can learn difficult problems and relationships within
Authorized licensed use limited to: VIT-Amaravathi campus. Downloaded on September 16,2024 at 18:03:43 UTC from IEEE Xplore. Restrictions apply.
2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)
the data. The different types of DL are utilized such as DCNN, approach examined an NSCLC classification process with
RNN, LSTM, and DNN. These algorithms can be described radiomic determination of 3D tumor ROI. The various pre-
in the following. trained CNN models were utilized for the classification
1) Deep Convolutional Neural Network: Khan and process. The CNN with LSTM approach was utilized to fuse
Ansari [22] introduced a Lung cancer classification using the data over an inherent spatial coherency of the tumor’s CT
Deep CNN (DCNN). The preprocessing was completed portions. The CNN with LSTM approach was utilized for the
before applying an input CT image to a network approach to classification of CT images. The classifier classified CT
design equal sizes as well as images. The DCNN was images into two major cases called Adenocarcinoma (AC)
classified into two phases such as training as well as testing. and squamous cell carcinoma (SCC).
In training, this approach classified lung cancer into 4) Deep Neural Network: Shakeel et al. [25] a new image
cancerous (malignant) as well as noncancerous (benign) lung processing as well as ML approach was introduced for
nodules to determine the classification accuracy. In testing, prediction oflung cancer. CT scans of non-small cell lung
an unknown image in a network was provided as an input to cancer were gathered in order to make a prognosis about the
the classification model. disease. A multi-level brightness-conserved model was used
2) Recurrent Neural Network: Gunjan et al. [23] aimed to determine the obtained images. This model effectively
to feature extraction and training for the identification of lung analyzed every pixel, removed noise, and improved the
cancer, which utilized categorization approaches to mine quality of the image. The Deep Neural Network (DNN) was
relevant as well as accurate data. Recurrent Neural Networks utilized to segment the affected region from an image by
(RNNs) are neural networks that integrate feedback loops. network layer as well as the number of extracted features. The
The morphological approaches named Grey Wolf hybrid spiral optimization intelligent-generalized method
Optimization (GWO) as well as RNN approaches were was utilized for the selection of extracted features and finally,
determined. The early diagnosis method known as Computer- the ensemble classifiers were utilized for the process of
Aided Diagnosis (CAD) was used. of lung development. classification.
That approach aimed to detect nodules as malignant or non- IV.COMPARATIVE ANALYSIS
malignant as well as to give the results with higher accuracy.
3) Long Short-Term Memory Convolutional Neural The lung cancer disease predictions are estimated with
previous approaches to enhance an effectiveness of a model.
Network:Marentakis and colleagues [24] examined the
The comparative analysis is significant in developing and
possibilities of non-small cell lung cancer. (NSCLC) analysis effectively enhancing the performance of a model. Table 1
by utilizing various techniques of extraction of features and represents the comparative analysis of existing methods.
classification on pre-trained CT images. Initially, this
Authorized licensed use limited to: VIT-Amaravathi campus. Downloaded on September 16,2024 at 18:03:43 UTC from IEEE Xplore. Restrictions apply.
2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)
Intelligence using Data Science (ASCI 2020), Jaipur, India, vol. 1099,
V.CHALLENGES p. 012059, 22nd–23rd December 2020. IOP Publishing.
[7] Y. Han, Y. Ma, Z. Wu, F. Zhang, D. Zheng, X. Liu, L. Tao, Z. Liang,
The problems in recent research analytics of lung cancer Z. Yang, X. Li, J. Huang, and X. Guo, “Histologic subtype
disease prediction are defined and determined in this section classification of non-small cell lung cancer using PET/CT
as follows. images,” Eur. J. Nucl. Med. Mol. Imaging, vol. 48, no. 2, pp. 350–360,
February 2021.
● The three-dimensional-based CNN needed a [8] T. L. Chaunzwa, A. Hosny, Y. Xu, A. Shafer, N. Diao, M. Lanuti, D.
significant number of computational resources due to C. Christiani, R. H. Mak, and H. J. W. L. Aerts, “Deep learning
the large size of medical images. classification of lung cancer histology using CT images,” Sci.
Rep., vol. 11, p. 5471, March 2021.
● The classification methods do not correctly classify [9] M. Vedaraj, C. S. Anita, A. Muralidhar, V. Lavanya, K. Balasaranya,
all the images due to the limited number of data and P. Jagadeesan “Early Prediction of Lung Cancer Using Gaussian
samples in the training process. Naive Bayes Classification Algorithm,” Int. J. Intell. Syst. Appl.
Eng., vol. 11, no. 6s, pp. 838–848, May 2023.
● The optimization-based classification approach [10] Y. Chen, E. Zitello, R. Guo, and Y. Deng, “The function of LncRNAs
requires human interaction with those approaches and their role in the prediction, diagnosis, and prognosis of lung
performed as supporting characters and the algorithm cancer,” Clin. Transl. Med., vol. 11, no. 4, p. e367, April 2021.
has a slow convergence speed. [11] A. H. Chehade, N. Abdallah, J. M. Marion, M. Oueidat, and P.
Chauvet, “Lung and colon cancer classification using medical imaging:
● The classification performance was affected due to A feature engineering approach,” Phys. Eng. Sci. Med., vol. 45, no. 3,
an inadequate amount of training data to efficiently pp. 729–746, September 2022.
train the multi-class classifier. [12] R. Raza, F. Zulfiqar, M. O. Khan, M. Arif, A. Alvi, M. A. Iftikhar, and
T. Alam, “Lung-EffNet: Lung cancer classification using EfficientNet
from CT-scan images,” Eng. Appl. Artif. Intell., vol. 126, no. B, p.
VI.SUMMARY 106902, November 2023.
Lung cancer is an incurable disease with a greater death [13] S. Wankhade, and S. Vigneshwari, “A novel hybrid deep learning
rate in diseased patients. The identification and lung cancer method for early detection of lung cancer using neural
networks,” Healthcare Anal., vol. 3, p. 100195, November 2023.
prediction at earlier diagnosis is challenging. Lung cancer is a
[14] N. Maleki, and S. T. A. Niaki, “An intelligent algorithm for lung cancer
chronic disease, which is slightly enhances the mortality rate diagnosis using extracted features from Computerized Tomography
as well as inhibits a monetary growth of the globe. The lack of images,” Healthcare Anal., vol. 3, p. 100150, November 2023.
mental and physical activity causes of chronic diseases. The [15] I. Shafi, S. Din, A. Khan, I. D. L. T. Díez, R. J. P. Casanova, K. T.
lung disease is manageable, however it is not threated through Pifarre, and I. Ashraf, “An Effective Method for Lung Cancer
medicines. This is based upon the concept of AI such as ML Diagnosis from CT Scan Using Deep Learning-Based Support Vector
and DL techniques for predicting and diagnosing lung cancer. Network,” Cancers, vol. 14, no. 21, p. 5457, November 2022.
The ML techniques of SVM, RF, LR and ANN as well as the [16] K. Pradhan, P. Chawla, and S. Rawat, “A deep learning-based approach
for detection of lung cancer using self adaptive sea lion optimization
DL techniques of CNN-LSTM, RNN, DNN and DCNN are algorithm (SA-SLnO),” J. Ambient Intell. Hum. Comput., vol. 14, no.
used to accurately predict lung cancer disease. Utilizing a 9, pp. 12933–12947, September 2023.
greater number of techniques often leads to enhanced [17] P. Nanglia, S. Kumar, A. N. Mahajan, P. Singh, and D. Rathee, “A
prediction in a model. The RNN and LSTM give a better result hybrid algorithm for lung cancer classification using SVM and Neural
when compared to the other techniques. Networks,” ICT Express, vol. 7, no. 3, pp. 335–341, September 2021.
[18] H. F. Kareem, M. S. AL-Husieny, F. Y. Mohsen, E. A. Khalil, and Z.
S. Hassan, “Evaluation of SVM performance in the detection of lung
REFERENCES cancer in marked CT scan dataset,” Indonesian Journal of Electrical
[1] J. Civit-Masot, A. Bañuls-Beaterio, M. Domínguez-Morales, M. Engineering and Computer Science, vol. 21, no. 3, pp. 1731–1738,
Rivas-Pérez, L. Muñoz-Saavedra, and J. M. R. Corral, “Non-small cell March 2021.
lung cancer diagnosis aid with histopathological images using [19] B. Yang, L. Zhou, J. Zhong, T. Lv, A. Li, L. Ma, J. Zhong, S. Yin, L.
Explainable Deep Learning techniques,” Comput. Methods Programs Huang, C. Zhou, X. Li, Y. Q. Ge, X. Tao, L. Zhang, Y. Son, and G. Lu,
Biomed., vol. 226, p. 107108, November 2022. “Combination of computed tomography imaging-based radiomics and
[2] A. R. Bushara, “A deep learning-based lung cancer classification of CT clinicopathological characteristics for predicting the clinical benefits of
images using augmented convolutional neural networks,” ELCVIA immune checkpoint inhibitors in lung cancer,” Respiratory Research,
Electronic Letters on Computer Vision and Image Analysis, vol. 21, vol. 22, p. 189, June 2021.
no. 1, pp. 130–142, June 2022. [20] R. Apsari, Y. N. Aditya, E. Purwanti, and H. Arof, “Development of
[3] X. Ren, L. Jia, Z. Zhao, Y. Qiang, W. Wu, P. Han, J. Zhao, and J. Sun, lung cancer classification system for computed tomography images
“Weakly supervised label propagation algorithm classifies lung cancer using artificial neural network,” AIP Conference Proceedings, vol.
imaging subtypes,” Sci. Rep., vol. 13, no. 1, p. 5167, March 2023. 2329, p. 050013, February 2021. AIP Publishing.
[4] M. Masud, N. Sikder, A.-A. Nahid, A. K. Bairagi, and M. A. AlZain, [21] A. Bhattacharjee, R. Murugan, and T. Goel, “A hybrid approach for
“A Machine Learning Approach to Diagnosing Lung and Colon Cancer lung cancer diagnosis using optimized random forest classification and
Using a Deep Learning-Based Classification Framework,” Sensors, K-means visualization algorithm,” Health and Technology, vol. 12, no.
vol. 21, no. 3, p. 748, January 2021. 4, pp. 787–800, July 2022.
[5] A. Z. Foeady, S. R. Riqmawatin, and D. C. R. Novitasari, “Lung cancer [22] A. Khan, and Z. Ansari, “Identification of lung cancer using
classification based on CT scan image by applying FCM segmentation convolutional neural networks-based classification,” Turk. J. Comput.
and neural network technique,” TELKOMNIKA (Telecommunication Math. Educ., vol. 12, no. 10, pp. 192–203, April 2021.
Computing Electronics and Control), vol. 19, no. 4, pp. 1284–1290, [23] V. K. Gunjan, N. Singh, F. Shaik, and S. Roy, “Detection of lung cancer
August 2021. in CT scans using grey wolf optimization algorithm and recurrent
[6] P. Chaturvedi, A. Jhamb, M. Vanani, and V. Nemade, “Prediction and neural network,” Health and Technology, vol. 12, no. 6, pp. 1197–
classification of lung cancer using machine learning techniques,” IOP 1210, November 2022.
Conference Series: Materials Science and Engineering (ASCI 2020), [24] P. Marentakis, P. Karaiskos, V. Kouloulias, N. Kelekis, S. Argentos,
International Conference on Applied Scientific Computational N. Oikonomopoulos, and C. Loukas, “Lung cancer histology
classification from CT images based on radiomics and deep learning
Authorized licensed use limited to: VIT-Amaravathi campus. Downloaded on September 16,2024 at 18:03:43 UTC from IEEE Xplore. Restrictions apply.
2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)
models,” Med. Biol. Eng. Comput., vol. 59, no. 1, pp. 215–226, and ensemble classifier,” Neural Comput. Appl., vol. 34, no. 12, pp.
January 2021. 9579–9592, June 2022.
[25] P. M. Shakeel, M. A. Burhanuddin, and M. I. Desa, “Automatic lung
cancer detection from CT image using improved deep neural network
Authorized licensed use limited to: VIT-Amaravathi campus. Downloaded on September 16,2024 at 18:03:43 UTC from IEEE Xplore. Restrictions apply.