Investigation of Lung Cancer Prediction and Classification Using CT-Scan Images by Employing Machine Learning & Population Based Techniques
Investigation of Lung Cancer Prediction and Classification Using CT-Scan Images by Employing Machine Learning & Population Based Techniques
ISSN No:-2456-2165
Abstract:- According to the estimated reports of World Keywords:- Lung Cancer Prediction, Classification,
Health Organization, with over 2.6 million new cases Machine Learning, Deep Learning, Feature Selection, Data
captured and diagnosed each year, lung cancer is the Mining, Image Processing.
most prevalent cause of cancer-related deaths
worldwide. Early detection and classification of LC is I. INTRODUCTION
needed for effective analysis & treatments for better
patient outcomes. Lung cancer prediction and Lung cancer is a one of the significant public health
classification at an early stage have shown significant issue and one of the main causes of cancer-related fatalities
potential for advanced ML algorithms, particularly DL globally. Lung cancer accounts for 11.9% of all cancer
models. Early detection of lung cancer facilitates diagnoses and 18.4% of cancer deaths worldwide, according
patients to undergo timely and effective treatment, to the World Health Organization. Early detection and
considerably improving their chances of survival. The therapy [1] is essential for increasing the survival rates of
purpose of this research is to put forward an ISBSSA lung cancer patients. In predicting the presence or absence
(Improved Selection Based Squirrel Search Algorithm)- of lung cancer, machine learning and deep learning models
based machine learning approach for LC prediction and using advanced computational approaches have showed
classification employing CT-SCAN illustrations. The considerable potential. Machine learning techniques such as
suggested method makes use of a deep learning model Support Vector Machines (SVM), Random Forest, K-
called ISBSSA that has been trained on a substantial Nearest Neighbours (KNN), and Naive Bayes have been
dataset of computed tomography (CT) images in order employed with great accuracy for lung cancer prediction.
to accurately identify and classify lung cancer cells. For Deep learning models, such as Convolutional Neural
the experimental study, a Large-Scale CT and PET/CT Networks (CNN), have also been utilised successfully for
Dataset for Lung Cancer Diagnosis took from Cancer lung cancer prediction.
Imaging Archive (CIA) serves as the data source. The
LC-CIA dataset which includes CT and PET-CT One of the main challenges in lung cancer prediction
DICOM pictures of lung cancer patients as well as is the complexity and variability of the disease. Non-small
individuals who are healthy. The model is trained using cell lung cancer (NSCLC) and small cell lung cancer
appropriate machine learning algorithms along with (SCLC) are the two primary kinds of lung cancer [2].
ISBSSA such Naive Bayes Algorithm (NBA), NSCLC is the most prevalent kind of lung cancer,
Convolutional Neural Networks (CNNs), Support Vector accounting for around 88% of all occurrences. SCLC is less
Machines (SVMs), K-Nearest Neighbour (KNN) and frequent, but it is more aggressive and can spread to other
Random Forests (RFs), to predict the presence and type regions of the body quickly. To properly forecast the
of lung cancer cells in the CT & PET-CT DICOM existence of lung cancer, particular traits or biomarkers must
images which was extracted. The findings of this study be identified that can discriminate between the two kinds
show that the proposed approach is successful in and reliably predict the presence of cancer [3].
effectively predicting and classifying lung cancer cells in
CT scans, which might have significant implications for To discover these biomarkers and predict the existence
the early detection and treatment of the disease. The of lung cancer, machine learning and deep learning
comprehensive results show that 94.02% accuracy, algorithms may be utilised for feature extraction and
91.80% sensitivity, 92.76% specificity, 96% precision, selection [4-5]. Feature extraction is the process of
92% recall, 0.90 True Positive, 0.87 True Negative, and identifying significant features or qualities in raw data,
96.13% F-Score are achieved to detect and classify the whereas feature selection is the process of picking the most
HD in an effective manner, which is the advantage of critical features for effective prediction. For example, in
employing ML and DL approaches. lung cancer prediction using CNN, the CT scans of the
lungs are used as input [5], and the deep learning model
learns to extract relevant features and classify the image as
cancerous or non-cancerous. In SVM, the algorithm
Issbssa:
The proposed method for LC prediction is carried out
by using CNN, NBA, SVM, RF, KNN and ISBSSA. The
ML and DL models used to feature selection and extraction.
Here the ISBSSA is portrayed for the LC prediction and it is
basically inspired by the food forging behavior of the flying
squirrels in the real life. Depends on the weather condition
the flying squirrel search for food and store them for future
use. If the climate is hot the squirrels will fall down from a
tree. It will move out and rapidly search for food for daily
needs. It will eat acorns which are available in the hot
climate. Once they consumed it will search food for winter.
During bad weather the hickory nuts will help squirrels to
satisfy the needs. So the process will be continuously
depends on the weather condition in the area. Let’s take this
into mathematical model, the following hypothesis are taken
into account.
NBA:
To examine the wide range of supervised and non
supervised learning NBA is used. Suppose in a CIA dataset
with only two features (age and smoking history) and a
binary target variable indicating whether or not the patient
CNN: Split the CIA data into training and test sets.
CNNs are used to analyze medical images loaded in Select a subset of features from the training data that
the dataset (CIA Images), such as chest X-rays or CT scans, are most relevant for lung cancer prediction, using a
to detect signs of lung cancer in all stages. After feature selection technique such as mutual information
preprocessing, feature selection and extraction the system or L1 regularization.
start functioning to identify the prediction levels. Here are Train an SVM on the selected features using the
the LC prediction steps when implementing through CNNs, training set, optimizing the hyperparameters using
cross-validation.
Input layer: The input to the model is a CIA dataset Evaluate the performance of the trained SVM on the
medical image of the lungs, such as a PET/DICOM and test set, using PEM shown below.
CT scan images. Adjust the hyperparameters or feature selection criteria
Convolutional layer: The 1st layer applies a set of and repeat steps 3 and 4 until an acceptable
filters to the input image to extract features that are performance is reached if the SVM's performance is not
relevant for detecting LC. adequate.
ReLUAF activation: A rectified linear unit (ReLUAF) By removing pertinent information and using the learnt
activation function is applied to the output of the decision boundary, the model may be used to predict
convolutional layer to introduce non-linearity into the the presence or absence of lung cancer in CIA medical
CNN model. images after training.
Pooling layer - PL: The output of the activation The SVM shows the remarkable performance in LC
function is passed through a PL to down sample the prediction compares to NBA and CNN which is shown
image spotted in the CIA dataset. in the diagram.
Dropout layer - DO: A dropout layer is added to
prevent overfitting by randomly dropping out a fraction KNN:
of the units in the layer during training. In case of KNN, the following steps are followed for
Fully connected layer - FCL: The pooled output is LC prediction, the steps are,
flattened and passed through one or more fully
connected layers, which use the extracted features to Preprocessing
make a final prediction. Extracting from CIA dataset