0% found this document useful (0 votes)
22 views6 pages

Lung Cancer Detection and Nodule Type Classification Using Image Processing and Machine Learning

Detection and nodule

Uploaded by

gemahesh2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

Lung Cancer Detection and Nodule Type Classification Using Image Processing and Machine Learning

Detection and nodule

Uploaded by

gemahesh2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lung cancer detection and nodule type classification

using image processing and machine learning


2023 International Wireless Communications and Mobile Computing (IWCMC) | 979-8-3503-3339-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/IWCMC58020.2023.10183237

Dorsaf Hrizi Khaoula Tbarki Monia Attia


Cosim Research Labs, Researcher in the LR-SITI Imaging Professor
Carthage University, research laboratory ENIT, Tunis University El Manar
Higher School of Communication, Private Higher School of Technology Tunisia
Tunisia and Engineering,
[email protected] Tek-Up University,
Tunisia
[email protected]

Sadok Elasmi
Cosim Research Labs,
Carthage University,
Higher School of Communication,
Tunisia
[email protected]

Abstract—Lung cancer is one of the most frequent type of source (fiberscope) inserted through the nostril into the
cancers worldwide. It is a type of disease that grows out of trachea to check for abnormalities. The third diagnosis
control and forms abnormal cells in the lungs. These cells do step is biopsy which consists on taking tissue samples
not function like other normal cells due to deoxyribonucleic
acid (DNA) mutation by various genetic factors. However, early seeming to be abnormal and analyze them to classify
detection and treatment of cases can reduce cancer-related the type of cells. Using image processing and machine
mortality. In this paper we propose a novel system for cancer learning approaches, can automatically detect the cancer
detection and classificaion based on image processing method from the Computed Tomography(CT) scan lungs images.
and machine learning algorithms. Our system composed of This technology saves time by removing manual steps
two main process, the first is the preprocessing process to
detect features and the second is the discrimination process to and improves outcomes at each stage of chemotherapy,
determine the type of lung cancer such as benign of malignant. radiotherapy, surgery, and immunotherapy. A lot of efforts
have been dedicated to create automated systems that are
keywords: segmentation, watershed, Machine Learning, more effective.
lung cancer detection, feature extraction N Kalaivani et al. [3] developed a deep neural network
for detecting lung cancer from CT images for the clas-
I. INTRODUCTION sification of the lung image (normal or malignant), a
In 2020, approximately 10 million people worldwide died densely connected convolution neural network (DenseNet)
of cancer, where 2.21 million died from lung cancer [1]. and adaptive boosting algorithm was used.This method
Lung cancer results from changes in lung cell behaviour or achieved an accuracy of 90.85%. R. Idrees, M. Abid et
growth leading to lung nodules development, which can al. [4] proposed a segmentation method based on the
be categorized into two types : non-cancerous (benign) watershed algorithm controlled by threshold and marker
or cancerous (malignant) nodules. There are two main and a binary classifier for the classification method. The
categories of lung cancer which are non-small and small resulting accuracy is 88.5%. A network of artificial neurons
cell lung cancer representing more than 80% and less than to detect the existence of lung cancer is developed in [5]
15% respectivly. [2]. through the symptoms that appear in the body. Model
The effective and the early detection of lung cancer is a provides an accuracy of 96.67%.In [6] the authors
the only way to cure. However the diagnosis of lung used a ground-breaking interdisciplinary process that
cancer requires four main steps. The first step is the combines metabolic and machine learning techniques to
chest X-ray for revealing the abnormalities. However find biomarkers early in the diagnosis of lung cancer. To
it cannot specifies the nodule type (malignant/benign). distinguish between people who have stage 1 lung cancer
Secondly the fiberscope that is a flexible tube with a light and healthy people, machine learning method Naive Bayes

979-8-3503-3339-8/23/$31.00 ©2023 IEEE 1154


Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.
are suggested in this study as a useful tool for early method is able to recognize the lung scan slices from
lung tumor prediction. Dechao Chen et al. [7] maked a the hole chest scans. The process was based on the
recommendation for a CNN optimization technique based windowing value, also known as a grayscale mapping and
on the Beetle Antenna Search (BAS) optimization algo- contrast stretching [9]. The four first proposed method
rithm. The BAS optimization algorithm is used to improve steps need a high image quality derived from injection
the initial CNN parameters.The analysis and diagnosis scanner technique.
of medical imaging data for intracranial hemorrhage has 2) DATASET 2: The second database is a collection
been created, using a new Convolutionel Neural Network of data gathered over a three-month period. It includes
(CNN) model with a pre-trained BAS optimization algo- 110 CT scans of cancer patients, who were diagnosed at
rithm. The suggested method gets 100% as recall and a various stages. [10]. The CT sections can be divided into
diagnostic accuracy of 93.94%. R. Dhiaa and O. Awad in three nodules type categories: 40 normal, 15 benign, and
[8] aimed to predict and discover lung cancer diagnoses 55 cancerous. This dataset is used for the classification
and predict relapse after surgery using changes in the step.
transcriptome associated with cancer and gene expression
using the eXtreme Gradient Boosting (XGBoost) model. A. PREPROCESSING
In this paper we propose a segmentation, detection and The purpose of image preprocessing is to improve the
classification method for Small Cell Lung Cancer (SCLC) image quality for subsequent analysis steps. This improve-
and Non Small Cell Lung Cancer (NSCLC) from CT ment is based on distorsions elemination, or on image
images. enhanced characteristics. In our work, the preprocessing
The reminders of this paper are organised us follows. step aims to prepare the image for segmentation, and
In section II we present the structure of our proposed to make features extraction easier. The preprocessing
method. In the subsection II.A we explore the datasets architecture is shown in Fig.2.
used in our work, in subsectionS II.B and II.C we describe
the two steps of preprocesssing and segmentation images
method, The two subsections II.D and II.E present the
nodule detection and the features extraction. In subsection
II.F we mention the classification part, where we compare
different machine learning classification algorithms. Sec-
tion III is consisted of examination results of our study.
Finally, section IV presents the conclusion.
Fig. 2. Preprocessing architecture
II. LUNG CANCER DETECTION AND
DISCRIMINATION PROPOSED SYSTEM • Blue channel extraction: The scan image is coded
Our proposed method is composed of two main process. on Red,Green and Blue (RGB) mode, where each
The first process composed of four steps, as preprocess- image pixel consists of 3 channels and each channel
ing, segmentation, features extraction in order to select represents a color. In order to perform the cell count,
the best features. The second process is to discriminate we extract only the blue channel to get nuclear
between detected modules. These steps are shown in Fig.1 staining.
• Morphological operation : We used the opening mor-
phological operation to remove unnecessary noise in
the image, then the closure morphological operation
that is used for filling the holes. Aiming to facilitate
features extraction step.
B. SEGMENTATION WITH OTSU’S THRESHOLD-
ING
The image segmentation step includes the removal of
grains touching the edges, the counting of the total of re-
gions detected before and after the grain removal process.
Fig. 1. Workflow of our proposed method The final step is marking the regions in their centroides.
In our case, we extract some regions that represent the
Two datasets containing lung scintigraphy images are nodules. We apply distance transformation in order to
used in this study. separate objects having contact. Then, we apply the pixel
1) DATASET 1: The first database composed of 1887 thresholding and expansion (which increases the limit of
chest CT scans of a single patient diagnosed with probable each cell in the background) to identify the background
malignant cancer from a public tunisian hospital. Our area image.

1155
Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.
C. BAS AND WATERSHED COMBINED ALGO- 2) THE NODULE TEXTURE: The nodular texture
RITHMS FOR NODULES DETECTION is a very important factor in the classification of cancer
grades. The characteristics related to the texture that have
The watershed algorithm extracts the layer between
been detected are:
forgeround and background layers in the CT image,
the texture, intensity the average intensity, the minimum
detects the lung regions by creating markers and labels
intensity, the maximum intensity, and the density per pixel
inside and then it fills all detected regions and extracts
to ensure that we treat the right region
pulmonary regions Features. While they are not all tumor
3) THE NODULE SHAPE: We detect the nodular
nodules, which becomes an over-segmentation issue, we
shape and localization. In this context we extract the
solved this confusion by applying BAS (Beetle Antennae
sphericity that indicates how close the nodule is to having
Search) algorithm [11], which takes three features of the
a sphere shape, the compacity that indicates the amount
region (diameter, circularity and solidity) as parameters
of space occupied by the nodule compared to the rest of
to distinguish the pulmonary nodules from other regions
the size of the lungs and the centroid of each nodule.
detected.
The BAS technique for texture nodule cancer discovery E. CLASSIFICATION
is divided in two steps : This stage involves classifying the cancer type whether
The Beetle behavior for Bengin nodule detection is it is Malignant or Benign. During this final step we tested
shown in Fig.3 and the Beetle behavior for malignant four supervised classification algorithms .
nodule detection is shown in Fig.4 1) SUPPORT VECTOR MACHINES: Supervised
learning algorithms that can be used to identify SVM
classifiers. When our data has precisely two classes, we
can use support vector machines (SVM). Two classes with
various data elements will be created using a hyperplan
[13].
2) XGBOOST: XGBoost a supervised learning algo-
Fig. 3. Beetle behavior for Bengin nodule detection rithm that attempts to accurately predict a target variable
by combining an ensemble of estimates from a set of
simpler and weaker models. The XGBoost algorithm
performs well in machine learning competitions because of
its robust handling of a variety of data types, relationships,
distributions, and the variety of hyperparameters that you
can fine-tune [14].
3) RANDOM FOREST: Random forests is a super-
Fig. 4. Beetle behavior for malignant nodule detection
vised learning algorithm. It can be used both for classifi-
cation and regression [15]. It is also the most flexible and
The Beetle take the parameter region value, compare easy to use algorithm. Random forests creates decision
the value with the next region value and take step in trees on randomly selected data samples, gets prediction
direction of increasing or decreasing the values to reach from each tree and selects the best solution by means
the goal (nodule researched). of voting. It also provides a pretty good indicator of the
d(x) : denotes the diameter of region . feature importance.
c(x) : denotes the circularity of region . 4) DECISION TREE: The non-parametric supervised
s(x) : denotes the solidity of region . learning algorithm used for classification and regression
problems is the decision tree. It is organized hierarchically
D. FEATURES EXTRACTION and has a parent node, branches, internal nodes, and leaf
It is important to extract the lung nodule attributes nodes [16].
because they are needed in classifying nodules type and III. RESULTS AND DISCUSSION
cancer stage. These features are: This section is composed of 2 subsections. In the
1) THE NODULE SIZE: the staging of non-small cell first subsection, we present the results of pre-processing,
lung cancer depends on the two thresholds depending on segmentation and detection using the two datasets men-
the size of the tumor. 2 and 5 cm represent appropriate tionned in the subsection II,A. The second subsection
threshold diameters that define subgroups with signifi- details the results of the classification process.
cantly different prognosis [12] Generally, the large nodule
is more suspicious, but there are many exceptions that A. PRE-PROCESSING, SEGMENTATION AND DE-
can be seen from other features. The features related to TECTION RESULTS
the size that we detected are: Area, perimeter, diameter, Fig.5 presents the result after the blue channel extrac-
and volume tion that is a gray image. In our case the blue channel

1156
Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.
is equivalent to a gray level. Fig.6 illustrates the output TABLE I
nodule size

Area diameter volume

2215 53.10579621461659 111.2245195005841

TABLE II
nodule texture

texture intensity density

0.906672124 131.1467268 51.58823529


96.58823529
141.58823529
Fig. 5. image after blue channel extraction

TABLE III
closing operation where the image was dilated and then nodule shape
eroded.
sphericity compacity centroid

0.655045 1.526611 (206.3309255,283.538045)

C. CLASSIFICATION RESULTS
Each nodule is caracterized by eight features and a
single class variable(bengin or malignant). We starts by
spliting the dataset[2] on 70% for training step and 30%
for test step. Four classifications algorithm were compared.
Fig. 6. image after opening application The comparaison is based on the evaluation metrics
axplained as bellow.
In Fig.7. we present the image after the Otsu thresh- The confusion matrix which is an array of four different
olding apllication where the image becomes clustered on combinations of predicted and real values. Each row
white and black classes in the figure A, the forground in displays the real class result, while each column displays
the figure B and the background in the figure C separation the total number of nodules predicted for each class.
and the image after the watershed filling in the figure D.

Fig. 8. Confusion Matrix

Fig. 7. Segmentation and Nodule detection result


TP : An instance is positive and it is classified correctly
as positive.
(a): Image after Otsu Thresholding, (b) The foreground, FP : An instance is negative and it is classified incorrectly
(c) the background, (d): watershed filling as positive.
FN : An instance is positive and it is classified incorrectly
as negative.
B. FEATURES EXTRACTION
TN : An instance is negative and it is classified correctly
In the Tables,I. II. and III. We illustrate the Features as negative..
extracted from the DATASET 1, as the nodule size, the The confusion Matrix results for SVM, Random Forest
nodule texture and the nodule shape. , Decision Tree and XGboost are shown respectively in

1157
Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.
Fig.9, Fig.10, Fig.11 and Fig.12 :

Fig. 12. The XGboost Confusion Matrix

Fig. 9. The SVM Confusion Matrix

Fig. 13. The SVM ROC CURVE

Fig. 10. The Random Forest Confusion Matrix

Fig. 14. The Random Forest ROC CURVE

Fig. 11. The Decision Tree Confusion Matrix


TP
Recall = (2)
A Receiver Operating Characteristic (ROC) curve TP + FN
shows the performance of a classification model at
all classification thresholds [17]. The sensitivity and TP
P recision = (3)
specificity are two metrics used by the ROC curve to TP + FP
characterize the performance of a classification model.
The ROC CURVE results are shown in Fig.13, Fig.14, Recall × P recision
f1 Score = 2 × (4)
Fig.15 and Fig.16 bellow Recall + P recision

Also, we use the accuracy, recall, precision, f1score are The results are given in TableIV. From these tables and
explained respectively by equations, (1)-(4). figures we can clearly see that Random Forest gives better
TP + TN resulkts compared to SVM XGboost and Decision Tree
Accuracy = (1) That is explained by Random Forest is an ensambling
TP + FP + TN + FN

1158
Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.
References
[1] E. Sebastian, M. Leitzmann, M. Bahls, C. Meisinger, C. Amos,
R. Hung, I. Consortium, L. Consortium, A. Teumer, and
H. Baurecht, “Physical activity does not lower the risk of lung
cancer,” Cancer Research, vol. 80, p. canres.1127.2020, 07 2020.
[2] J. Molina, P. Yang, S. Cassivi, S. Schild, and A. Adjei,
“Non-small cell lung cancer: Epidemiology, risk factors, treat-
ment, and survivorship,” Mayo Clinic proceedings. Mayo Clinic,
vol. 83, pp. 584–94, 06 2008.
[3] N. Kalaivani, N. Manimaran, D. Sophia, and D. Devi, “Deep
learning based lung cancer detection and classification,” IOP
Conference Series: Materials Science and Engineering, vol. 994,
p. 012026, 12 2020.
[4] R. Idrees, M. Abid, S. Raza, M. Kashif, M. Waqas, M. Ali, and
L. Rehman, “Lung cancer detection using supervised machine
learning techniques,” 10 2022.
Fig. 15. The Decision Tree ROC CURVE [5] C. Shankara and H. S. A. ., “Artificial neural network for
lung cancer detection using ct images,” International journal
of health sciences, pp. 2708–2724, 04 2022.
[6] Y. Xie, W.-Y. Meng, R.-Z. Li, Y. Wang, X. Qian, C. Chan,
Z.-F. Yu, X.-X. Fan, H. Pan, C. Xie, Q. Wu, P.-Y. Yan, L. Liu,
Y.-J. Tang, X.-J. Yao, M.-F. Wang, and L.-H. Leung, “Early
lung cancer diagnostic biomarker discovery by machine learning
methods,” Translational Oncology, vol. 14, p. 100907, 01 2021.
[7] D. Chen, X. Li, and S. Li, “A novel convolutional neural network
model based on beetle antennae search optimization algorithm
for computerized tomography diagnosis,” IEEE Transactions on
Neural Networks and Learning Systems, vol. PP, pp. 1–12, 08
2021.
[8] R. Dhiaa and O. Awad, “A comparative analysis study of lung
cancer detection and relapse prediction using xgboost classifier,”
IOP Conference Series: Materials Science and Engineering, vol.
1076, p. 012048, 02 2021.
[9] J. Ying, T. Yu, J. Liu, D. Huang, H. Yan, and Y. Zhuang,
“Clinical comparison of the “windowing” technique and the
Fig. 16. The XGboost Confusion Matrix “open book” technique in schatzker type ii tibial plateau
fracture,” Orthopaedic Surgery, vol. 14, 09 2022.
[10] H. F. Al-Yasriy, “The iq-oth/nccd lung cancer dataset,” 01 2021.
TABLE IV [11] X. Jiang and S. Li, “Bas: Beetle antennae search algorithm for
classification’s result optimization problems,” International Journal of Robotics and
Control, vol. 1, 10 2017.
Algorithm Accuracy f1score Recall Precision [12] C. Christian, S. Erica, and U. Morandi, “The prognostic impact
of tumor size in resected stage i non-small cell lung cancer:
SVM 95.1% 97.41% 97.33% 97.5% Evidence for a two thresholds tumor diameters classification,”
Lung cancer (Amsterdam, Netherlands), vol. 54, pp. 185–91, 11
Random Forest 98.6% 99.26% 99.1% 99.43% 2006.
[13] “Sparse and robust svm classifier for large scale classification,”
Xgboost 98.1% 98.99% 98.79% 99.2% Applied Intelligence, pp. 1–25, 03 2023.
[14] M. Babyomi, O. Olagbaju, and A. Kadiri, “Convolutional
Decision Tree 95.6% 97.65% 98.76% 96.57% xgboost (c-xgboost) model for brain tumor detection,” 01 2023.
[15] J. Lötsch and B. Mayer, “A biomedical case study showing
that tuning random forests can fundamentally change the
interpretation of supervised data structure exploration aimed
at knowledge discovery,” BioMedInformatics, vol. 2, 10 2022.
and bagging algorithm so it is choosen to adopt for the [16] P. Sarang, Decision Tree: A Supervised Learning Algorithm for
classification step. Classification, 03 2023, pp. 75–96.
[17] D. Chicco and G. Jurman, “The matthews correlation coefficient
(mcc) should replace the roc auc as the standard metric for
IV. CONCLUSION assessing binary classification,” BioData Mining, vol. 16, 02
Preprocessing, nodule identification, nodule segmenta- 2023.
tion, feature extraction, and image classification are all
processes in the method for lung cancer that is suggested
in this paper. Following the detection and segmentation
of nodules, feature extraction takes place. From the seg-
mentation nodule, features required for classification are
recovered using feature extraction algorithms. A classifier
is employed to determine that the nodule is bengin or
malignant based on the features that were taken from it.
In a future work we propose a method of cancer state
prediction based on reinforcement learning.

1159
Authorized licensed use limited to: Angadi Institute of Technology & Management. Downloaded on December 03,2024 at 04:06:41 UTC from IEEE Xplore. Restrictions apply.

You might also like