0% found this document useful (0 votes)
14 views6 pages

Lung Cancer Detection Based On CT-Scan Images With Detection Features Using Gray Level Co-Occurrence Matrix GLCM and Support Vector Machine SVM Methods

The document discusses a study on lung cancer detection using CT-scan images, employing Gray Level Co-Occurrence Matrix (GLCM) and Support Vector Machine (SVM) methods. The proposed system includes stages of pre-processing, segmentation, feature extraction, and classification, achieving an accuracy of 83.33% in distinguishing between benign and malignant tumors. The study highlights the importance of early diagnosis for effective treatment and the potential benefits of automated detection systems in medical applications.

Uploaded by

vv5456462
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Lung Cancer Detection Based On CT-Scan Images With Detection Features Using Gray Level Co-Occurrence Matrix GLCM and Support Vector Machine SVM Methods

The document discusses a study on lung cancer detection using CT-scan images, employing Gray Level Co-Occurrence Matrix (GLCM) and Support Vector Machine (SVM) methods. The proposed system includes stages of pre-processing, segmentation, feature extraction, and classification, achieving an accuracy of 83.33% in distinguishing between benign and malignant tumors. The study highlights the importance of early diagnosis for effective treatment and the potential benefits of automated detection systems in medical applications.

Uploaded by

vv5456462
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2020 International Electronics Symposium (IES)

Lung Cancer Detection Based On CT-Scan Images


With Detection Features Using Gray Level Co­
Occurrence Matrix (GLCM) and Support Vector
Machine (SVM) Methods
Qurina Firdaus Riyanto Sigit Tri Harsono
Department o f Informatics and Department o f Informatics and Department o f Informatics and
Computer Engineering Computer Engineering Computer Engineering
Politeknik Elektronika Negeri Surabaya Politeknik Elektronika Negeri Surabaya Politeknik Elektronika Negeri Surabaya
Surabaya, Indonesia Surabaya, Indonesia Surabaya, Indonesia
[email protected] [email protected] [email protected]

Anwar Anwar
Department o f Informatics and
Computer Engineering
Politeknik Elektronika Negeri Surabaya
Surabaya, Indonesia
anwar@pasca. student.pens.ac.id

Abstract— Lung cancer is all malignant diseases in the lungs, Lung cancer is a tumor located in the lung that grows
including malignancies originating from the lungs themselves rapidly and spreads to other organs. The process of cancer is
(primary) or those originating from other organs (metastasis). characterized by abnormal cell growth that can damage other
Lung cancer is one of the leading causes of death worldwide. normal tissue cells. Based on histopathology, lung cancer is
Lung cancer is a tumor that grows rapidly and can spread to
divided into two types: Small Cell Lung Cancer (SCLC) is a
other organs. The onset of cancer is characterized by abnormal
cell growth that can damage other normal tissue cells. small cell lung cancer, and Non-Small Cell Lung Cancer
Computerized Tomography (CT) is an imaging technique often (NSCLC) is a non-small cell lung cancer consisting of
used to diagnose lung cancer. Lung cancer can be classified into squamous cell cancer (ACC), adenocarcinoma. (ADC), and
benign and malignant cancer. It is very important to diagnose large cells. From the division of these types of cancer, NSCLC
lung cancer at an early stage to speed up the treatment process is a cancer that causes 80-90% of deaths in the world. While
and the actions that will be taken. This study aims to develop a based on pathological residues are converted into benign and
lung cancer detection system based on CT-scan images. This malignant cancers. Most cases of lung cancer are found at an
detection system has 4 main stages, namely pre-processing of
advanced stage, making healing difficult (Dandil, 2014).
CT-Scan images to improve image quality, segmentation to
identify and separate the desired cancer object from the Computerized Tomography (CT) is one of the imaging
background, feature extraction based on area, contrast, energy, techniques most often used in the diagnosis of lung cancer.
entropy, and homogeneity. The classification of lung cancer into
Lung cancer with multiple pathological residues can be seen
cancer benign and malignant cancer. From the system trial, the
accuracy level based on the system decision in determining the on CT. Lung cancer is categorized into benign and malignant
diagnosis of lung cancer is benign or malignant was 83.33%. cancer. During diagnosis, cancer of a certain density can be
categorized as benign cancer in some cases. However, in
Keywords— CT-Scan Image, Classification, Lung Cancer, many cases that occur, the lung which tends to be congested
Segmentation, System Detection, Introduction is usually classified as malignant cancer. It is very important
to make a diagnosis of lung cancer at an early stage to improve
I. I n t r o d u c t io n the treatment and treatment process. Systems designed for
medical applications provide a variety of benefits to
Lung cancer is a disease of the lungs which includes
successfully optimize lung cancer detection. This allows
malignancy originating from the lungs itself (primary) or
panderita to start the treatment process early on with the help
diseases originating from other organs (metastasis). Lung
of this system and can simplify the decision-making process
cancer is one of the leading causes of death worldwide where
of doctors (Riti, 2016).
there were 9.6 million deaths in 2018 with a death rate of 1.76
million (WHO, 2018). According to WHO statistics, in Segmentation aims to make the image easier to analyze,
Indonesia the percentage of causes of cancer death in men is where the objects used are distinguished from one another.
21.8%, in women it is 9.1 %, so with an average o f22,475 men The process of giving a label to each pixel in an image aims
and 8,390 women who are thought-about lung cancer each to distinguish each characteristic it has. Image segmentation
year ( WHO, 2018). produces a set of contours separated from the background.
Each pixel in an image has a different color, intensity, and

978-1-7281-9530-8/20/$31.00 ©2020 IEEE 643


Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.
texture value. This study uses a watermark image classification from CT Scan lungs image," in 2016 presents
segmentation approach, where a result is a group of segments the segmentation method. The segmentation process here aims
covering several contours that are completely explained. In the to divide the image into several areas and separate the image
segmentation process, images that have been segmented and from objects and background areas that are not used. The
differentiated with their background aim to facilitate feature segmentation method used in this research is the thresholding
extraction using the GLCM process. By using DAS method which aims to divide the histogram of an image with
segmentation, the system will detect unwanted nodules with different gray levels into two areas without having to enter a
abnormal pixels which are then stored in the database (Jony, threshold value. The approach taken uses the discriminant
analysis method by determining the variables that can
2019).
differentiate between two or more groups. Discriminant
The difference between the research and the previous analysis can be used to maximize the separation of the cancer
study entitled " Lung Cancer Detection Based On CT-Scan object from its background. The resulting segmentation is in
Images With Detection Features Using Gray Level Co­ the form of a binary image which only has an intensity value
Occurrence Matrix (GLCM) and Support Vector Machine of 0 and 1. The value 0 shows the intensity of black
(background), while the value 1 shows the intensity of white
(SVM) Methods" lies in the technique used in the
(object) (Riti, 2016). In a study conducted by E. Dandil, M.
segmentation process where the paper still uses semi­
Cakiroglu, Z. Eksi, M. Ozkan, OK Kurt, and A. Canan,
automatic techniques while this study uses automatic entitled "Artificial neural network-based classification
techniques. systems for lung nodules on computed tomography scans," in
In this study, a cancer detection system procedure has been 2014 presents feature extraction with GLCM (gray-level co­
implemented the lungs based on the image on the CT-Scan. occurrence matrix). GLCM is a texture-based feature
this system can help answer questions about the types of extraction method used for the classification of benign or
cancer benign and malignant lungs as seen on the initial CT malignant cancers. After the GLCM process, the most suitable
scan processed with this system can assist in the field features are selected through the Principal Component
medicine to help diagnose lung cancer. Analysis (PCA) process. PCA is a statistical feature-based
search method used to select complex data entry dimensions
II. Re l a t e d w ork consisting of information. Of the 6 most suitable features were
selected to provide the best performance according to the
In a previous study conducted by R. Wulandari, R. Sigit,
experiment, where the 6 features were selected from 88
and S. Wardhana, entitled "Automatic lung cancer detection
features using the PCA method (Dandil, 2014). In a study
using color histogram calculation," in 2017. It has been
conducted by DP Kaucha, PWC Prasad, A. Alsadoon, A.
discussed how to detect the size of lung cancer based on CT-
Elchouemi, and S. Sreedharan, entitled "Early detection of
Scan images. First, the image used for lung cancer detection
lung cancer using SVM classifier in biomedical image
generally uses CT-scan results, which are formed by gray
processing," in 2017. In this study, the classification used is
values in different ranges in each image. The area without
SVM (Support Vector Machine), which aims to classify
tissue (air) that will be input from the results of the CT scan
images whether cancer or not. SVM here functions as a
process is black where the image will then be processed to get
classifier determined by the hyperplane which separates the
an image of the cancer by calculating the percentage of cancer
learning algorithm carried out by the machine. For this
size from the whole lung organs detected during the image
algorithm, data item labeling is used in n-dimensional space
taking process using a CT scan, the next process that is, by
where n is the number of features with the same feature value
using the color histogram method because of the different gray
as the coordinate value and then classified by looking for the
value factors between cancer objects and other objects. The
hyper plane. SVM is a trained learning model used to analyze
histogram method is used here to collect the value of the
data for classification. SVM uses an optimal hyper-linear
selected channel in the input image and find the appropriate
separator that can be used for classification and regression.
histogram value. The output obtained is in the form of data
Here, linear SVM kernel is used to classify images into normal
with a predetermined gray value that has passed the previous
or cancerous images (Kaucha, 2017).
process, so that the gray value of the cancer object is
generated. This trial states that this method can detect the size III. PROPOSED METHODOLOGY
of lung cancer. The performance of this system has an error
with an average for the cavity area of 12.75% and for the A. Problem definition
cancer area 31.74% (Wulandari, 2017). The preprocessing Computerized Tomography (CT) is one of the imaging
method used in the research was conducted by L. Anifah, techniques often used in diagnosing lung cancer. Lung cancer
Haryanto, R. Harimurti, Z. Permatasari, PW Rusimamto, and with pathological residues of various diameters and sizes can
AR Muhamad, entitled "Cancer lungs detection on CT scan be seen with a CT scan. Lung cancer is categorized into
images using artificial neural networks backpropagation based benign and malignant cancer. During diagnosis, cancer of a
on gray level co-occurrence matrices feature features", in certain density and atypical can be assessed as benign cancer
2017 are to use Thresholding and Median Filtering. The
in some cases. However, in many cases, the congestion of the
method was chosen because it can eliminate or separate
lung is usually categorized as malignant. It is important to
images of lung cancer objects with noise in the form of
diagnose lung cancer at an early stage to speed up the
disturbing lines and dots located on the original image from
CT-Scan. So that the output generated from the preprocessing treatment and treatment process. Systems designed for
process using this method is in the form of a lung cancer object medical applications can provide a variety of benefits to be
image from a CT-Scan image that can be seen more clearly successful in detecting lung cancer. The treatment process
(Anifah, 2017). In a study conducted by Y. F. Riti, H. A. can be started early with the help of this system and can
Nugroho, S. Wibirama, B. Windarta, and L. Choridah, entitled facilitate the doctor's decision-making process quickly and
"Feature extraction for lesion margin characteristic accurately.

644
Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.
B. General system design
The system planning used in this research consists of 6
main parts, namely pulmonary CT-scan image input, pre­
processing, segmentation, feature extraction, classification, Fig. 3. Diagram of preprocessing
and decision making which are described in Figure 1.
a. Gray Scale
The inserted image is an image of the CT-Scan image that
Input Image Input Image
needs to be fixed to grayscale to make it easier to do further
processing.
Segmentation
Thresholding
F in d
From the image, the quality of the grayscale is improved to
Contours make it easier when the next process is done. Then the output
Thresholding of the grayscale process.

n eatlire Extraction b. Thresholding


Segmentation The area o f cancer Thresholding is one of the good segmentation techniques
F ind used for images with significant differences in intensity
C'ontours
values between the background and the main object, to
separate the desired object from the background. This
Extraction technique is used to obtain areas that contain cancers and
The area o f cancer Homogeneity convert grayscale images into binary images. At the time of
implementation, thresholding requires a value that is used as
a boundary value between the main object and the
Classification Output background, and that value is called the threshold.
Support Nonna/
Vector The algorithm in the pre-processing threshold is as
M achine
Homogeneity follows:
M alignant
1, i f f i x , y ) > T
9( x, y ) = | (1)
Database
0, i f f{x, y) < T

Training Process Testing Process Where:

g (x, y) = binary image of gray image f (x, y)


Fig. 1. General Proposed Method
T = threshold
1) Load image
One of the simplest ways to extract objects from the
In this section is the initial stage that must be done in background is to choose the T threshold value that separates
system development. This sub-section will explain the type of the two modes. Every point (x, y) that f (x, y) is greater than
CT-Scan image file and the input load process. CT-Scan the value of T is called the object point, otherwise, the point
image files are CT scan data. In this study, the input used is an is the background point. Or in other words, thresholding is
offline image of a patient's lung CT-scan images. The image used to partition the image by adjusting the intensity of all
used is of type .jpg. Figure 2 is an example of a Lung Cancer pixels greater than the T threshold value as the foreground
CT Scan Image. and smaller than the T threshold value as the background.

3) Segmentation
The segmentation stage is used to identify and separate the
cancer object desired by the background. The segmentation
phase uses the find contour method, where the results of the
thresholding process that have been done before then by
looking at the widest cancer area by these pixels. This
segmentation process takes pictures from the pre-processing
Fig. 2. Lung Cancer CT Scan Image. results and then is used to take the area detected by cancer
from the original image. Where the system takes the most
2) Pre-processing value 1 pixel among the area of the pre-processed lung CT
The second process that is carried out after successfully scan.
loading the image is the preprocessing process. In this process,
4) Feature extraction
two stages will be carried out, namely grayscale to improved
the quality of gray and to convert grayscale images to binary The feature extraction stage based on the shape is done by
using the thresholding method. calculating the area value of the segmented cancer object
which is then reconstructed to the original image color. After

645
Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.
the cancer object is detected, then the area will be calculated In making the .txt file using labels in the form of numbers,
using the contour Area function in OpenCV. wherein this research there are three labels, namely label 0 for
the parameter values of the normal lung, label 1 for the
The contour Area function works by taking the pixel area parameter values of the malignant lung cancer and label 2 for
from the contour. After getting the pixel value from the the parameter values from benign lung cancers. The parameter
contour, the value will be converted to millimeter using values used to consist of five parameter values, namely the
formula. area of the lung cancer, the value of contrast, energy, entropy,
p ix e ls x 25,4 and homogeneity.
mm = ------- —---- (1)
dpi
• SV M Training
with 96 dpi it means there are 96 pixels per inch, where 1 inch After the creation of the .txt file is done, the next step is
= 25.4 mm. So that 1 pixel that is read equals 0.2645833
the training process that is carried out through the command
millimeters.
prompt using the existing library in Open CV. The command
The feature extraction stage based on the texture is carried is used at the command prompt.
out using the Gray Level Co-occurrence Matric (GLCM) The training process through this command prompt
method. The GLCM method will calculate the contrast, generates a file .model that contains a database or a place to
energy, entropy, and homogeneity of the cancer object. The store the model parameters studied by SVM-train. This file
formulas used to calculate these values are:
.model will be used for predictions in the testing process.
Contrast: B. Testing Process
Con = (2) The testing process in Open CV can be done using the
SVM::predict library. This process aims to classify input
Energy: samples using trained SVM. Figure 4 is an illustration of how
the testing process works using the SVM-Predict method.
E = ZyCpCi,/)2) (3)

Entropy:

E n = (.i-D (4)

Homogeneity:

H= p(j,n
i+\i-n (5)

5) Classification
Fig. 4. How the Testing Process Works
The classification stage is carried out using the Support
Vector Machine (SVM) method. Inputs used in this stage are Figure 4 is an illustration of how the testing process works.
parameter values in the form of cancer area, contrast, energy, In the testing process using SVM, the input used is the
entropy, and homogeneity. While the output produced in the parameter values generated from the feature extraction stage.
classification stage is a decision in the form of normal, benign, The values of these parameters are cancer area, contrast,
or malignant. There are 2 stages in the classification process energy, entropy, and homogeneity of CT-Scan images that are
using SVM, namely training and testing. used as input from the system. Then the parameter values will
be matched with the parameter values from the training
A. Training process
process stored in the database with a .model file. The output
The training process in Open CV can be done using the of this process is 0, 1, or 2, where 0 is normal, 1 is malignant
SVM::train library. This process aims to build the SVM lung cancer, while 2 is a benign lung cancer using equation
model. The training or learning process using the Support (6).
Vector Machine (SVM) method is done in two stages, namely
the creation of a .txt file and the training process which is done yi(wxi + b) > O.untuk i = 1, 2, ...n (6)
through the command prompt.
Where xi is the input data, Yi is the output result that has
• Creation o f a .txt File a value of +1 or +2, w and b are the parameter values. If the
data output yi = +1, then the result is malignant lung cancer,
The first step that must be prepared in the training process
whereas if the data output yi = +2, then the result is benign
using the Support Vector Machine (SVM) method is the
lung cancer.
creation of a .txt file. This file will be used to store data in the
form of feature extraction values from images that have been C. Decision M aking
labeled as normal, benign lung cancers and malignant lung
The last stage is decision making. The decision-making
cancers. In this process 35 data CT, CT scan images are used,
stage is carried out after obtaining the values of the area,
there are 10 normal data, 20 malignant lung cancer data, and
contrast, energy, entropy, and homogeneity of the input image
5 benign lung cancer data. Writing format in .txt file.
which are then matched with the data of the parameter values

646
Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.
in the result database of the training process. Decisions
resulting from this system can be normal, benign lung cancers,
or malignant lung cancers.

IV. EXPERIMENT RESULT AND ANALYSIS


CT-Scan images that have been taken from the hospital,
are still unclear to be processed, and therefore the need for a
preprocessing process to eliminate noise in the image and
clarify it. Fig. 8. Cancer Area Results

After the segmentation process, the next step is feature


extraction, which takes the pixel area from the contour and
calculates the contrast, energy, entropy, and homogeneity of
the cancer object using the Gray Level Co-occurrence Matric
(GLCM) method.
TABLE I
VALUE OF FEATURE EXTRACTION
Value of Feature Extraction
Fig. 5. Grayscale processing results Large
No Contrast Energy Entropy Homogeneity
(mm)
The inserted image is from the grayscale process that has 1 0 0 0 0 0
been done before and to facilitate further processing of the 2 812.1 40523 153977.4 4173.1 155471.8
image that has been grayscale converted to a binary image. 3 480.2 31318.5 154793.1 3137.7 155995.5
Thresholding is used to draw objects that are needed to be 4 6243.3 441635.5 131294.3 32499 140949.9
5 5644.8 481262.3 129497.5 34689.5 139839.8
white and the others that are not needed are black, then the
output of the thresholding process can be seen in Figure 6. Table I shows the results of feature extraction values
wherein number 1 uses normal data, in numbers 2 and 3 use
benign data, while numbers 4 and 5 use malignant data. The
extraction of GLCM features on the image will produce
parameters of contrast, energy, entropy, and homogeneity.
This parameter is a feature value which is then carried out by
the training process to build a knowledge system, the training
process uses as much as 35 data. Knowledge is built using the
SVM classification method. In this study, the SVM algorithm
Fig. 6. Thresholding process results is available in the library using the C++ programming
language, where the parameters used are the cancer area
After going through the process of segregation with find (mm), contrast, energy, entropy, and homogeneity. The
contour, eating will produce an image of a separate cancer knowledge that has been acquired is then tested to determine
area. The image of the find contour process can be seen in the accuracy level of the SVM classification algorithm, where
Figure 7. the SVM method produces an accuracy of 83.33 with several
testing data of 30.

V. CONCLUSION
This paper discusses the development of a CT-Scan based
image-based lung cancer detection system. This system can
help in answering the problem of determining lung cancer
based on benign and malignant types which can be seen from
CT scan images which are then processed with this system so
that it can contribute to the medical field to facilitate the
Fig. 7. Result Find Contour
diagnosis of lung cancer. From the system trial, the level of
After successfully knowing the cancer contour area, then accuracy based on the system decision in determining the
the reconstruction process is carried out to retrieve and return diagnosis of benign or malignant lung cancer is 83.33%.
the color of the contour to find results to the original, then the
Re f e r e n c e s
output of the reconstruction process can be seen in Figure 8.
[1] https://www.who.int/en/news-room/fact-sheets/detail/cancer (akses 28
ju n i2019)
[2] , and A. Canan,
"Artificial neural network-based classification system for lung nodules
on computed tomography scans," 2014 6th International Conference of
Soft Computing and Pattern Recognition (SoCPaR), Tunis, 2014, pp.
382-386.
[3] Y. F. Riti, H. A. Nugroho, S. Wibirama, B. Windarta, and L. Choridah,
"Feature extraction for lesion margin characteristic classification from
CT Scan lungs image," 2016 1st International Conference on

647
Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.
Information Technology, Information Systems and Electrical on Systems, Signals and Image Processing (IWSSIP), London, 2015,
Engineering (ICITISEE), Yogyakarta, 2016, pp. 54-58. pp. 5-8.
[4] R. Wulandari, R. Sigit, and S. Wardhana, "Automatic lung cancer [8] E. Rendon-Gonzalez and V. Ponomaryov, "Automatic Lung nodule
detection using color histogram calculation," 2017 International segmentation and classification in CT images based on SVM," 2016
Electronics Symposium on Knowledge Creation and Intelligent 9th International Kharkiv Symposium on Physics and Engineering of
Computing (IES-KCIC), Surabaya, 2017, pp. 120-126. Microwaves, Millimeter and Submillimeter Waves (MSMW),
[5] L. Anifah, Haryanto, R. Harimurti, Z. Permatasari, P. W. Rusimamto, Kharkiv, 2016, pp. 1-4.
and A. R. Muhamad, "Cancer lung detection on CT scan image using [9] A. Kulkarni and A. Panditrao, "Classification of lung cancer stages on
artificial neural network backpropagation based gray level co­ CT scan images using image processing," 2014 IEEE International
occurrence matrices feature," 2017 International Conference on Conference on Advanced Communications, Control and Computing
Advanced Computer Science and Information Systems (ICACSIS), Technologies, Ramanathapuram, 2014, pp. 1384-1388.
Bali, 2017, pp. 327-332. [10] S. A. El-Regaily, M. A. M. Salem, M. H. A. Aziz and M. I. Roushdy,
[6] D. P. Kaucha, P. W. C. Prasad, A. Alsadoon, A. Elchouemi, and S. "Lung nodule segmentation and detection in computed tomography,"
Sreedharan, "Early detection of lung cancer using SVM classifier in 2017 Eighth International Conference on Intelligent Computing and
biomedical image processing," 2017 IEEE International Conference on Information Systems (ICICIS), Cairo, 2017, pp. 72-78.
Power, Control, Signals and Instrumentation Engineering (ICPCSI), [11] M. H. Jony, F. Tuj Johora, P. Khatun and H. K. Rana, "Detection of
Chennai, 2017, pp. 3143-3148. Lung Cancer from CT Scan Images using GLCM and SVM," 2019 1st
[7] F. Taher, N. Werghi, and H. Al-Ahmad, "Computer-aided diagnosis International Conference on Advances in Science, Engineering and
system for early lung cancer detection," 2015 International Conference Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-6.

64 8
Authorized licensed use limited to: Alliance University. Downloaded on March 13,2025 at 15:18:04 UTC from IEEE Xplore. Restrictions apply.

You might also like