0% found this document useful (0 votes)
18 views5 pages

Prediction Lung Cancer in Machine Learning Perspective

Uploaded by

matrubhoomi.info
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Prediction Lung Cancer in Machine Learning Perspective

Uploaded by

matrubhoomi.info
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Prediction Lung Cancer– In Machine Learning

Perspective
1st Nikita Banerjee
Computer Science And Engineering 2nd Subhalaxmi Das
College of Engineering and Technology Computer Science And Engineering
Bhubaneswar,India College of Engineering and Technology
[email protected] Bhubaneswar, India
[email protected]

Abstract— Past years have experienced increasing mortality century. There is various cause of lung cancer like smoking,
rate due to lung cancer and thus it becomes crucial to predict exposure to radon gas, secondhand smoking, and exposure
whether the tumor has transformed to cancer or not, if the to asbestos etc. Lung cancer is of two type small cell lung
prediction is made at an early stage then many lives can be cancer (SCLC) and non small cell lung cancer (NSCLC).
saved and accurate prediction also can help the doctors start Non-small cell lung cancer is more common than SCLC and
their treatment. Computed tomography plays a vital role in
ensuring the condition of tumor that by checking the size of
it generally grows and spreads more slowly. SCLC is almost
tumor, location of tumor, etc. In this paper, we have proposed related with smoking and grows more quickly and form
a framework for prediction of cancer at an early stage so that large tumors that can spread widely through the body. They
many lives that are in an endangered situation could be often start in the bronchi near the center of the chest. Lung
revived. Basically, our focus is on two domains of computer cancer death rate is related to total amount of cigarette
science that is Digital Image Processing acronymed DIP and smoked. [1] Symptoms that may suggest lung cancer
Machine Learning. Digital image processing is well-known for include:
the phase of preprocessing the image. In the further stage, the
pre-processed image is exposed to segmentation phase and then • dyspnoea (shortness of breath with activity),
the segmented image is passed for feature extraction and
finally the extracted features are trained using machine • haemoptysis (coughing up blood),
learning classification algorithms like SVM (Support Vector
• chronic coughing or change in regular coughing
Machines), Random Forest, ANN (Artificial Neural Network) .
Based on the classification results obtained, prediction is made pattern,
whether the tumor is benign or malignant. The inevitable • wheezing, chest pain or pain in the abdomen,
parameters such as accuracy, Recall and precision are
cachexia (weight loss, fatigue, and loss of appetite),
calculated for determining which algorithm has the highest
predictive accuracy. • dysphonia (hoarse voice),
• clubbing of the fingernails (uncommon),
Keywords—Lung Cancer, Edge detection, Segmentation,
SVM, Random Forest, ANN • dysphasia (difficulty swallowing),

I. INTRODUCTION • Pain in shoulder, chest, arm, Bronchitis or


pneumonia,
With the rapid increase in population rate, the rate of
diseases like cancer, chikungunya, cholera etc., are also • Decline in Health and unexplained weight loss [1]
increasing. Among all of them, cancer is becoming a
To diagnose lung cancer various techniques are used like
common cause of death. Cancer can start almost anywhere
chest X-Ray, Computed Tomography (CT scan), MRI
in the human body, which is made up of trillions of cells.
(magnetic resonance imaging) through which doctor can
Normally, human cells grow and divide to form new cells as
decide the location of tumor based on that treatments are
the body needs them. When cells grow older or become
given. Now it is important that the disease diagnose should
damaged, they die, and new cells take their place. When
be done in early stage so that many life’s can be saved. As
cancer cells develop, however, this orderly process breaks
the medical images are full of noise and due to the present
down. As cells become more and more abnormal, old or
of noise it becomes very difficult for prediction. So, for that
damaged cells survive when they should die, and new cells
reason image processing technique will be applied on the
form when they are not needed. These extra cells can divide
medical image for pre-processing and then on the pre-
without stopping and may form growths called tumor. This
processed image machine learning algorithm is implemented
tumor starts spreading to different of body.
for predicting lung cancer.
Tumors are of two types benign and malignant where II. ENABLING TERMINOLOGY
benign (non-cancerous) is the mass of cell which lack in
ability to spread to other part of the body and malignant A. Segmentation
(cancerous) is the growth of cell which has ability to spread The objective of lung image segmentation is to extract
in other part of body this spreading of infection is called the size of lung parenchyma from the preprocessed image
metastasis. There is various type of cancer like Lung cancer, and to remove windpipe, tubular branches, alveoli, and
leukemia, and colon cancer etc. The incidence of lung muscles from the image to give more accuracy and to reduce
cancer has significantly increased from the early 19th the complication while doing feature extraction. There is

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on November 04,2022 at 07:17:11 UTC from IEEE Xplore. Restrictions apply.
numerous ways for segmentation like active contour model, Lynch et al., [6] Various machine learning algorithm are
Edge detection, Watershed segmentation etc. For Example implemented for predicting the survivability rate of person,
for segmenting a lung CT scan image the steps are (1) apply performance is measured based on root mean square error.
edge detection on the preprocessed image (2) Then apply Each model is trained using 10-fold cross validation, as the
threshold to the edge so that if the image edge intensity is parameters are preprocessed by assigning default value so
less than threshold will remove and the intensity which is cross validation is used for avoiding over fitting.
more than the threshold will be considered. (3) Then
morphological technique will be applied like morphological FENWA et al., [3] proposed a model whether feature like
closing or opening. (4) After this step we will apply contrast, brightness from the image dataset is extracted
morphological segmentation to get the volume of the lung. using texture based feature extraction and on that two type
[2] of machine learning algorithm are applied one is artificial
neural network another one is support vector machine and
B. Machine Learning Context to LungCancer
then performance has been evaluated on both the algorithm
Machine learning is used for classify the tumor whether to compare which algorithm is giving more accuracy.
the tumor is benign or malignant. The algorithms in context
to Lung Cancer Prediction are as follows: Öztürk et al., [7] proposed a model where a five type of
feature extraction techniques were used in individual
1. Support Vector Machine- SVM help in prediction
classification algorithm to predict at which features
of cancer by separating the dataset into two classes
by using kernel functions as the image data are high extraction technique which machine learning algorithm is
dimensional. As the images are arranged in non giving more accuracy.
linear manner so it will plot the image in 3-D plane Jin et al., [8] proposed a model where the original image is
by using kernel function like polynomial kernel, first converted into binary image the erosion and dilution
Gaussian kernel, radial basis function etc, and has been operated on that image after that image has been
separate the class using a hyper plane. For example segmented on the segmented image region of interest
if we take a CT scan of lung the image will be first extraction is applied to identify volume or size of the tumor
pre-processed ad then the pre-processed image is
and after extraction convolutional neural network is applied
trained by using RBF kernal while training the
with softmax classification layer to recognize the tumor is
image are labeled as 1 and 2 for normal and
abnormal tumor after training and testing it form a cancerous or not.
confusion matrix which show the prediction in two Sumathipala et al., [9] proposed a model where the image
form classification and misclassification based on data are taken from LIDC-IDRI, after collecting the image
the classification table we can generate accuracy of data image filtration has been implemented, filtration is
our prediction.[3] done based on the patient who went through biopsy and
2. Artificial Neural Network- Artificial neural network module level is equal to 30 and then images whose module
is a concept generated from biological neural level is equal to 30 is segmented and then Logistic
network. A multilayer feed forward neural network regression and random forest has been applied for
consists of input layer, hidden layer, Output layer. prediction.
The image is first inserted into input layer is
forwarded to calculate the activation value and then IV. PROPOSED FRAMEWORK FOR CANCER PREDICTION
at output layer activation function is calculated and Based on the literature survey a novel model has been
aggregated to get O(x), the difference between O(x) proposed which consist of pre-processing block,
and desire output that is error is calculated using segmentation block, feature extraction block and then
Backpropagation algorithm where some weight is classification block. In prediction of cancer CT scan report
assigned and the error deviates backward for is basically used. But CT scan report is full of noise which
optimal error value.[4]
cannot be seen by human eye for that reason various digital
3. Random Forest- Random is a collection of decision image processing plays a important role to get a noise free
tree. It use the concept of bagging , it can handle image. Digital image processing is the process where the
many numbers of variable without deleting any analysis and manipulation of image is used to extract some
variable. useful information from the image. Digital image processing
involve various step like image pre-processing where we
III. SUMMARY OF LITERATURE SURVEY
can enhance the image using histogram equalization, spatial
Many works has already been proposed for prediction of filter etc. Then image restoration can be done where various
cancer by various researchers among then Palani et al., [5] kind of noise like salt and pepper noise, Gaussian noise etc
has proposed IoT based predictive modeling by using fuzzy are applied and filter like median filter, mean filter can be
C mean clustering for segmentation and incremental applied on the pre-processed image. After that color
classification algorithm using association rule mining and conversions is applied only if the image is colored image
decision tree for classification for classifying the tumor sets then convert it to gray level. Fig. 1 shows the proposed
and based on the output generated by incremental novel framework.
classification model convolutional neural network has been
applied with other features for predicting benign or Image segmentation is a process which divides the image
malignant. into several segment based on the pixel, once the image
segmentation is over the feature extraction can be applied.
Feature extraction is a type of dimensionality reduction

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on November 04,2022 at 07:17:11 UTC from IEEE Xplore. Restrictions apply.
where a set of raw data is reduced to more manageable exist various ways to segment using watershed segmentation
group image data for extracting the feature like region and here we have used watershed segmentation using gradient.
texture. After extracting the feature different machine The gradient magnitude is used to preprocess the gray scale
learning technique is used to classify the image. image; it has high pixel value along the object edge and low
pixel value in another left region. And through this we can
get the final segmented image through which we can extract
features.
C. Feature Extraction Layer
The output generated by segmentation is used for feature
extraction. By doing feature extraction we have extracted
two types of feature one is region based another is texture
based region based we have extracted feature like area in
context to image means pixel of the image, perimeter in
context image mean vector containing the distance around
the boundary of each region in the image, centroid means
the centre of mass of the region and it is in 1 X 2 vector
form, image and based on texture we have extracted feature
like mean is used to find average intensity, standard
deviation is used to measure average contrast, smoothness
used to measure relative smoothness of the intensity in the
region, entropy is used to measure randomness using
statistical approach of texture based.
D. Classification Layer
After feature extraction we will apply classification
technique on both the feature to compare at which feature
extraction which machine learning algorithm is giving more
accuracy. Machine learning algorithm which has been used
Fig 1. Proposed framework for lung cancer prediction
is support vector machine, artificial neural network and
Random forest. After applying classification technique, it
A. Pre-Processing Layer can be predicted that the tumour is cancerous or not and at
Image has been collected from LIDC-LDRI. The original which feature we are getting more accurate prediction.
image was full of noise and for that first we have applied Proposed Algorithm can be viewed as follows:
histogram equalization on the image to enhance the image
and then on the equalized image median filter has been Input: Image Data (ID)
applied to remove the noise which was already present in Output: Classification as benign or malignant
the image after getting the noise free image we have applied
some more noise in the image yield more clearer picture Step 1: Input the image data (ID)
then again noise has been removed using median filter. Step 2: Pre-Process the image
Generally median filter is non linear digital filtering Step 2.1: If the image is noise free
technique and it is also used as smoothing of images as it Go to step 3
don’t blur the edges completely as compare to other Else
filtration technique like Gaussian filter or average filter. Go to step 2.1
Step 2.2: Apply image Enhancement Method
B. Segmentation Layer Step 2.3: Apply filter to enhanced image to reduce
Image segmentation is a method of partitioning the image noise
into various parts. After pre-processing the image on the Step 3: Segment the image
pre-processed image segmentation is applied to acquire the Step 3.1: Segment the boundary of the output
information from the image. For image segmentation first image generated at step 2.3 using Edge Detection
we have applied edge detection technique through edge Step 3.2: After edge detection segment apply
detection we can segment the boundary of the image for watershed gradient segmentation.
edge detection prewitt operator has been used, on that Step 4: Feature Extraction
operator threshold has been applied so that after edge Step 4.1: Region based feature are extracted like
detection the intensity value which is less than threshold is area, perimeter, centroid.
removed and the intensity value which is higher than or Step 4.2: Statistical based feature are extracted like
equal to threshold will consider for further segmentation mean, standard deviation, smoothness.
after getting the segmented image by edge detection we will Step 5: Apply classification algorithm for training and
apply watershed segmentation on the output image. prediction of tumour as benign or malignant.
Watershed segmentation takes the concept topographical Step 6: Evaluate the parameter like accuracy, precision,
landscape with ridge and valley which is defined by a gray Recall.
level with respective pixel or gradient magnitude. There Step 7: End

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on November 04,2022 at 07:17:11 UTC from IEEE Xplore. Restrictions apply.
V. PERFORMANCE EVALUATION TABLE I. REGION BASED FEATURE

Model evaluation metrics are used to evaluate the Accuracy Precision Recall F1
performance of the model. The choice of metrics depends on score
the machine learning task. Types of model evaluation Random 79% 100% 50% 67%
parameter are: Forest
SVM 86% 100% 67% 80%
ANN 92% 100% 69% 81%
A. Confusion Matrix
Confusion matrix gives a detail description of
classification or misclassification in a form of matrix. It
consists of true positive (correctly predict the positive class),
true negative (correctly predict the negative class), false
positive (incorrectly predict the positive class), false
negative (incorrectly predict the negative class).
B. Clasification Accuracy
It is used to measure the performance of our prediction.
It can be measure by correct prediction by overall prediction
made.

Fig 2. Performance measure based on region based


extraction
C. Recall
It measures the proportion of actual positive that are TABLE II. TEXTURE BASED FEATURE
correctly identified.
Accuracy Precision Recall F1
Score
Random 70% 89% 47% 62%
Forest
D. Precision
SVM 80% 90% 57% 69%
It measure the proposition of positive identification is ANN 96% 100% 69% 81%
actually correct.

E. F1 score
F1 score is the average of both precision and recall.


VI. RESULT AND DISCUSSION


In the proposed model for classification of tumour begin
malignant or benign the machine learning algorithm used is Fig 3. Performance measure based on texture based
artificial neural network, Random forest and Support vector extraction
machine. In both the feature that is region based and texture
based artificial neural network is giving more accuracy. And VII. CONLUSION AND FUTURE SCOPE
comparing the accuracy with the proposed model, then it
can be seen that accuracy has been increased whereas recall The proposed model shows the overview of prediction of
was less. For digital image processing was implemented in lung cancer at an early stage. After prediction of the tumour
matlab R2017a and for classification using machine learning begins malignant or benign, we generate a confusion matrix
was implemented in jupyter notebook. A comparison for each machine learning technique and based on the
between both the features is shown below confusion matrix we calculate accuracy, Recall, precision
and F1 score.
From the result we can say that our proposed model can
distinguish between benign and malignant, and it can be
seen that artificial neural network is providing more
accuracy in both texture and region based, as well as from

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on November 04,2022 at 07:17:11 UTC from IEEE Xplore. Restrictions apply.
the recall value we can say that it has correctly indentified [4] Daoud, Maisa, and Michael Mayo. "A survey of neural network-based
cancer prediction models from microarray data." Artificial
maximum number of malignant tumour intelligence in medicine (2019).
[5] Palani, D., and K. Venkatalakshmi. "An IoT based predictive
In near future deep learning shall outperform machine modelling for predicting lung cancer using fuzzy cluster based
learning in the field of image classification, object segmentation and classification." Journal of medical systems 43.2
recognition and feature extraction. CNN networks are well- (2019): 21.
known for its features in providing accuracy with higher [6] Lynch, Chip M., et al. "Prediction of lung cancer patient survival via
supervised machine learning classification techniques." International
number of hidden layers in it. journal of medical informatics 108 (2017): 1-8.
[7] Öztürk, Şaban, and Bayram Akdemir. "Application of feature
extraction and classification methods for histopathological image
REFERENCES using GLCM, LBP, LBGLCM, GLRLM and SFTA." Procedia
computer science 132 (2018): 40-46.
[1] Krishnaiah, V., G. Narsimha, and Dr N. Subhash Chandra. "Diagnosis [8] Jin, Xin-Yu, Yu-Chen Zhang, and Qi-Liang Jin. "Pulmonary nodule
of lung cancer prediction system using data mining classification detection based on CT images using convolution neural
techniques." International Journal of Computer Science and network." 2016 9th International symposium on computational
Information Technologies 4.1 (2013): 39-45. intelligence and design (ISCID). Vol. 1. IEEE, 2016.
[2] Zhang, Junjie, et al. "Pulmonary nodule detection in medical images: [9] Sumathipala, Yohan, et al. "Machine learning to predict lung nodule
a survey." Biomedical Signal Processing and Control 43 (2018): 138- biopsy method using CT image features: A pilot
147. study." Computerized Medical Imaging and Graphics 71 (2019): 1-8.
[3] Fenwa, Olusayo D., Funmilola A. Ajala, and A. Adigun.
"Classification of cancer of the lungs using SVM and ANN." Int. J.
Comput. Technol. 15.1 (2016): 6418-6426.

Authorized licensed use limited to: K K Wagh Inst of Engg Education and Research. Downloaded on November 04,2022 at 07:17:11 UTC from IEEE Xplore. Restrictions apply.

You might also like