Boban 2020
Boban 2020
Boban 2020
Abstract—Machine learning (ML) is a significant subset of computed tomography images could be used to distinguish
Artificial Intelligence (AI) that plays a key role in medical many body tissues. A medical diagnosis that can be performed
diagnosis. The advantage of AI is they can automatically learn, using traditional X-rays provides multiple images within the
extract and translate the features from data sets such as images,
text or video, without introducing traditional hand-coded code or
body. The cross-sectional CT scan images provided a variety
rules. This paper focuses on recognizing and classifying lung of body planes that can be generated in the 3D view. CT scans
diseases by ML algorithms. It includes 400 lung disease images include high resolution pictures of lungs that can be viewed on
(i.e. CT scan images) including bronchitis, emphysema, pleural a PC or printed on a film. Lungs are responsible for oxygen
effusion, cancer, and normal. The input image is analyzed, supply and carbon dioxide exhalation as well. Most individuals
categorized and classified using ML algorithms such as the MLP, have smoking habit that leads to infection and biological
KNN and SVM classifier. After feature extraction, the output is
segmented and compares the classifier’s accuracy. When a CT disorders that cause pulmonary diseases.
scan image was given to a classifier as an input, it contains This paper contains four disease types (i.e. bronchitis,
irrelevant information. For the selection of the most relevant emphy-sema, pleural effusion, and cancer) as well as a normal
features (i.e. for extracting characteristics) here Gray Level Co- lung CT scan. The inflammation between the nose area and the
occurrence Matrix (GLCM) is used. For MLP, this classifier lung tissue that surrounds the airways causes bronchitis. This
acquires 98% accuracy, for SVM accuracy is 70.45% and for
causes pneumonia. Emphysema is a form of COPD (chronic
KNN accuracy is 99.2%. These classifiers will help the doctors to
prescribe the most effective treatment for a patient. obstructive lung disease) that causes damage to the lung air
sacs when germs affect pleural space. Cigarette smoking
Index Terms—Machine learning (ML), Artificial Intelligence triggered this. Pleural effusion is otherwise referred to as lung
(AI), Gray-Level Co-occurrence Matrix (GLCM), Multilayer water. It is due to the accumulation of excess fluid between
perceptron (MLP), K-nearest neighbors (KNN), Support vector pleura layers. It will damage the inhalation and exhalation and
machine(SVM) reduce lung tissue growth. Lung cancer is uncontrollably
caused by cell division in the lung and will affect breathing.
I. INTRODUCTION
When CT scan image itself is used as an input, we re-quire a
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.
The proposed method should be carried out in four phases, lung cancer detection. This paper proposes a computational
i.e. Pre-processing is done in the first phase with the use of method, i.e. particle swarm optimization (PSO) with neural
median filters and morphological smoothening. The network. In this paper [8], authors concentrated on detection of
characteristics are derived from the pre-processed picture lung cancer at early stage. For the identification, a non-
using GLCM (Gray-Level Co- occurrence Matrix) parametric process, like genetic K-Nearest Neighbor (GKNN)
methodology. The second last phase of detection and algorithm is suggested. In this process K (50-100) are chosen
separation of lung ailments is accomplished using the MLP for each iteration using genetic algorithms and performance
(Multilayer Perceptron), Sup-port Vector Mechanism (SVM) tests in the exact range of 90%. Researchers introduced in this
and KNN (k-nearest neighbor) classifications. The final phase paper[9] a K-immediate neighbors classification to define and
performance evaluation of the classifier. For implementing distinguish cancer into harmless or malignant pictures. In the
these algorithms software’s such as MATLAB or python, can classification of benign or malignant tumor, the overall
be used. classification acquired by the classifier is 97%. The learning
The rest of the paper is organized as below. Section II and time in this K nearest neighbor algorithm is 3 seconds and the
Section III describes about the literature review and nearest neighbor distance is 0.20889. Authors applied a SVM
methodology of the work. The simulation results are discussed based description of diagnosis of lung cancer in this paper
in Section IV. At last, Section V concludes the paper with [10]. CLAHE Equalization technique improved the contrast of
conclusion of the work. the CT scan graphic. After that, the method of walk
segmentation was implemented. The writers in this paper [11]
II. LITERATURE REVIEW used median filters to minimize noise without affecting
This paper [1] discusses the potential for medical diagnosis performance in pre-processing. After that feature, extraction
and prediction of osteoporosis by risk factor in the use of an has been done and the feature extracted has been selected by
artificial neural network (ANN). Artificial neural network PSO (particle swarm optimization) algorithm method and lung
(ANN) is developed in tandem with Probabilistic Neural disease classification has been done. In this paper [12] author
Networks (PNN) based on MLPs with back propagation.In this proposed, a KNN based classifier together with the genetic
paper [2], authors proposed a neural network focused on MLP algorithm for heart disease detection. Here values have been
backpropagation to predict heart disease.Here vari-ous multi- taken and recorded for different k values.
layer perceptron training functions are compared and the best In this paper [13], features were derived from the GLCM
training function is chosen for training. MLP with TRAINBR method and the neural network back propagation algorithm
training algorithm gives 96.3% accuracy in heart disease was used for the classification of images. In the training stage,
prediction. In this paper [3], authors developed an artificial the classifier reaches 95% precision and 81.25% exactness in
neural network with histogram based genomic gradient the evaluation level. This paper [14] explores the use of a
characteristics for predicting lung cancer. Together with neural network to diagnose various patterns of rubella, German
histogram based gradient genomic features, this ANN network measles and chickenpox signs, based on the pores and skin
provides 95.90% percent accuracy and 0.0159 mean square symptoms. The ANN will examine the signs and provides
error. In this paper [4], two forms of ANNs used to identify better predictions and credibility than a human doctor. Thus,
and diagnose Parkinson’s disease were suggested by patients can be monitored entirely based on the signs found for
researchers. One is MLP (MultiLayer Perceptron) and the pores and skin problems. In this paper [15], a novel approach
other is RBF (Radial Base Function). MLP is the best is suggested within order to achieve better rates of
classifier with 93.22% percent accuracy based on the accuracy classification by integrating the predictive T-test and absolute
compar-ison. RBF classifier offers just 86.44% accuracy in ranking. Appropriate classification methods are also explored
classifying the same set of data. This can assist neurologists in using linear SVM, proximal SVM and Newton SVM. Also
their medical diagnosis. In this paper [5], researchers presented is a descriptive study on the various extraction
suggested a diagnostic method to assist doctors in the techniques. In this paper [16], they describe the image
diagnosis of heart disease based on patient clinical conditions processing technique like fractal image compression and its
after translating it into numerical representation. Two properties and a method to improve the performance.
classifiers were proposed: Multi-Layer Perceptron Neural
Network (MLP) and Support machine vector (SVM). Here III. METHODOLOGY
they considered the classification of two heart diseases and
A. Multilayer perceptron (MLP)
used the collected database to evaluate the performance of this
classifier. Neuron is a basic building block of a neural network (MLP)
In this paper [6], authors proposed a Convolution Neural which is also known as artificial neurons that takes certain
Network (CNN) for the classification of malignant or benign number of weighted input signals and bias and produce
tumors in the lung. By using CNN as a classifier, the accuracy weighted output based on activation function as shown in Fig.
reached 96%, which is better than the traditional neural 1. When a network has 5 inputs it will have 5 weights that can
classifier accuracy. In this paper [7], authors focus on early be adjusted in training section.
0316
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.
Back propagation - After forward propagation we get a
predicted value at output side in order to find the error we
compare the actual output value with these predicted one (loss
function is usually used). Their difference is error, In order to
reduce error we calculate the derivative of the error with
respect to each and every weight in the network. Calculating
the derivative gradients start from the last layer weights and
move backwards until we reaches initial layer. Then subtract
these gradient value from current weights and initialize the
result as new weight. Then the input is given to check whether
the error reduced. It will continue until the error reaches
minimum value
B. K-nearest neighbors (KNN)
Algorithm for K neighbors (KNN) uses the similarity
function to estimate values for the new data points, implying
that a score will also be allocated to the current data points
depending on how exactly they fit the training points.
0317
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.
training row is then calculated. The distance calculated in affecting the sharpness of image. Fig. 3 shows when a CT
ascending order based on distance values is sorted from scans image given as input: a-bronchitis, b-emphysema, c-
Euclidean distance as distance metric. pleural effusion, d-normal, e-lung cancer. first column
Phase 3: Then get top k rows from the categorized list. The corresponds to original image, then gray image and finally
most common class is the real one. filtered image.
C. Summer Vector Machine
Multi-class SVM attempts to allocate marking to instances of
Fig. 4. Feature matrix.
supporting vector machines that derive the mark from several
elements in a finite range.The approach used here is to reduce Fig. 4. is the features extracted using GLCM function. Here
the single multi-class problem to several binary classification we take only eight features and this is given to classifier for
problems via a one-to-all approach. The one-over-all approach identifying the disease and for correctly classifying it.
is to create binary classifiers that differentiate one label from
the rest.
From Fig. 2, first the input image (i.e. RGB image) is con-
verted into grey format and applied to median filter to remove
noises and for smoothening. Then the output image is now
applied to GLCM so that certain parameters (Contrast, Corre-
lation, Energy, Homogeneity, Mean, Standard deviation, En-
tropy, RMS) can be extracted. Then segmentation is done here
we identifying the affected area. Finally images are passed to
the classifier, where the classification takes place. After
applying the classification techniques on the same dataset, it is
Fig. 5. Result.
found that KNN classifier is having higher accuracy than
simple MLP and SVM classifier.
Fig. 3. CT scan image. tp- True positive (The actual class is correctly predicted).
0318
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.
The output is get as probability as shown in Fig. 9. Here in this
tn- True negative (The actual class is wrongly predicted). figure third row value is high i.e. the Ct scan image is belongs
to cancer class.
fp- False positive (The wrong class is correctly predicted).
V. CONCLUSION
In this project we are giving CT scan image of lungs in jpg
format as an input to the program. After pre-processing i.e.
converting to gray image and remove the noise then it is fed
for feature extraction using GLCM. Here we get a matrix that
contains only needed features; it helps to save time and
memory i.e. to reduce the variables. After that matrix is given
to successfully trained classifiers and compare the
performances. Segmentation is done by using masking and
thresholding. Comparing the performances shows that KNN
(K nearest neighbor) is more accurate than MLP (Multi layer
preceptron) and Support vector machine (SVM) classifiers. .
0319
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.
infrastructures required for completing this project to Dr [8] P . Bhuvaneswari , Dr. A. Brintha Therese ,“Detection of cancer in lung
with k-nnclassification using genetic algorithm”,2nd International
Rajesh Kannan Megalingam. I appreciate everyone who Conference onNanomaterials and Technologies, 2014.
helped me get this project done in good time. [9] P. Thamilselvan, Dr. J. G. R. Sathiaseelan,”An enhanced k nearest
neighbor method todetecting and classifying mri lung cancer images for
REFERENCES large amount data”,International Journal of Applied Engineering
Research ISSN 0973-4562, vol.11, Number 6 pp 4223-4229, 2016.
[1] Dimitrios H. Mantzaris, George C. Anastassopoulos , Dimitrios K. [10] R Sathishkumar,Kalaiarasan K,Prabakaran A, Aravind M,”Detection of
Lymberopoulos, “Medicaldisease prediction using artificial neural net- lung cancer usingsvm classifier and knn algorithm”,International
works”, 2008 8th IEEE International Conference on BioInformatics and Journal of ScientificResearch and Review, Volume 8, Issue 3, 2019.
BioEngineering,Oct. 2008, DOI: 10.1109/BIBE.2008.4696782. [11] Tejinder Kaur,Neelakshi Gupta,”A new algorithm for classification of
[2] Durairaj M, Revathi V, ”Prediction of heart disease using back prop- lung diseases”,International Journal of Advances in Electronics and
agation mlpalgorithm”,International Journal of Scientific Technology Computer Science,ISSN: 2393-2835, Volume-2, Issue-9, Sept.-2015.
Research volume4, issue 08, August 2015. [12] M.Akhil jabbar, B.L Deekshatulua Priti Chandra ,”Classification of
[3] Emmanuel Adetiba, Oludayo O. Olugbara,” “Lung cancer prediction heart disease using k- nearestneighbor and genetic
using neural networkensemble with histogram of oriented gradient algorithm”,International Conference on ComputationalIntelligence:
genomic features”,TheScientific World Journal, Volume 2015, Article Modeling Techniques and Applications (CIMTA), 2013.
ID 786013,http://dx.doi.org/10.1155/2015/786013. [13] Kusworo Adi, Catur Edi Widodo, Aris Puji Widodo, Rahmat Gernowo,
[4] Farhad Soleimanian Gharehchopogh ,Peyman Mohammadi, “A case Adi Pamungkas, Rizky Ayomi Syifa ,”Detectionlung cancer using gray
study of parkinson’sdisease diagnosis using artificial neural net- level co-occurrence matrix (glcm) and backpropagation neural network
works”,International Journal ofComputer Applications (0975 – 8887), classification”,JOURNAL OF EngineeringScience and Technology Re-
vol. 73– No.19, July 2013. view, March 2018.
[5] Tabreer T. Hasan,Manal H. Jasim, Ivan A. Hashim, ”Heart disease [14] Monisha M; Suresh A; Rashmi M R, “Artificial Intelligence Based Skin
diagnosis systembased on multi-layer perceptron neural network and Classification Using GMM”, Journal of Medical Systems, vol. 43, no. 1,
support vector machine”,International Journal of Current Engineering p. 3, 2018.
and Technology, vol. 7, oct 2017. [15] Arunkumar Chinnaswamy , Ramakrishnan S, “Two Step Feature Ex-
[6] S. Sasikala, M. Bharathi, B. R. Sowmiya,“Lung cancer detection and traction Method for Microarray Cancer Data using Support Vector
classificationusing deep cnn”,International Journal of Innovative Tech- Machines”, International Journal of Computer Applications, vol. 85, no.
nology andExploring Engineering (IJITEE), ISSN: 2278-3075, Volume- 8, pp. 34-42, 2014.
8 Issue-2S,December, 2018. [16] Loganathan D, Amudha J; Mehata K.M,”Classification and feature
[7] Dr. S. Senthil,B. Ayshwarya,”Lung cancer prediction using feed vector techniques to improve fractal image coding”,IEEE Region 10
forward back propagation neural networks with optimal Annual International Conference, Proceedings/TENCON, Volume 4,
features”,International Journal ofApplied Engineering Research ISSN Bangalore, p.1503-1507 (2003).
0973-4562, vol. 13, Number 1 pp.318-325, 2018.
0320
Authorized licensed use limited to: Carleton University. Downloaded on October 04,2020 at 04:57:11 UTC from IEEE Xplore. Restrictions apply.