Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm
Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm
Introduction
In the real world, Lung cancer accounts for more deaths than any other cancer in both men
and women. Lung Cancer disease is the fifth leading cause of death in the world over the past
10 year (World Health Organization 2016). According to the WHO (World Health
Organization) report lung Disease is the leading cause of death across the world accounting
for 1.58 million, accounting for about 27 % of all cancer deaths. Death rate began declining
in 1991 in men and in 2003 in women.
Early detection of lung cancer is essential in reducing life losses. However earlier
treatment requires the ability to detect lung cancer in early stages. Early diagnosis requires an
accurate and reliable diagnosis procedure that allow physicians to distinguish benign lung
disease from malignant ones.
Health data is rapidly increasing in the world. Health data is very large and complex due to
this processing of data using traditional data processing techniques is very difficult. For
simplicity, machine learning techniques like KNN, SVM, D.T have been used. Some tool like
Python (pandas) and Weka are widely used in the data analytics field.
Objective
To design a system for higher accuracy in lung disease prediction than already
existing systems.
Rucha Shinde , et.al (2015) ,nowadays people work on computers for hours and hours they
don’t have time to take care of themselves. Due to hectic schedules and consumption of junk
food, it affects the health of people and mainly heart. So to we are implementing an heart
disease prediction system using data mining technique Naïve Bayes and k-means clustering
algorithm. It is the combination of both the algorithms. This paper gives an overview for the
same. It helps in predicting the heart disease using various attributes and it predicts the
output as in the prediction form. For grouping of various attributes, it uses k-means algorithm
and for predicting it uses naïve Bayes algorithm.
V.Krishnaiah , et.al (2013) Proposed the potential use of classification based data mining
techniques such as rule based ,decision based, naïve Bayes to massive volume of healthcare
data. The healthcare industry collects huge amount of data which, unfortunately are not
mined to discover hidden information for data preprocessing and effective decision making
one dependency augmented naïve Bayes classifiers(ODANB) and naïve creedal classifiers 2
(NCC2) are used. This is extension of naïve Bayes to imprecise probabilities that aims at
delivering robust classification also when dealing with small or incomplete data sets.
S.Sudha, et.al (2013) data mining is defined as sifting through very large amounts of data
for useful information. Some of the most important and popular data mining techniques are
association rules, classification, clustering, prediction and sequential patterns. Data mining
techniques are used for variety of applications. In health care industry, data mining plays an
important role for predicting diseases. For detecting a disease number of tests should be
required from the patient. But using data mining technique the number of test should be
reduced. This reduced test plays an important role in time and performance.
T.Karthikeyan, et.al, (2014) presented a extraction algorithm used to improve the predicted
accuracy of the classification. This paper applies with Principal Component analysis as a
feature evaluator and ranker for searching method. Naive Bayes algorithm is used as a
classification algorithm. It analyzes the hepatitis patients from the UCI rvine machine
learning repository. The results of the classification model are accuracy and time. Finally, it
concludes that the proposed PCA-NB algorithm performance is better than other
classification techniques for hepatitis patients.
Pallavi Mirajkar, et.al (2011) Cancer identification and prediction are huge challenge to the
researchers. The use of various techniques of data mining techniques has revolutionized the
whole process of cancer Diagnosis and Prognosis. We are proposing integrated system which
is based on combination of various data mining techniques such as analytical hierarchy
process, rule based association, classification etc. that is helpful to predict the patient’s
disease status. Cancer disease risk can be discovered by analyzing and identifying various
factors and symptoms of the patient before recommending treatments. The vital aim of our
system is to help oncologist and medical practitioners in diagnosing the patient by analyzing
available data and relevant information.
Priyanka D, et.al (2014) Lung cancer is one of the major causes of death in both genders
when compared to all other cancers. Lung cancer has become the most hazardous types of
cancer in the world. Early detection of lung cancer is essential in reducing life losses. This
paper presents prediction on lung disease using K means algorithm. This project comprises of
three modules. First, admin module which is administrator’s login there the details of the
patient will be generated. Now the user will authenticate based on their credentials. The
second module is User module there the patient enters his username and password to predict
cancer. Third module is Cancer prediction module in which the result will be predicted at the
last stage with the help of K means algorithm. The K means will classify the input features
into two classes of cancer type (benign and malignant). This project is implemented in java as
the front end and mysql as the back end. This project aims to implement an effective
prediction on lung cancer with the help of K means algorithm user can know the cancer
status. From this project we infer that the K means is suitable for lung cancer prediction
Research Methodology
To analyze data related to lung diseases for data mining through Weka.
K-means clustering has the ability to handle massive data and cluster those data
efficiently and quickly.
A simple and straightforward iterative method will be use to partition the data set into
k-number of clusters.
Tentative Outcomes
Lung disease prediction system will be developed by combining Naïve Bayes and K-
Means algorithm. Weka tools would be used to reduce the execution time of
algorithms. The prediction system may be faster, less computationally expensive, time
efficient and produce results that are more accurate. The proposed system will help
doctors to efficiently predict lung diseases in the initial stages for better treatment.
References
[1] World Health Organization (2011) The top ten causes of death. World Health
Organization (2013) Deaths from coronary heart disease.
[3] Rucha Shinde ,Sandhya Arjun,Priyanka Patil,”An intelligent heart disease prediction
system using k-means clustering and naïve bayes algorithm,” IJCSIT 2015 ,vol 6(1),2015
[4] S.Sudha , S.Vijayarani , “Disease Prediction in Data Mining Technique” Vol. II, Issue
I, January 2013 (ISSN: 2278-7720).
[6] Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, Alok
Choudhary, “A Lung Cancer Outcome Calculator Using Ensemble Data Mining on
SEER Data,” BIOKDD 2011, August 2011, San Diego, CA, USA, 2011.
[8] MS.Mehdi Khundmir Iliyas, “Heart disease prediction using naïve Bayes and k-
means techniques”, IJRPET, VOLUME 3, ISSUE 6, Jun.-2017, ISSN: 2454-7875
[10] Priyanka D ,Ms S Shehar Bano , Prediction on lung disease using k-means
algorithm, IJERT vol 1 issue 11, 2014
[13] Ada , Rajneet Kaur ,“ A Study of Detection of Lung Cancer Using Data
Mining Classification Techniques”,IJARCSSE, vol 3 issue 3,2013