0% found this document useful (0 votes)
82 views5 pages

Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm

This document proposes using k-means clustering and naive Bayes algorithms to develop a system for predicting lung diseases. The system would analyze lung disease data using Weka tools to classify data into clusters and predict disease status. This approach aims to create a faster and more accurate prediction system to help doctors detect lung diseases earlier for better treatment outcomes.

Uploaded by

Mohammad Farhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views5 pages

Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm

This document proposes using k-means clustering and naive Bayes algorithms to develop a system for predicting lung diseases. The system would analyze lung disease data using Weka tools to classify data into clusters and predict disease status. This approach aims to create a faster and more accurate prediction system to help doctors detect lung diseases earlier for better treatment outcomes.

Uploaded by

Mohammad Farhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm

Introduction

In the real world, Lung cancer accounts for more deaths than any other cancer in both men
and women. Lung Cancer disease is the fifth leading cause of death in the world over the past
10 year (World Health Organization 2016). According to the WHO (World Health
Organization) report lung Disease is the leading cause of death across the world accounting
for 1.58 million, accounting for about 27 % of all cancer deaths. Death rate began declining
in 1991 in men and in 2003 in women.

Early detection of lung cancer is essential in reducing life losses. However earlier
treatment requires the ability to detect lung cancer in early stages. Early diagnosis requires an
accurate and reliable diagnosis procedure that allow physicians to distinguish benign lung
disease from malignant ones.

Health data is rapidly increasing in the world. Health data is very large and complex due to
this processing of data using traditional data processing techniques is very difficult. For
simplicity, machine learning techniques like KNN, SVM, D.T have been used. Some tool like
Python (pandas) and Weka are widely used in the data analytics field.

Objective

 To study different disease prediction algorithms and literature review.

 To design a system for lung disease prediction based on patient data.

 To design a system for higher accuracy in lung disease prediction than already
existing systems.

 To implement a system using multiple algorithms for increased time-efficiency.


Literature survey

Rucha Shinde , et.al (2015) ,nowadays people work on computers for hours and hours they
don’t have time to take care of themselves. Due to hectic schedules and consumption of junk
food, it affects the health of people and mainly heart. So to we are implementing an heart
disease prediction system using data mining technique Naïve Bayes and k-means clustering
algorithm. It is the combination of both the algorithms. This paper gives an overview for the
same. It helps in predicting the heart disease using various attributes and it predicts the
output as in the prediction form. For grouping of various attributes, it uses k-means algorithm
and for predicting it uses naïve Bayes algorithm.

V.Krishnaiah , et.al (2013) Proposed the potential use of classification based data mining
techniques such as rule based ,decision based, naïve Bayes to massive volume of healthcare
data. The healthcare industry collects huge amount of data which, unfortunately are not
mined to discover hidden information for data preprocessing and effective decision making
one dependency augmented naïve Bayes classifiers(ODANB) and naïve creedal classifiers 2
(NCC2) are used. This is extension of naïve Bayes to imprecise probabilities that aims at
delivering robust classification also when dealing with small or incomplete data sets.

S.Sudha, et.al (2013) data mining is defined as sifting through very large amounts of data
for useful information. Some of the most important and popular data mining techniques are
association rules, classification, clustering, prediction and sequential patterns. Data mining
techniques are used for variety of applications. In health care industry, data mining plays an
important role for predicting diseases. For detecting a disease number of tests should be
required from the patient. But using data mining technique the number of test should be
reduced. This reduced test plays an important role in time and performance.

T.Karthikeyan, et.al, (2014) presented a extraction algorithm used to improve the predicted
accuracy of the classification. This paper applies with Principal Component analysis as a
feature evaluator and ranker for searching method. Naive Bayes algorithm is used as a
classification algorithm. It analyzes the hepatitis patients from the UCI rvine machine
learning repository. The results of the classification model are accuracy and time. Finally, it
concludes that the proposed PCA-NB algorithm performance is better than other
classification techniques for hepatitis patients.

Pallavi Mirajkar, et.al (2011) Cancer identification and prediction are huge challenge to the
researchers. The use of various techniques of data mining techniques has revolutionized the
whole process of cancer Diagnosis and Prognosis. We are proposing integrated system which
is based on combination of various data mining techniques such as analytical hierarchy
process, rule based association, classification etc. that is helpful to predict the patient’s
disease status. Cancer disease risk can be discovered by analyzing and identifying various
factors and symptoms of the patient before recommending treatments. The vital aim of our
system is to help oncologist and medical practitioners in diagnosing the patient by analyzing
available data and relevant information.
Priyanka D, et.al (2014) Lung cancer is one of the major causes of death in both genders
when compared to all other cancers. Lung cancer has become the most hazardous types of
cancer in the world. Early detection of lung cancer is essential in reducing life losses. This
paper presents prediction on lung disease using K means algorithm. This project comprises of
three modules. First, admin module which is administrator’s login there the details of the
patient will be generated. Now the user will authenticate based on their credentials. The
second module is User module there the patient enters his username and password to predict
cancer. Third module is Cancer prediction module in which the result will be predicted at the
last stage with the help of K means algorithm. The K means will classify the input features
into two classes of cancer type (benign and malignant). This project is implemented in java as
the front end and mysql as the back end. This project aims to implement an effective
prediction on lung cancer with the help of K means algorithm user can know the cancer
status. From this project we infer that the K means is suitable for lung cancer prediction

Research Methodology

 To analyze data related to lung diseases for data mining through Weka.

 K-means clustering and naïve Bayes techniques will be use.

 Naive Bayes algorithm will be use as a classification algorithm.

 K-means clustering has the ability to handle massive data and cluster those data
efficiently and quickly.

 A simple and straightforward iterative method will be use to partition the data set into
k-number of clusters.
Tentative Outcomes

Lung disease prediction system will be developed by combining Naïve Bayes and K-
Means algorithm. Weka tools would be used to reduce the execution time of
algorithms. The prediction system may be faster, less computationally expensive, time
efficient and produce results that are more accurate. The proposed system will help
doctors to efficiently predict lung diseases in the initial stages for better treatment.

References

 [1] World Health Organization (2011) The top ten causes of death. World Health
Organization (2013) Deaths from coronary heart disease.

 [2] V.Krishnaiah, G.Narsimha, N.Subhash Chandra. 2013, “Diagnosis of Lung Cancer


Prediction System Using Data Mining Classification Techniques,” International Journal
of Computer Science and Information Technologies, Vol. 4 (1), 2013, 39 – 45

 [3] Rucha Shinde ,Sandhya Arjun,Priyanka Patil,”An intelligent heart disease prediction
system using k-means clustering and naïve bayes algorithm,” IJCSIT 2015 ,vol 6(1),2015

 [4] S.Sudha , S.Vijayarani , “Disease Prediction in Data Mining Technique” Vol. II, Issue
I, January 2013 (ISSN: 2278-7720).

 [5] T.Karthikeyan , P.Thangaraju, “PCA-NB Algorithm to Enhance the Predictive


Accuracy” 2014,IJET,vol.6(1)

 [6] Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, Alok
Choudhary, “A Lung Cancer Outcome Calculator Using Ensemble Data Mining on
SEER Data,” BIOKDD 2011, August 2011, San Diego, CA, USA, 2011.

 [7] S. S. Mohamed and M. M. A. Salama, “Computer-aided diagnosis for prostate cancer


using support vector machine,” Proceedings SPIE Med. Imag., vol. 5744, pp. 898–906,
2005.

 [8] MS.Mehdi Khundmir Iliyas, “Heart disease prediction using naïve Bayes and k-
means techniques”, IJRPET, VOLUME 3, ISSUE 6, Jun.-2017, ISSN: 2454-7875

 [9] S. Vijayarani and S. Sudha ,” An Efficient Clustering Algorithm for Predicting


Diseases from Hemogram Blood Test Samples “ Vol 8(17), DOI:
10.17485/ijst/2015/v8i17/52123, August 2015

 [10] Priyanka D ,Ms S Shehar Bano , Prediction on lung disease using k-means
algorithm, IJERT vol 1 issue 11, 2014

 [11] Tanupriya Choudhury, Vivek Kumar ,“ Intelligent Classification & Clustering


Of Lung & Oral Cancer through Decision Tree & Genetic Algorithm ,”
IJARCSSE, Volume 5, Issue 12, December 2015 ISSN: 2277 128X .
 [12] P.Ramachandran , N.Girija and T.Bhuvaneswari ,“ Early Detection and
Prevention of Cancer using Data Mining Techniques ,”IJCA vol (97) no-13,2014.

 [13] Ada , Rajneet Kaur ,“ A Study of Detection of Lung Cancer Using Data
Mining Classification Techniques”,IJARCSSE, vol 3 issue 3,2013

You might also like