0% found this document useful (0 votes)
162 views

Diabetes Prediction Using Data Mining

This document describes a research project aimed at predicting diabetes using data mining techniques. It discusses motivations for the research due to the growing problem of diabetes. The research methodology will collect diabetes-related data, preprocess it, use classification algorithms like Naive Bayes, Decision Trees and Random Forest to predict diabetes. Previous related work that used similar techniques is also reviewed. The predicted outcomes will help physicians make more informed decisions to potentially diagnose and manage diabetes earlier.

Uploaded by

Fariha Tabassum
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views

Diabetes Prediction Using Data Mining

This document describes a research project aimed at predicting diabetes using data mining techniques. It discusses motivations for the research due to the growing problem of diabetes. The research methodology will collect diabetes-related data, preprocess it, use classification algorithms like Naive Bayes, Decision Trees and Random Forest to predict diabetes. Previous related work that used similar techniques is also reviewed. The predicted outcomes will help physicians make more informed decisions to potentially diagnose and manage diabetes earlier.

Uploaded by

Fariha Tabassum
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Diabetes Prediction Using

Data Mining
Prepared for:
Thesis Committee, Dept. of CSE, IUBAT

Prepared by: Supervised by:


Fahima Afroz Rozy – 17103091 Nusrath Tabassum
Fariha Tabassum – 17103092 Lecturer, Dept. of CSE, IUBAT
Contents
• Introduction
• Motivation
• Problem Statement
• Literature Review
• Research Methodology
• Conclusion
Introduction
• Data Mining
• It can be used to analyze large volume of medical data.
• In medical science data mining can be used in - predictive medicine, management
of healthcare, disease prediction etc.
• Diabetes is an incurable disease.
• Data Mining algorithm is used for testing the accuracy in predicting diabetes.
Motivation
• According to WHO, more than 422 million people are suffering from diabetes.
• Diabetes is seventh leading cause of death.
• Now the youngsters are the most affected by it.
• Type-1 diabetes- IDDM.
• Type-2 diabetes- NIDDM.
• Type-3 diabetes- Gestational diabetes.
•No permanent cure but can be balanced with proper treatment.
Problem Statement
• Diabetes is growing at an alarming rate nowadays.
• As it is incurable, but if we can predict diabetes in early stage, it can be balanced
with treatment.
• Clinical decisions are often made based on doctors’ experience rather than on the
rich database.
• Our objective of this research is to find out new features and factors that can change
the prediction of diabetes.
• The proposed system will predict a certain outcome based on a given input.
• The algorithm analyses the input and produces a prediction
Literature Review
1. “Prediction on Diabetes Using Data mining Approach” by Pardha Repalli,
Oklahoma State University .
• In this paper they have used –
Variable selection node
Cross Industry Standard Process
Decision Tree Algorithm
Regression Model
Literature Review
• Their average square error is 0.043.
• According to their research, people with age above 45 years are mostly affected by
diabetes.
• They have used already existing information in different databases to rework it into
new researches and results.
Literature Review
2. “Prediction of Diabetes Using Bayesian Network” by Mukesh Kumari, Dr. Rajan
Vohra and Anshul Arora.
• In this paper they have used –
Decision Tree Algorithm
Naïve Bayes Algorithm
Random Tree
NBTree
Weka Tool
Bayesian Network
Literature Review
• In this paper they used 206 records.
• Accuracy of Bayesian network is 99.51 which is high.
• This framework includes some initial parts, like login, enter side effects in the
system, and recommend medications etc.
• When the symptoms occur then the patient need the specialist's help but they are not
accessible because of some reason. This can be the limitation of this paper.
Research Methodology
Data Collection

Data Pre-processing

Training Classifier Test Dataset


Dataset

Positive Negative

Figure: Framework for Diabetes Prediction


Research Methodology
• The dataset we are going to use contains a record of 769 patients.
• Features are –
Pregnancies
Glucose level
Blood Pressure
BMI (Body Mass Index)
Skinfold thickness
Insulin value in 2 hrs.
Diabetes Pedigree function
Age
Outcome
Research Methodology
• Classification technique assigns items in the collection to target
category.
• It begins with the records whose class labels are known.
• Classification models are tested by comparing the predicted values to
known target values in a set of test data.
• Simple and efficient technique for data mining research.
Research Methodology
1. Naïve Bayes Classifier
• Based on Bayes theorem.
• Classification algorithm.
• Uses conditional independence in which attribute value is independent
• It is easy and fast to predict class of test data set.
• It performs well in case of categorical input variables.
Research Methodology
2. Decision Tree Algorithm
• Decision tree can be used in a classification or regression model.
• It works like a tree structure.
• It breaks down a big data set into smaller subsets.
• It shows all the possible outcomes and find each path to a conclusion.
• It can handle both categorical and numerical data.
Research Methodology
3. Random Forest Algorithm
• It is a classification algorithm based on many decision trees.
• It is used to obtain better predictive performance.
• We use multiple decision tree in this case.
• This algorithm runs efficiently on big data sets.
• It handles variables without deletion.
• Output is highly accurate.
Conclusion
• The system will be capable of predicting diabetes effectively, efficiently and timely.
• This will help a physician in making decisions.
• It generates results that make it closer to the real life situations.
• Huge savings in costs in terms of medical expenses
• We hope to improve the accuracy of the prediction by increasing the level of training
data.
Thank You

You might also like