Diabetes Prediction Using Data Mining

This document describes a research project aimed at predicting diabetes using data mining techniques. It discusses motivations for the research due to the growing problem of diabetes. The research methodology will collect diabetes-related data, preprocess it, use classification algorithms like Naive Bayes, Decision Trees and Random Forest to predict diabetes. Previous related work that used similar techniques is also reviewed. The predicted outcomes will help physicians make more informed decisions to potentially diagnose and manage diabetes earlier.

Uploaded by

Fariha Tabassum

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

162 views

Diabetes Prediction Using Data Mining

Uploaded by

Fariha Tabassum

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Diabetes Prediction Using

Data Mining
Prepared for:
Thesis Committee, Dept. of CSE, IUBAT

Prepared by: Supervised by:

Fahima Afroz Rozy – 17103091 Nusrath Tabassum
Fariha Tabassum – 17103092 Lecturer, Dept. of CSE, IUBAT
Contents
• Introduction
• Motivation
• Problem Statement
• Literature Review
• Research Methodology
• Conclusion
Introduction
• Data Mining
• It can be used to analyze large volume of medical data.
• In medical science data mining can be used in - predictive medicine, management
of healthcare, disease prediction etc.
• Diabetes is an incurable disease.
• Data Mining algorithm is used for testing the accuracy in predicting diabetes.
Motivation
• According to WHO, more than 422 million people are suffering from diabetes.
• Diabetes is seventh leading cause of death.
• Now the youngsters are the most affected by it.
• Type-1 diabetes- IDDM.
• Type-2 diabetes- NIDDM.
• Type-3 diabetes- Gestational diabetes.
•No permanent cure but can be balanced with proper treatment.
Problem Statement
• Diabetes is growing at an alarming rate nowadays.
• As it is incurable, but if we can predict diabetes in early stage, it can be balanced
with treatment.
• Clinical decisions are often made based on doctors’ experience rather than on the
rich database.
• Our objective of this research is to find out new features and factors that can change
the prediction of diabetes.
• The proposed system will predict a certain outcome based on a given input.
• The algorithm analyses the input and produces a prediction
Literature Review
1. “Prediction on Diabetes Using Data mining Approach” by Pardha Repalli,
Oklahoma State University .
• In this paper they have used –
Variable selection node
Cross Industry Standard Process
Decision Tree Algorithm
Regression Model
Literature Review
• Their average square error is 0.043.
• According to their research, people with age above 45 years are mostly affected by
diabetes.
• They have used already existing information in different databases to rework it into
new researches and results.
Literature Review
2. “Prediction of Diabetes Using Bayesian Network” by Mukesh Kumari, Dr. Rajan
Vohra and Anshul Arora.
• In this paper they have used –
Decision Tree Algorithm
Naïve Bayes Algorithm
Random Tree
NBTree
Weka Tool
Bayesian Network
Literature Review
• In this paper they used 206 records.
• Accuracy of Bayesian network is 99.51 which is high.
• This framework includes some initial parts, like login, enter side effects in the
system, and recommend medications etc.
• When the symptoms occur then the patient need the specialist's help but they are not
accessible because of some reason. This can be the limitation of this paper.
Research Methodology
Data Collection

Data Pre-processing

Training Classifier Test Dataset

Dataset

Positive Negative

Figure: Framework for Diabetes Prediction

Research Methodology
• The dataset we are going to use contains a record of 769 patients.
• Features are –
Pregnancies
Glucose level
Blood Pressure
BMI (Body Mass Index)
Skinfold thickness
Insulin value in 2 hrs.
Diabetes Pedigree function
Age
Outcome
Research Methodology
• Classification technique assigns items in the collection to target
category.
• It begins with the records whose class labels are known.
• Classification models are tested by comparing the predicted values to
known target values in a set of test data.
• Simple and efficient technique for data mining research.
Research Methodology
1. Naïve Bayes Classifier
• Based on Bayes theorem.
• Classification algorithm.
• Uses conditional independence in which attribute value is independent
• It is easy and fast to predict class of test data set.
• It performs well in case of categorical input variables.
Research Methodology
2. Decision Tree Algorithm
• Decision tree can be used in a classification or regression model.
• It works like a tree structure.
• It breaks down a big data set into smaller subsets.
• It shows all the possible outcomes and find each path to a conclusion.
• It can handle both categorical and numerical data.
Research Methodology
3. Random Forest Algorithm
• It is a classification algorithm based on many decision trees.
• It is used to obtain better predictive performance.
• We use multiple decision tree in this case.
• This algorithm runs efficiently on big data sets.
• It handles variables without deletion.
• Output is highly accurate.
Conclusion
• The system will be capable of predicting diabetes effectively, efficiently and timely.
• This will help a physician in making decisions.
• It generates results that make it closer to the real life situations.
• Huge savings in costs in terms of medical expenses
• We hope to improve the accuracy of the prediction by increasing the level of training
data.
Thank You

FSISAC_GenerativeAI-VendorEvaluation&QualitativeRiskAssessmentTool
No ratings yet
FSISAC_GenerativeAI-VendorEvaluation&QualitativeRiskAssessmentTool
8 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
SIT718 Assessment-Task 4-T3 2019-Amended PDF
No ratings yet
SIT718 Assessment-Task 4-T3 2019-Amended PDF
7 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
18 pages
E-Commerce Customer Prediction
No ratings yet
E-Commerce Customer Prediction
5 pages
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
No ratings yet
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
89 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
CH 6
No ratings yet
CH 6
72 pages
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
No ratings yet
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
13 pages
Random Forest
No ratings yet
Random Forest
18 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
Roadmap To Build A Machine Learning Model
No ratings yet
Roadmap To Build A Machine Learning Model
12 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Market Basket Analysis For Data Mining - Msthesis PDF
No ratings yet
Market Basket Analysis For Data Mining - Msthesis PDF
75 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
100% (1)
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
35 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Web Mining Project Document Final
No ratings yet
Web Mining Project Document Final
40 pages
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
No ratings yet
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
7 pages
Paper 4-Churn Prediction in Telecommunication PDF
No ratings yet
Paper 4-Churn Prediction in Telecommunication PDF
3 pages
Explain Machine Learning Model Using SHAP
No ratings yet
Explain Machine Learning Model Using SHAP
28 pages
Data Mining in Medicine
No ratings yet
Data Mining in Medicine
42 pages
Data Mining
No ratings yet
Data Mining
27 pages
Seminar Data Mining
No ratings yet
Seminar Data Mining
10 pages
Parkison's Diseases Prediction Using Machine Learning
No ratings yet
Parkison's Diseases Prediction Using Machine Learning
10 pages
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
No ratings yet
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
19 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
70 pages
Data Science
No ratings yet
Data Science
39 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Alzheimers Disease Detection Using Different Machine Learning Algorithms
100% (1)
Alzheimers Disease Detection Using Different Machine Learning Algorithms
7 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Deep Learning and CNNFYTGS5101-Guoyangxie
No ratings yet
Deep Learning and CNNFYTGS5101-Guoyangxie
42 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Single customer view Second Edition
From Everand
Single customer view Second Edition
Gerardus Blokdyk
No ratings yet
Big Data Analytics Complete Self-Assessment Guide
From Everand
Big Data Analytics Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
No ratings yet
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
5 pages
Prediction For Diabetes and Heart Diseas
No ratings yet
Prediction For Diabetes and Heart Diseas
10 pages
Diabetes Prediction Using Data Mining Te
No ratings yet
Diabetes Prediction Using Data Mining Te
9 pages
Fahima Afroz Rozy and Fariha Tabassum
No ratings yet
Fahima Afroz Rozy and Fariha Tabassum
52 pages
41 Manual Nissan Sentra 2013
No ratings yet
41 Manual Nissan Sentra 2013
16 pages
Mitfxcpu
No ratings yet
Mitfxcpu
50 pages
Digital Banking
100% (1)
Digital Banking
13 pages
Most Common Passwords List 2022 - Passwords Hackers Easily Guess
No ratings yet
Most Common Passwords List 2022 - Passwords Hackers Easily Guess
1 page
University List and Registered Email ID
100% (1)
University List and Registered Email ID
3 pages
Determination of Elements by Graphite Furnace Atomic Absorption Spectrometry
No ratings yet
Determination of Elements by Graphite Furnace Atomic Absorption Spectrometry
9 pages
BRKSEC-3032 (ASA Clustering Deep Dive)
No ratings yet
BRKSEC-3032 (ASA Clustering Deep Dive)
77 pages
Saroja Devi Sex StoriesMalavin Leelaikal Part5
30% (10)
Saroja Devi Sex StoriesMalavin Leelaikal Part5
34 pages
Assignment of Group Case Study 1: Submitted by
0% (1)
Assignment of Group Case Study 1: Submitted by
3 pages
ML Internship Experience
No ratings yet
ML Internship Experience
38 pages
About Me - Bow
No ratings yet
About Me - Bow
1 page
Mittal Sompura
No ratings yet
Mittal Sompura
5 pages
Chapter 1 - Part 1
No ratings yet
Chapter 1 - Part 1
8 pages
3.2 HMT-2000-4T-Series
No ratings yet
3.2 HMT-2000-4T-Series
2 pages
Week 1
No ratings yet
Week 1
2 pages
Iwsn Unit 2
No ratings yet
Iwsn Unit 2
18 pages
P Ice List: Comprehensive Range of Pipes and Fittings
No ratings yet
P Ice List: Comprehensive Range of Pipes and Fittings
12 pages
GIS 132 Substation
No ratings yet
GIS 132 Substation
63 pages
Certificate / Certificat Zertifikat /: Series 8320 Solenoid Valves Asco, L.P. Florham Park, NJ - USA
No ratings yet
Certificate / Certificat Zertifikat /: Series 8320 Solenoid Valves Asco, L.P. Florham Park, NJ - USA
2 pages
Grade X - Domestic Circuit
No ratings yet
Grade X - Domestic Circuit
3 pages
Kellytruck Brochure
No ratings yet
Kellytruck Brochure
12 pages
Awesome Charts Documentation: Table of Content
No ratings yet
Awesome Charts Documentation: Table of Content
22 pages
Absence Management Explorer-EXAMEN
100% (3)
Absence Management Explorer-EXAMEN
7 pages
Oca Java Se 8 Exam Chapter 2 Operators Statements
No ratings yet
Oca Java Se 8 Exam Chapter 2 Operators Statements
63 pages
Careers in STEM - Updated
No ratings yet
Careers in STEM - Updated
19 pages
MDF LIGHT (Fabricate On Site)
No ratings yet
MDF LIGHT (Fabricate On Site)
1 page
Method Statement For Installation of Epoxy Grout Sleeve
No ratings yet
Method Statement For Installation of Epoxy Grout Sleeve
5 pages
White Paper Casper
No ratings yet
White Paper Casper
30 pages
Lecture No.45 Data Structures: Dr. Sohail Aslam
No ratings yet
Lecture No.45 Data Structures: Dr. Sohail Aslam
54 pages