This document describes a research project aimed at predicting diabetes using data mining techniques. It discusses motivations for the research due to the growing problem of diabetes. The research methodology will collect diabetes-related data, preprocess it, use classification algorithms like Naive Bayes, Decision Trees and Random Forest to predict diabetes. Previous related work that used similar techniques is also reviewed. The predicted outcomes will help physicians make more informed decisions to potentially diagnose and manage diabetes earlier.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
162 views
Diabetes Prediction Using Data Mining
This document describes a research project aimed at predicting diabetes using data mining techniques. It discusses motivations for the research due to the growing problem of diabetes. The research methodology will collect diabetes-related data, preprocess it, use classification algorithms like Naive Bayes, Decision Trees and Random Forest to predict diabetes. Previous related work that used similar techniques is also reviewed. The predicted outcomes will help physicians make more informed decisions to potentially diagnose and manage diabetes earlier.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17
Diabetes Prediction Using
Data Mining Prepared for: Thesis Committee, Dept. of CSE, IUBAT
Prepared by: Supervised by:
Fahima Afroz Rozy – 17103091 Nusrath Tabassum Fariha Tabassum – 17103092 Lecturer, Dept. of CSE, IUBAT Contents • Introduction • Motivation • Problem Statement • Literature Review • Research Methodology • Conclusion Introduction • Data Mining • It can be used to analyze large volume of medical data. • In medical science data mining can be used in - predictive medicine, management of healthcare, disease prediction etc. • Diabetes is an incurable disease. • Data Mining algorithm is used for testing the accuracy in predicting diabetes. Motivation • According to WHO, more than 422 million people are suffering from diabetes. • Diabetes is seventh leading cause of death. • Now the youngsters are the most affected by it. • Type-1 diabetes- IDDM. • Type-2 diabetes- NIDDM. • Type-3 diabetes- Gestational diabetes. •No permanent cure but can be balanced with proper treatment. Problem Statement • Diabetes is growing at an alarming rate nowadays. • As it is incurable, but if we can predict diabetes in early stage, it can be balanced with treatment. • Clinical decisions are often made based on doctors’ experience rather than on the rich database. • Our objective of this research is to find out new features and factors that can change the prediction of diabetes. • The proposed system will predict a certain outcome based on a given input. • The algorithm analyses the input and produces a prediction Literature Review 1. “Prediction on Diabetes Using Data mining Approach” by Pardha Repalli, Oklahoma State University . • In this paper they have used – Variable selection node Cross Industry Standard Process Decision Tree Algorithm Regression Model Literature Review • Their average square error is 0.043. • According to their research, people with age above 45 years are mostly affected by diabetes. • They have used already existing information in different databases to rework it into new researches and results. Literature Review 2. “Prediction of Diabetes Using Bayesian Network” by Mukesh Kumari, Dr. Rajan Vohra and Anshul Arora. • In this paper they have used – Decision Tree Algorithm Naïve Bayes Algorithm Random Tree NBTree Weka Tool Bayesian Network Literature Review • In this paper they used 206 records. • Accuracy of Bayesian network is 99.51 which is high. • This framework includes some initial parts, like login, enter side effects in the system, and recommend medications etc. • When the symptoms occur then the patient need the specialist's help but they are not accessible because of some reason. This can be the limitation of this paper. Research Methodology Data Collection
Data Pre-processing
Training Classifier Test Dataset
Dataset
Positive Negative
Figure: Framework for Diabetes Prediction
Research Methodology • The dataset we are going to use contains a record of 769 patients. • Features are – Pregnancies Glucose level Blood Pressure BMI (Body Mass Index) Skinfold thickness Insulin value in 2 hrs. Diabetes Pedigree function Age Outcome Research Methodology • Classification technique assigns items in the collection to target category. • It begins with the records whose class labels are known. • Classification models are tested by comparing the predicted values to known target values in a set of test data. • Simple and efficient technique for data mining research. Research Methodology 1. Naïve Bayes Classifier • Based on Bayes theorem. • Classification algorithm. • Uses conditional independence in which attribute value is independent • It is easy and fast to predict class of test data set. • It performs well in case of categorical input variables. Research Methodology 2. Decision Tree Algorithm • Decision tree can be used in a classification or regression model. • It works like a tree structure. • It breaks down a big data set into smaller subsets. • It shows all the possible outcomes and find each path to a conclusion. • It can handle both categorical and numerical data. Research Methodology 3. Random Forest Algorithm • It is a classification algorithm based on many decision trees. • It is used to obtain better predictive performance. • We use multiple decision tree in this case. • This algorithm runs efficiently on big data sets. • It handles variables without deletion. • Output is highly accurate. Conclusion • The system will be capable of predicting diabetes effectively, efficiently and timely. • This will help a physician in making decisions. • It generates results that make it closer to the real life situations. • Huge savings in costs in terms of medical expenses • We hope to improve the accuracy of the prediction by increasing the level of training data. Thank You