DIAPRO - Diabetes Prediction Application
DIAPRO - Diabetes Prediction Application
Project
Branch : CSE
Introduction
1 Analysis Of Our Project Title
Abstract
“
⊹ Diabetes is a chronic disease with the potential to cause a worldwide health care crisis. According to the International
Diabetes Federation 382 million people are living with diabetes across the whole world. By 2035, this will be doubled
as 592 million. Diabetes mellitus or simply diabetes is a disease caused due to the increased level of blood glucose.
Various traditional methods, based on physical and chemical tests, are available for diagnosing diabetes. However, early
prediction of diabetes is quite a challenging task for medical practitioners due to complex interdependence on various
factors as diabetes affects human organs such as kidney, eye, heart, nerves, foot etc. Data science methods have the
potential to benefit other scientific fields by shedding new light on common questions. One such task is to help make
predictions on medical data. Machine learning is an emerging scientific field in data science dealing with the ways in
which machines learn from experience. The aim of this project is to develop a system which can perform early
prediction of diabetes for a patient with a higher accuracy by combining the results of different machine learning
techniques. This project aims to predict diabetes via 10 different supervised & Ensemble Machine Learning methods
including: SVM, K Nearest Neighbor, Naive Bayes, Logistic Regression, Random Forest Classifier, AdaBoost,
XgBoost, Gradient Boost, LightGBM, Extra Tree Classifier. This project also aims to propose an effective technique for
earlier detection of the diabetes disease.
4
Proposed System
⊹ The whole project will be completed in 3 complex
steps
⊹ a. Creating a model using machine learning
⊹ b. Creating a web app using flask and connecting it
with model
⊹ c. Now, uploading project to GitHub, then connect
Heroku with your GitHub account. Name your
application – Click on Deploy Branch. Wahoo!! our
application on fly now.
5
⊹ Classification is one of the most important decision making techniques in
many real world problems.
⊹ In this work, the main objective is to classify the data as diabetic or non-
diabetic and improve the classification accuracy. For many classification
problems, the higher number of samples chosen doesn't leads to higher
classification accuracy.
⊹ In many cases, the performance of algorithms is high in the context of speed
but the accuracy of data classification is low. The main objective of our model
is to achieve high accuracy.
⊹ Classification accuracy can be increased if we use much of the data set for
training and few data sets for testing. This survey has analyzed various
classification techniques for classification of diabetic and non-diabetic data.
Thus, it is observed that techniques like Gradient Boosting & K nearest
6
Neighbor are most suitable for implementing the Diabetes prediction system.
Current System and its
limitations
Existing problems |
purposed System
⊹ Still no effective ⊹ To develop a intelligent
system to classify pd
solution
patients.
⊹ Time consuming ⊹ To contribute in medical
clinical analysis sector
⊹ High cost ⊹ Reduce the cost of overall
⊹ Experienced clinical analysis
⊹ Diagnose patient in early
manpower
stages
⊹ Reduce mortality rate 7
Hardware and Software
Requirements
a) Python programming language.
b) Jupyter Notebook.
c) Google Colab.
D) Windows 7 / 10 Operating System.
E) RAM minimum 4Gb.
8
System Flow Chart
9
Overall
Workflow
Classification
Naïve Bayes
Feature Extraction Data Pre-Processing Feature
Data Standardization Selection Logistic
Regression
K – nearest
Ensemble neighbors
Voting Random Forest
SVM (Linear)
Result
Fig : Graphical SVM (RBF)