0% found this document useful (0 votes)

789 views25 pages

Diabetes Prediction System

This document describes a project to develop a Diabetes Prediction System using machine learning techniques. The system aims to predict diabetes earlier for better patient outcomes. Several machine learning classification models including KNN, Logistic Regression, Decision Tree, SVM, Gradient Boosting, and Random Forest are trained and compared on a diabetes dataset. The Random Forest model achieved the highest accuracy for diabetes prediction. The project aims to help medical professionals detect diabetes early to help patients better manage the disease.

Uploaded by

Yeswanth C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

789 views25 pages

Diabetes Prediction System

Uploaded by

Yeswanth C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Diabetes Prediction System

BMS INSTITUTE OF TECHNOLOGY & MANAGEMENT

YELAHANKA, BENGALURU - 560064

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROJECT BASED LEARNING

2020-21 Even Semester

Report of System Software and Compilers – 18CS61 project work

“Diabetes Prediction System”

Submitted By
YESWANTH C 1BY18CS192
SHARATH MK 1BY18CS149
VACHAN C RANNORE 1BY18CS182

Under the guidance of

Dr.Archana.R.A Mrs.Mari Kirthima

(Assistant Professor) (Assistant Professor)

2020-2021

BMSIT&M Department of CSE(2020-21) Page 1

Diabetes Prediction System

INSTITUTE VISION
To emerge as one of the finest technical institutions of higher learning, to
develop engineering professionals who are technically competent, ethical and
environment friendly for betterment of the society.

INSTITUTE MISSION
Accomplish stimulating learning environment through high quality academic
instruction, innovation and industry-institute interface.

DEPARTMENT VISION
To develop technical professionals acquainted with recent trends and
technologies of computer science to serve as valuable resource for the
nation/society.

DEPARTMENT MISSION
Facilitating and exposing the students to various learning opportunities through
dedicated academic teaching, guidance and monitoring.

PROGRAM EDUCATIONAL OBJECTIVES

1. Lead a successful career by designing, analyzing and solving various
problems in the field of Computer Science & Engineering.
2. Pursue higher studies for enduring edification.
3. Exhibit professional and team building attitude along with effective
communication.
4. Identify and provide solutions for sustainable environmental
development.

BMSIT&M Department of CSE(2020-21) Page 2

Diabetes Prediction System

Web Technology and its applications– 18CS63 - Course Outcomes (COs)

w.r.t this PBL
CO 1 CO DEFINED
Explain software system

Subject Name– Code - Course Outcomes (COs) w.r.t this PBL

CO 5 CO DEFINED
Inspect JavaScript frameworks like jQuery and Backbonewith facilities
developer to focus on core features.

Project to Program Outcomes (PO) Mapping

Project Name: Diabetes Prediction System
COURSE PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
SSCD ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
WTA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

Program outcomes (POs):

PO1 Engineering knowledge: Apply the knowledge of Mathematics, Science,
Engineering fundamentals and an engineering specialization to the solution of
complex engineering problems
PO2 Problem analysis: Identify, formulate, review research literature, and analyse
complex Engineering problems reaching substantiated conclusions using first
principles of mathematics, Natural sciences and engineering sciences
PO3 Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs
with appropriate consideration for the public health and safety, and the cultural,
societal, and environmental considerations.
PO4 Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of
data, and synthesis of the Information to provide valid conclusions
PO5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern Engineering and IT tools including prediction and modelling to complex
engineering activities with an understanding of the limitations.
PO6 The engineer and society: Apply reasoning informed by the contextual knowledge
to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.
PO7 Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for Sustainable development
PO8 Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO9 Individual and team work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings
PO10 Communication: Communicate effectively on complex engineering activities with
the engineering Community and with society at large, such as, being able to

BMSIT&M Department of CSE(2020-21) Page 3

Diabetes Prediction System

comprehend and write effective reports And design documentation, make effective
presentations, and give and receive clear instructions.
PO11 Project management and finance: Demonstrate knowledge and understanding of
the Engineering and management principles and apply these to one’s own work, as a
member and Leader in a team, to manage projects and in multidisciplinary
environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.

Project to Program Specific Outcomes (PSO) Mapping

Project Name: Diabetes Prediction System

COURSE PSO1 PSO2

SSCD ✓ ✓
WTA ✓ ✓

Program Specific Outcomes (PSOs):

PSO1 Analyze the problem and identify computing requirements appropriate to its
solution.
PSO2 Apply design and development principles in the construction of software systems of
varying complexity.

BMSIT&M Department of CSE(2020-21) Page 4

Diabetes Prediction System

Table Of Contents:

Contents Page no.

1. Abstract 6

2. Motivation 6

3. Intoduction 7

4. Existing system 8

5. Proposed system 9

6. System requirement specifications 10

7. Proposed Methodology 12

8. Outputs 21

9. Conclusion 23

10. Reference 25

BMSIT&M Department of CSE(2020-21) Page 5

Diabetes Prediction System

1.ABSTRACT

Diabetes is an illness caused because of high glucose level in a human body.

Diabetes should not be ignored if it is untreated then Diabetes may cause
some major issues in a person like: heart related problems, kidney problem,
blood pressure, eye damage and it can also affects other organs of human
body. Diabetes can be controlled if it is predicted earlier. To achieve this goal
this project work we will do early prediction of Diabetes in a human body or a
patient for a higher accuracy through applying, Various Machine Learning
Techniques. Machine learning techniques Provide better result for prediction
by constructing models from datasets collected from patients. In this work we
will use Machine Learning Classification and ensemble techniques on a
dataset to predict diabetes. Which are K-Nearest Neighbor (KNN), Logistic
Regression (LR), Decision Tree (DT), Support Vector Machine (SVM),
Gradient Boosting (GB) and Random Forest (RF). The accuracy is different for
every model when compared to other models. The Project work gives the
accurate or higher accuracy model shows that the model is capable of
predicting diabetes effectively. Our Result shows that Random Forest
achieved higher accuracy compared to other machine learning techniques.

2.MOVITATION

The drastic increase in diabetes requires a new research. The main source of
motivation is the current state of diabetic people suffering from this disease. Lifestyle
is the main cause of diabetes type 2. We want to create a system which could act as
a source for medical professionals to detect diabetes on time. So possibly the patient
can manage his/her diabetes effectively.

BMSIT&M Department of CSE(2020-21) Page 6

Diabetes Prediction System

3.INTRODUCTION

Diabetes is noxious diseases in the world. Diabetes caused because of

obesity or high blood glucose level, and so forth. It affects the hormone
insulin, resulting in abnormal metabolism of crabs and improves level of sugar
in the blood. Diabetes occurs when body does not make enough insulin.
According to (WHO) World Health Organization about 422 million people
suffering from diabetes particularly from low or idle income countries. And this
could be increased to 490 billion up to the year of 2030.

However prevalence of diabetes is found among various Countries like

Canada, China, and India etc. Population of India is now more than 100
million so the actual number of diabetics in India is 40 million. Diabetes is
major cause of death in the world. Early prediction of disease like diabetes
can be controlled and save the human life. To accomplish this, this work
explores prediction of diabetes by taking various attributes related to diabetes
disease. For this purpose we use the Pima Indian Diabetes Dataset, we apply
various Machine Learning classification and ensemble Techniques to predict
diabetes.

Machine Learning Is a method that is used to train computers or machines

explicitly. Various Machine Learning Techniques provide efficient result to
collect Knowledge by building various classification and ensemble models
from collected dataset. Such collected data can be useful to predict diabetes.
Various techniques of Machine Learning can capable to do prediction,
however it’s tough to choose best technique. Thus for this purpose we apply
popular classification and ensemble methods on dataset for prediction.

BMSIT&M Department of CSE(2020-21) Page 7

Diabetes Prediction System

4.EXISTING SYSTEM

The existing system consists of the project model that can calculate only some
particular parameters and not taking into considerations the all remaining parameters
and networks. The available smart watches are very expensive and specifically they
are not at all available for hardware usage purposes and for daily waged activities.
Other disadvantage is that the systems need to be connected very invasively in
order to work properly. There is complete absence of automated systems in the
currently existing model system.

We are only watching for limited machine learning techniques with the help of which
this paper cannot accurately determine the diabetes prediction process. So
therefore, this paper need somewhat modification. Although the process of
xgboosting is very much tough compared to those such as decision-based tree
technique, support vector listing methods and random forest consisting of linear
regression techniques. Profile creation of the clients and the patients and their
storage management everything includes the use of real time communication.

The E-heath scheme manages the real time retrieval and gathering of database
information. The application services consist of three main parts the web services,
emergency response systems and the hospital services. Oximeter comes under the
information perception tasks.

Fig: Architecture Diagram

BMSIT&M Department of CSE(2020-21) Page 8

Diabetes Prediction System

5.PROPOSED SYSTEM

.Fig: Different Phases

The proposed work roles through the diabetes prediction where our purpose will be
dealing with the pima Indian diabetes dataset to predict weather a human will suffer
from diabetes or not based on the values as per his/her dataset. The diabetes
dataset we are dealing with has somewhat 768 datapoints Range and 9 features.
The result we need to get is in binary format 0 or 1. 0 denotes that the person will
not suffer from diabetes and 1 means he/she will suffer from diabetes. From out of
these 768 datapoints 500 are marks 0 and rest 268 as 1. Considering mainly on the
train test splitting of the accumulated datasets to determine the individual
contribution of each data values. Training of data segments is very much vital
because it ensures the stability of the data contains from the accumulated data to
avoid data redundancy and to increase the overall efficiency of the system
algorithms. Importing of the necessary training data files is done prior to the
beginning of the code segments and then the result is sorted in a separate
database which is further sent for validation approval followed by the splitting of the
overall trained attributes which are further stable.

BMSIT&M Department of CSE(2020-21) Page 9

Diabetes Prediction System

6.SYSTEM REQUIREMENT SPECIFICATIONS

In the design and development of the architecture for the diabetes management
system, the clinical requirements and design analysis of the system were based on
discussions with collaborators from the Department of Nutrition and Food Science of
the University of Ghana and Kwame Nkrumah University of Science and Technology
(KNUST). From these discussions, the diet type of patients was determined to be an
essential approach suitable for the diabetes management system. The following
functionalities were mentioned: (1) Scheduling and reminding diabetic patients to
take their medication and blood glucose readings, (2) recommending healthy meals
for diabetics to keep their blood glucose levels in check, (3) encouraging and
tracking the activity of diabetic patients, (4) providing a visual interface to help them
make meaning of their readings and establishing a sufficient connection between the
doctor and the diabetic patient using e-mail.

Providing the diabetic patient with a data visualization tool to display the data in
tables, charts, and an educational program for newly diagnosed and ongoing
diabetes treatment is valuable for the treatment and management of diabetes.

SYSTEM ARCHITECTURE

The system architecture for the Diabetes Management System presented below in
Figure 1 is the conceptual model that defines the structure, behavioural interactions,
and multiple system views that underpins the system development. It presents the
formal descriptions of the systems captured graphically that supports reasoning, and
the submodules developed as well as the dataflows between the developed
modules.

BMSIT&M Department of CSE(2020-21) Page 10

Diabetes Prediction System

Fig: System architecture for the implemented system with all submodules.

BMSIT&M Department of CSE(2020-21) Page 11

Diabetes Prediction System

7.PROPOSED METHODOLOGY

Goal of the paper is to investigate for model to predict diabetes with better accuracy.
We experimented with different classification and ensemble algorithms to predict
diabetes. In the following, we briefly discuss the phase.

Fig: Comparing Glucose with the outcome.

A. Dataset Description- the data is gathered from UCI repository which is named
as Pima Indian Diabetes Dataset. The dataset have many attributes of 768
patients.
Table: Dataset Description

Sno. Attributes
1. Pregnancy
2. Glucose
3. Blood Pressure
4. Skin Thickness
5. Insulin
6. BMI(Body Mass Index)
7. Diabetes Pedigree Function
8. Age

BMSIT&M Department of CSE(2020-21) Page 12

Diabetes Prediction System

The 9th attribute is class variable of each data points. This class variable shows
the outcome 0 and 1 for diabetics which indicates positive or negative for
diabetics.

Fig: 1v1 characteristics.

BMSIT&M Department of CSE(2020-21) Page 13

Diabetes Prediction System

Distribution of Diabetic patient- We made a model to predict diabetes however

the dataset was slightly imbalanced having around 500 classes labeled as 0
means negative means no diabetes and 268 labeled as 1 means positive means
diabetic.

Fig: Ratio of Diabetic and Non Diabetic Patient

Fig: Corelation matrix between the parameters.

A correlation matrix is simply a table which displays the correlation. The measure is best
used in variables that demonstrate a linear relationship between each other. The fit of the
data can be visually represented in a scatterplot.

BMSIT&M Department of CSE(2020-21) Page 14

Diabetes Prediction System

B. Data preprocessing:- is most important process. Mostly healthcare related data

contains missing vale and other impurities that can cause effectiveness of data.
To improve quality and effectiveness obtained after mining process, Data
preprocessing is done. To use Machine Learning Techniques on the dataset
effectively this process is essential for accurate result and successful prediction.
For Pima Indian diabetes dataset we need to perform pre processing in two steps.

1). Missing Values removal- Remove all the instances that have zero (0) as
worth. Having zero as worth is not possible. Therefore this instance is eliminated.
Through eliminating irrelevant features/instances we make feature subset and this
process is called features subset selection, which reduces diamentonality of data
and help to work faster.

2). Splitting of data- After cleaning the data, data is normalized in training and
testing the model. When data is spitted then we train algorithm on the training data
set and keep test data set aside. This training process will produce the training
model based on logic and algorithms and values of the feature in training data.
Basically aim of normalization is to bring all the attributes under same scale.

Fig: Feature Importance.

C. Apply Machine Learning- When data has been ready we apply Machine Learning
Technique. We use different classification and ensemble techniques, to predict
diabetes. The methods applied on Pima Indians diabetes dataset. Main objective to

BMSIT&M Department of CSE(2020-21) Page 15

Diabetes Prediction System

apply Machine Learning Techniques to analyze the performance of these methods

and find accuracy of them, and also been able to figure out the
responsible/important feature which play a major role in prediction. The Techniques
are follows:-

1. Support Vector Machine- Support Vector Machine also known as svm is a

supervised machine learning algorithm. Svm is most popular classification
technique. Svm creates a hyperplane that separate two classes. It can create a
hyperplane or set of hyperplane in high dimensional space. This hyper plane
can be used for classification or regression also. Svm differentiates instances in
specific classes and can also classify the entities which are not supported by
data. Separation is done by through hyperplane performs the separation to the
closest training point of any class.
Algorithm-
• Select the hyper plane which divides the class better.
• To find the better hyper plane you have to calculate the distance between the
planes and the data which is called Margin.
• If the distance between the classes is low then the chance of miss
conception is high and vice versa. So we need to
• Select the class which has the high margin. Margin = distance to positive
point + Distance to negative point.
2. K-Nearest Neighbor - KNN is also a supervised machine learning algorithm.
KNN helps to solve both the classification and regression problems. KNN is lazy
prediction technique.KNN assumes that similar things are near to each other.
Many times data points which are similar are very near to each other.KNN helps
to group new work based on similarity measure.KNN algorithm record all the
records and classify them according to their similarity measure. For finding the
distance between the points uses tree like structure. To make a prediction for a
new data point, the algorithm finds the closest data points in the training data
set — it’s nearest neighbors. Here K= Number of nearby neighbors, it’s always
a positive integer. Neighbor’s value is chosen from set of class. Closeness is
mainly defined in terms of Euclidean distance. The Euclidean distance between

BMSIT&M Department of CSE(2020-21) Page 16

Diabetes Prediction System

two points P and Q i.e. P (p1,p2, …. Pn) and Q (q1, q2,..qn) is defined by the
following equation:-

d(P,Q) = summation of (Pi-Qi)^2

Algorithm-
• Take a sample dataset of columns and rows named as Pima Indian Diabetes
data set.
• Take a test dataset of attributes and rows.
• Find the Euclidean distance by the help of formula:
• Then, Decide a random value of K. is the no. of nearest neighbors
• Then with the help of these minimum distance and Euclidean distance find
out the nth column of each.
• Find out the same output values.
If the values are same, then the patient is diabetic, otherwise not.

3. Decision Tree- Decision tree is a basic classification method. It is supervised

learning method. Decision tree used when response variable is categorical.
Decision tree has tree like structure based model which describes classification
process based on input feature. Input variables are any types like graph, text,
discrete, continuous etc. Steps for Decision Tree
Algorithm-
• Construct tree with nodes as input feature.
• Select feature to predict the output from input feature whose information gain
is highest.
• The highest information gain is calculated for each attribute in each node of
tree.
• Repeat step 2 to form a subtree using the feature which is not used in above
node.

BMSIT&M Department of CSE(2020-21) Page 17

Diabetes Prediction System

4. Logistic Regression- Logistic regression is also a supervised learning

classification algorithm. It is used to estimate the probability of a binary
response based on one or more predictors. They can be continuous or discrete.
Logistic regression used when we want to classify or distinguish some data
items into categories.
It classify the data in binary form means only in 0 and 1 which refer case to
classify patient that is positive or negative for diabetes.
Main aim of logistic regression is to best fit which is responsible for describing
the relationship between target and predictor variable. Logistic regression is a
based on Linear regression model. Logistic regression model uses sigmoid
function to predict probability of positive and negative class.
Sigmoid function P = 1/1+e - (a+bx) Here P = probability, a and b = parameter
of Model.
Ensembling- Ensembling is a machine learning technique Ensemble means
using multiple learning algorithms together for some task. It provides better
prediction than any other individual model that’s why it is used. The main cause
of error is noise bias and variance, ensemble methods help to reduce or
minimize these errors. There are two popular ensemble methods such as –
Bagging, Boosting, ada-boosting, Gradient boosting, voting, averaging etc. Here
In these work we have used Bagging (Random forest) and Gradient boosting
ensemble methods for predicting diabetes.

5. Random Forest – It is type of ensemble learning method and also used for
classification and regression tasks. The accuracy it gives is grater then
compared to other models. This method can easily handle large datasets.
Random Forest is developed by Leo Bremen. It is popular ensemble Learning
Method. Random Forest Improve Performance of Decision Tree by reducing
variance. It operates by constructing a multitude of decision trees at training
time and outputs the class that is the mode of the classes or classification or
mean prediction (regression) of the individual trees.
Algorithm-
• The first step is to select the “R” features from the total features “m” where
R<<M.

BMSIT&M Department of CSE(2020-21) Page 18

Diabetes Prediction System

• Among the “R” features, the node using the best split point.
• Split the node into sub nodes using the best split.
• Repeat a to c steps until ”l” number of nodes has been reached.
• Built forest by repeating steps a to d for “a” number of times to create “n”
number of trees.

The first step is to need the take a glance at choices and use the foundations of
each indiscriminately created decision tree to predict the result and stores the
anticipated outcome at intervals the target place. Secondly, calculate the votes for
each predicted target and ultimately, admit the high voted predicted target as a
result of the ultimate prediction from the random forest formula. Some of the
options of Random Forest does correct predictions result for a spread of
applications are offered.

Fig: Algorithm’s accuracies.

6. Gradient Boosting - Gradient Boosting is most powerful ensemble technique

used for prediction and it is a classification technique. It combine week learner
together to make strong learner models for prediction. It uses Decision Tree

BMSIT&M Department of CSE(2020-21) Page 19

Diabetes Prediction System

model. it classify complex data sets and it is very effective and popular method.
In gradient boosting model performance improve over iterations.
Algorithms:
• Consider a sample of target values as P
• Estimate the error in target values.
• Update and adjust the weights to reduce error M.
• P[x] =p[x] +alpha M[x]
• Model Learners are analyzed and calculated by loss function F
• Repeat steps till desired & target result P.

Fig: Overview of the Process

Fig: Cross-Validates classification metrics.

BMSIT&M Department of CSE(2020-21) Page 20

Diabetes Prediction System

Fig: PIMA Indian Dataset.

8.OUTPUTS.
1. Home Page:

BMSIT&M Department of CSE(2020-21) Page 21

Diabetes Prediction System

2. With values added:

3. Final Output:

BMSIT&M Department of CSE(2020-21) Page 22

Diabetes Prediction System

9.CONCLUSION
This research paper has presented a meal recommendation system with food
recognition capabilities which focused on generating daily personalized meal plans
for the users, according to their nutritional necessities and previous meal
preferences. The reviewed literature presented some gaps which informed the
design and development of an integrated diabetes management platform for patients
using K-Nearest Neighbour (KNN) algorithm, a supervised machine learning model
for food recommendation system for diabetics, (2) scheduling and reminding diabetic
patients to take their medication and blood glucose readings for doctor’s intervention
via mobile app, (3) encouraging and tracking the activity of diabetic patients, and (4)
providing an interactive visual interface to help them make meaning of their readings
and establishing a sufficient connection between the doctor and the diabetic patient
using e-mail and chatbots. These integrated technologies present state-of-the-art
solutions for the effective management of diabetes. This research paper required us
to provide a framework with a user-friendly interface for people with diabetes to
monitor their diet, medication, and activity levels. The task has been solved using
state of the art algorithms in artificial intelligence. The proposed framework factors
the diabetes management problem into subgoals: building a Tensorflow neural
network model for food classification; thus, it allows users to upload an image to
determine if a meal is recommended for consumption; implementing K-Nearest
Neighbour (KNN) algorithm to recommend meals; using cognitive sciences to build a
diabetes question and answer chatbot; tracking user activity, user geolocation and
generating pdfs of logged blood sugar readings. The food recognition model was
evaluated with cross-entropy metrics that support validation using neural networks
with a backpropagation algorithm. The model learned features of the images fed
from local Ghanaian dishes with specific nutritional value and essence in managing
diabetics and provided accurate image classification with given labels and
corresponding accuracy. The model achieved specified goals by predicting with high
accuracy, labels of unseen new images. The food recognition and classification
model achieved over 95% accuracy levels for specific calorie intakes. The
performance of the meal recommender model and question and answer chatbot was
tested with a designed cross-platform user-friendly interface using Cordova and Ionic
Frameworks for software development for both mobile and web applications. The

BMSIT&M Department of CSE(2020-21) Page 23

Diabetes Prediction System

system recommended meals to meet the calorific needs of users successfully using
KNN (with k = 5) and answered questions asked in a human-like way. The
implemented system would solve the problem of managing activity, dieting
recommendations, and medication notification for diabetics. The critical limitation of
this work is that it does not address corresponding hardware modules for insulin
pumps and control, as discussed by others in the review, and that may constitute a
fatal limitation since insulin control is crucial. It concentrates principally on
developing software for diabetes management with a machine learning algorithm.
Other supervised and unsupervised machine learning algorithms, such as Support
Vector Machines, random forests, K-Means, and Fuzzy C-Means, could be explored
as well. Finally, there is hope that this system will be useful to people with diabetes
now and in the future. The focus of this work has been on implementing a software
system that will take into consideration the various factors that affect diabetics. The
most crucial issue was to get different models to work; consequently, there are
improvements to make. The nonfatal limitations include a lack of wearables for
physical activity tracking and associated model to determine the number of calories
burned from each activity undertaken. The present system tracks the user’s walk and
saves the route but does not relate the saved route to the calories burned. In the
future, the calories burned would be determined, and the various modules will work
together to predict the user’s future blood glucose readings.

BMSIT&M Department of CSE(2020-21) Page 24

Diabetes Prediction System

10.REFERENCES

• Stackoverflow
• Flask
• Debadri Dutta, Debpriyo Paul, Parthajeet Ghosh, "Analyzing Feature
Importance’s for Diabetes Prediction using Machine Learning". IEEE, pp 942-
928, 2018.
• Md. Faisal Faruque, Asaduzzaman, Iqbal H. Sarker, "Performance Analysis of
Machine Learning Techniques to Predict Diabetes Mellitus". International
Conference on Electrical, Computer and Communication Engineering
(ECCE), 7-9 February, 2019.

• Tejas N. Joshi, Prof. Pramila M. Chawan, "Diabetes Prediction Using Machine

Learning Techniques".Int. Journal of Engineering Research and Application,
Vol. 8, Issue 1, (Part -II) January 2018, pp.-09-13

BMSIT&M Department of CSE(2020-21) Page 25

CS 511 MJP ADWT Slips
No ratings yet
CS 511 MJP ADWT Slips
25 pages
P Reportdb
No ratings yet
P Reportdb
34 pages
Screening
100% (1)
Screening
40 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
Unit 4
No ratings yet
Unit 4
29 pages
Internship Report DiabetesPrediction
No ratings yet
Internship Report DiabetesPrediction
15 pages
Major Project
No ratings yet
Major Project
53 pages
Hypothesis Testing
100% (2)
Hypothesis Testing
16 pages
Types of Sampling
No ratings yet
Types of Sampling
6 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
6 2D Platformer Report PDF
No ratings yet
6 2D Platformer Report PDF
68 pages
Final Year Project Presentation Blood Donation App
No ratings yet
Final Year Project Presentation Blood Donation App
28 pages
Uber Data Analysis
100% (4)
Uber Data Analysis
37 pages
Minor Project Report
0% (1)
Minor Project Report
25 pages
Multiple Disease Detection
No ratings yet
Multiple Disease Detection
79 pages
Validity Reliability
No ratings yet
Validity Reliability
25 pages
Project On ": " Diabetic Retinopathy Detection
100% (1)
Project On ": " Diabetic Retinopathy Detection
11 pages
PROJECT
No ratings yet
PROJECT
71 pages
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
Eye Disease Project Proposal
No ratings yet
Eye Disease Project Proposal
8 pages
SEPM Notes
No ratings yet
SEPM Notes
16 pages
MSBTE STE Chapter 5
No ratings yet
MSBTE STE Chapter 5
106 pages
Blood Donation Management System Report
100% (1)
Blood Donation Management System Report
29 pages
7 Types of Data
100% (1)
7 Types of Data
9 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Synopsis Diabetes Pred System ML
No ratings yet
Synopsis Diabetes Pred System ML
9 pages
Devops & Agile Programming Unit-2
No ratings yet
Devops & Agile Programming Unit-2
40 pages
Unit I Illumination and Color Models: Light Sources
No ratings yet
Unit I Illumination and Color Models: Light Sources
80 pages
Uml Diagrams of RAILWAY RESERVAION
69% (29)
Uml Diagrams of RAILWAY RESERVAION
18 pages
Prediction of Stroke Using Machine Learning
No ratings yet
Prediction of Stroke Using Machine Learning
6 pages
EPGP in Data Science Gen AI PDF
No ratings yet
EPGP in Data Science Gen AI PDF
63 pages
Col780 A1
No ratings yet
Col780 A1
4 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Stqa File
No ratings yet
Stqa File
38 pages
Cloud Computing Unit 1
No ratings yet
Cloud Computing Unit 1
12 pages
A Machine Learning Analysis of Stock Market Tick Data For Stock Price Trend Prediction
100% (1)
A Machine Learning Analysis of Stock Market Tick Data For Stock Price Trend Prediction
24 pages
C++ Project Diabetes Detection Program StdXII
No ratings yet
C++ Project Diabetes Detection Program StdXII
9 pages
Staff Circular 2023 - 12 - 22 - 029 Holiday List 2024 Staff
No ratings yet
Staff Circular 2023 - 12 - 22 - 029 Holiday List 2024 Staff
1 page
Multiple Disease Prediction Using Machine Learning
No ratings yet
Multiple Disease Prediction Using Machine Learning
4 pages
Projects 2021 B4
No ratings yet
Projects 2021 B4
96 pages
Cse320 Srs
No ratings yet
Cse320 Srs
20 pages
Unit Iii: Software Testing and Maintenance
No ratings yet
Unit Iii: Software Testing and Maintenance
34 pages
Application For The Grant of Partial Financial TN State Council - Assistance For Conference/ Seminar/ Symposia/Workshop
0% (1)
Application For The Grant of Partial Financial TN State Council - Assistance For Conference/ Seminar/ Symposia/Workshop
2 pages
Major Project Report - AKTU
No ratings yet
Major Project Report - AKTU
15 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Blood Bank Report
No ratings yet
Blood Bank Report
106 pages
Final BBMS
No ratings yet
Final BBMS
16 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
Flight Price Prediction Project Presentation
No ratings yet
Flight Price Prediction Project Presentation
15 pages
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
No ratings yet
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
37 pages
MCA211 Software Testing
No ratings yet
MCA211 Software Testing
2 pages
Salary Prediction Using Machine Learning
No ratings yet
Salary Prediction Using Machine Learning
4 pages
Deepfake Detection Synopsis
No ratings yet
Deepfake Detection Synopsis
28 pages
Chapter 8 - Software Testing
No ratings yet
Chapter 8 - Software Testing
20 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
09 117292 Final
100% (1)
09 117292 Final
8 pages
Hospital Managemen T System: Oose LAB File
No ratings yet
Hospital Managemen T System: Oose LAB File
62 pages
Application of Machine Learning in High Frequency Trading of Stocks
No ratings yet
Application of Machine Learning in High Frequency Trading of Stocks
12 pages
SOFTWARE TESTING Question Paper 21 22
No ratings yet
SOFTWARE TESTING Question Paper 21 22
3 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
CS341Tut3 PDF
100% (1)
CS341Tut3 PDF
3 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
JNTUA-B.Tech.2-2 CSE-R15-SYLLABUS PDF
No ratings yet
JNTUA-B.Tech.2-2 CSE-R15-SYLLABUS PDF
24 pages
Lecturenote - 1938410780chapter 9 - Emerging Trends in Software Engineering (Lecture 15)
No ratings yet
Lecturenote - 1938410780chapter 9 - Emerging Trends in Software Engineering (Lecture 15)
20 pages
LightGBM - An In-Depth Guide Python
No ratings yet
LightGBM - An In-Depth Guide Python
26 pages
Assignment SE
No ratings yet
Assignment SE
1 page
It8076 Software Testing
No ratings yet
It8076 Software Testing
2 pages
Self-Diagnosis With Advanced Hospital Management Abstract
No ratings yet
Self-Diagnosis With Advanced Hospital Management Abstract
4 pages
Career Guidance With AI
No ratings yet
Career Guidance With AI
10 pages
CIIE (Centre For Innovation Incubation and Entrepreneurship)
No ratings yet
CIIE (Centre For Innovation Incubation and Entrepreneurship)
2 pages
Unit: 2 (Test Case Design) : 1. Define Smart Tester
No ratings yet
Unit: 2 (Test Case Design) : 1. Define Smart Tester
5 pages
ML Unit-V
No ratings yet
ML Unit-V
161 pages
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
No ratings yet
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
7 pages
Intraday Market Preditability. A Machine Learning Approach
No ratings yet
Intraday Market Preditability. A Machine Learning Approach
56 pages
Housing Price Prediction
No ratings yet
Housing Price Prediction
87 pages
ML Unit 1
No ratings yet
ML Unit 1
27 pages
Mini Project - Merged
No ratings yet
Mini Project - Merged
48 pages
Employee Attrition Analysis of Data Driven Models
No ratings yet
Employee Attrition Analysis of Data Driven Models
10 pages
Temporal Fusion VMD Windpower
No ratings yet
Temporal Fusion VMD Windpower
18 pages
BoostingDEA and R Language
No ratings yet
BoostingDEA and R Language
8 pages
Interview - Preparation-Machine Learning Questions & Answers
No ratings yet
Interview - Preparation-Machine Learning Questions & Answers
37 pages
Supplementary Materials For: Improving Refugee Integration Through Data-Driven Algorithmic Assignment
No ratings yet
Supplementary Materials For: Improving Refugee Integration Through Data-Driven Algorithmic Assignment
37 pages
Slides
No ratings yet
Slides
39 pages
Fake News and Message Detection Project Report: September 2021
No ratings yet
Fake News and Message Detection Project Report: September 2021
13 pages
Azure AutoML
No ratings yet
Azure AutoML
28 pages
XG Boosting Reference
No ratings yet
XG Boosting Reference
6 pages
Capec
No ratings yet
Capec
18 pages
Report On ML NEW Project
No ratings yet
Report On ML NEW Project
5 pages
02 ruchiJWoo35-49
No ratings yet
02 ruchiJWoo35-49
16 pages
IQBAL Fresher 19
No ratings yet
IQBAL Fresher 19
3 pages

Diabetes Prediction System

Uploaded by

Diabetes Prediction System

Uploaded by

Diabetes Prediction System

BMS INSTITUTE OF TECHNOLOGY & MANAGEMENT

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROJECT BASED LEARNING

2020-21 Even Semester

Report of System Software and Compilers – 18CS61 project work

“Diabetes Prediction System”

Under the guidance of

Dr.Archana.R.A Mrs.Mari Kirthima

BMSIT&M Department of CSE(2020-21) Page 1

PROGRAM EDUCATIONAL OBJECTIVES

BMSIT&M Department of CSE(2020-21) Page 2

Web Technology and its applications– 18CS63 - Course Outcomes (COs)

Subject Name– Code - Course Outcomes (COs) w.r.t this PBL

Project to Program Outcomes (PO) Mapping

Program outcomes (POs):

BMSIT&M Department of CSE(2020-21) Page 3

Project to Program Specific Outcomes (PSO) Mapping

Project Name: Diabetes Prediction System

COURSE PSO1 PSO2

Program Specific Outcomes (PSOs):

BMSIT&M Department of CSE(2020-21) Page 4

Contents Page no.

6. System requirement specifications 10

BMSIT&M Department of CSE(2020-21) Page 5

Diabetes is an illness caused because of high glucose level in a human body.

BMSIT&M Department of CSE(2020-21) Page 6

Diabetes is noxious diseases in the world. Diabetes caused because of

However prevalence of diabetes is found among various Countries like

Machine Learning Is a method that is used to train computers or machines

BMSIT&M Department of CSE(2020-21) Page 7

Fig: Architecture Diagram

BMSIT&M Department of CSE(2020-21) Page 8

.Fig: Different Phases

BMSIT&M Department of CSE(2020-21) Page 9

6.SYSTEM REQUIREMENT SPECIFICATIONS

BMSIT&M Department of CSE(2020-21) Page 10

BMSIT&M Department of CSE(2020-21) Page 11

Fig: Comparing Glucose with the outcome.

BMSIT&M Department of CSE(2020-21) Page 12

Fig: 1v1 characteristics.

BMSIT&M Department of CSE(2020-21) Page 13

Distribution of Diabetic patient- We made a model to predict diabetes however

Fig: Ratio of Diabetic and Non Diabetic Patient

Fig: Corelation matrix between the parameters.

BMSIT&M Department of CSE(2020-21) Page 14

B. Data preprocessing:- is most important process. Mostly healthcare related data

Fig: Feature Importance.

BMSIT&M Department of CSE(2020-21) Page 15

apply Machine Learning Techniques to analyze the performance of these methods

1. Support Vector Machine- Support Vector Machine also known as svm is a

BMSIT&M Department of CSE(2020-21) Page 16

d(P,Q) = summation of (Pi-Qi)^2

3. Decision Tree- Decision tree is a basic classification method. It is supervised

BMSIT&M Department of CSE(2020-21) Page 17

4. Logistic Regression- Logistic regression is also a supervised learning

BMSIT&M Department of CSE(2020-21) Page 18

Fig: Algorithm’s accuracies.

6. Gradient Boosting - Gradient Boosting is most powerful ensemble technique

BMSIT&M Department of CSE(2020-21) Page 19

Fig: Overview of the Process

Fig: Cross-Validates classification metrics.

BMSIT&M Department of CSE(2020-21) Page 20

Fig: PIMA Indian Dataset.

BMSIT&M Department of CSE(2020-21) Page 21

2. With values added:

BMSIT&M Department of CSE(2020-21) Page 22

BMSIT&M Department of CSE(2020-21) Page 23

BMSIT&M Department of CSE(2020-21) Page 24

• Tejas N. Joshi, Prof. Pramila M. Chawan, "Diabetes Prediction Using Machine

BMSIT&M Department of CSE(2020-21) Page 25

You might also like