Diabetes Prediction System
Diabetes Prediction System
Diabetes Prediction System
2020-2021
INSTITUTE VISION
To emerge as one of the finest technical institutions of higher learning, to
develop engineering professionals who are technically competent, ethical and
environment friendly for betterment of the society.
INSTITUTE MISSION
Accomplish stimulating learning environment through high quality academic
instruction, innovation and industry-institute interface.
DEPARTMENT VISION
To develop technical professionals acquainted with recent trends and
technologies of computer science to serve as valuable resource for the
nation/society.
DEPARTMENT MISSION
Facilitating and exposing the students to various learning opportunities through
dedicated academic teaching, guidance and monitoring.
comprehend and write effective reports And design documentation, make effective
presentations, and give and receive clear instructions.
PO11 Project management and finance: Demonstrate knowledge and understanding of
the Engineering and management principles and apply these to one’s own work, as a
member and Leader in a team, to manage projects and in multidisciplinary
environments.
PO12 Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
Table Of Contents:
1. Abstract 6
2. Motivation 6
3. Intoduction 7
4. Existing system 8
5. Proposed system 9
7. Proposed Methodology 12
8. Outputs 21
9. Conclusion 23
10. Reference 25
1.ABSTRACT
2.MOVITATION
The drastic increase in diabetes requires a new research. The main source of
motivation is the current state of diabetic people suffering from this disease. Lifestyle
is the main cause of diabetes type 2. We want to create a system which could act as
a source for medical professionals to detect diabetes on time. So possibly the patient
can manage his/her diabetes effectively.
3.INTRODUCTION
4.EXISTING SYSTEM
The existing system consists of the project model that can calculate only some
particular parameters and not taking into considerations the all remaining parameters
and networks. The available smart watches are very expensive and specifically they
are not at all available for hardware usage purposes and for daily waged activities.
Other disadvantage is that the systems need to be connected very invasively in
order to work properly. There is complete absence of automated systems in the
currently existing model system.
We are only watching for limited machine learning techniques with the help of which
this paper cannot accurately determine the diabetes prediction process. So
therefore, this paper need somewhat modification. Although the process of
xgboosting is very much tough compared to those such as decision-based tree
technique, support vector listing methods and random forest consisting of linear
regression techniques. Profile creation of the clients and the patients and their
storage management everything includes the use of real time communication.
The E-heath scheme manages the real time retrieval and gathering of database
information. The application services consist of three main parts the web services,
emergency response systems and the hospital services. Oximeter comes under the
information perception tasks.
5.PROPOSED SYSTEM
The proposed work roles through the diabetes prediction where our purpose will be
dealing with the pima Indian diabetes dataset to predict weather a human will suffer
from diabetes or not based on the values as per his/her dataset. The diabetes
dataset we are dealing with has somewhat 768 datapoints Range and 9 features.
The result we need to get is in binary format 0 or 1. 0 denotes that the person will
not suffer from diabetes and 1 means he/she will suffer from diabetes. From out of
these 768 datapoints 500 are marks 0 and rest 268 as 1. Considering mainly on the
train test splitting of the accumulated datasets to determine the individual
contribution of each data values. Training of data segments is very much vital
because it ensures the stability of the data contains from the accumulated data to
avoid data redundancy and to increase the overall efficiency of the system
algorithms. Importing of the necessary training data files is done prior to the
beginning of the code segments and then the result is sorted in a separate
database which is further sent for validation approval followed by the splitting of the
overall trained attributes which are further stable.
In the design and development of the architecture for the diabetes management
system, the clinical requirements and design analysis of the system were based on
discussions with collaborators from the Department of Nutrition and Food Science of
the University of Ghana and Kwame Nkrumah University of Science and Technology
(KNUST). From these discussions, the diet type of patients was determined to be an
essential approach suitable for the diabetes management system. The following
functionalities were mentioned: (1) Scheduling and reminding diabetic patients to
take their medication and blood glucose readings, (2) recommending healthy meals
for diabetics to keep their blood glucose levels in check, (3) encouraging and
tracking the activity of diabetic patients, (4) providing a visual interface to help them
make meaning of their readings and establishing a sufficient connection between the
doctor and the diabetic patient using e-mail.
Providing the diabetic patient with a data visualization tool to display the data in
tables, charts, and an educational program for newly diagnosed and ongoing
diabetes treatment is valuable for the treatment and management of diabetes.
SYSTEM ARCHITECTURE
The system architecture for the Diabetes Management System presented below in
Figure 1 is the conceptual model that defines the structure, behavioural interactions,
and multiple system views that underpins the system development. It presents the
formal descriptions of the systems captured graphically that supports reasoning, and
the submodules developed as well as the dataflows between the developed
modules.
Fig: System architecture for the implemented system with all submodules.
7.PROPOSED METHODOLOGY
Goal of the paper is to investigate for model to predict diabetes with better accuracy.
We experimented with different classification and ensemble algorithms to predict
diabetes. In the following, we briefly discuss the phase.
A. Dataset Description- the data is gathered from UCI repository which is named
as Pima Indian Diabetes Dataset. The dataset have many attributes of 768
patients.
Table: Dataset Description
Sno. Attributes
1. Pregnancy
2. Glucose
3. Blood Pressure
4. Skin Thickness
5. Insulin
6. BMI(Body Mass Index)
7. Diabetes Pedigree Function
8. Age
The 9th attribute is class variable of each data points. This class variable shows
the outcome 0 and 1 for diabetics which indicates positive or negative for
diabetics.
A correlation matrix is simply a table which displays the correlation. The measure is best
used in variables that demonstrate a linear relationship between each other. The fit of the
data can be visually represented in a scatterplot.
1). Missing Values removal- Remove all the instances that have zero (0) as
worth. Having zero as worth is not possible. Therefore this instance is eliminated.
Through eliminating irrelevant features/instances we make feature subset and this
process is called features subset selection, which reduces diamentonality of data
and help to work faster.
2). Splitting of data- After cleaning the data, data is normalized in training and
testing the model. When data is spitted then we train algorithm on the training data
set and keep test data set aside. This training process will produce the training
model based on logic and algorithms and values of the feature in training data.
Basically aim of normalization is to bring all the attributes under same scale.
C. Apply Machine Learning- When data has been ready we apply Machine Learning
Technique. We use different classification and ensemble techniques, to predict
diabetes. The methods applied on Pima Indians diabetes dataset. Main objective to
two points P and Q i.e. P (p1,p2, …. Pn) and Q (q1, q2,..qn) is defined by the
following equation:-
Algorithm-
• Take a sample dataset of columns and rows named as Pima Indian Diabetes
data set.
• Take a test dataset of attributes and rows.
• Find the Euclidean distance by the help of formula:
• Then, Decide a random value of K. is the no. of nearest neighbors
• Then with the help of these minimum distance and Euclidean distance find
out the nth column of each.
• Find out the same output values.
If the values are same, then the patient is diabetic, otherwise not.
5. Random Forest – It is type of ensemble learning method and also used for
classification and regression tasks. The accuracy it gives is grater then
compared to other models. This method can easily handle large datasets.
Random Forest is developed by Leo Bremen. It is popular ensemble Learning
Method. Random Forest Improve Performance of Decision Tree by reducing
variance. It operates by constructing a multitude of decision trees at training
time and outputs the class that is the mode of the classes or classification or
mean prediction (regression) of the individual trees.
Algorithm-
• The first step is to select the “R” features from the total features “m” where
R<<M.
• Among the “R” features, the node using the best split point.
• Split the node into sub nodes using the best split.
• Repeat a to c steps until ”l” number of nodes has been reached.
• Built forest by repeating steps a to d for “a” number of times to create “n”
number of trees.
The first step is to need the take a glance at choices and use the foundations of
each indiscriminately created decision tree to predict the result and stores the
anticipated outcome at intervals the target place. Secondly, calculate the votes for
each predicted target and ultimately, admit the high voted predicted target as a
result of the ultimate prediction from the random forest formula. Some of the
options of Random Forest does correct predictions result for a spread of
applications are offered.
model. it classify complex data sets and it is very effective and popular method.
In gradient boosting model performance improve over iterations.
Algorithms:
• Consider a sample of target values as P
• Estimate the error in target values.
• Update and adjust the weights to reduce error M.
• P[x] =p[x] +alpha M[x]
• Model Learners are analyzed and calculated by loss function F
• Repeat steps till desired & target result P.
8.OUTPUTS.
1. Home Page:
3. Final Output:
9.CONCLUSION
This research paper has presented a meal recommendation system with food
recognition capabilities which focused on generating daily personalized meal plans
for the users, according to their nutritional necessities and previous meal
preferences. The reviewed literature presented some gaps which informed the
design and development of an integrated diabetes management platform for patients
using K-Nearest Neighbour (KNN) algorithm, a supervised machine learning model
for food recommendation system for diabetics, (2) scheduling and reminding diabetic
patients to take their medication and blood glucose readings for doctor’s intervention
via mobile app, (3) encouraging and tracking the activity of diabetic patients, and (4)
providing an interactive visual interface to help them make meaning of their readings
and establishing a sufficient connection between the doctor and the diabetic patient
using e-mail and chatbots. These integrated technologies present state-of-the-art
solutions for the effective management of diabetes. This research paper required us
to provide a framework with a user-friendly interface for people with diabetes to
monitor their diet, medication, and activity levels. The task has been solved using
state of the art algorithms in artificial intelligence. The proposed framework factors
the diabetes management problem into subgoals: building a Tensorflow neural
network model for food classification; thus, it allows users to upload an image to
determine if a meal is recommended for consumption; implementing K-Nearest
Neighbour (KNN) algorithm to recommend meals; using cognitive sciences to build a
diabetes question and answer chatbot; tracking user activity, user geolocation and
generating pdfs of logged blood sugar readings. The food recognition model was
evaluated with cross-entropy metrics that support validation using neural networks
with a backpropagation algorithm. The model learned features of the images fed
from local Ghanaian dishes with specific nutritional value and essence in managing
diabetics and provided accurate image classification with given labels and
corresponding accuracy. The model achieved specified goals by predicting with high
accuracy, labels of unseen new images. The food recognition and classification
model achieved over 95% accuracy levels for specific calorie intakes. The
performance of the meal recommender model and question and answer chatbot was
tested with a designed cross-platform user-friendly interface using Cordova and Ionic
Frameworks for software development for both mobile and web applications. The
system recommended meals to meet the calorific needs of users successfully using
KNN (with k = 5) and answered questions asked in a human-like way. The
implemented system would solve the problem of managing activity, dieting
recommendations, and medication notification for diabetics. The critical limitation of
this work is that it does not address corresponding hardware modules for insulin
pumps and control, as discussed by others in the review, and that may constitute a
fatal limitation since insulin control is crucial. It concentrates principally on
developing software for diabetes management with a machine learning algorithm.
Other supervised and unsupervised machine learning algorithms, such as Support
Vector Machines, random forests, K-Means, and Fuzzy C-Means, could be explored
as well. Finally, there is hope that this system will be useful to people with diabetes
now and in the future. The focus of this work has been on implementing a software
system that will take into consideration the various factors that affect diabetics. The
most crucial issue was to get different models to work; consequently, there are
improvements to make. The nonfatal limitations include a lack of wearables for
physical activity tracking and associated model to determine the number of calories
burned from each activity undertaken. The present system tracks the user’s walk and
saves the route but does not relate the saved route to the calories burned. In the
future, the calories burned would be determined, and the various modules will work
together to predict the user’s future blood glucose readings.
10.REFERENCES
• Stackoverflow
• Flask
• Debadri Dutta, Debpriyo Paul, Parthajeet Ghosh, "Analyzing Feature
Importance’s for Diabetes Prediction using Machine Learning". IEEE, pp 942-
928, 2018.
• Md. Faisal Faruque, Asaduzzaman, Iqbal H. Sarker, "Performance Analysis of
Machine Learning Techniques to Predict Diabetes Mellitus". International
Conference on Electrical, Computer and Communication Engineering
(ECCE), 7-9 February, 2019.