Disease Prediction Using Machine Learning
Disease Prediction Using Machine Learning
Disease Prediction Using Machine Learning
https://doi.org/10.22214/ijraset.2022.41230
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
Abstract: It is a system which provides the user the information and tricks to take care of the health system of the user and it
provides how to search out the disease using this prediction. Now a day’s health industry plays major role in curing the
diseases of the patients so this is often also some quite help for the health industry to inform the user and also it's useful for the
user just in case he/she doesn’t want to travel to the hospital or the other clinics, so just by entering the symptoms and every
one other useful information the user can get to grasp the disease he/she is affected by and also the health industry may also
get enjoy this method by just asking the symptoms from the stoner and entering within the system and in only many seconds
they'll tell the precise and over to some extent the accurate conditions. This Disease Prediction Using Machine Learning is
totally through with the assistance of Machine Learning and Python programming language and also using the dataset that's
available previously by the hospitals using that we are going to predict the diseases.
Keywords: Prediction, Decision Tree, Random forest, Naive Bayes
I. INTRODUCTION
The purpose of constructing this project called “Disease Prediction Using Machine Learning” is to predict the accurate disease of
the patient using all their general information’s and also the symptoms. If this Prediction is completed at the first stages of the
disease with the assistance of this project and every one other necessary measure disease is cured and generally this prediction
system can even be very useful in health industry. The final purpose of this Disease prediction is to supply prediction for the
assorted and customarily occurring diseases that when unchecked and sometimes ignored can turns into fatal disease and cause lot
of problem to the patient and moreover as their members of the family. this method will predict the foremost possible disease
supported the symptoms. The health industry in information yet and knowledge poor and this industry is incredibly vast industry
which has lot of labor to be done. So, with the assistance of all those algorithms, techniques and methodologies we've done this
project which is able to help the peoples who are within the need.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 417
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
3) SVM is very good when we have no idea on the data. Even with unstructured and semi structured data like text, images, and
trees SVM algorithm works well. The drawback of the SVM algorithm is that to achieve the best classification results for any
given problem, several key parameters are needed to be set correctly. Decision tree: It is easy to understand and rule decision
tree. Instability is there in decision tree, that is bulky change can be seen by minor modification in the data structure of the
optimal decision tree. They are often relatively inaccurate. I Bayes: It is robust, handles the missing values by ignoring
probability estimation calculation. Sensitive to how inputs are prepared. Prone bias when increase the number of training
dataset. ANN: Gives good prediction and easy to implement. Difficult with dealing with big data with complex models. Require
huge processing time.
4) Diabetes is caused due to the excessive amount of sugar condensed into the blood. Currently, it is considered as one of the
lethal diseases in the world. People all around the globe are affected by this severe disease knowingly or unknowingly. Other
diseases like heart attack, paralyzed, kidney disease, blindness etc. are also caused by diabetes. Numerous computer-based
detection systems were designed and outlined for anticipating and analysing diabetes. Usual identifying process for diabetic
patients needs more time and money. But with the rise of machine learning, we have that ability to develop a solution to this
intense issue. Therefore, we have developed an architecture which has the capability to predict where the patient has diabetes or
not. Our main aim of this exploration is to build a web application based on the higher prediction accuracy of some powerful
machine learning algorithm. We have used a benchmark dataset namely Pima Indian which can predict the onset of diabetes
based on diagnostics manner. With an accuracy of 82.35% prediction rate Artificial Neural Network (ANN) shows a significant
improvement of accuracy which drives us to develop an Interactive Web Application for Diabetes Prediction.
III. ALGORITHM
A. Decision Tree Algorithm
Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and
smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision
nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each
representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The
topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical
and numerical data. Depending on the take a look at outcome, the classification algorithmic rule branches towards the suitable kid
node wherever the method of take a look at and branching repeats till it reaches the leaf node . The leaf or terminal nodes
correspond to the choice outcomes. DTs are found straightforward to interpret and fast to be told, and area unit a standard element
to several medical diagnostic protocols [25]. once traversing the tree for the classification of a sample, the outcomes of all tests at
every node on the trail can offer spare data to conjecture concerning its categories. associate degree illustration of associate degree
DT with its components and rules is portrayed.
Random Forest is a supervised learning algorithm. It is an extension of machine learning classifiers which include the bagging to
improve the performance of Decision Tree. It combines tree predictors, and trees are dependent on a random vector which is
independently sampled. The distribution of all trees are the same. Random Forests splits nodes using the best among of a predictor
subset that are randomly chosen from the node itself, instead of splitting nodes based on the variables. The time complexity of the
worst case of learning with Random Forests is O(M(dnlogn)) , where M is the number of growing trees, n is the number of
instances, and d is the data dimension. It can be used both for classification and regression. It is also the most flexible and easy to
use algorithm. A forest consists of trees. It is said that the more trees it has, the more robust a forest is. Random Forests create
Decision Trees on randomly selected data samples, get predictions from each tree and select the best solution by means of voting. It
also provides a pretty good indicator of the feature importance.
B. Navie Bayes
Naive Bayes is a set of supervised learning algorithms based on the Bayes’ theorem with the “naïve” assumption of independence
between every pair of features. Despite its simplicity, it often outperforms more sophisticated classification methods. If there are
input variables x and output variable y, Bayes’ theorem states the following relationship. p(y|x) = p(y).p(x|y)/ p(x) In this project,
Gaussian Naïve Bayes algorithm has been implemented. In case of Gaussian Naïve Bayes, the likelihood of the features us assumed
to be Gaussian i.e. all continuous values x associated with class y are distributed according to Gaussian distribution. Given a
continuous attribute x in training data, the data is first segmented by the class y. Then, the mean and variance of x in each class is
computed.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 418
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
If μ be the mean of the values in x associated with class y, then let d2 be the variance of the values in x associated with class y.
Suppose there is some observation value v then, the probability distribution of v given by class y, p(x=v | y), can be computed by
plugging into the equation for a normal distribution thought-about during this figure. Thus, the chance of ‘white’ given ‘green’ is
zero.025 (1 ÷ 40) and therefore the chance of ‘white’ given ‘red’ is zero.15 (3 ÷ 20). though the previous chance indicates that the
new ‘white’ object is a lot of probably to Retain ‘ green’ class, the chance shows that it's a lot of presumably to be within the‘ red’
categories. within the theorem analysis, the ultimate classifier is created by combining each sources of knowledge (i.e., previous
chance and chance value). The ‘multiplication’ perform is employed to mix these 2 sorts of data and therefore the product is
termed the ‘posterior’ chance. Finally, the posterior chance of ‘white’ being ‘green’ is zero.017 (0.67 × 0.025) and therefore the
posterior chance of ‘white’ being ‘red’ is zero.049 (0.33 × 0.15). Thus, the new ‘white’ object ought to be category as a member of
the ‘red’ class per the NB technique.
V. RESULT
The result for this prediction system displays a convenient user interface consisting of details like name, symptoms and the
algorithm that we use to predict as a button and the results will be predicted based on the implemented algorithm.
It also displays the accuracy percentage on which algorithm has the best accuracy so based on the accuracy of the decision tree,
random forest and naive bayes algorithm random forest has the better accuracy percentage of 0.96. It is a best suited algorithm for
this model.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 419
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com
e
Fig 5. Accuracy Rate of Algorithm
VI. CONCLUSION
The Prediction Engine that allows the user to examine whether or not he/she has any unwellness or disorder supported the given
symptoms. The user interacts with the Prediction Engine by filling a collection of symptoms that holds the parameter set provided
as associate input to the trained models. The Prediction Engine makes use of 3 algorithms to predict the presence of a unwellness
namely: call Tree, Random Forest and Naive Bayes.
REFERENCES
[1] Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of unwellness mellitus in India”. AMJ, 7(1), pp. 45-48
[2] Dean, L., McEntyre, J., 2004, “The Genetic Landscape of unwellness [Internet]. Bethesda (MD): National Center for Biotechnology info (US); Chapter one,
Introduction to unwellness. 2004 Jul 7.
[3] Y. Zhang, M. Qiu, C.-W. Tsai, M. M. Hassan and A. Alamri, "HealthCPS: aid cyberphysical system power-assisted by cloud and massive data", IEEE Syst. J,
vol. 11, no. 1, pp. 88-95, Mar. 2017.
[4] Allen Daniel Sunny, Sajal Kulshreshtha, Satyam Singh, Srinabh, Mohan Ba and H Sarojadevi, "Disease identification System By Exploring Machine
Learning Algorithms", International Journal of Innovations in Engineering and Technology (IJIET), vol. 10, no. 2, May
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 420