Chronic Disease Prediction Using Machine Learning
Chronic Disease Prediction Using Machine Learning
https://doi.org/10.22214/ijraset.2022.46166
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII August 2022- Available at www.ijraset.com
Abstract: Technological advancement, including machine learning, has a significant impact on health by allowing for more
accurate diagnosis and treatment of various chronic diseases. Accurate prediction is critical in the biomedical and healthcare
communities for determining the risk of disease in patients. The only way to overcome chronic disease mortality is to predict it
earlier so that disease prevention can be implemented. Such a model is a Patient's requirement for which Machine Learning is
highly recommended. However, a doctor finds it difficult to make an exact forecast based just on symptoms. The most
challenging task is making an accurate diagnosis of a disease. Data mining is crucial in helping to predict the sickness and solve
this issue. Based on a dataset for chronic diseases from the UCI machine learning data warehouse, this study assesses chronic
diseases using machine learning techniques. In order to create accurate prediction models for various chronic diseases using
data mining approaches, we employ datasets for heart disease, kidney disease, cancer disease, and diabetes disease. To increase
accuracy and shorten training time, the dataset's most pertinent features are chosen. The system evaluates the user's symptoms
as input and outputs the likelihood that the disease will occur. The implementation of Logistic Regression is used to predict
disease. Prediction of diseases like diabetes, heart disease, cancer, and kidney disease using logistic regression, random forest,
and decision trees are performed. Different models, methodologies, and algorithms are utilized to forecast and analyses each
chronic disease. The study includes a conceptual model that includes the prediction of the majority of chronic diseases.
Keywords: Disease Prediction and Accuracy, Logistic Regression, Chronic Diseases, Machine Learning
I. INTRODUCTION
Machine learning is the process of programming computers to optimise their performance based on previous data or examples. The
study of computer systems that learn from data and experience is known as machine learning. ML can be supervised (i.e., output
variables are predicted from input variables) or unsupervised (i.e., output variables are not predicted from input variables) (i.e., deals
with clustering of different groups for a particular intercession). Complex models are determined using machine learning, and
medical information is extracted using ML, revealing innovative ideas to professionals and specialists. In clinical practise, machine
learning predictive models can be used to highlight stronger rules when making decisions about individual patient treatment. These
are also capable of making independent diagnoses of many diseases based on clinical guidelines. Incorporating these models into
medicine prescriptions can save doctors time and money while also providing new medical prospects for identification. Machine
learning has been demonstrated to be useful in assisting decision-making and forecasting from enormous amounts of data generated
by the healthcare business. We optimise machine learning methods for accurate chronic illness outbreak prediction. Various
research only provides a sliver of what can be done with machine learning to forecast disease. We present a unique strategy that uses
machine learning techniques such as the K-Nearest Neighbour Algorithm (KNN), Decision Trees (DT), Logistic Regression,
Random Forest, and Naive Bayes (NB) to uncover meaningful characteristics, resulting in improved disease prediction accuracy. To
improve the accuracy of the learning process, several such algorithms are used. It can then be put to the test using the datasets that
are available. The prediction model is introduced using a variety of feature combinations and well-known classification approaches.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 693
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII August 2022- Available at www.ijraset.com
Machine learning is the research and development of algorithms that can learn from and predict data. It is closely connected to (and
frequently overlaps with) computational statistics, which focuses on making predictions using computers as well. It has strong
linkages to mathematical optimization, which provides the discipline with methods, theory, and application domains. Unsupervised
learning is a subset of machine learning that focuses on exploratory data analysis and is commonly confused with data mining.
Typically, machine learning tasks are divided into numerous categories:
1) Supervised learning: The machine learning task of learning a function that translates an input to an output based on example
input-output pairs is known as supervised learning. It infers a function from a set of training examples and labelled training
data. Each example in supervised learning is made up of an input object (usually a vector) and a desired output value (also
called the supervisory signal).
2) Unsupervised learning: Unsupervised learning is a sort of machine learning that searches a data set for previously unnoticed
patterns with no pre-existing labels and minimal human observation. Unsupervised learning, also known as self-organization,
allows for the modelling of probability densities across inputs, comparable to supervised learning, which often uses human-
labeled data.
V. SYSTEM DESIGN
A. Design Objectives
The design goals are a collection of several designs that we've used in our "Chronic Disease Prediction Using Machine Learning"
system. Data flow diagrams, sequence diagrams, class diagrams, use case diagrams, and activity diagrams are all used to construct
this system. Our system is set up in such a way that the registration procedure is handled completely by the administrator. Users,
such as doctors, can log into the system using their credentials after completing the registration process. Doctors will be able to
forecast chronic disease based on the inputs/attributes provided.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 694
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII August 2022- Available at www.ijraset.com
C. Activity Diagram
Figure 2 depicts the activity diagram. It denotes the sequence in which a system task is completed in order to produce a result. The
Administrator is in charge of the User/Doctor registration process. After completing the registration process, the user, in this case a
doctor, will log into the system using the credentials provided by the administrator. When a user logs in successfully, the system
directs him to the appropriate page based on his specialism. The user must enter the qualities (independent variables) in the
appropriate order to obtain the desired forecast. To generate the appropriate predictions and visualization, the system uses a
Machine Learning Model that is built using accessible datasets and several ML methods (classification algorithms).
VI. ALGORITHM
A. KNN
K Nearest Neighbor (KNN) is a basic, easy-to-understand, adaptable, and one of the most advanced machine learning algorithms.
The user will be able to predict the disease in the Healthcare System. The user can forecast whether or not an illness will be detected
using this approach. The proposed method divides diseases into distinct classifications, indicating which disease will occur based on
symptoms. For each classification and regression problem, the KNN rule was utilised. Based on a feature comparison technique, the
KNN algorithm was developed. A case is classified by a majority vote of its neighbours, with the case being allocated to the most
common class among its K closest neighbours as determined by a distance function. If K is equal to 1, the instance is simply placed
in the category of its closest neighbour.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 695
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII August 2022- Available at www.ijraset.com
Euclidean Distance= ∑ ( − )
It's also worth noting that each of the three distance measurements is only valid for continuous variables. The Hamming distance
must be employed when categorical variables are present. It all adds up to the difficulty of standardizing numerical variables
between zero and one when the dataset contains both numerical and category variables.
Hamming Distance = ∑ | − |
B. Naïve Bayes
For prognosticative modelling, Naive Bayes is a simple yet incredibly powerful rule. One of the easiest methods is to choose the
most likely hypothesis based on the facts we have, which we may utilize as past information about the subject.
The Bayes' Theorem explains how we can determine the likelihood of a hypothesis based on the information we already have. The
presence of a certain feature in an extremely class is unrelated to the presence of the other feature, according to the Naive Bayes
classifier. The Bayes theorem allows you to calculate the posterior probability P (b|a) from P (b), P (a), and P (a|b). Take a look at
the following equation:
( ) ( )
P (bVa) =
( )
Above all,
1) The posterior chance of class (b, target) given predictor is P (b|a) (a, attributes).
2) P (b) is the class prior probability.
3) P (a|c) denotes the probability of a predictor in a particular class.
4) P (a) denotes the predictor's prior probability.
C. Logistic Regression
Logistic regression is a supervised learning classification technique used to predict the likelihood of a disease target variable.
Because the nature of the target or variable is separated, there are only two possible groups. In simple terms, the variable is binary in
nature, with information represented as either 1 (meaning success) or 0 (meaning failure). A logistic regression model predicts
P(y=1) as a function of x.
Logistic regression can be expressed as:
log(p(X)⁄(1 − p(X)) = β + β X
Where the logiest or log odds function is on the left, and p(x) / (1-p(x)) is on the right. The odds are the ratio of the chances of
success to the chances of failure. As a result, in logistic regression, a linear combination of inputs is translated to the log (odds),
with the output sufficient to 1.
Recall, also known as sensitivity, is the ratio of the number of patients with chronic diseases who are accurately recognized to the
total number of chronic disease patients.
Recall =
F-Measure: It assesses the test's precision. It's the harmonic mean of memory and precision.
∗
F - Measure=2 ∗
The ratio of accurately anticipated output cases to all cases in the data collection is called accuracy.
Accuracy=
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 696
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VIII August 2022- Available at www.ijraset.com
Table I: Accuracy results for correctly classified and wrongly classified occurrences.
Disease Accuracy Correctly Incorrectly
Classified Classified
Instances Instances
Cancer 80.8246 107 23
REFERENCES
[1] Hamet P., Tremblay J. Artificial intelligence in medicine. Metabolism.2017;69: S36S40.doi:10.1016/j.metabol.2017.01.011. [PubMed] [CrossRef] [Google
Scholar].
[2] Johnson K.W., Soto J.T., Glicksberg B.S., Shameer K., Miotto R., Ali M., Dudley J.T. Artificial intelligence in cardiology. J. Am. Coll. Cardiol.2018;71:2668–
2679.doi:10.1016/j.jacc.2018.03.521. [PubMed] [CrossRef] [Google Scholar].
[3] Bini S. Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health
Care? J. Arthroplast. 2018; 33:2358–2361. doi: 10.1016/j.arth.2018.02.067. [PubMed] [CrossRef] [Google Scholar].
[4] Kotsiantis S.B., Zaharakis I., Pintelas P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007;
160:3–24. [Google Scholar].
[5] Deo R.C. Machine Learning in Medicine. Circulation. 2015; 132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593. [PMC free article] [PubMed]
[CrossRef] [Google Scholar].
[6] Battineni G., Sagaro G.G., Nalini C., Amenta F., Tayebati S.K. Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions
by Cross-Validation Methods. Machines. 2019; 7:74. doi: 10.3390/machines7040074. [CrossRef] [Google Scholar].
[7] Polat H., Mehr H.D., Cetin A. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods. J. Med. Syst. 2017;
41:55. doi: 10.1007/s10916-017-0703-x. [PubMed] [CrossRef] [Google Scholar].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 697