Heart Disease Prediction Using Machine Learning Algorithm

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, Pdf Url: https://www.ijtsrd.com/papers/ijtsrd38358.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38358/heart-disease-prediction-using-machine-learning-algorithm/ravi-kumar-singh

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views

Heart Disease Prediction Using Machine Learning Algorithm

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 5 Issue 2, January-February 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

Heart Disease Prediction using Machine Learning Algorithm

Ravi Kumar Singh, Dr. A Rengarajan
Department of Master of Computer Applications, Jain Deemed to be University, Bengaluru, Karnataka, India

ABSTRACT How to cite this paper: Ravi Kumar Singh

Nowadays, Heart disease has become dangerous to a human being, it effects | Dr. A Rengarajan "Heart Disease
very badly to human body. If anyone is suffering from heart disease, then it Prediction using Machine Learning
leads to blood clotting. Heart disease prediction is very difficult task to predict Algorithm" Published
in the field of medical science. Affiliation has predicted that 12 million people in International
fail horrendously every year as a result of heart disease. In this paper, we Journal of Trend in
propose a k-Nearest Neighbors Algorithm (KNN) way to deal with improve the Scientific Research
exactness of heart determination. We show that k-Nearest Neighbors and Development
Algorithm (KNN) have better accuracy than random forest algorithm for (ijtsrd), ISSN: 2456-
viewing heart disease. The k-Nearest Neighbors Algorithm give more precise 6470, Volume-5 | IJTSRD38358
and exact outcome . We have taken 13 attributes in the dataset and a target Issue-2, February
attribute, by applying machine learning we achieved 84% accuracy in the 2021, pp.183-187, URL:
heart disease detection. www.ijtsrd.com/papers/ijtsrd38358.pdf

KEYWORDS: Machine Learning, k-Nearest Neighbors classifier, Decision Tree Copyright © 2021 by author(s) and
classifier, Random Forest Classifier, Jupyter International Journal of Trend in Scientific
Research and Development Journal. This
is an Open Access article distributed
under the terms of
the Creative
Commons Attribution
License (CC BY 4.0)
(http://creativecommons.org/licenses/by/4.0)

INTRODUCTION
Heart disease forecast is one of the most notable point in the This overview paper is committed for a review in the field of
machine learning field for expectation. It clusters the blood machine learning technique in heart disease. Later aspects of
to all aspects of the body. If the blood not siphons to every the overview paper will discuss about different machine
part of the body, at that point the brain and different organ learning calculation for heart disease and their comparison
will stop work and the person may die. It is hard to recognize on different parameters. It also shows future outline of
heart disease on account of few factors, for example, machine learning calculation in heart disease. This paper
diabetes, hypertension, high cholesterol, heart beat rate and gives a profound analysis in the field of predicting heart
various other factors. As per World Health Organization disease.
heart related disease are liable for taking 17.7 million lives
every year, 31% of all over worldwide. In India, heart RELATED WORKS
disease has become the main source of mortality. Heart Heart is one of the main organ of human body, it plays vital
disease has killed 1.7 million Indian in 2016, as indicated by function of blood siphoning in human body which is as
the 2016 worldwide weight of infection report. fundamental as the oxygen of human body so there is
consistently need of insurance of it, this is one of the main
In clinical science coronary illness is one of the huge explanation behind the analysts to work on it. So there are
challenges, because a lot of parameter and technicality is number of specialists dealing with it. There is consistently
involved for predicting this disease. Machine learning could need of examination of heart related things either analysis or
be a superior decision for accomplishing high precision for expectation or you can say that assurance of heart disease.
heart disease as well as another disease and its diverse There are different fields like artificial intelligence, machine
information types under different condition for predicting learning, data mining that contributed on this work. Here, we
the heart disease calculation, for example, Naive Bayes, will discuss some of them.
Decision Tree, KNN, Neural Network are utilized to predict
risk of heart algorithm and its speciality such as Naive Bayes Some of the analysts have taken a shot of information about
is utilized for predicating heart disease, while Decision Tree the expectation of heart disease. Kaur et al. have worked on
is utilized to give ordered report to the heart disease, though this and characterize how the interesting pattern and
the Neural Network give chances to limit the mistake for information are gotten from a huge dataset. They perform
predication of heart disease. All these procedures are exactness correlation on different machine learning and
utilized in old patient record for getting expectation about information mining 453 methodologies for discovering
new patient. The expectation for heart disease encourages which one is best among at that point and get the outcome
doctor to predict heart disease in early stage so that he can on the kindness of SVM.
save millions of lives.

@ IJTSRD | Unique Paper ID – IJTSRD38358 | Volume – 5 | Issue – 2 | January-February 2021 Page 183
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Zhao et al. (2017) built up a framework for heart disease 83.07%, and MPL gave 78.14% exactness and inferred that
classification by utilizing two datasets, one from Shanghai J48 beats every other calculation.
Shuguang Hospital and another in UCI coronary disease
dataset. The model uses support Vector Machine calculation PROJECT SCOPE AND OBJECTIVES
alongside PCA, CCA and DMPCCA which are utilized for The primary goal of this examination is to develop a heart
include extraction and combination. The general forecast framework. The system can find information related
investigation come about that DMPCCA gave the best with heart disease from the historical heart data set to
outcome. implement the classifier that classifies the disease according
to the contribution of the client and reduce the cost of the
Ganesan et al. (2019) utilize IOT innovation for expectation medical test. The scope of the project is to execute machine
and conclusion of heart disease by taking UCI dataset and learning calculation to bigger dataset helps to improve the
applied J48 classifier, Logistic Regression, Multiplayer accuracy of results. Utilizing of machine learning procedure
Perception, and SVM utilizing Java on Amazon cloud. In this gives more exact outcomes than more experienced doctor.
examination J48 gives 91.48%, SVM gave 84.07%, LR gave By this clinical choice with computer-based patient record
could decrease medical error and improve patient result.

Literature survey
SI. no Authors Year Description
The authors proposed to develop a model Intelligent Heart Disease Prediction
Palaniappan
1 2008 System (IHDPS) utilizing information mining procedures to be specific Naive
and Awang
Bayes, Decision Tree, and Neural Network.
The authors proposed that neural network was best survey in information
2 Bhatla and Jyoti 2012
mining methods to anticipate heart disease.
The creators proposed three mainstream information mining calculation CART
3 Chaurasia and Pal 2013 (Classification and Regression Tree), ID3 (Iterative Dichotomized 3) and
Decision Table (DT) separated from a choice tree to foresee heart disease.
The authors proposed to utilize diverse characterization procedures in coronary
4 Boshra Brahmi et al. 2015 illness determination like J48 Decision Tree, K-Nearest Neighbors (KNN), Naive
Bayes (NB) and SMO to classify dataset.
K. Vembandasamy et The authors proposed Naive Bayes algorithm in data mining technique which
5 2015
al. serves diagnosis of heart disease patient.
The authors propose an efficient mechanism to predict heart disease by mining
6 S. Seema et al. 2016
the data from health record.
The authors proposed to analysis information mining methods to foresee various
7 K. Gomathi et al. 2016
kinds of sicknesses like heart disease, diabetes and bosom disease and so on.
The authors proposed of this examination is to dissect directed AI calculation to
8 Ayon Dey et al. 2016
anticipate heart disease.

Requirement Analysis Software Requirements

Tools Jupyter Notebook
Anaconda The Jupyter Notebook is an open-source web application
Anaconda is an open-source appropriation for python and R that permits you to make and offer chronicles that contain
programming language. It is utilized for information science, live code, condition, perceptions and story text. Utilization
machine learning, profound learning, and so on. With the include: information cleaning and transformation,
availability of more than 300 libraries for information mathematical simulation, measurable displaying,
science, it turns out to be genuinely ideal for any developer information representation, machine learning, and
to work on anaconda for information science. Anaconda significantly more.
helps in improved bundle the board and sending. Anaconda
accompanies the wide assortment variety of tools to Python
effectively gather information from different source using Python is a universally useful deciphered, intelligent, object-
different machine learning and machine learning arranged and elevated level programming language. It was
calculations. It is developed and maintained by developed by Guido van Rossum during 1985-1990. Like
Anaconda.inc., which was developed by Peter Wang and Perl, python source code is additionally accessible under the
Travis Oilphant in 2012. GNU General Public License (GNL). Its Error! Bookmark not
defined. and object-oriented approach aim to
Hardware Requirements help programmers write clear, logical code for small and
Operating System: Windows 10 large-scale projects.
Processor: Intel(R)Pentium(R) CPU N3710 @1.60GHz
1.60GHz Python Libraries
System Type: 64-bit operating system, x64-based Numpy
processor Pandas
Installed Ram: 4.00 GB Matplotlib
Sklearn

@ IJTSRD | Unique Paper ID – IJTSRD38358 | Volume – 5 | Issue – 2 | January-February 2021 Page 184
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Material and Methods order groundbreaking perceptions. At the end of the day, the
Dataset Used for Research preparation dataset is utilized to acquire better limit
The dataset consists of 303 individual data. There are 14 conditions which can be utilized to decide each target class;
columns in the dataset, which are described below. when such limit conditions are resolved, next undertaking is
to foresee the objective class
1. Age: displays the age of the individual.
2. Sex: displays the gender of the individual using the Machine learning is a field of study and is concerned with
following format: algorithms that learn from examples. There are many
1 = male different types of classification tasks that you may encounter
0 = female in machine learning and specialized approaches to modelling
that may be used for each.
3. Chest Pain type: shows the kind of chest-torment
experienced by the individual utilizing the accompanying K-Nearest Neighbor Algorithm (KNN)
organization t : K nearest neighbors is one of the easiest machine learning
1 = typical angina calculation is dependent on supervised learning procedure.
2 = atypical angina K-NN calculation accepts the closeness between the new
3 = non — anginal pain case and available cases and put the new case into the
4 = asymptotic classification that is generally like the accessible
classification. K-NN calculation can be utilized for regression
4. Resting Blood Pressure: shows the resting pulse just as for classification issue. K-NN is a non-parametric
estimation of a person in mmHg (unit) calculation, which implies it doesn’t make any presumption
on hidden information.
5. Serum Cholestrol: shows the serum cholesterol in mg/dl
(unit)
Pros:
6. Fasting Blood Sugar: looks at the fasting glucose Basic Algorithm and consequently simple to decipher the
estimation of a person with 120mg/dl. In the event that forecast. Quick calculation time.
fasting glucose > 120mg/dl at that point: 1 (valid) Used for both classification and regression.

7. Resting ECG: displays resting electrocardiographic Cons:

results Does not work well for large dataset.
0 = normal Prediction is very costly.
1 = having ST-T wave abnormality Poor at classifying data points in a boundary where they
2 = left ventricular hyperthrophy can be classified one way or another.

8. Max heart rate achieved: displays the max heart rate Random Forest Classifier
achieved by an individual. Random Forest is one of the most prestigious and most
9. Exercise induced angina: remarkable machine learning calculations. It is one sort of
1 = yes machine learning calculation that is called Bagging or
0 = no Bootstrap Aggregation. So, as the access an incentive from an
information test, for example, mean, the bootstrap is very
10. ST depression induced by exercise relative to rest: powerful statistical approach. Here, lots of information are
displays the value which is an integer or float. taken, the mean is determined, after that all the mean value
are averaged to give a superior expectation of the mean value.
11. Peak exercise ST segment: In bagging, a similar strategy is utilized, but instead of
1 = upsloping estimating the mean of each information test, decision tree is
2 = flat commonly utilized.
3 = downsloping
Advantage of Random Forest:
12. Number of major vessels (0-3) colored by flourosopy: Random Forest Algorithm is exact outfit learning
displays the value as integer or float. calculation.
Random Forest runs efficiently for large scale data sets.
13. Thal: displays the thalassemia: It can handle hundreds of input variables.
3 = normal
6 = fixed defect Disadvantage of Random Forest:
7 = reversible defect Features need to have some predictive power else they
won’t work.
14. Diagnosis of heart disease: Displays whether the Forecasts of the trees should be uncorrelated.
individual is suffering from heart disease or not: Appears as black box.
0 = absence
1 = present. Decision Tree Classifier
Decision Tree Classifier is a basic and generally utilized
Classification Techniques grouping procedure. It applies a waterway forward plan to
Procedures In AI and measurements, grouping is a directed take care of the grouping issue.. Decision tree classifier
learning approach in which the PC program gains from the represents a progression of deliberately made inquiries
information and afterward utilizes this figuring out how to concerning the characteristics of the test record. Decision

@ IJTSRD | Unique Paper ID – IJTSRD38358 | Volume – 5 | Issue – 2 | January-February 2021 Page 185
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Trees (DTs) are a non-parametric directed learning method on this attribute I would anticipate the outcome for a patient
used for classification and regression. It is a Supervised whether he is experiencing heart disease or not. This paper
Machine Learning where the information is constantly part has Random Forest classifier, KNN (K-Nearest neighbour
as indicated by a specific boundary. classifier) & Decision Tree classifier – three techniques for
the effective prediction of heart disease. It analyses the
Decision Tree consists of: efficiency & accuracy of the three techniques to choose them
Nodes: Test for the estimation of a specific quality. the best.
Edges/ Branch: Compare to the result of a test and
associate with the following hub or leaf. The figure below shows the number of the heart disease
Leaf nodes: Terminal hubs that anticipate the result cases.
(speak to class marks or class appropriation).

Experiment
The Proposed Method
Heart disease is the main source of death among all the
diseases, even cancer. The quality of people facing heart
disease is on a raise every year. The prompts for its initial
finding and treatment. Because of absence of source in the
medical field, the prediction of heart disease might be a
issue. Use of suitable technology can be useful to the medical
society and patient. The issue can be settled by embracing
machine learning techniques. In my project, I would be
taking a shot at basic machine learning classification model.
And using this model I could prepare my model utilizing the
information which comprise of different attribute like age, 0 = absence 1 = present
sex, cp, blood pressure, skin thickness and so on and based

Result and Discussion

Correlation Matrix
Let’s see the correlation matrix of features. From this graph, we can observe that some features are highly correlated and some
are not.

This figure shows the correlation matrix

@ IJTSRD | Unique Paper ID – IJTSRD38358 | Volume – 5 | Issue – 2 | January-February 2021 Page 186
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
K-Nearest Neighbors Classifier:
K Nearest Neighbors is a non-parametric strategy utilized for
grouping. It is lazy learning figuring where all computation is
surrendered until gathering. It is otherwise called case based
learning calculation, where the capacity is approximated
locally. This algorithm is used when the amount of data is
large and there are non-linear decision boundaries between
classes. KNN explains a categorical value using the majority
votes of nearest neighbors. Not only for classification, KNN
can be used for function approximation problem.

This figure shows the Decision Tree Classifier scores

Conclusion
Machine Learning plays an important role in various fields
such as Healthcare, Stocks & Marketing, Banking, Weather
Forecast and so on. With the help of KNN Algorithms it
become easy to evaluate and fetch meaningful information
from them. In KNN by using the various K- values of the K-
NN classifier the accuracy of the model increases
simultaneously, this study aims to accurately predicting
whether a given patient is suffering from diabetes or not.
Finally the accuracy of my model comes close to 84 % and
for any new patient it could easily predict whether the
patient is having diabetes or not.
This figure shows the K Neighbors Classifier scores
Bibliography
Random Forest Classifier: [1] S. E.-S. S. I. D. K. A. A. F Ali, "A smart healthcare
Random forest is a regulated learning calculation. It very monitoring system for heart disease prediction based
well may be utilized for order and relapse. It is on ensemble deep learning and feature fusion," 2020.
straightforward and simple to execute. A backwoods is [2] C. T. G. S. S Mohan, "Effective heart disease prediction
contained trees. This classifier makes choice trees on using hybrid machine learning techniques," 2019.
haphazardly chose information tests, gets forecast from each [3] M. R. M. I. M. I. S Nashif, "Heart disease detection by
tree and chooses the best arrangement by methods of using machine learning algorithms and a real-time
casting a ballot. The random forest composed of multiple cardiovascular health monitoring system," 2018.
decision trees. It creates a forest of trees. [4] Y. H. K. H. L. W. L. W. M Chen, "Disease prediction by
machine learning over big data from healthcare
communities," 2017.
[5] A. A. AA Soofi, "Classification techniques in machine
learning: applications and issues," 2017.
[6] S. S. K Deepika, "Predictive analytics to prevent and
control chronic diseases," 2016.
[7] D. P. K Gomathi, "Multi Disease Prediction using Data
Mining Techniques," 2016.
[8] J. S. N. S. A Dey, "Analysis of supervised machine
learning algorithms for heart disease prediction with
reduced number of attributes using principal
component analysis," 2016.
This figure shows the Random Forest Classifier scores. [9] M. S. B Bahrami, "Prediction and Diagnosis of Heart
Disease by Data Mining Techniques," 2015.
Decision Tree Classifier [10] R. S. E. D. K Vembandasamy, "Heart diseases detection
This classifier falls under the category of supervised using Naive Bayes algorithm," 2015.
learning. It very well may be utilized to take care of relapse [11] E. A. Y. K. AF Otoom, "Effective diagnosis and
and characterization issues. We can utilize this calculation monitoring of heart disease," 2015.
for issues where we have ceaseless yet in addition [12] S. P. V Chaurasia, "Early prediction of heart diseases
unmitigated info and target highlights. It is the best machine using data mining techniques," 2013.
learning calculation utilized for depicting the tree in a [13] S. S. G Parthiban, "Applying machine learning
graphical way. methods in diagnosing heart disease for diabetic
patients," 2012.
[14] K. J. N Bhatla, "An analysis of heart disease prediction
using different data mining techniques," 2012.
[15] R. A. S Palaniappan, "Intelligent heart disease
prediction system using data mining techniques,"
2008.

@ IJTSRD | Unique Paper ID – IJTSRD38358 | Volume – 5 | Issue – 2 | January-February 2021 Page 187