Heart Disease Prediction Using Machine Learning
Heart Disease Prediction Using Machine Learning
net/publication/351763446
CITATIONS READS
0 1,474
2 authors:
All content following this page was uploaded by Md. Rubel Rana on 01 June 2021.
Abstract--- Heart disease cases are rising at an alarming rate, and II. RELATED WORKS
it's critical and to be able to predict these diseases in advance. The
project focuses on predicting which patients are more likely to have This project was inspired by a significant amount of work on
heart disease based on a variety of medical factors. To predict and the detection of CVDs using Machine Learning algorithms. ML
identify patients with heart disease, we used different algorithms algorithms have been used to make several efficient heart
such as logistic regression and KNN. The proposed model's accuracy disease predictions. Using the previous and current machine
was quite good, and it was able to predict signs of heart disease in a learning and deep learning models, the model incorporating
person. This heart disease predictive method improves patient IHDPS was able to predict the likelihood of a person getting
treatment and makes diagnosing the disease easier along with heart disease pretty accurately [2]. It focused on basic attributes
allowing exploring large data at once. like age, sex, blood pressure, and blood sugar. But new models
which use deep learning and neural network are more efficient,
Keywords – Heart Disease, Machine Learning, Logistic Regression,
accurate and reliable. A new neural network model had the
KNN, Prediction.
classification power of 77% to correctly classify the presence of
I. INTRODUCTION Coronary Heart Disease (CHD) and 81.8% to accurately classify
the absence of CHD cases on testing data, which is 85.70% of
Machine learning is a form of artificial intelligence that the total of their dataset [3]. While the recall values obtained
enables the machine to create and implement algorithms that can from other machine learning methods, such as SVM and random
learn from previous experiences. We used a variety of classifiers forest, are comparable to that of our proposed CNN model, our
from supervised and unsupervised learning to predict and model predicts the negative cases with higher accuracy. They
determine the dataset's accuracy. state that other machine learning models, such as SVM and
Cardiovascular diseases refer to various disorders that can random forest, produce recall values comparable to their
affect the heart and circulatory system. Heart disease has been proposed Convolutional Neural Network (CNN) model. The
common for a long time and is still one of the most severe CNN model, on the other hand, predicts the negative cases with
diseases today. According to WHO, cardiovascular diseases greater precision.
(CVDs) are the number 1 cause of death globally. Four out of
5CVD deaths are due to heart attacks and strokes, and one-third III. DATA SOURCE
of these deaths occur prematurely in people under 70 years of We collected the dataset from Kaggle to train the model [4].
age [1]. Our research can identify people who are more likely to The data was gathered from multiple instances. The database is
be diagnosed with heart disease based on their medical history. taken from the UCI repository [5]. It initially had 76 attributes
It predicts based on factors like sugar level, blood pressure, chest but a subset of 14 attributes was used. The dataset includes a
pain, cholesterol, etc. This way, people will be able to know variety of individuals and their histories of heart disease, as well
about themselves beforehand and take necessary precautions. To as other medical conditions. The dataset consists of the medical
verify the accuracy and explore different models, we used both history of 303 different patients of different attributes spread
KNN and logistic regression models. We used a dataset of across. This dataset provides well details about the patient's
people with and without heart disease to predict heart disease. medical characteristics, such as age, chest pain types, blood
We used 14 different attributes of a patient to predict if they are pressure, sugar level, angina, and so on, which enables us in
susceptible to heart disease. The more efficient of these determining whether or not the patient has been diagnosed with
algorithms is KNN which gives us an accuracy of 75.409%. We heart disease. Here are attributes are given below:
used many graphical representations to present the results of the
project’s predictions and such.
1
A. Logistic Regression
Logistic regression is a type of supervised learning in which
the probability for classification problems with two outcomes
is computed. It can also be used to predict many classes. We
used the sigmoid function in the Logistic Regression model,
which is:
𝟏
𝝈(𝒛) =
𝟏 + 𝒆−𝒛
This function successfully changes any number into a value
between 0 and 1, which we used to calculate the likelihood of
correctly guessing classes. For example, there are two types of
heart disease: one, those who have it, and the other, those who
do not.
B. K-Nearest Neighbors
K-Nearest Neighbors (KNN) is a nonparametric, lazy, and
basic classifier. When all of the characteristics are continuous,
Table 1. Used Attributes KNN is preferred. KNN, also known as case-based reasoning,
has been employed in a wide range of applications, including
IV. METHODOLOGY pattern recognition and statistical estimation. To determine the
We did analysis of two machine learning algorithms, K class of an unknown sample, the nearest neighbor must be
nearest neighbors (KNN) and Logistic Regression which are identified. Because of its rapid convergence speed and
pretty accurate to this certain predicting model. The proposed simplicity, KNN is favored over other categorization methods.
approach is organized as follows: the first phase is data
V. RESULTS & DISCUSSION
collection, the second stage is substantial value extraction, and
the third stage is data exploration. Depending on the algorithms Here are some graphs that are acquired from the project
used, data preprocessing deals with missing values, data demonstrated below:
cleaning, and normalization. The classifier used in the proposed
models is then used to identify the pre-processed data after it has
been pre-processed. Later, we put the proposed model to the test,
evaluating it for accuracy and performance using a variety of
performance metrics. We used 20% of the full dataset to test the
model.
2
Algorithm Accuracy Comparison
SL
Algorithm Accuracy (%)
73.770
1. Logistic Regression
75.409
2. K-Nearest Neighbors