Heart Disease Prediction Using Machine Learning Method
Heart Disease Prediction Using Machine Learning Method
net/publication/366858619
CITATIONS READS
0 48
7 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Muhammad Adnan Khan on 06 January 2023.
Abstract— The heart disease is also known as coronary in young and aged persons and becoming a trend in youth.
artery disease, many hearts affecting symptoms that are very Smoking tightens the arteries of the coronary heart and
common nowadays and causes death. It is a challenging task
reasons abnormal heartbeat; it additionally increases blood
to diagnose heart diseases without any intelligent diagnosing
system. Many researchers did research on it and developed a stress. High blood stress reasons many issues it enforces the
diagnostic system to diagnose heart diseases and worked on it. coronary heart to work harder to deliver blood to the body,
The prediction of cardiovascular disease, required a brief it additionally makes the lower-left coronary heart chamber
medical history of patients, including genetic information. thickens which growth the hazard of coronary heart failure.
The world is in acute need of a system for predicting heart
High blood sugar degrees can harm your nerves or vessels
disease and it became crucial. Data mining and machine
learning are common techniques used in the field of health that control your coronary heart.
care to process large and complex data. This research paper
presents reasons for heart disease and a model based on High cholesterol builds up inside the partitions of
Machine learning algorithms for prediction. arteries, inflicting atherosclerosis, which reasons coronary
heart disorder. Due to high cholesterol, arteries turn out to
Keywords—coronary, Data mining, cardiovascular, be narrowed, and blood flow is down and blocked. These
algorithm
are the maximum viral illnesses and are very common
I.INTRODUCTION nowadays and four out of five individuals have those
The heart is the most crucial thing in human illnesses [3].
existence and its survival. It may be very important for the In the past, human beings have been unaware
appropriate functioning of the body. It pumps oxygenated approximately the seriousness of coronary heart illnesses
blood to the other components of the body. The heart gets and don’t be aware of those illnesses, and while some
deoxygenated blood and sporting metabolic waste products coronary heart illnesses are very deadly to fitness and might
from the body and sends it to the lungs for oxygenation [1]. even reason be death. Today, most deaths take place
If the coronary heart carries out its proper functioning, then because of coronary heart failure. The rate of deaths,
it leads to a wholesome existence, however, if the coronary because of coronary heart failure, is growing every day.
heart forestalls its typical overall performance, it reasons According to the WHO report about 17.9 million people
dying of humans. It causes inflammation of blood vessels. died due to cardiovascular diseases in 2019 [4], which
Heart diseases cause abnormal blood streaming, which may accounts for 32% of all global deaths 85% of these deaths
be very risky for humans [2]. were due to heart attacks and strokes. The reason behind it
There are many factors that reason coronary is that peoples are unaware of the reasons for coronary heart
heart disease e.g., smoking, excessive blood stress, high illnesses.
cholesterol, diabetes, etc. Smoking is common nowadays
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
The detection of coronary heart illnesses is with a disease. SVM can work on small as well as complex
turning into very difficult, sluggish, and complex through datasets and is much more powerful and can be stronger for
ordinary clinical strategies in emergent countries because building machine learning algorithms while the Confusion
of the loss of professional medical doctors and matrix is used for summarizing the performance of a
modernization in exam tools. It is turning into large trouble classification algorithm that your classification model
all around the world [5]. Whereas examination procedures getting right or what kind of error it is making.
are carried out after studying the medical history of the
patient and analyzing symptoms doctors suggest various In this research paper, machine learning strategies
tests according to the situation of the patient like Blood and algorithms of category SVM are used for the detection
tests, Electrocardiogram (ECG), coronary angiogram, of coronary heart disorder. For finding the accuracy of data,
Exercise stress test, Echocardiogram (Ultrasound), Nuclear a data set is selected from Kaggle with patient history and
Cardiac stress test. attributes and then preprocess that data. By using these data
sets and applying classification algorithms, it can be
An ECG checks a person’s coronary heartbeat, it predicted whether a person has heart disease or not. Data
reads electric impulses of the coronary heart. It’s far mining is a process of finding exceptions, figures, and
completed through small sticky dots that are placed-on associations with large datasets to predict results.
palms, legs, and chest then lead connected with an ECG
machine which facts’ heartbeat which might be withinside Data preprocessing is the data mining technique
the shape of electrical impulses and print them on paper. that transforms raw data into useful data. In the data
This is recommended by your doctor while your coronary preprocessing process first step is data cleaning data may
heart beats abnormally to diagnose coronary heart failure. have missing parts so to fill that missing part by the most
When you need a clear or detailed image of the heart MRI probable value and remove noisy data by clustering,
uses, it is done through magnets and radio waves, it will regression, and binning method in this way first step is
take moving pictures of the heart on the computer, it is done. Data transformation is the next step which transforms
suggested when your physician wants to know how well the data into suitable form normalization and attribute
your heart is working. Coronary Angiogram, this test is selection is done in this step. Data reduction is the third and
taken after a heart attack. A small tube called a catheter is last step in data preprocessing, data cube aggregation,
put into an artery on your wrist, arms, and groin and then attribute subset selection numerosity reduction and
move inside an artery. In this test, an X-ray of the heart is dimensionality reduction are the various steps involves in
taken so that a doctor can see blocked arteries [6]. data reduction [11].
But these strategies have flaws and cannot predict the This data is trained using SVM classification
complexity of the disorder. According to worldwide algorithms, which are used to check the accuracy of
research in 2019, coronary heart disorder has turned out to classifying a patient as having a risk of heart disease or not.
be the No. 1 disorder with its diverse variants due to the This method is both cost and memory efficient. [12].
fact entire body relies upon the right functioning of the II. LITERATURE REVIEW
coronary heart, if it does not work properly then causes
Numbers of studies have been done that focus on
death of a person [7]. The medication can be very costly
the diagnosis of heart disease because most of the deaths
due to its late diagnosis and lack of medical experts, as late
occurred due to heart failure. Researchers applied different
as the disease might be diagnosed, it becomes more and
techniques for the diagnosis of heart diseases and achieved
more crucial. The tests carried out nowadays are not able to
different probabilities for different methods. The
predict the entire reason for the disorder like in ECG, this
researchers used data mining methods and algorithms of
isn't always sufficient to diagnose coronary heart disorder
classification [13]. Decision Tree, Native Bayes, and
more tests are needed for detecting coronary heart problems
Neural Network for the prediction of heart diseases. In an
[8]. A coronary angiogram can influence a patient’s kidney
experiment, the researcher produced a model using neural
[9] it may injure the artery and might reason allergies etc.
network and hybrid intelligent techniques on a dataset
The clinical strategies are crucial for diagnosis the which shows a result that the hybrid intelligent techniques
coronary heart disorder. The agenda of this paper is to test give the best result, and the accuracy of prediction is
whether an affected person has a coronary heart disorder or improved by this technique. Machine Learning techniques
is now no longer with correct strategies. Machine learning predict risk at an early stage and are very useful techniques
is an approach [10]. This is used in the improvement of a in some techniques.
computer system wherein algorithms are used, and
Researchers used machine learning techniques
statistical fashions are utilized to investigate facts without
for the prediction of heart disease some techniques are
following explicit instructions. Applications of machine
SVM support vector machine, naive Bayes, neural
learning used in biological databases and increasing day by
network, decision tree, and regression classifiers. The
day. It is helping us in various medical departments and
researcher shows that SVM [14] is the best technique which
building models that associate a wide range of variables
gives an accuracy of about 92.1% while neural networks
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
give an accuracy of about 91% and decision trees show proposed using wearable sensors. A wearable medical
lesser accuracy which is about 89.6%. The researchers used device-based system was presented by Al-Makhadmeh and
the backpropagation algorithm [15] as the best Tolba that system collects detail about cardiac patients
classification technique for the prediction of heart disease. before and after heart failure. For the appropriate class and
They also proposed a genetic algorithm optimizer against valuable function extraction, they carried out function
the backpropagation algorithm and the drawback of this is extraction techniques and deep learning models, and then
stuck in local minima. They proposed that this transmitted the collected information to the healthcare
methodology give 100% accuracy in the future with fewer system.
errors.
That system has limitations because it used 23
Research is done by S. Prakash et al. in 2017 on attributes for training, and it reasons system complexity and
heart disease prediction in which they compare two dimensionality. That tool has become no longer accurate,
methods Optimality Criterion Feature Selection (OCFS) because of inefficient feature extraction and feature
and rough set feature selection on information entropy weighting approach. A tool that automatically treats
(RSFS-IE). The researchers used different types of datasets coronary heart patients using the net of things (IoT) and a
in terms of computational time, prediction quality, and deep learning version called Health fog was presented, and
error rate. They proposed that OCFS is best compared to the aim have become to routinely address cardiac affected
RSFS-IF because it can take less execution time [16]. man or woman information coming from IoT devices [21]
Researchers used different algorithms Naïve Byes, Neural
Another study was done by researchers in which Network, Decision Tree, and genetic algorithm for the
they took a sample database of the patient record. They prediction of heart disease Naïve Bayes shows good results
trained and tested Neural Network by 13 attributes such as and accuracy was about 96.6% [22]-[23].
age, blood pressure, angiography report, etc. They
recommended supervised network diagnosis of heart III. LIMITATION OF PREVIOUS WORK AND OUR
disease and used a backpropagation algorithm for training CONTRIBUTION
[17]. The system identified unknown data and set In previous research, researchers did a great job
comparisons between unknown data and trained data for the prediction of heart disease using different
whenever the doctor fed unknown data and produced a list techniques. Table 1 shows the previous work of
of probable diseases a patient endangered. The desired researchers.
output is closest to 100%.
TABLE I. Analysis of previous Methods and results
Kim and Kang used a neural network [18] to
Approach Year Method used Results
develop a system for the diagnosis of heart disease. For the
detection of features that were more important, sensitivity Vincy Cherian et
2017 Naïve Bayes 86%
analysis of features was used. Those features which have al,[23]
high sensitivity were more important than those features Logistic
Rani et al,[18] 2021 86.60%
which have low sensitivity. By analyzing the change in the Regression,
sensitivity of features with a change in the value of one
feature, correlated features were found after the selection
of irrelevant features. Two features were correlated if a But those researchers have a few deficiencies. The
change in the value of one feature also change the deficiency of those researchers conquers with the aid of
sensitivity of another feature more than the average change. using the methodologies utilized in this article. SVM is
K. Polara j et al. compare different algorithms utilized in this article to get the preferred result for the
models for the prediction of heart diseases [19] and the prediction of coronary heart disorder. SVM is a computer
result was that multiple linear Regression is better for the algorithm that still was efficaciously implemented on a
prediction of the risk of cardiovascular disease. The study more and more huge variety of organic applications.
is done by using a dataset that consists of 1000 values. Data Support Vector Machine utilized in one-of-a-kind
was divided into two phases in which 70% of data was used classification problems together with bioinformatics, as an
for training of the machine and the remaining 30% was effective machine learning method [24]. Furthermore, this
used for the testing purpose, after seeing the result it was article represents a machine learning model to predict heart
confirmed that the regression algorithm is maximum as disease.
compared to other models. In 2020 MafizurRehman [20] IV. METHODOLOGY
used the Random Forest algorithm for the prediction of
The strategy utilized in this article for the prediction of
heart disease more effectively and the accuracy was about coronary heart disease is SVM. In the medical field, it is
97%. currently a very active research area and in the future, it will
be widely used in the biomedical system. SVM is a set of
In recent years, for the improvement in the process of supervised learning, works on small as well as complex
prediction of heart diseases, the various system has been datasets. The benefits of support vector machines are that it
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
is effective in excessive dimensional spaces. Mainly due to Patient ST value
Old peak Numeric value (measured in
this the answer isn't restricted to linearity. It is perfect for depression)
figuring out sicknesses and the usage of community scans.
Since no unique set of rules is needed on how the ailment is ST_Slop Up, Flat, down Slope of peak exercise
diagnosed [25]. The Federated Learning approach is Heart Disease 0 or 1 Heart disease or not
differing from traditional machine learning techniques it's
miles a rising approach that could be very useful in price
saving and security. In this approach, all datasets submit to
a single server [26]. It involves the training of data through
several decentralized edges and servers that carry local data C. Proposed Model
samples without sharing them. Proposed Model that is applied in this article is using the
A. Preprocessing method of SVM to get better accuracy. As compared with
the last implementations and studies, the models considered
Data preprocessing is the step one which initiates the for implementation in this article give a better-optimized
technique. A suitable dataset from Kaggle was taken for this result.
process. In ML it refers to the cleaning and transformation
of raw data which is not understandable into readable data
and makes it suitable for training machine learning models.
Data cleansing, data transformation, and data reduction take
place in the technique of data preprocessing. Data cleaning
is the process of filling missing values, smoothing noisy
data, and removal of outlier’s data transformation includes
normalization and aggregation while the data reduction
process reduced the amount of data, but the result remains
the same. The execution of preprocessing necessitates the
use of several Kaggle datasets. During this process, data is
erased, and lost values and dots are removed to prevent any
inefficiencies and to get higher accuracy. The accuracy
depends on given dataset.
Fig. 1 Proposed Model for Heart Disease Prediction
B. Dataset Description
A dataset selected from different datasets that are taken In this Model as shown in figure1, by applying first the
from Kaggle has 919 rows and 12 columns which preprocessing on data set and label it with different form
incorporate age, sex, chest ache type, resting BP, then applying the SVM trained model on it for better
cholesterol, fasting BS, Resting ECG, max HR, exercising accuracy and then analysis these results with using
angina, vintage peak, ST_slop, coronary heart disorder. The Confusion Matrix for verifying. In this Model, by applying
coronary heart disorder column has values “1” which first the preprocessing on given data set and label it with a
suggests the affected person has a coronary heart disorder or different form then applying the SVM trained model on it
“0” which suggests the affected person does now no longer for better accuracy, and then analyzing these results using
have a coronary heart disorder. The data taken from dataset Confusion Matrix for verifying.
remains imbalanced, so preprocessing carried out on it.
Table 2 below describe the dataset. D. Implementation Detail
All Results were carried out on a computer with a Core i6
processor, 16 GB RAM, and Super GPU at 3.60 GHz.
TABLE II. Attributes & discerption of heart disease Data set MATLAB 2020a is used for this purpose and
Feature Name Type Detail implementation.
Sex M: male, F: female Sex of the patient
V. STIMULATIONS AND RESULTS
Shows age of the
Age Integer
patient The method and implementation implemented here is
SVM. Support Vector Machines (SVMs) are a set of
Patient having which supervised learning methods for classification, regression,
Chesting pain type TA, ATA, NAP, ASY
type of pain
and outsourced discovery. The advantages of support vector
Patient blood pressure machines are valid in high-dimensional spaces. It still works
Resting BP Integer
level
when the number of dimensions exceeds the number of
Cholesterol Integer
Patient cholesterol samples. Figure 2 and Table 3 below show the results of
level
applying SVM on dataset.
Patient fasting blood
Fasting BS 0 or 1
sugar level
Patient Resting
Resting ECG Either normal or ST
electrodiogram result
Patient’s maximum
Max. HR Integer
heart rate
Exercise angina Y: yes, or N: no Patient induced angina
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
used to evaluate the performance of a classification model,
where N is the number of target groups. This matrix
compares the actual target value with the predictions made
by the machine learning model. This gives a holistic view
of how well rating model is performing and what errors it
causes. Table 4 shows the results of Confusion Matrix.
(𝑇𝑃+𝑇𝑁) 𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = (1)
𝑃+𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃
𝑃𝑃𝑉 = (2)
𝑇𝑃+𝐹𝑃
Fig.2 Scotter Plot Diagram of SVM Trained Model
𝑇𝑃
𝑇𝑁𝑅 = (3)
𝑇𝑃+𝐹𝑃
𝑇𝑃
TABLE III. Accuracy obtained using SVM 𝑇𝑃𝑅 = (4)
𝑇𝑃+𝐹𝑁
𝑇𝑃∗𝑇𝑁−𝐹𝑃∗𝐹𝑁
TRAINING TRAINING TEST
𝑀𝐶𝐶 = (5)
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)
TEST TIME
TIME ACCURACY ACCURACY 𝑃𝑃𝑉×𝑇𝑃𝑅
𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 × (6)
𝑃𝑃𝑉×𝑇𝑃𝑅
30.533 SEC 90.5% 6.932 SEC 78.7%
Accuracy 90.47%
F1 Score 88.81%
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION [17] Gavhane, A., Kokkula, G., Pandya, I. and Devadkar, K., 2018,
March. Prediction of heart disease using machine learning. In 2018
Many people suffered heart damage as a result of second international conference on electronics, communication and
major coronavirus epidemic, according to experiments. As aerospace technology (ICECA) (pp. 1275-1278). IEEE.
a result, research is warranted to develop a suitable [18] Rani, P., Kumar, R., Ahmed, N. & Jain, A. (2021) A decision
support system for heart disease prediction based upon machine
diagnostic method that focuses on the incidence of heart learning. Journal of Reliable Intelligent Environments 7, 263-275.
failure and can detect it early enough to prevent death. It [19] Diwakar, M., Tripathi, A., Joshi, K., Memoria, M., Singh, P. &
assists patients in diagnosing heart illness regarding kumar, N. (2021) Latest trends on heart disease prediction using
machine learning and image fusion. Materials Today: Proceedings
medical information from past heart disease diagnoses. The 37, 3213-3218.
SVM approach was used to build this model. The model [20] Pavithra M., M. (2022) Effective Heart Disease Prediction Systems
has a 90.47 percent accuracy. Using additional training data Using Data Mining Techniques. Annalsofrscb.ro.
Https://www.annalsofrscb.ro/index.php/journal/article/view/2172
raises the risk of the model correctly detecting cardiac [accessed 2 January 2022].
illness. To simplify data and compare outcomes, several [21] Ali, F., El-Sappagh, S., Islam, S., Kwak, D., Ali, A., Imran, M. &
methods might be performed. Additional techniques to link Kwak, K. (2020) A smart healthcare monitoring system for heart
trained ML and DL cardiac models with specific disease prediction based on ensemble deep learning and feature
fusion. Information Fusion 63, 208-222.
multimedia can be found for the convenience of patients [22] Https://www.researchgate.net/profile/Anbarasi
and clinicians. Masilamani/publication/50361284_Enhanced_Prediction_of_Heart
_Disease_with_Feature_Subset_Selection_using_Genetic_Algorith
REFERENCES m/links/54accada0cf2479c2ee853b1/Enhanced-Prediction-of-
Heart-Disease-with-Feature-Subset-Selection-using-Genetic-
[1] Heart and Circulatory System (for teens) - Nemours. Kidshealth.org. Algorithm.pdf [accessed 6 January 2022].
Https://kidshealth.org/Nemours/en/teens/heart.html [accessed 2
December 2021]. [23] Anon. (2022). Ijcstjournal.org. Http://www.ijcstjournal.org/volume-
[2] Sahoo, P. & Jeripothula, P. (2020) Heart Failure Prediction Using [24] 5/issue-2/IJCST-V5I2P13.pdf [accessed 6 January 2022].
Machine Learning Techniques. SSRN Electronic Journal. [25] Noble, W.S., 2006. What is a support vector machine? Nature
[3] Javeed, A., Rizvi, S., Zhou, S., Riaz, R., Khan, S. & Kwon, S. (2020) Biotechnology, 24(12), pp.1565-1567.
Heart Risk Failure Prediction Using a Novel Feature Selection [26] Matveeva, N. (2021) ARTIFICIAL NEURAL NETWORKS IN
Method for Feature Refinement and Neural Network for MEDICAL DIAGNOSIS. System technologies 2, 33-41.
Classification. Mobile Information Systems 2020, 1-11. [27] Anon. (2022) Analysis of Neural Networks Based Heart Disease
[4] Chicco, D. & Jurman, G. (2020) Machine learning can predict Prediction System. Ieeexplore.ieee.org.
survival of patients with heart failure from serum creatinine and Https://ieeexplore.ieee.org/abstract/document/8431153/ [accessed 9
ejection fraction alone. BMC Medical Informatics and Decision January 2022].
[5] Anon. (2021).
Https://www.healthywa.wa.gov.au/Articles/A_E/Common-
medical-tests-to-diagnose-heart-conditions [accessed 2 December
2021].
[6] Shu, T., Zhang, B. & Tang, Y. (2017) Effective Heart Disease
Detection Based on Quantitative Computerized Traditional Chinese
Medicine Using Representation Based Classifiers. Evidence-Based
Complementary and Alternative Medicine 2017, 1-10.
[7] Anon. (2021) The top 10 causes of death. Who.int.
Https://www.who.int/news-room/fact-sheets/detail/the-top-10-
causes-of-death [accessed 2 December 2021].
[8] Risks.
Stanfordhealthcare.org.Https://stanfordhealthcare.org/medical
tests/e/ekg/risks.html [accessed 2 December 2021].
[9] Coronary angiogram - Mayo Clinic. Mayoclinic.org.
Https://www.mayoclinic.org/tests-procedures/coronary-
angiogram/about/pac-20384904 [accessed 2 December 2021].
[10] MachinelearningWikipedia.
En.wikipedia.org.Https://en.wikipedia.org/wiki/Machine_learning
[accessed 2 December 2021].
[11] Data Preprocessing in Data Mining - GeeksforGeeks.
GeeksforGeeks. Https://www.geeksforgeeks.org/data-
preprocessing-in-data-mining/ [accessed 2 December 2021].
[12] Goel, R. (2021) Heart Disease Prediction Using Various Algorithms
of Machine Learning. SSRN Electronic Journal.
[13] Anon. (2022) WCECS2014 pp809-. Academia.edu.
Https://www.academia.edu/35720965/WCECS2014_pp809_
[accessed 1 January 2022].
[14] Latah, C. & Jeeva, S. (2019) Improving the accuracy of prediction
of heart disease risk based on ensemble classification techniques.
Informatics in Medicine Unlocked 16, 100203.
[15] Nanekar, G. (2021) Heart Disease Prediction using Neural Network.
International Journal for Research in Applied Science and
Engineering Technology 9, 1907-1910.
[16] Anon. (2022) Improving Heart Disease Prediction Using Feature
Selection Approaches. Ieeexplore.ieee.org.
Https://ieeexplore.ieee.org/abstract/document/8667106/ [accessed 1
January 2022].
Authorized licensed use limited to: Gachon University. Downloaded on January 06,2023 at 06:32:00 UTC from IEEE Xplore. Restrictions apply.
View publication stats