e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
HEART DISEASE PREDICTION USING MACHINE LEARNING ALGORITHMS
Mukesh Raj*1, Om Katiyar*2, Manan Agarwal*3, Manvendra Pathya*4
*1Professor, Department of Computer Science and Engineering, JSS Academy of Technical Education, Noida,
Uttar Pradesh, India.
*2,3,4Student, Department of Computer Science and Engineering, JSS Academy of Technical Education, Noida,
Uttar Pradesh, India.
DOI : https://www.doi.org/10.56726/IRJMETS56192
ABSTRACT
The increasing incidence of cardiovascular disease is a significant problem and highlights the importance of
disease prediction. This diagnostic work is difficult and must be performed accurately and efficiently. Data
science focuses on predicting which patients will be at risk of heart disease based on various medical conditions.
A heart disease screening tool is designed to use a patient's medical history to determine whether a patient is
likely to have heart disease. Different machine learning algorithms such as logistic regression and KNN are used
for prediction and classification. The methods used are aimed at improving prediction accuracy and the models
have been shown to be effective. Both KNN and logistic regression show good accuracy compared to other
classification methods such as Naive Bayes. These standards reduce the burden of diagnosing heart disease,
improve health, and reduce costs. The program provides information on predicting heart disease in patients and
is used as a .pynb format.
Keywords: Heart disease, machine learning, k-nodes, classification, multilayer perceptron, model evaluation.
I. INTRODUCTION
"Machine Learning involves the manipulation and extraction of implicit, previously unknown, known, and
potentially useful information from data" [1]. It is a vast and diverse field, with increasing scope and
implementation. Machine learning encompasses various classifiers of Supervised, Unsupervised, and Ensemble
Learning, which are used to predict and assess the accuracy of datasets. This knowledge can be applied to projects
like the Heart Disease Prediction System (HDPS), benefiting many people.
Cardiovascular diseases are prevalent today, describing a range of conditions that affect the heart. The World
Health Organization estimates 17.9 million global deaths from Cardiovascular diseases (CVDs) [2], making it the
leading cause of death in adults. Our project aims to predict individuals likely to be diagnosed with a heart disease
based on their medical history [6]. It identifies symptoms such as chest pain or high blood pressure and aids in
diagnosing diseases with fewer medical tests and effective treatments for better outcomes.
This project mainly focuses on three data mining techniques: Logistic regression, KNN, and Random Forest
Classifier. Our project achieved an accuracy of 87.5%, better than previous systems using only one data mining
technique. Employing multiple data mining techniques increased the accuracy and efficiency of HDPS. Logistic
regression is a supervised learning method that deals with discrete values.
The objective of this project is to determine if a patient is likely to be diagnosed with any cardiovascular diseases
based on attributes such as gender, age, chest pain, fasting sugar level, etc. We selected a dataset from the UCI
repository containing patients' medical histories and attributes. Using this dataset, we predict whether a patient
may have a heart disease. We utilize 14 medical attributes to classify patients' likelihood of having a heart disease
using Logistic regression, KNN, and Random Forest Classifier algorithms. KNN, the most efficient algorithm,
achieved an accuracy of 88.52%. This method is cost-efficient and helps classify patients at risk of heart diseases.
RELATED WORK
The extensive work done on diagnosing Cardiovascular Heart disease using Machine Learning algorithms has
been a significant motivator for this research. This paper incorporates a comprehensive literature survey that
delves into various methods of efficient Cardiovascular disease prediction. Several algorithms, including Logistic
Regression, KNN, and Random Forest Classifier, have been utilized, each showcasing its strengths in achieving
defined objectives [7].
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5800]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
Incorporating the Intelligent Heart Disease Prediction System (IHDPS), this model has demonstrated the ability
to calculate decision boundaries using both traditional machine learning and deep learning models. It emphasizes
crucial factors such as family history connected with heart disease. However, the accuracy obtained by the IHDPS
model was comparatively lower than newer models, such as the detection of coronary heart disease using
artificial neural networks and other advanced machine and deep learning algorithms.
McPherson et al. [8] identified the risk factors of coronary heart disease or atherosclerosis using an inbuilt
implementation algorithm employing techniques of Neural Networks. Their model accurately predicted whether
a test patient is suffering from the given disease or not.
Furthermore, R. Subramanian et al. [24] introduced a method for the diagnosis and prediction of Heart Disease
and Blood Pressure, along with other attributes, using neural networks. They built a deep Neural Network
incorporating disease-related attributes, which produced an output carried out by the output perceptron,
including approximately 120 hidden layers. This extensive architecture ensures accurate results in identifying
heart disease when applied to the Test Dataset.
A supervised network has been recommended for the diagnosis of heart diseases [16]. In testing the model with
unfamiliar data, a doctor utilized the model trained on previously learned data to predict results, thereby
calculating the accuracy of the given model.
Expanding upon these ideas, it's crucial to understand the role of Machine Learning algorithms in healthcare.
Cardiovascular diseases are among the leading causes of death globally, making accurate prediction and
diagnosis imperative. Machine Learning offers a promising approach due to its ability to analyze vast amounts of
data and identify patterns that humans might miss.
Logistic Regression, one of the foundational algorithms in Machine Learning, is widely used in healthcare for
binary classification tasks. Its ability to model the probability of a certain outcome based on input variables
makes it suitable for predicting the likelihood of cardiovascular diseases based on patient attributes.
K-Nearest Neighbors (KNN) is another algorithm commonly used in healthcare. It works by finding the most
similar instances in the dataset to the given data point and making predictions based on the majority class among
its neighbors. KNN is particularly useful when dealing with non-linear relationships between features and
outcomes.
Random Forest Classifier, an ensemble learning method, combines multiple decision trees to improve prediction
accuracy. It is robust to overfitting and works well with high-dimensional data, making it suitable for complex
healthcare datasets.
The Intelligent Heart Disease Prediction System (IHDPS) leverages these algorithms to predict the likelihood of
heart diseases based on a patient's medical history. By incorporating family history and other relevant attributes,
the model aims to provide accurate predictions, aiding in early detection and prevention of heart diseases.
However, as technology advances, newer models like artificial neural networks (ANNs) are gaining traction in
healthcare. ANNs are inspired by the structure and function of the human brain, consisting of interconnected
nodes organized in layers. Deep Neural Networks (DNNs), a type of ANN with multiple hidden layers, have shown
promising results in various healthcare applications, including the diagnosis of heart diseases.
McPherson et al.'s work on identifying risk factors for coronary heart disease using neural networks highlights
the effectiveness of these advanced models. By training neural networks on large datasets, they were able to
accurately predict the presence of coronary heart disease, helping clinicians make informed decisions.
Similarly, R. Subramanian et al.'s deep Neural Network approach demonstrates the potential of DNNs in
diagnosing heart diseases. With its extensive architecture and incorporation of relevant attributes, the model
achieves high accuracy in identifying patients at risk of heart diseases.
In conclusion, the integration of Machine Learning algorithms in healthcare, particularly for predicting and
diagnosing cardiovascular diseases, holds immense potential. From traditional methods like Logistic Regression
to advanced techniques like deep Neural Networks, these algorithms offer valuable insights into patients' health
conditions, aiding in early detection and personalized treatment plans. As research in this field progresses, we
can expect even more accurate and efficient models, ultimately improving patient outcomes and reducing
healthcare costs.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5801]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
II. METHODOLOGY
This paper presents an analysis of several machine learning algorithms, including K nearest neighbors (KNN),
Logistic Regression, and Random Forest Classifiers, which can aid practitioners or medical analysts in accurately
diagnosing Heart Disease. The methodology outlined in this paper provides a framework for the proposed model
[13]. This methodology involves a series of steps that transform the given data into recognizable patterns for
user understanding. In the proposed methodology (Figure 1), the first step involves data collection, followed by
the extraction of significant values in the second stage. The third stage involves preprocessing, where data
exploration, handling missing values, data cleaning, and normalization, depending on the algorithms used, are
performed [15]. After preprocessing, the data is classified using classifiers such as KNN, Logistic Regression, and
Random Forest Classifier. Finally, the proposed model is evaluated based on accuracy and performance using
various performance metrics. This model, known as the Effective Heart Disease Prediction System (EHDPS),
employs 13 medical parameters such as chest pain, fasting sugar, blood pressure, cholesterol, age, and sex for
prediction[17].
Figure 1 : Flow Chart of the methodology used
III. DATA SOURCE
A meticulously organized dataset of individuals has been selected, focusing on their history of heart problems
and other medical conditions [2]. Heart diseases encompass a range of conditions affecting the heart. According
to the World Health Organization (WHO), cardiovascular diseases are responsible for the highest number of
deaths in middle-aged people. For our analysis, we utilized a dataset comprising the medical histories of 304
patients across different age groups. This dataset provides essential information, including medical attributes
such as age, resting blood pressure, and fasting sugar level, aiding in the detection of patients diagnosed with
heart diseases.
The dataset comprises 13 medical attributes of 304 patients, assisting in identifying patients at risk of heart
diseases. It enables the classification of patients into those at risk and those not at risk of heart diseases. This
Heart Disease dataset was obtained from the UCI repository. Analysis of this dataset reveals patterns indicative
of patients prone to heart diseases.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5802]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
The dataset is divided into two parts: Training and Testing. It contains 303 rows and 14 columns, with each row
representing a single record. All attributes are listed in Table 1.
IV. RESULTS AND DISCUSSION
These results indicate that while many researchers employ various algorithms such as SVC and Decision Trees
to detect patients diagnosed with heart disease, KNN, Random Forest Classifier, and Logistic Regression yield
superior results [23]. The algorithms we utilized are not only more accurate but also more cost-efficient and
faster than those used by previous researchers. Furthermore, maximum accuracy obtained by KNN and Logistic
Regression, which is 88.5%, is either greater or nearly equal to the accuracies obtained in previous studies.
Therefore, we conclude that our improved accuracy is attributed to the increased number of medical attributes
used from the dataset we collected.
Our project also reveals that Logistic Regression and KNN outperform Random Forest Classifier in predicting
patients diagnosed with heart disease. This suggests that KNN and Logistic Regression are more effective in
diagnosing heart diseases. The following figures (2, 3, 4, and 5) illustrate the distribution and prediction of
patients based on age group, resting blood pressure, sex, and chest pain.
Figure 2. Shows the Risk of Heart Attack on the basis of their age.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5803]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
Figure 3. Shows the Risk of Heart Attack on the basis of their Resting blood pressure
Figure 4. Shows the patients having or not having Heart Disease on the basis of Sex.
Figure 5. Shows the patients having or not having Heart Disease on the basis of type of Chest Pain
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5804]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
V. CONCLUSION
A cardiovascular disease detection model has been developed utilizing three machine learning classification
techniques. This project aims to predict individuals with cardiovascular disease by analyzing their medical
history, including factors such as chest pain, sugar levels, and blood pressure. This detection system is
particularly useful for patients with a history of heart disease. The algorithms employed in building this model
are Logistic Regression, Random Forest Classifier, and KNN, resulting in an accuracy of 87.5% [22]. Increasing
the training data enhances the model's ability to accurately predict whether a person has a heart disease or not
[9]. By utilizing these computer-aided techniques, we can predict patients' conditions faster and more accurately,
leading to significant cost reductions. Machine learning techniques outperform humans in prediction, making
them valuable for both patients and doctors.
This project marks a significant advancement in the field of cardiovascular disease detection, offering a reliable
and efficient method for identifying individuals at risk. With the rising prevalence of heart diseases globally, such
predictive models are crucial for early intervention and prevention. Cardiovascular diseases encompass various
conditions affecting the heart and blood vessels, with heart attacks and strokes being the most common
manifestations. According to the World Health Organization (WHO), cardiovascular diseases are responsible for
the highest number of deaths worldwide, making early detection and management paramount.
The dataset used in this project comprises the medical records of 304 patients, providing valuable insights into
the factors contributing to heart diseases. This dataset includes 13 medical attributes, such as age, resting blood
pressure, and fasting sugar level, enabling accurate prediction of heart disease risk. The data was obtained from
the UCI repository, a reputable source for machine learning datasets.
The methodology employed in this project follows a systematic approach to data analysis and model
development. Figure 1 illustrates the steps involved, beginning with data collection and extraction of significant
values, followed by preprocessing to handle missing data and normalize the features. The preprocessed data is
then fed into classifiers such as Logistic Regression, Random Forest Classifier, and KNN for training and testing.
Logistic Regression is a widely used classification algorithm that models the probability of a binary outcome
based on input features. It is particularly suitable for predicting the likelihood of cardiovascular disease based
on patient attributes. Random Forest Classifier, on the other hand, is an ensemble learning method that combines
multiple decision trees to improve prediction accuracy. It is robust to overfitting and works well with high-
dimensional data, making it suitable for complex datasets like medical records. KNN, or K nearest neighbors, is a
non-parametric method that classifies data points based on the majority class of their nearest neighbors. It is
effective for pattern recognition and works well with small to medium-sized datasets.
The results of the model evaluation demonstrate its effectiveness in predicting heart disease with an accuracy of
87.5%. This accuracy rate is a significant improvement over previous models, indicating the efficacy of the
employed algorithms. Furthermore, the analysis reveals that KNN outperforms the other algorithms with an
accuracy of 88.52%, making it the preferred choice for heart disease prediction in this context.
It is important to note that the accuracy of the model can be further improved by incorporating more training
data and fine-tuning the algorithms. By continuously updating the dataset with new patient records and refining
the model, we can enhance its predictive capabilities and ensure its reliability in clinical settings.
The significance of this project lies in its potential to revolutionize the way cardiovascular diseases are diagnosed
and managed. By leveraging machine learning techniques, we can identify individuals at risk of heart disease
earlier, allowing for timely intervention and personalized treatment plans. This not only improves patient
outcomes but also reduces healthcare costs associated with treating advanced stages of the disease.
Moreover, the development of accurate predictive models for cardiovascular diseases opens up new avenues for
research and innovation in healthcare. With access to large-scale medical databases and advanced machine
learning algorithms, we can gain deeper insights into the underlying mechanisms of heart diseases and develop
targeted interventions to prevent them.
In conclusion, the development of a cardiovascular disease detection model using machine learning algorithms
represents a significant milestone in healthcare. By leveraging the power of data analytics and predictive
modeling, we can improve the early detection and management of heart diseases, ultimately saving lives and
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5805]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
reducing healthcare burdens. This project underscores the potential of machine learning in transforming
healthcare delivery and highlights the importance of interdisciplinary collaboration in addressing complex
medical challenges.
Figure 6. Shows the total number of patients having or not having Heart Disease.
VI. REFERENCES
[1] Soni J, Ansari U, Sharma D & Soni S (2011). Predictive data mining for medical diagnosis: an overview of
heart disease prediction. International Journal of Computer Applications, 17(8), 43-8
[2] Dangare C S & Apte S S (2012). Improved study of heart disease prediction system using data
mining classification techniques. International Journal of Computer Applications, 47(10), 44-8.
[3] Ordonez C (2006). Association rule discovery with the train and test approach for heart disease
prediction. IEEE Transactions on Information Technology in Biomedicine, 10(2), 334-43.
[4] Shinde R, Arjun S, Patil P & Waghmare J (2015). An intelligent heart disease prediction system
using k-means clustering and Naïve Bayes algorithm. International Journal of Computer Science
and Information Technologies, 6(1), 637-9.
[5] Bashir S, Qamar U & Javed M Y (2014, November). An ensemble-based decision support framework
for intelligent heart disease diagnosis. In International Conference on Information Society (i-
Society 2014) (pp. 259-64). IEEE.
[6] Jee S H, Jang Y, Oh D J, Oh B H, Lee S H, Park S W & Yun Y D (2014). A coronary heart disease prediction
model: the Korean Heart Study. BMJ open, 4(5), e005025.
[7] Ganna A, Magnusson P K, Pedersen N L, de Faire U, Reilly M, Ärnlöv J & Ingelsson E (2013).
[8] Jabbar M A, Deekshatulu B L & Chandra P (2013, March). Heart disease prediction using lazy associative
classification. In 2013 International Mutli-Conference on Automation, Computing,Communication,
Control and Compressed Sensing (iMac4s) (pp. 40- 6). IEEE.
[9] Dangare Chaitrali S and Sulabha S Apte. "Improved study of heart disease prediction system
using data mining classification techniques." International Journal of Computer Applications
47.10 (2012): 44-8.
[10] Soni Jyoti. "Predictive data mining for medical diagnosis: An overview of heart disease prediction."
International Journal of Computer Applications 17.8 (2011): 43-8. [11] Chen A H, Huang S Y, Hong P S,
Cheng C H & Lin E J (2011, September). HDPS: Heart disease prediction system. In 2011 Computing
in Cardiology (pp. 557-60). IEEE.
[11] Parthiban, Latha and R Subramanian. "Intelligent heart disease prediction system using CANFIS
and genetic algorithm." International Journal of Biological, Biomedical and Medical Sciences 3.3
(2008).
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5806]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:05/May-2024 Impact Factor- 7.868 www.irjmets.com
[12] Wolgast G, Ehrenborg C, Israelsson A, Helander J, Johansson E & Manefjord H (2016). Wireless body area
network for heart attack detection [Education Corner]. IEEE antennas and propagation magazine, 58(5),
84-92.
[13] Patel S & Chauhan Y (2014). Heart attack detection and medical attention using motion sensing
device -kinect. International Journal of Scientific and Research Publications, 4(1), 1-4.
[14] Zhang Y, Fogoros R, Thompson J, Kenknight B H, Pederson M J, Patangay A & Mazar S T (2011). U.S.
Patent No. 8,014,863. Washington, DC: U.S. Patent and Trademark Office.
[15] Raihan M, Mondal S, More A, Sagor M O F, Sikder G, Majumder M A & Ghosh K (2016, December).
Smartphone based ischemic heart disease (heart attack) risk prediction using clinical data and data
mining approaches, a prototype design. In 2016 19th International Conference on Computer and
Information Technology (ICCIT) (pp. 299-303). IEEE.
[16] Buechler K F & McPherson P H (1999). U.S. Patent No. 5,947,124. Washington, DC: U.S. Patent
and Trademark Office.
[17] Takci H (2018). Improvement of heart attack prediction by the selection methods. Turkish Journal of
Electrical Engineering & Computer Sciences, 26(1), 1-10.
[18] Worthen W J, Evans S M, Winter S C & Balding D (2002). U.S. Patent No. 6,432 124. Washington, DC: U.S.
Patent and Trademark Office.
[19] Acharya U R, Fujita H, Oh S L, Hagiwara Y, Tan J H & Adam M (2017). Application of deep convolutional
neural network for automated detection of myocardial infarction using ECG signals. Information Sciences,
415, 190-8.
[20] Brown N, Young T, Gray D, Skene A M & Hampton J R (1997). Inpatient deaths from acute myocardial
infarction, 1982-92: analysis of data in the Nottingham heart attack register. BMJ, 315(7101), 159-64.
[21] Piller L B, Davis B R, Cutler J A, Cushman W C, Wright J T, Williamson J D & Haywood
[22] L J (2002). Validation of heart failure events in the Antihypertensive and Lowering Treatment to Prevent
Heart Attack Trial (ALLHAT) participants to doxazosin and chlorthalidone. Current controlled trials in
cardiovascular medicine, 3(1), 10.
[23] Folsom A R, Prineas R J, Kaye S A & Soler J T (1989). Body fat distribution and self-reported prevalence
of hypertension, heart attack, and other heart disease in older women. International journal of
epidemiology, 18(2), 361-7.
[24] Kiyasu J Y (1982). U.S. Patent No. 4,338,396. Washington, DC: U.S. Patent and Trademark Office.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[5807]