Prediction of Heart Disease Using Machine Learning Techniques

This document describes a study that aims to predict heart disease using machine learning techniques. The researchers developed models using algorithms like KNN, Random Forest and Gradient Boosting Classifier to classify whether patients have heart disease based on attributes from a publicly available heart disease dataset. They found that Random Forest and Gradient Boosting Classifier achieved the highest accuracy of 74% according to previous research. The proposed system uses a dataset of patient details, preprocesses the data, applies machine learning algorithms and measures the accuracy to predict heart disease.

Uploaded by

rahul suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views4 pages

Prediction of Heart Disease Using Machine Learning Techniques

Uploaded by

rahul suryawanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:06/June-2022 Impact Factor- 6.752 www.irjmets.com
PREDICTION OF HEART DISEASE USING MACHINE
LEARNING TECHNIQUES
Prasad Thombare*1, Madhvaj Ghalme*2, Saurabh Raut*3,
Narendra Dhakne*4, Ms. Poonam R. Dholi*5
*1,2,3,4Student, Department Of Computer Science And Engineering, Matoshri College Of Engineering
And Research Centre, Nashik, Maharashtra, India.
*5Project Guide, Department Of Computer Science And Engineering, Matoshri College Of Engineering
And Research Centre, Nashik, Maharashtra, India.
ABSTRACT
Cardiovascular diseases are the most common cause of death worldwide over the last few decades in the
developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and
continuous supervision of clinicians can reduce the mortality rate. However, it is not possible to monitor
patients every day in all cases accurately and consultation of a patient for 24 hours by a doctor is not available
since it requires more sapience, time and expertise. In this project, we have developed and researched about
models for heart disease prediction through the various heart attributes of patient and detect impending heart
disease using Machine learning techniques like backward elimination algorithm, KNN and REFCV on the dataset
available publicly in Kaggle Website, further evaluating the results using confusion matrix and cross validation.
The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high-risk
patients and in turn reduce the complications, which can be a great milestone in the field of medicine.
Keywords: Machine Learning, Heart Disease, Random Forest Classifier, Gradient Boost Classifier, K-Nearest
Neighbors, Dataset.
I. INTRODUCTION
According to the World Health Organization, every year 12 million deaths occur worldwide due to heart
disease. The load of cardiovascular disease is rapidly increasing all over the world from the past few years.
Many researches have been conducted in attempt to pinpoint the most influential factors of heart disease as
well as accurately predict the overall risk. Heart Disease is even highlighted as a silent killer which leads to the
death of the person without obvious symptoms. The early diagnosis of heart disease plays a vital role in making
decisions on lifestyle changes in high-risk patients and in turn reduces the complications. This project aims to
predict future heart disease by analyzing data of patients which classifies whether they have heart disease or
not using machine-learning algorithms.
II. LITERATURE SURVEY
With growing development in the field of medical science alongside machine learning various experiments and
researches have been carried out in recent years releasing the relevant significant papers. The paper [1]
proposes heart disease prediction using KStar, J48, SMO, and Bayes Net and Multilayer perceptron using WEKA
software. Based on performance from different factors SMO (89KStar, Multilayer perceptron and J48
techniques using k-fold cross validation. The accuracy performance achieved by those algorithms is still not
satisfactory. So that if the performance of accuracy is improved more to give batter decision to diagnosis
disease. [2]In a research conducted using Cleveland dataset for heart diseases which contains 303 instances
and used 10- fold Cross Validation, considering 13 attributes, implementing 3 different algorithms, they
concluded Gradient Boost and Random Forest gave the maximum accuracy of 74.0 percent. [3]Using the similar
dataset of Framingham, Massachusetts, the experiments were carried out using 4 models and were trained and
tested with maximum accuracy K Neighbors Classifier: 87 Classifier: 84.
III. METHODOLOGY
The proposed system architecture will give an overview of the working of the system. Working of the system
starts with the Gathering of data and selecting the important attributes that will give efficient accuracy. After
that the required data is preprocessed into the required format that is suitable for machine learning algorithms.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1099]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:06/June-2022 Impact Factor- 6.752 www.irjmets.com
After preprocessing of data, the data is then divided into two parts that are: training data and testing data. The
algorithms are applied and the algorithm is trained using the training data. The accuracy is obtained by testing
the system using the testing data. This system is implemented using the following modules.
1.) Dataset Collection
2.) Attributes Selection
3.) Data Pre-Processing
4.) Data Balancing
5.) Disease Prediction
Collection of Patient
Details

Dataset

Data Processing

Machine Learning
Algorithms

Gradient Boosting Random Forest

KNN Classifier
Classifier Classifier

Prediction of Heart
Disease

Measure of
Accuracy

Figure 1: Flowchart of Proposed System.

1) Dataset Collection:
First, we collected a dataset for the prediction of heart disease. After collecting the dataset, we divided the
dataset into training data and testing data. The training data of dataset is used for prediction model learning
and testing data is used for evaluating and analyzing the prediction model. For this project, 70% of training data
is used and 30% of data is used for testing. The dataset that we used for our project is Heart Disease UCI. Out of
76 attributes that are present related to heart which we select 14 attributes to use for the system.
2) Attributes Selection:
Attribute or Feature selection includes the selection of appropriate attributes that will give efficient accuracy
for the prediction system. This is used to increase the efficiency and accuracy of the system. Several parameters
of the patient like gender, chest pain type, fasting blood pressure, serum cholesterol, etc., are selected for the
prediction of our system.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1100]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:06/June-2022 Impact Factor- 6.752 www.irjmets.com
3) Data Pre-Processing:
Data pre-processing is an important step for the creation of a system while using machine learning models. At
first, data may not be clean or in the required format for the model which can cause misleading outcomes. In
pre-processing of data, we transform data into our required format that is compatible with model. It deals with
noises, duplicates, and missing values of the dataset. Data pre-processing has the activities like data reduction
and data transformation etc. Preprocessing of data is required for improving the accuracy of the model.
4) Data Balancing:
Data balancing is very important in system that dealt with large amount of data. So, we make sure our dataset is
balanced. In case it is imbalanced, imbalanced datasets can be balanced in two ways. They are Under Sampling
and Over Sampling 1. Under Sampling: In this sampling dataset balance is done by the reduction of the size of
the ample class. This process is done when the data is adequate. 2. Over Sampling: In this sampling, dataset
balance is done by increasing the size of the scarce samples. This process is done when the data is inadequate.
5) Disease Prediction:
Various machine learning models like KNN, Gradient Boosting and Random Forest are used for classification.
Comparative analysis is performed among algorithms and the algorithm with highest accuracy is used for heart
disease prediction in this proposed system.
IV. MACHINE LEARNING ALGORITHMS
Machine learning is a powerful technology that is a systematic study of various algorithms that provide the
system with the potential to replicate human learning activities without being actually programmed. The
machine learning models used in our project are:
1) Gradient Boosting Classifier:
Gradient boosting machines are a family of powerful machine-learning techniques that have shown
considerable success in a wide range of practical applications. They are highly customizable to the particular
needs of the application, like being learned with respect to different loss functions. In gradient boosting
machines, or simply, GBMs, the learning procedure consecutively fits new models to provide a more accurate
estimate of the response variable. The principal idea behind this algorithm is to construct the new base-
learners to be maximally correlated with the negative gradient of the loss function, associated with the whole
ensemble. The loss functions applied can be arbitrary, but to give a better intuition, if the error function is the
classic squarederror loss, the learning procedure would result in consecutive error-fitting. In general, the
choice of the loss function is up to the researcher, with both a rich variety of loss functions derived so far and
with the possibility of implementing one’s own task specific loss. This high flexibility makes the GBMs highly
customizable to any particular data driven task. It introduces a lot of freedom into the model design thus
making the choice of the most appropriate loss function a matter of trial and error. However, boosting
algorithms are relatively simple to implement, which allows one to experiment with different model designs.
2) Random Forest:
Random Forest classifier algorithm is one among the supervised learning technique in machine learning
algorithms. It is based on ensemble learning which is the method of combining various multiple classifiers to
resolve a complicated problem to improve the performing nature of the model. Random Forest improves the
prediction accuracy of the dataset which consists of several decision trees on various subsets of the given
dataset by taking the average. Random forest not relay on one decision tree. Rather than looking forward to a
one decision tree, the random forest acquires the prediction from each single tree, and supported the bulk of
votes for predictions, it predicts the ultimate output by taking average. The upper the trees, the upper the
accuracy. And also prevents the matter of over fitting. The ultimate output is taken based on using the bulk
voting classifier for a classification problem wiyhin the case of a regression problem the ultimate output is
relies on the mean of all the outputs.
3) K-Nearest Neighbor:
K-Nearest Neighbor algorithm is one among the simplest and easiest Machine Learning algorithms supported
Supervised Learning technique. The K-NN algorithm will think about the identical content between the new
data and already available data and place the new data into the category that’s most just like the available
categories. Based on similarity of data, KNN algorithm stores the data that is available and then classifies the
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[1101]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:06/June-2022 Impact Factor- 6.752 www.irjmets.com
data. This suggests when new data appears we are ready to easily classify it into a most suited category by
using K- NN algorithm. The K-NN algorithm can also be used for Regression in addition as for Classification. But
most of the days Classification problems are make use of it.  K-NN is additionally called as non-parametric
algorithm, which implies it doesn’t make any particular assumption on underlying data.  And it’s also called as
lazy learner algorithm because it doesn’t learn from the training set immediately. The KNN algorithm in the
training phase which is training data apart from testing data just when it gets the new data it just stores it, then
it classifies that data into a category that’s rather more just like new data.
V. RESULTS AND DISCUSSION
For Comparative study we are proposed to train 3 different models and test whether which model is given
higher accuracy against the same dataset.
Table 1. Algorithm Accuracy Table
Algorithms Accuracy
KNN 0.74
Random Forest 0.72
Gradient Boost 0.74
After observing the above results, we conclude that Gradient Boosting Algorithm and KNN has higher accuracy
than Random Forest algorithm.
VI. CONCLUSION
The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high-risk
patients and in turn reduce the complications, which can be a great milestone in the field of medicine. This
project aims to predict the disease on the basis of the symptoms. The project is designed in such a way that the
system takes symptoms from the user as input and produces output i.e. predict disease. Average prediction
accuracy probability of 74 Percent is obtained.
VII. REFERENCES
[1] A. H. M. S. U. Marjia Sultana [Analysis of Data Mining Techniques for Heart Disease Prediction 2018].
[2] M. I. K. A. I.S. Musfiq Ali [Heart Disease Prediction Using Machine Learning Algorithms].
[3] M. A. K. S. H. K. M. A. V. P. M Marimuthu, [A Review on Heart Disease Prediction using Machine Learning
and Data Analytics Approach].
[4] K. Bhanot, toward data science.com,
https://towardsdatascience.com/predicting-presence-of-heart-diseases-usingmachinelearning-
36f00f3edb2c.
[5] Senthil Kumar Mohan, Chandrasekar Thirumalai and Gautam Srivastva, “Effective Heart Disease
Prediction Using Hybrid Machine Learning Techniques” IEEE Access 2019.
[6] M. Nikhil Kumar, K. V. S. Koushik, K. Deepak, “Prediction of Heart Diseases Using Data Mining and
Machine Learning Algorithms and Tools” International Journal of Scientific Research in Computer
Science, Engineering and Information Technology, IJSRCSEIT 2019.
[7] M. A. Jabbar, P. Chandra, and B. L. Deekshatulu, “Prediction of risk score for heart disease using
associative classification and hybrid feature subset selection,” Int. Conf. Intell. Syst. Des. Appl. ISDA, pp.
628–634, 2012.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

[1102]

IJRPR31881
No ratings yet
IJRPR31881
9 pages
Heart Disease Prediction Using Feature Selection and Ensemble Learning Techniques
No ratings yet
Heart Disease Prediction Using Feature Selection and Ensemble Learning Techniques
5 pages
No 7
No ratings yet
No 7
9 pages
Heart Disease
No ratings yet
Heart Disease
6 pages
Heart Disease Identification Using Machine Learning Classification
100% (2)
Heart Disease Identification Using Machine Learning Classification
11 pages
Processes 11 01210
No ratings yet
Processes 11 01210
31 pages
2022 Research
No ratings yet
2022 Research
19 pages
Applsci 11 08352 v2
No ratings yet
Applsci 11 08352 v2
22 pages
Research Paper
No ratings yet
Research Paper
7 pages
Comparative Study of Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Comparative Study of Heart Disease Prediction Using Machine Learning Algorithms
6 pages
JETIR2006301
No ratings yet
JETIR2006301
5 pages
Research Paper - IRJMETS60500110643
No ratings yet
Research Paper - IRJMETS60500110643
8 pages
Heart Disease Prediction System Report
No ratings yet
Heart Disease Prediction System Report
31 pages
Review 1
No ratings yet
Review 1
18 pages
Final Year Project Report
No ratings yet
Final Year Project Report
8 pages
Karthik Ai Project Report
No ratings yet
Karthik Ai Project Report
29 pages
Paper 7588
No ratings yet
Paper 7588
3 pages
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
No ratings yet
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
7 pages
JETIR2008396
No ratings yet
JETIR2008396
6 pages
Prediction of Heart Disease Using Machine Learning
No ratings yet
Prediction of Heart Disease Using Machine Learning
5 pages
Crop Prediction System Final Report
No ratings yet
Crop Prediction System Final Report
46 pages
Heart Disease
No ratings yet
Heart Disease
19 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
BP-5 (Model, Algo Info)
No ratings yet
BP-5 (Model, Algo Info)
8 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
70 pages
Heart Disease Prediction Using Machine L
No ratings yet
Heart Disease Prediction Using Machine L
7 pages
2023-Heart Disease Prediction Using Machine Learning
No ratings yet
2023-Heart Disease Prediction Using Machine Learning
11 pages
Symptoms Diagnosis Using Machine Learning Model Random Forest
No ratings yet
Symptoms Diagnosis Using Machine Learning Model Random Forest
7 pages
(IJCST-V13I2P2) :seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
No ratings yet
(IJCST-V13I2P2) :seema Saroj, Sakshi Sahu, Sanjana Patel, Suraj Sahu
2 pages
INTRODUCTION
No ratings yet
INTRODUCTION
14 pages
SmartCare A Symptoms Based Disease Prediction Model Using Machine Learning Approach
No ratings yet
SmartCare A Symptoms Based Disease Prediction Model Using Machine Learning Approach
9 pages
Article Eda
No ratings yet
Article Eda
7 pages
Multiple Disease Prediction System
100% (1)
Multiple Disease Prediction System
9 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Multiple Diseases
No ratings yet
Multiple Diseases
15 pages
ML Using Scikit
50% (4)
ML Using Scikit
23 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
7 pages
Soumyakant Tripathy - Project Report
No ratings yet
Soumyakant Tripathy - Project Report
44 pages
Comparison of Various Data Mining Methods For Early Diagnosis of Human Cardiology
No ratings yet
Comparison of Various Data Mining Methods For Early Diagnosis of Human Cardiology
9 pages
Developing A Hyperparameter Tuning Based Machine L
No ratings yet
Developing A Hyperparameter Tuning Based Machine L
17 pages
Heart Disease PredictionUsing
No ratings yet
Heart Disease PredictionUsing
6 pages
Nigercon Abuad IEEE 2024
No ratings yet
Nigercon Abuad IEEE 2024
5 pages
Developing Automated Amharic Hate Speech Posts Detection Model From Facebook Using Deep Learning
No ratings yet
Developing Automated Amharic Hate Speech Posts Detection Model From Facebook Using Deep Learning
94 pages
Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms
No ratings yet
Improving Heart Disease Prediction Accuracy Using A Hybrid Machine Learning Approach: A Comparative Study of SVM and KNN Algorithms
6 pages
BI Unit4
No ratings yet
BI Unit4
83 pages
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
11 pages
AI Review 1
No ratings yet
AI Review 1
5 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
11 pages
RapidMiner For ML
No ratings yet
RapidMiner For ML
9 pages
Ijarcce 2019 81210
No ratings yet
Ijarcce 2019 81210
3 pages
Final Heart Disease Prediction
No ratings yet
Final Heart Disease Prediction
26 pages
Predicting Stock Market Time-Series Data Using CNN-LSTM Neural Network Model
No ratings yet
Predicting Stock Market Time-Series Data Using CNN-LSTM Neural Network Model
8 pages
DataScience Project Report
No ratings yet
DataScience Project Report
21 pages
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
No ratings yet
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
5 pages
Mapping Population Exposure To Flood Hazard
No ratings yet
Mapping Population Exposure To Flood Hazard
69 pages
Fin Irjmets1679911272
No ratings yet
Fin Irjmets1679911272
6 pages
Unit 4 DMBI
No ratings yet
Unit 4 DMBI
55 pages
Prediction of Heart Diseases Using Machine Learning
No ratings yet
Prediction of Heart Diseases Using Machine Learning
49 pages
Heart Diesease Prediction and Recommendation System Using Machine Learning
No ratings yet
Heart Diesease Prediction and Recommendation System Using Machine Learning
11 pages
Heart Disease Prediction Using Hybrid Model
No ratings yet
Heart Disease Prediction Using Hybrid Model
6 pages
Disease Prediction Using Python
100% (1)
Disease Prediction Using Python
7 pages
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
No ratings yet
Heart Disease Detection Using Machine Learning: Chithambaram T Logesh Kannan N Gowsalya M (Gowsalya.m@vit - Ac.in)
5 pages
Prediction of Heart Disease Using Machine Learning Algorithms
100% (1)
Prediction of Heart Disease Using Machine Learning Algorithms
10 pages
DM Unit-1 Notes
No ratings yet
DM Unit-1 Notes
47 pages
Auto ML v21657563907199
No ratings yet
Auto ML v21657563907199
39 pages
Chapter 3 - For Class
No ratings yet
Chapter 3 - For Class
52 pages
Heart Disease Prediction Using Machine Learning Algorithm
No ratings yet
Heart Disease Prediction Using Machine Learning Algorithm
4 pages
Heart Disease Python Report 1st Phase
No ratings yet
Heart Disease Python Report 1st Phase
33 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
4 pages
Multimedia Data Mining
No ratings yet
Multimedia Data Mining
19 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Raparthi Yaswanth, Y. Md. Riyazuddin
5 pages
Introduction
No ratings yet
Introduction
23 pages
A Cardiovascular Disease Prediction Using Machine Learning Algorithms
No ratings yet
A Cardiovascular Disease Prediction Using Machine Learning Algorithms
10 pages
Faizan Industrial Internship Report
No ratings yet
Faizan Industrial Internship Report
16 pages
Twitter 2 Comparativestudy
No ratings yet
Twitter 2 Comparativestudy
13 pages
Notes - Unit01 - Data Science and Big Data Analytics
No ratings yet
Notes - Unit01 - Data Science and Big Data Analytics
7 pages
Data Preprocessing Part 1
No ratings yet
Data Preprocessing Part 1
14 pages
Dragon Bundle: It's Time To & Make Projects That Matter
No ratings yet
Dragon Bundle: It's Time To & Make Projects That Matter
18 pages
Invoice Processing Using AI
No ratings yet
Invoice Processing Using AI
13 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
7 pages
Computer Vision and Image Processing + Libaries
No ratings yet
Computer Vision and Image Processing + Libaries
9 pages
Stages in Data Mining
No ratings yet
Stages in Data Mining
11 pages
Prediction and Sentiment Analysis of Stock Using Machine Learning
No ratings yet
Prediction and Sentiment Analysis of Stock Using Machine Learning
10 pages
MATLAB, Step by Step
No ratings yet
MATLAB, Step by Step
5 pages
Football Data Analysis Using Machine Learning Techniques
No ratings yet
Football Data Analysis Using Machine Learning Techniques
3 pages
Heart Disease Prediction System
No ratings yet
Heart Disease Prediction System
3 pages
Pandie - 2020 - Analysis of Characteristics of Guided Missile's Aerodynamics Using CFD (ANSYS R15.0) Software PDF
No ratings yet
Pandie - 2020 - Analysis of Characteristics of Guided Missile's Aerodynamics Using CFD (ANSYS R15.0) Software PDF
13 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Prediction of Heart Disease Using Machine Learning Techniques

Uploaded by

Prediction of Heart Disease Using Machine Learning Techniques

Uploaded by

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science

Gradient Boosting Random Forest

Figure 1: Flowchart of Proposed System.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science

You might also like