0% found this document useful (0 votes)
18 views7 pages

Title: Heart Disease Prediction Using Different Machine Learning Algorithm

Uploaded by

pegih51730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Title: Heart Disease Prediction Using Different Machine Learning Algorithm

Uploaded by

pegih51730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Title: Heart disease prediction using different machine learning

algorithm
Created by: Prabal Tripathi, Aniket Patel, Pritam Bunker, Vikram Singh Parihar

Abstract: Heart related diseases or Cardio vascular Diseases (CVDs) are the main reason for a
huge number of death in the world over the last few decades and has emerged as the most life-
threatening disease, not only in India but in the whole world.
So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for
proper treatment. Machine Learning algorithms and techniques have been applied to various
medical datasets to automate the analysis of large and complex data.
Many researchers, in recent times, have been using several machine learning techniques to help
the health care industry and the professionals in the diagnosis of heart related diseases. This
paper presents a study of various models based on such algorithms and techniques and analyze
their performance.
1. Introduction
Heart is an important organ of the human body. It pumps blood to every part of our anatomy.
If it fails to function correctly, then the brain and various other organs will stop working, and
within few minutes, the person will die. Change in lifestyle, work related stress and bad food
habits contribute to the increase in rate of several heart related diseases.

Heart disease remains the leading cause of death worldwide. According to the World Health
Organization (WHO), cardiovascular diseases (CVDs) accounted for 17.9 million deaths in
2019, representing 32% of all global deaths. Most of these deaths were due to heart attacks
and strokes. In India, heart diseases continue to be a major health concern, accounting for
over 25% of all deaths. The country faces a rising burden of CVD, with non-communicable
diseases like heart conditions causing 60% of total adult death.
The economic impact of heart disease is equally significant. Globally, CVD cost $863 billion
in 2010, with projections suggesting this figure could rise to over $1 trillion by 2030. In India,
WHO estimates reveal that between 2005 and 2015, the nation may have lost around $237
billion due to heart-related illness.
Medical organisations, all around the world, collect data on various health related issues.
These data can be exploited using various machine learning techniques to gain useful insights.

But the data collected is very massive and, many a times, this data can be very noisy. These
datasets, which are too overwhelming for human minds to comprehend, can be easily
explored using various machine learning techniques. Thus, these algorithms have become
very useful, in recent times, to predict the presence or absence of heart related diseases
accurately.
2. Methodology
This paper shows the analysis of various machine learning algorithms, the algorithms that are
used in this paper are K nearest Neighbours (KNN), Logistic Regression and Random Forest
Classifiers, Naive Bayes, Support Vector Machine, Neural Network, XGBoost, Decision Tree,
which can be helpful for practitioners or medical analysts for accurately diagnose Heart
Disease.
This paperwork includes examining the journals, published paper and the data of
cardiovascular disease of the recent times. Methodology gives a framework for the proposed
model. The methodology is a process which include steps that transform given data into
recognized data patterns for the knowledge of the users.
The proposed methodology (Figure 1.) includes steps, where first step is referred as the
collection of the data than in second stage it extracts significant values than the 3rd is the
preprocessing stage where we explore the data.
Data preprocessing deals with the missing values, cleaning of data and normalization
depending on algorithms used. After pre-processing of data, classifier is used to classify the
pre-processed data the classifier used in the proposed model are K nearest Neighbours
(KNN), Logistic Regression and Random Forest Classifiers, Naive Bayes, Support Vector
Machine, Neural Network, XGBoost, Decision Tree. Finally, the proposed model is
undertaken, where we evaluated our model on the basis of accuracy and performance using
various performance metrics. Here in this model, an effective Heart Disease Prediction
System.

Fig: 1
3. Algorithms and Techniques Used
3.1. Naïve Bayes
Naive Bayes is a simple but an effective classification technique which is based on the Bayes
Theorem. In Naive Bayes has achieved an accuracy of 85.25% with the 14 most significant
features which are, of the Cleveland dataset are used.
3.2. Support Vector Machine
Support Vector Machine is an extremely popular supervised machine learning technique
(having a pre-defined target variable) which can be used as a classifier as well as a predictor.
For classification, it finds a hyper-plane in the feature space that differentiates between the
classes. The accuracy score achieved using Support Vector Machine is: 81.97 %

Fig. 1: Support Vector Machine


3.3. K – Nearest Neighbour
K-Nearest Neighbour technique is one of the most elementary but very effective classification
techniques. It makes no assumptions about the data and is generally be used for classification
tasks when there is very less or no prior knowledge about the data distribution.
The accuracy score achieved using K-Nearest Neighbors is: 67.21 %.
3.4. Decision Tree
Decision tree is a of supervised learning algorithm. This technique is mostly used in
classification problems. It performs effortlessly with continuous and categorical attributes.

This algorithm divides the population into two or more similar sets based on the most
significant predictors Decision Tree algorithm, first calculates the entropy of each and every
attribute.
Then the dataset is split with the help of the variables or predictors with maximum
information gain or minimum entropy. These two steps are performed recursively with the
remaining attributes.

Fig. 2: Decision Tree


The accuracy score achieved using Decision Tree is: 81.97 %.

3.5. Random Forest (Choose Model)


Random Forest is also a popularly supervised machine learning algorithm. This technique can
be used for both regression and classification tasks but generally performs better in
classification tasks.
As the name suggests, Random Forest technique considers multiple decision trees before
giving an output. So, it is basically an ensemble of decision trees.
This technique is based on the belief that more number of trees would converge to the right
decision. For classification, it uses a voting system and then decides the class whereas in
regression it takes the mean of all the outputs of each of the decision trees. It works well with
large datasets with high dimensionality
.
Fig. 3: Random Forest

The accuracy score achieved using Random Forest is: 95.08 %


3.6. Neural Network
A neural network is a type of machine learning model inspired by the structure and
functioning of the human brain. It consists of interconnected units called neurons (or nodes)
that work together to process information, learn patterns, and make decisions. Neural
networks are widely used in tasks like image recognition, natural language processing, and
complex decision-making problems. The accuracy score achieved using Neural Network is:
80.33 %.
3.7. Logistic Regression
Logistic Regression is a popular statistical method used for binary classification problems,
where the goal is to predict one of two possible outcomes. Despite its name, logistic
regression is actually a classification algorithm, not a regression algorithm. It models the
probability that a given input belongs to a certain class. The accuracy score achieved using
Logistic Regression is: 85.25 %.
The accuracy of models using different algorithms based on our dataset, where the percentage
of patients without heart problems is 45.54% and the percentage of patients with heart
problems is 54.46%, is as follows:
This figure will show the accuracy score of predicting heart disease symptom using different
algorithms.
4. Conclusion
Based on the above review, it can be concluded that there is a huge scope for machine
learning algorithms in predicting cardiovascular diseases or heart related diseases. Each of the
above-mentioned algorithms have performed extremely well in some cases but poorly in some
other cases.
Alternating decision trees when used with PCA, have performed extremely well but decision
trees have performed very poorly in some other cases which could be due to overfitting.
Random Forest and Logistic Regression have performed very well because they solve the
problem of overfitting by employing multiple algorithms (multiple Decision Trees in case of
Random Forest). Models based on Naïve Bayes classifier were computationally very fast and
have also performed well. SVM performed extremely well for most of the cases.
Systems based on machine learning algorithms and techniques have been very accurate in
predicting the heart related diseases but still there is a lot scope of research to be done on how
to handle high dimensional data and overfitting.

Reference
[1] Ramadoss and Shah B et al.“A. Responding to the threat of chronic diseases in India”.
[2] Global Atlas on Cardiovascular Disease Prevention and Control. Geneva, Switzerland:
World Health Organization, 2011
[3] Dhomse Kanchan B and Mahale Kishor M. et al. “Study of Machine Learning Algorithms
for Special Disease Prediction using Principal of Component Analysis”, 2016 International
Conference on Global Trends in Signal Processing, Information Computing and
Communication.
[4] R.Kavitha and E.Kannan et al. “An Efficient Framework for Heart Disease Classification
using Feature Extraction and Feature Selection Technique in Data Mining “, 2016
[5] Shan Xu ,Tiangang Zhu, Zhen Zang, Daoxian Wang, Junfeng Hu and Xiaohui Duan et al.
“Cardiovascular Risk Prediction Method Based on CFS Subset Evaluation and Random
Forest Classification Framework”, 2017 IEEE 2nd International Conference on Big Data
Analysis.
[6] Manpreet Singh, Levi Monteiro Martins, Patrick Joanis and Vijay K. Mago et al. “
Building a Cardiovascular Disease Predictive Model using Structural Equation Model &
Fuzzy Cognitive Map

You might also like