Title: Heart Disease Prediction Using Different Machine Learning Algorithm
Title: Heart Disease Prediction Using Different Machine Learning Algorithm
algorithm
Created by: Prabal Tripathi, Aniket Patel, Pritam Bunker, Vikram Singh Parihar
Abstract: Heart related diseases or Cardio vascular Diseases (CVDs) are the main reason for a
huge number of death in the world over the last few decades and has emerged as the most life-
threatening disease, not only in India but in the whole world.
So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for
proper treatment. Machine Learning algorithms and techniques have been applied to various
medical datasets to automate the analysis of large and complex data.
Many researchers, in recent times, have been using several machine learning techniques to help
the health care industry and the professionals in the diagnosis of heart related diseases. This
paper presents a study of various models based on such algorithms and techniques and analyze
their performance.
1. Introduction
Heart is an important organ of the human body. It pumps blood to every part of our anatomy.
If it fails to function correctly, then the brain and various other organs will stop working, and
within few minutes, the person will die. Change in lifestyle, work related stress and bad food
habits contribute to the increase in rate of several heart related diseases.
Heart disease remains the leading cause of death worldwide. According to the World Health
Organization (WHO), cardiovascular diseases (CVDs) accounted for 17.9 million deaths in
2019, representing 32% of all global deaths. Most of these deaths were due to heart attacks
and strokes. In India, heart diseases continue to be a major health concern, accounting for
over 25% of all deaths. The country faces a rising burden of CVD, with non-communicable
diseases like heart conditions causing 60% of total adult death.
The economic impact of heart disease is equally significant. Globally, CVD cost $863 billion
in 2010, with projections suggesting this figure could rise to over $1 trillion by 2030. In India,
WHO estimates reveal that between 2005 and 2015, the nation may have lost around $237
billion due to heart-related illness.
Medical organisations, all around the world, collect data on various health related issues.
These data can be exploited using various machine learning techniques to gain useful insights.
But the data collected is very massive and, many a times, this data can be very noisy. These
datasets, which are too overwhelming for human minds to comprehend, can be easily
explored using various machine learning techniques. Thus, these algorithms have become
very useful, in recent times, to predict the presence or absence of heart related diseases
accurately.
2. Methodology
This paper shows the analysis of various machine learning algorithms, the algorithms that are
used in this paper are K nearest Neighbours (KNN), Logistic Regression and Random Forest
Classifiers, Naive Bayes, Support Vector Machine, Neural Network, XGBoost, Decision Tree,
which can be helpful for practitioners or medical analysts for accurately diagnose Heart
Disease.
This paperwork includes examining the journals, published paper and the data of
cardiovascular disease of the recent times. Methodology gives a framework for the proposed
model. The methodology is a process which include steps that transform given data into
recognized data patterns for the knowledge of the users.
The proposed methodology (Figure 1.) includes steps, where first step is referred as the
collection of the data than in second stage it extracts significant values than the 3rd is the
preprocessing stage where we explore the data.
Data preprocessing deals with the missing values, cleaning of data and normalization
depending on algorithms used. After pre-processing of data, classifier is used to classify the
pre-processed data the classifier used in the proposed model are K nearest Neighbours
(KNN), Logistic Regression and Random Forest Classifiers, Naive Bayes, Support Vector
Machine, Neural Network, XGBoost, Decision Tree. Finally, the proposed model is
undertaken, where we evaluated our model on the basis of accuracy and performance using
various performance metrics. Here in this model, an effective Heart Disease Prediction
System.
Fig: 1
3. Algorithms and Techniques Used
3.1. Naïve Bayes
Naive Bayes is a simple but an effective classification technique which is based on the Bayes
Theorem. In Naive Bayes has achieved an accuracy of 85.25% with the 14 most significant
features which are, of the Cleveland dataset are used.
3.2. Support Vector Machine
Support Vector Machine is an extremely popular supervised machine learning technique
(having a pre-defined target variable) which can be used as a classifier as well as a predictor.
For classification, it finds a hyper-plane in the feature space that differentiates between the
classes. The accuracy score achieved using Support Vector Machine is: 81.97 %
This algorithm divides the population into two or more similar sets based on the most
significant predictors Decision Tree algorithm, first calculates the entropy of each and every
attribute.
Then the dataset is split with the help of the variables or predictors with maximum
information gain or minimum entropy. These two steps are performed recursively with the
remaining attributes.
Reference
[1] Ramadoss and Shah B et al.“A. Responding to the threat of chronic diseases in India”.
[2] Global Atlas on Cardiovascular Disease Prevention and Control. Geneva, Switzerland:
World Health Organization, 2011
[3] Dhomse Kanchan B and Mahale Kishor M. et al. “Study of Machine Learning Algorithms
for Special Disease Prediction using Principal of Component Analysis”, 2016 International
Conference on Global Trends in Signal Processing, Information Computing and
Communication.
[4] R.Kavitha and E.Kannan et al. “An Efficient Framework for Heart Disease Classification
using Feature Extraction and Feature Selection Technique in Data Mining “, 2016
[5] Shan Xu ,Tiangang Zhu, Zhen Zang, Daoxian Wang, Junfeng Hu and Xiaohui Duan et al.
“Cardiovascular Risk Prediction Method Based on CFS Subset Evaluation and Random
Forest Classification Framework”, 2017 IEEE 2nd International Conference on Big Data
Analysis.
[6] Manpreet Singh, Levi Monteiro Martins, Patrick Joanis and Vijay K. Mago et al. “
Building a Cardiovascular Disease Predictive Model using Structural Equation Model &
Fuzzy Cognitive Map