Predicting Heart Failure Using ML Algorithms
Predicting Heart Failure Using ML Algorithms
A Report submitted in partial fulfilment of the requirements for the degree of
Master of Business Administration
Submitted by,
REGISTER NUMBER
2028151
JANUARY 2022
Predicting Heart Failure using ML Algorithms and Comparing the
Classifiers in terms of Accuracy
Problem Statement
Heart failures due to Cardio-vascular diseases (CVDs) are the number one cause of deaths
globally. The available techniques for predicting heart failure are not accurate and they
demand more time, cost and technical expertise. Data mining and ML techniques are helpful
to get rid of these difficulties caused by conventional diagnosis methods. But it is important
to find which attributes and which classifier to be used while using ML techniques for
prediction. The problem statement is to build machine learning models for heart failure
prediction using R Studio and compare the classifiers used for modelling to find the best one
in terms of accuracy and reliability.
Objective
The goal of predicting heart failure is to avoid severe episodes of heart disease with
preventive therapy. The prediction of heart failure using a minimal number of attributes will
be crucial for the health care industry to save lives. Experts so far have used machine learning
techniques to predict the early signs of heart failure. But the classifiers used must be highly
accurate and reliable. This work aims to compare the different classifiers available today for
the prediction of heart failure to find the finest classifier with the highest accuracy. This will
help the healthcare industry to select the best algorithm from the existing Machine Learning
algorithms in cardiovascular disease prediction.
1. Business understanding
2. Data understanding
3. Data preparation
4. Modelling
5. Evaluation
6. Deployment
Business Understanding
The project aims to make predictions on the possibility of occurrence of Heart Failure for a
person by building Machine Learning models. This problem is a binary classification
problem since it has only two outputs, Y and N. As per the results shown by the Machine
learning model, the person can take actions for his health accordingly. The output of the
machine learning model is expected to be categorical. From a business perspective, this
would enable the healthcare industry to find the best algorithm for heart failure prediction
among various classifiers in terms of performance parameters.
Data understanding
The dataset used is a repository from IBM used for heart failure prediction. It has ten columns with
10800 rows with categorical and numerical variables, including the target variable HEARTFAILURE.
The columns in this data set are optimum for predicting heart disease. Various features leading to
heart failure, including alterable and unalterable risk factors, have been used as the independent
variables, and the dependent variable is whether the person has heart failure or not. The following are
the columns used in the dataset:
For machine learning models built using R Studio, the following steps were taken to prepare the data
for building the model: