HEART DISEASE PREDICTION USING
MACHINE LEARNING
DEPARTMENT : COMPUTER SCIENCE AND ENGINEERING
GUIDE : Dr . D. MAGDALENE DELIGHTA ANGELINE
BATCH : 03
NAMES : P. PRASHANTH - 18J25A0583
: MD. ASIF - 20J25A0505
: P. VENKATESH - 20J25A0507
: S. VAMSHI - 20J25A0508
ABSTRACT :
In recent times, heart disease prediction is one of the most complicated tasks
in medical field. In the modern era, approximately one person dies per minute
due to heart disease. Data science plays a crucial role in processing huge
amount of data in the field of healthcare. As heart disease prediction is a
complex task, there is a need to automate the prediction process to avoid
risks associated with it and alert the patient well in advance. This paper
makes use of heart disease dataset available in UCI machine learning
repository. The proposed work predicts the chances of heart disease and
classifies patient’s risk level by implementing different data mining
techniques such as Naive Bayes, Decision Tree, Logistic Regression and
Random Forest. Thus, this paper presents a comparative study by analysing
the performance of different machine learning algorithms.
INTRODUCTION :
According to the World Health Organisation, every year 12 million
deaths occur worldwide due to heart disease.
Prediction of cardiovascular disease is regarded as one of the most
important subjects in the section of data analysis.
The load of cardiovascular disease is rapidly increasing all over the
world from the past few years.
Heart Disease is even highlighted as a silent killer which leads to the
death of the person without obvious symptoms.
DOMAIN INTRODUCTION :
Machine learning (ML) is a field of inquiry devoted to understanding and
building methods that ‘learn’, that is, methods that leverage data to
improve performance on some set of tasks.
It is seen as a part of artificial intelligence. Machine learning algorithms
build a model based on sample data, known as training data, in order to
make predictions or decisions without being explicitly programmed to do
so.
Machine learning algorithms are used in a wide variety of
applications, such as in medicine, email filtering, speech recognition,
and computer vision, where it is difficult or unfeasible to develop
conventional algorithms to perform the needed tasks.
EXISTING SYSTEM :
In this system, the input details are obtained from the patient. Then
from the user inputs, using ML techniques heart disease is analyzed.
The main Methodology used for prediction is Random forest, Decision
Trees and Naive Bayes Techniques.
This system uses 13 medical attributes as input and with that input,
Data sets it to process the data mining techniques
The main task of data Prediction is done using these three techniques
DISADVANTAGES OF EXISTING
SYSTEM :
Prediction of cardiovascular disease results is not accurate.
Data mining techniques does not help to provide effective decision
making.
Cannot handle enormous datasets for patient records.
PROPOSED SYSTEM :
After evaluating the results from the existing methodologies, we have
used python and pandas operations to perform heart disease
classification for the data obtained from the UCI repository.
It provides an easy-to-use visual representation of the dataset,
working environment and building the predictive analytics.
ML process starts from a preprocessing data phase followed by
feature selection based on data cleaning, classification of modelling
performance evaluation.
Random forest technique is used to improve the accuracy of the
result.
ADVANTAGES OF PROPOSED
SYSTEM :
Increased accuracy for effective heart disease diagnosis.
Handles roughest(enormous) amount of data using random forest
algorithm and feature selection.
Reduce the time complexity of doctors.
Cost effective for patients.
SYSTEM REQUIREMENTS :
Software Requirements:
MS Windows 11
Pycharm IDE
Hardware Requirements:
Hard Disk: Greater than 15GB
RAM: Greater than 1 GB
Processor: Core 2 Duo and Above
ALGORITHM
Random forest
Logistic regression
Navies Bayes
Decision tree
DECISION TREE
Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision rules
and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the
Decision Node and Leaf Node. Decision nodes are used to
make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain
any further branches.
It is called a decision tree because, similar to a tree, it starts
with the root node, which expands on further branches and
constructs a tree-like structure.
A decision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees
Why use Decision Trees?
Decision Trees usually mimic
human thinking ability while
making a decision, so it is easy to
understand.
The logic behind the decision
tree can be easily understood
because it shows a tree-like
structure.
RANDOM FOREST
Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for
both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model.
Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average
to improve the predictive accuracy of that dataset.
Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy
and prevents the problem of overfitting.
• Why use Random Forest?
1. It takes less training time as
compared to other
algorithms.
2. It predicts output with high
accuracy, even for the large
dataset it runs efficiently.
3. It can also maintain accuracy
when a large proportion of
data is missing.
LOGISTIC REGRESSION
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique. It
is used for predicting the categorical dependent variable using a
given set of independent variables.
Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. But instead
of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1
Logistic Regression is much similar to the Linear Regression except
that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for solving
the classification problems.
Logistic regression diagram
NAIVES BAYES
Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
It is mainly used in text classification that includes a high-
dimensional training dataset.
Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
The formula for Bayes’ theorem is given as:
MODULES
DATA PRE-PROCESSING
DATA TRAINING
DATA TESTING
DATA PREDICTION
PERFORMANCE ANALYSIS
THANK YOU