0% found this document useful (0 votes)

261 views33 pages

Project Report Half

This document is a project report on developing a machine learning model for early prediction of heart disease. It was submitted by two students, Shivank and Mehul Saini, under the guidance of their professor Dr. A. Anbarasi. The report introduces the growing problem of heart disease and the need for accurate early prediction. It then describes how the students developed a heart disease prediction system using machine learning algorithms like logistic regression and KNN on patient medical history data. The results showed improved accuracy over previous methods like naive Bayes. The model can help reduce costs and enhance early detection of heart disease.

Uploaded by

Shivank Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

261 views33 pages

Project Report Half

Uploaded by

Shivank Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

EARLY HEART DISEASE PREDICTION using ML

A PROJECT REPORT
Submitted by
Shivank [Reg No:RA201100301386]

Mehul Saini[Reg No: RA201100301389]

Under the Guidance of
Dr. A. Anbarasi
Associate Professor, Department of Computing Technologies

in partial fulfillment of the requirements for the degree of

BACHELOR OF TECHNOLOGY in
COMPUTER SCIENCE AND
ENGINEERING

DEPARTMENT OF COMPUTING TECHNOLOGIES

COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR– 603 203
October 2023
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY `
KATTANKULATHUR–603 203

BONAFIDE CERTIFICATE

Certified that 18CSP109L / I8CSP111L project report titled “EARLY HEART DISEASE

PREDICTION USING ML” is the bonafide work of SHIVANK[RA2011003010386] and MEHUL

SAINI [RegNo:RA2011003010389] who carried out the project work under my

Dr. A. ANBARASI Dr. Jeyasekar A

SUPERVISOR PANEL HEAD
Associate Professor Associate Professor
Department of Computing Technologies Department of Computing Technologies

Dr. M. PUSHPALATHA
HEAD OF THE DEPARTMENT
Department of Computing Technologies

ii
supervision. Certified further, that to the best of my knowledge the work reported here in does

not form part of any other thesis or dissertation on the basis of which a degree or award was

conferred on an earlier occasion for this or any other candidate.

Department of Computing Technologies
SRM Institute of Science and Technology Own
Work Declaration Form
Degree/Course :B.Tech in Computer Science and Engineering

Student Names : SHIVANK, MEHUL SAINI

Registration Number: RA2011003010386, RA201100301389

Title of Work : EARLY HEART DISEASE PREDICTION USING ML

I/We here by certify that this assessment compiles with the University’s Rules and
Regulations relating to Academic misconduct and plagiarism, as listed in the University
Website, Regulations, and the Education Committee guidelines.

I / We confirm that all the work contained in this assessment is our own except where
indicated, and that we have met the following conditions:
▪ Clearly references / listed all sources as appropriate
▪ Referenced and put in inverted commas all quoted text (from books,
web,etc.)
▪ Given the sources of all pictures, data etc that are not my own.
▪ Not made any use of the report(s) or essay(s) of any other
student(s)either past or present
▪ Acknowledged in appropriate places any help that I have received from
others (e.g fellow students, technicians, statisticians, external sources)
▪ Compiled with any other plagiarism criteria specified in the Course hand
book / University website
I understand that any false claim for this work will be penalized in accordance with the
University policies and regulations.

DECLARATION:
I am aware of and understand the iii University’s policy on Academic misconduct and
plagiarism and I certify that this assessment is my / our own work, except where
indicated by referring, and that I have followed the good academic practices noted above.

Student 1 Signature:

Student 2 Signature:

Date:
If you are working in a group, please write your registration numbers and sign with the
date for every student in your group.
ACKNOWLEDGEMENT

We express our humble gratitude to Dr. C. Muthamizhchelvan, Vice-Chancellor, SRM

Institute of Science and Technology, for the facilities extended for the project work and his

continued support.

We extend our sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr. T.

V. Gopal, for his invaluable support.

We wish to thank Dr. Revathi Venkataraman, Professor and Chairperson, School of

Computing, SRM Institute of Science and Technology, for her support throughout the project

work.

We are incredibly grateful to our Head of the Department, Dr. M. Pushpalatha, Professor,

Department of Computing Technologies, SRM Institute of Science and Technology, for her

suggestions and encouragement at all the stages of the project work.

We want to convey our thanks to our Project Coordinators, Dr. A.Anbarasi, Dr. T.K.

Sivakumar and Dr. P. Saravanan, Panel Head, Dr. Jeyasekar A, Associate Professor and

Panel Members, Dr. M. Vijayalakshmi Assistant Professor, Mrs A. Mariya Nancy Assistant

Professor and Dr. N. Arunachelam Assistant Professor, Department of Computing

Technologies, SRM Institute of Science and Technology, for their inputs during the project

reviews and support.

We register our immeasurable thanks to our Faculty Advisor, Dr. Jagadeesan, Assistant

Professor, Department of Computing Technologies, SRM Institute of Science and Technology,

for leading and helping us to complete our course.

Our inexpressible respect and thanks to our guide, Dr. A. ANBARASI, Associate Professor,

Department of Computing Technologies, SRM Institute of Science and Technology, for

providing us with an opportunity to pursue our project under her mentorship. She provided us

with the freedom and support to explore the research topics of our interest. Her passion for

solving problems and making a difference in the world has always been inspiring.

We sincerely thank all the staff and students of Computing Technologies Department, School of

Computing, S.R.M Institute of Science and Technology, for their help during our project.

Finally, we would like to thank our parents, family members, and friends for their

unconditional love, constant support and encouragement.

SHIVANK [RA2011003010386] MEHUL SAINI [RA2011003010389]

v
ABSTRACT:
The incidence of heart disease is steadily on the rise, and it is of utmost importance
to be able to predict such diseases in advance. The process of diagnosis is quite
challenging, requiring precision and efficiency. The primary focus of this research
paper is to determine which patients are at a higher risk of developing heart disease
based on various medical attributes. To accomplish this, we have developed a heart
disease prediction system that utilizes a patient's medical history. We employed
various machine learning algorithms, including logistic regression and K-nearest
neighbors (KNN), to predict and classify patients with heart disease. An effective
approach was adopted to fine-tune the model and enhance the accuracy of heart
attack prediction in individuals.

The results demonstrated the model's robust performance, particularly when using
KNN and Logistic Regression, showcasing improved accuracy compared to
previously employed classifiers such as naïve Bayes. This model has significantly
alleviated the pressure associated with accurately identifying heart disease, offering
a valuable tool for assessing the probability of heart disease in individuals. The
heart disease prediction system presented here not only enhances medical care but
also reduces associated costs. This project equips us with valuable insights that can
aid in the early detection of heart disease, and the implementation is available in the
.pynb format.
INTRODUCTION

Heart Disease Prediction:-

Heart disease encompasses a spectrum of conditions affecting the heart. Presently,

cardiovascular diseases stand as the leading global cause of death, with 17.9 million
annual fatalities, as reported by the World Health Organization. The escalating risk of
heart disease is attributable to various unhealthy habits such as high cholesterol,
obesity, elevated triglyceride levels, and hypertension. Recognizing symptoms is
critical, and the American Heart Association lists indicators like sleep disturbances,
irregular heart rates, swollen legs, and rapid weight gain, often around 1-2 kilograms
per day. These symptoms can mimic other conditions, making accurate diagnosis
challenging and potentially fatal.

Fortunately, with the passage of time, substantial research data and hospital patient
records have become available. Open sources offer access to patient records, facilitating
research to leverage computer technologies for accurate disease diagnosis and
prevention. Machine learning and artificial intelligence are now widely acknowledged
for their substantial contributions to the medical field. These technologies enable the
development of various models for diagnosing and predicting heart disease. They also
permit in-depth analysis of complete genomic data, knowledge pandemic predictions,
and improved understanding of medical records for more accurate predictions.

Numerous studies have explored machine learning models for heart disease
classification and prediction. Notable examples include a classifier by Melillo et al.,
which detected congestive heart failure with a 93.3% sensitivity and 63.5% specificity
using the CART algorithm. Rahhal et al. improved performance using deep neural
networks and electrocardiogram data. Guidi et al. introduced a clinical decision
support system, comparing various machine learning and deep learning models to
achieve an 87.6% accuracy with random forest and CART algorithms, outperforming
other classifiers.

Integrating natural language processing and rule-based approaches, Zhang et al.

achieved a 93.37% accuracy in identifying NYHA HF class from unstructured clinical
notes. Parthiban and Srivatsa employed SVM techniques to detect patients with
diabetes and predict heart disease, attaining a 94.60% accuracy using common features
like blood sugar level, patient age, and blood pressure.

High-dimensional datasets pose challenges in machine learning due to issues like

memory constraints and overfitting. Feature engineering and selection techniques are
essential for reducing dataset dimensionality and improving classification and
prediction results. Researchers like Dun et al. used these techniques and achieved high
accuracy using various models, including neural networks, logistic regression, SVM,
and ensemble methods.

To address high-dimensional data, dimensionality reduction techniques such as

Principal Component Analysis (PCA) are commonly employed. Researchers like
Rajagopal and Ranganathan used five different dimensionality reduction techniques,
including PCA, to classify cardiac arrhythmia with remarkable accuracy. Additionally,
PCA was used by other researchers like Zhang et al. to detect breast cancer and reduce
the dimensionality of the data.

In summary, heart disease prediction and classification have seen significant

advancements through machine learning and artificial intelligence. Researchers have
explored various models and techniques to enhance accuracy and address the
challenges posed by high-dimensional data. Heart disease, a condition more prevalent
in males than females, remains a grave concern, and this dataset from 1988,
comprising databases from different regions, has become a valuable resource for
predictive research in this field.
Background of study:-

The Heart Disease Predictor is an offline platform meticulously designed and

developed as a machine learning exploration. Its primary objective is to forecast a
patient’s health status by analyzing comprehensive data, allowing for the early
identification of potentially risky configurations in patients. In situations necessitating
urgent medical intervention, it can promptly alert the appropriate medical personnel.
The platform starts with an initial dataset that compiles patient information, enabling
us to obtain comprehensive results and make precise predictions. These predictions,
generated by machine learning models, will be presented through various distinct
graphical interfaces based on the specific datasets under consideration. Following this,
we will critically evaluate the scope and implications of our findings.

The data used for this platform has been sourced from Kaggle, where data collection
involves the systematic process of gathering and measuring information from diverse
sources to effectively utilize this data.
Methodology:-

Description of the Dataset

The dataset utilized for the research comprises the Public Health Dataset from 1988,
encompassing four distinct databases: Cleveland, Hungary, Switzerland, and Long
Beach V. This dataset contains a total of 76 attributes, including the predictive
attribute. However, all published experiments primarily focus on a subset of 14 specific
attributes. The "target" field within this dataset pertains to the presence of heart
disease in the patient and is represented by integer values, where 0 signifies the
absence of disease and 1 denotes the presence of disease.

Here's a description of the attributes used in this research and their respective
significance:

1. Age: This attribute denotes the patient's age in years.

2. Sex: It distinguishes between male (1) and female (0) patients.

3. Cp (Chest Pain Type): This attribute characterizes the type of chest pain experienced.

4. Trestbps (Resting Blood Pressure): It represents the resting blood pressure

measured in mm Hg upon admission to the hospital. Normal range is 120/80; higher
values may require lifestyle changes for management.

5. Chol (Serum Cholesterol): This attribute reflects the serum cholesterol level, which
indicates the quantity of triglycerides present. Ideal values should be less than 170
mg/dL.

6. Fbs (Fasting Blood Sugar): It categorizes fasting blood sugar levels, where values
larger than 120 mg/dL are labeled as "1" for true. Normal values are below 100 mg/dL.
7. Restecg (Resting Electrocardiographic Results): This attribute captures the results of
resting electrocardiography.

8. Thalach (Maximum Heart Rate Achieved): It records the maximum heart rate
achieved, which can be roughly estimated as 220 minus the patient's age.

9. Exang (Exercise-Induced Angina): This attribute is marked as "1" for yes and is
associated with angina, a chest pain type caused by reduced blood flow to the heart.

10. Oldpeak (ST Depression Induced by Exercise): It measures the ST depression

induced by exercise relative to rest.

11. Slope (Slope of Peak Exercise ST Segment): This attribute describes the slope of the
peak exercise ST segment.

12. Ca (Number of Major Vessels): It quantifies the number of major vessels, ranging
from 0 to 3, colored by fluoroscopy.

13. Thal: Although not explicitly explained, this attribute is likely associated with
thalassemia and contains values such as "3" for normal, "6" for fixed defects, and "7"
for reversible defects.

14. Target (T): This attribute indicates the patient's disease status, where "0" signifies
the absence of disease, and "1" denotes the presence of angiographic disease.
Machine Learning :

Machine learning serves the purpose of imparting knowledge to machines

and enabling them to discern patterns for more efficient data handling.
Occasionally, when presented with data, we may find it challenging to
predict actual patterns or extract valuable insights from it. In such
scenarios, the recourse is to employ machine learning. The objective of
machine learning is for machines to autonomously acquire knowledge from
the data. Numerous studies have explored the concept of machines learning
on their own.

Machine Learning Techniques:-

Machine learning techniques can be broadly categorized as follows:

Supervised Learning:-

Supervised machine learning algorithms necessitate external guidance. The

input dataset is divided into training and test datasets, with the trained
dataset including the output variable to be predicted or classified. Each
algorithm learns specific patterns from the training dataset and applies
them to the test dataset for prediction or classification. This process is
termed supervised learning because it resembles a teacher guiding the
algorithm's learning process. Notable supervised learning algorithms
include:

Data Collection and Preprocessing:-

The dataset used for this analysis is the Heart Disease Dataset, a
combination of four different databases, although only the UCI Cleveland
dataset was utilized. This dataset contains a total of 76 attributes, but all
published experiments focus on a subset of only 14 features. Consequently,
we utilized the pre-processed UCI Cleveland dataset available on Kaggle for
our analysis. A detailed description of the 14 attributes employed in our
work is provided in Table 1 below.

Distinct Values of Attribute:-

- Age: Represents the age of an individual, with multiple values between 29

and 71.
- Sex: Indicates the gender of an individual (0 for Female, 1 for Male).
- CP: Represents the severity of chest pain suffered by the patient (values 0,
1, 2, 3).
- RestBP: Denotes the patient's blood pressure, with multiple values
between 94 and 200.
- Chol: Represents the patient's cholesterol level, with multiple values
between 126 and 564.
- FBS: Represents fasting blood sugar in the patient (values 0, 1).
- Resting ECG: Indicates the result of the electrocardiogram (ECG), with
values 0, 1, 2.
- Heartbeat: Shows the maximum heart rate of the patient, with values
ranging from 71 to 202.
- Exang: Used to identify the presence of exercise-induced angina (1 for yes,
0 for no).

Preprocessing of the Dataset:-

The dataset does not contain any null values; however, it required proper
handling of outliers and addressing the issue of uneven distribution. Two
approaches were employed: one without outliers and feature selection,
which yielded unsatisfactory results, and another using a normally
distributed dataset to mitigate overfitting, combined with the use of
Isolation Forest for outlier detection, resulting in more promising
outcomes. Various visualization techniques were used to assess data
skewness, detect outliers, and evaluate data distribution. These
preprocessing techniques play a crucial role when applying the data for
classification or prediction purposes.

Checking the Distribution of the Data:-

Data distribution holds significance in predictive or classification tasks. In

this dataset, heart disease occurred 54.46% of the time, while the absence
of heart disease accounted for 45.54%. Balancing the dataset is crucial to
prevent overfitting and helps the model identify patterns contributing to
heartdisease.

Checking the Skewness of the Data

For checking the attribute values and determining the skewness of the data (the
asymmetry of a distribution), many distribution plots are plotted so that some
interpretation of the data can be seen. Different plots are shown, so an overview of the
data could be analyzed. The distribution of age and sex, the distribution of chest pain and
trestbps, the distribution of cholesterol and fasting blood, the distribution of ecg resting
electrode and thalach, the distribution of exang and oldpeak, the distribution of slope and
ca, and the distribution of thal and target all are analyzed and the conclusion.
By analyzing the distribution plots, it is visible that thal and fasting blood sugar is not
uniformly distributed and they needed to be handled; otherwise, it will result in overfitting
or underfitting of the data.

Machine Learning Classifiers Proposed

The proposed approach was applied to the dataset in which firstly the dataset was
properly analyzed and then different machine learning algorithms consisting of linear
model selection in which Logistic Regression was used. For focusing on neighbor
selection technique KNeighbors Classifier was used, then tree-based technique like
DecisionTree Classifier was used, and then a very popular and most popular technique of
ensemble methods RandomForest Classifier was used. Also for checking the high
dimensionality of the data and handling it, Support Vector Machine was used. Another
approach which also works on ensemble method and Decision Tree method combination
is XGBoost classifier

Deep Learning Pseudocode:

• Dataset of training

• Dataset of testing

• Checking the shape/features of the input

• The procedure of initiating the sequential layer

• Adding dense layers with dropout layers and ReLU activation functions

• Adding a last dense layer with one output and binary activation function

• End repeat
• L (output)

• End procedure

Deep Learning Proposed:

There are two ways a deep learning approach can be applied. One is using a sequential
model and another is a functional deep learning approach. In this particular research, the
first one is used. A sequential model with a fully connected dense layer is used, with the
flatten and dropout layers to prevent the overfitting and the results are compared of the
machine learning and deep learning and variations in the learning including computational
time and accuracy can be analyzed and can be seen in the figures further discussed in
the Results section.

Evaluation Process Used:

For the evaluation process, confusion matrix, accuracy score, precision, recall, sensitivity,
and F1 score are used. A confusion matrix is a table-like structure in which there are true
values and predicted values, called true positive and true negative. It is defined in four
parts: the first one is true positive (TP) in which the values are identified as true and, in
reality, it was true also. The second one is false positive (FP) in which the values identified
are false but are identified as true. The third one is false negative (FN) in which the value
was true but was identified as negative. The fourth one is true negative (TN) in which the
value was negative and was truly identified as negative.

P=positive, N=negative, TP=true positive, FN=false negative, FP=false positive, TN=true

negative.

Then for checking how well a model is performing, an accuracy score is used. It is defined
as the true positive values plus true negative values divided by true positive plus true
negative plus false positive plus false negative. The formula is
accuracy = TP+TN/TP+TN+FP+FN.

After accuracy there is specificity which is the proportion of true negative cases that were
classified as negative; thus, it is a measure of how well a classifier identifies negative
cases. It is also known as the true negative rate. The formula is

Specificity = TN/TN+FP.

Then there is sensitivity in which the proportion of actual positive cases got predicted as
positive (or true positive). Sensitivity is also termed as recall. In other words, an unhealthy
person got predicted as unhealthy. The formula is

Sensitivity = TP/TP+FN

Results:

By applying different machine learning algorithms and then using deep learning to see
what difference comes when it is applied to the data, three approaches were used. In the
first approach, normal dataset which is acquired is directly used for classification, and in
the second approach, the data with feature selection are taken care of and there is no
outliers detection. The results which are achieved are quite promising and then in the third
approach the dataset was normalized taking care of the outliers and feature selection; the
results achieved are much better than the previous techniques, and when compared with
other research accuracies, our results are quite promising.
Using the First Approach (without Doing Feature Selection and Outliers Detection):

As can be seen in the dataset is not normalized, there is no equal distribution of the target
class, it can further be seen when a correlation heatmap is plotted, and there are so many
negative values.
So, even if the feature selection is done, still, we have outliers.

By applying the first approach, the accuracy achieved by the Random Forest is 76.7%,
Logistic Regression is 83.64%, KNeighbors is 82.27%, Support Vector Machine is
84.09%, Decision Tree is 75.0%, and XGBoost is 70.0%. SVM is having the highest
accuracy here which is achieved by using the cross-validation and grid search for finding
the best parameters or in other words doing the hyperparameter tuning. Then after
machine learning, deep learning is applied by using the sequential model approach. In the
model, 128 neurons are used and the activation function used is ReLU, and in the output
layer which is a single class prediction problem, the sigmoid activation function is used,
with loss as binary cross-entropy and gradient descent optimizer as Adam. The accuracy
achieved is 76.7%.

Using the Second Approach (Doing Feature Selection and No Outliers Detection) :
After selecting the features (feature selection) and scaling the data as there are outliers,
the robust standard scalar is used; it is used when the dataset is having certain outliers. In
the second approach, the accuracy achieved by Random Forest is 88%, the Logistic
Regression is 85.9%, KNeighbors is 79.69%, Support Vector Machine is 84.26%, the
Decision Tree is 76.35%, and XGBoost is 71.1%. Here the Random Forest is the clear
winner with a precision of 88.4% and an F1 score of 86.5%. Then deep learning is
applied with the same parameters before and the accuracy achieved is 86.8%, and the
evaluation accuracy is 81.9%, which is better than the first approach.

Using the Third Approach (by Doing Feature Selection and Also Outliers Detection):

In this approach, the dataset is normalized and the feature selection is done and also the
outliers are handled using the Isolation Forest. The correlation comparison can be seen in
Figure 10. The accuracy of the Random Forest is 80.3%, Logistic Regression is
83.31%, KNeighbors is 84.86%, Support Vector Machine is 83.29%, Decision Tree is
82.33%, and XGBoost is 71.4%. Here the winner is KNeighbors with a precision of 77.7%
and a specificity of 80%. A lot of tips and tricks for selecting different algorithms are shown
by Garate-Escamila et al. [38]. Using deep learning in the third approach, the accuracy
achieved is 94.2%. So, the maximum accuracy achieved by the machine learning model
is KNeighbors ( 83.29%) in the third approach, and, for deep learning, the maximum
accuracy achieved is 81.9%. Thus, the conclusion can be drawn here that, for this
dataset, the deep learning algorithm achieved 94.2 percent accuracy which is greater than
the machine learning models. We also made a comparison with another research of the
deep learning by Ramprakash et al. [39] in which they achieved 84% accuracy and Das et
al. [33] achieved 92.7 percent accuracy. So our algorithm produced greater accuracy and
more promising than other approaches. The comparison of different classifiers of ML and
DL

Common Symptoms of Heart prediction:

• Machine learning allows building models to quickly analyze data and deliver
results, leveraging the historical and real-time data, with machine learning that
will help healthcare service providers to make better decisions on patient’s
disease diagnosis

• By analyzing the data we can predict the occurrence of the disease in our
project.
• This intelligent system for disease prediction plays a major role in controlling the
disease and maintaining the good health status of people by predicting accurate
disease risk.

• Machine learning algorithms can also help provide vital statistics, real-time data
and advanced analytics in terms of the patient’s disease, lab test results, blood
pressure, family history, clinical trial data, etc., to doctors.

• Heart disease predictor is an offline platform designed and developed to explore

the path of machine learning.

• The goal is to predict the health of the patient from collective data to be able to
detect configurations at risk for the patient, and therefore, in cases requiring
emergency medical assistance, alert the appropriate medical staff of the
situation of the latter.

• We initially have a dataset collecting information of many patients with which we

can conclude the results into a complete form and can predict data precisely.

• The results of the predictions, derived from the predictive models generated by
machine learning, will be presented through several distinct graphical interfaces
according to the datasets considered. We will then bring criticism as to the
scope of our results.

Types of Heart Disease:

• Healthy Heart :

• Atrium.

• Plural atria. Unhealthy heart:

• Coronary artery Disease.

• Heart Arrhythmias.

• Heart Failure.

• Heart Valve Disease.

• Cardiomyopathy

Eating a diet high in saturated fats, trans fat, and cholesterol has been linked to heart
disease and related conditions, such as atherosclerosis. Also, too much salt (sodium) in
the diet can raise blood pressure. Not getting enough physical activity can lead to heart
disease.
Coronary artery disease

• Also called: CAD, atherosclerotic heart disease •

OverviewSymptomsTreatmentsNewsSpecialists

• Damage or disease in the heart's major blood vessels.

• The usual cause is the build-up of plaque. This causes coronary arteries to
narrow, limiting blood flow to the heart.

High blood pressure

• Also called: HBP, hypertension

• OverviewSymptomsTreatmentsNewsSpecialists

• A condition in which the force of the blood against the artery walls is too high.

• Usually hypertension is defined as blood pressure above 140/90, and is

considered severe if the pressure is above 180/120.

• Cardiac arrest

• OverviewSymptomsTreatmentsNewsSpecialists

• Sudden, unexpected loss of heart function, breathing and

consciousness.

• In cardiac arrest, the heart abruptly stops beating. Without prompt intervention,
it can result in the person's death.

Heart Disease Prediction System Using Machine Learning :

Machine Learning can play an essential role in predicting presence/absence of Locomotor

disorders, Heart diseases and more. Such information, if predicted well in advance, can
provide important insights to doctors who can then adapt their diagnosis and treatment
per patient basis.

Supervised Learning :

This study, an effective heart disease prediction system (EHDPS) is developed using
neural network for predicting the risk level of heart disease. The system uses 15 medical
parameters such as age, sex, blood pressure, cholesterol, and obesity for prediction
Data insight: As mentioned here we will be working with the heart disease detection
dataset and we will be putting out interesting inferences from the data to derive some
meaningful results.

EDA: Exploratory data analysis is the key step for getting meaningful results.

Feature engineering: After getting the insights from the data we have to alter the features
so that they can move forward for the model building phase.

Model building: In this phase, we will be building our Machine learning model for heart
disease detection.

Conclusion:

The conclusion which we found is that machine learning algorithms performed better in
this analysis. Many researchers have previously suggested that we should use ML where
the dataset is not that large, which is proved in this work. In this paper, we proposed three
methods in which comparative analysis was done and promising results were achieved.
The conclusion which we found is that machine learning algorithms performed better in
this analysis. Many researchers have previously suggested that we should use ML where
the dataset is not that large, which is proved in this paper. The methods which are used
for comparison are confusion matrix, precision, specificity, sensitivity, and F1 score. For
the 13 features which were in the dataset, KNeighbors classifier performed better in the
ML approach when data preprocessing is applied.

The computational time was also reduced which is helpful when deploying a model. It was
also found out that the dataset should be normalized; otherwise, the training model gets
overfitted sometimes and the accuracy achieved is not sufficient when a model is
evaluated for real-world data problems which can vary drastically to the dataset on which
the model was trained. It was also found out that the statistical analysis is also important
when a dataset is analyzed and it should have a Gaussian distribution, and then the
outlier’s detection is also important and a technique known as Isolation Forest is used for
handling this. The difficulty which came here is that the sample size of the dataset is not
large. If a large dataset is present, the results can increase very much in deep learning
and ML as well. The algorithm applied by us in ANN architecture increased the accuracy
which we compared with the different researchers. The dataset size can be increased and
then deep learning with various other optimizations can be used and more promising
results can be achieved. Machine learning and various other optimization techniques can
also be used so that the evaluation results can again be increased. More different ways of
normalizing the data can be used and the results can be compared. And more ways could
be found where we could integrate heart-disease-trained ML and DL models with certain
multimedia for the ease of patients and doctors.

REFERENCE :

"Cardiovascular Diseases (Cvds)". Who.Int, 2020, https://www.who.int/zh/newsroom/fact-

sheets/detail/cardiovascular-diseases-(cvds). "Logistic Regression". En.Wikipedia.Org,

2020, https://en.wikipedia.org/wiki/Logistic_regression. "Understanding Random Forest".

Medium, 2020, https://towardsdatascience.com/understanding-random-forest-

58381e0602d2.

"Explanation Of The Decision Tree Model".

Webfocusinfocenter.Informationbuilders.Com,
2020,

https://webfocusinfocenter.informationbuilders.com/wfappent/TLs/TL_rstat/source/Decis ionTree

47.htm[5] "Xgboost Algorithm: Long May She Reign!". Medium, 2020,

https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-
algorithmlong-she- may-rein-edd9f99be63d.

"Neural Network Definition". Investopedia, 2020,

https://www.investopedia.com/terms/n/neuralnetwork.asp.

Perret-Guillaume C, et al. "Heart Rate As A Risk Factor For Cardiovascular Disease. -

Pubmed - NCBI". Ncbi.Nlm.Nih.Gov, 2020,

https://www.ncbi.nlm.nih.gov/pubmed/19615487.

"Both Blood Pressure Numbers May Predict Heart Disease". Medicalnewstoday.Com,

2020, https://www.medicalnewstoday.com/articles/325861.

"Angina (Chest Pain)". Www.Heart.Org, 2020, https://www.heart.org/en/health-

topics/heart-attack/angina-chest-pain.

2020, http://cooleysanemia.org/updates/Cardiac.pdf. Accessed 14 Mar 2020. Saeed,

M., Hetts, S., English, J., & Wilson, M. (2012, January). MR fluoroscopy in vascular
and cardiac interventions (review). Retrieved March 14, 2020, from

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3275732/

"What Can I Do To Avoid A Heart Attack Or A Stroke?". World Health Organization, 2020,

https://www.who.int/features/qa/27/en/.

Major Final Report Kartik
No ratings yet
Major Final Report Kartik
53 pages
Project Report - Front Page - Sample
No ratings yet
Project Report - Front Page - Sample
11 pages
Jaswanth Narayana R (40738003) Vishesh K (40738007)
100% (1)
Jaswanth Narayana R (40738003) Vishesh K (40738007)
37 pages
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
No ratings yet
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
15 pages
Heart Disease Documentation
No ratings yet
Heart Disease Documentation
82 pages
Phase 1 Project Report
No ratings yet
Phase 1 Project Report
44 pages
Heart Disease Prediction Using Machine Learning.
No ratings yet
Heart Disease Prediction Using Machine Learning.
59 pages
Final Report of Heart Disease Prdiction
No ratings yet
Final Report of Heart Disease Prdiction
81 pages
SST Word
No ratings yet
SST Word
15 pages
TS Final
No ratings yet
TS Final
62 pages
Early Detection of Cardiovascular Diseases Using Machine Learning 2
No ratings yet
Early Detection of Cardiovascular Diseases Using Machine Learning 2
38 pages
Final - Urop - Report - Heart Attack Machine Learning
No ratings yet
Final - Urop - Report - Heart Attack Machine Learning
33 pages
Project Report First Phase @8 Suhana
No ratings yet
Project Report First Phase @8 Suhana
32 pages
MD - Walid - 20103160 - Sec I Complete
No ratings yet
MD - Walid - 20103160 - Sec I Complete
29 pages
Mega Report Final
No ratings yet
Mega Report Final
22 pages
Project Report
No ratings yet
Project Report
46 pages
Heart Disease Prediction Research
No ratings yet
Heart Disease Prediction Research
45 pages
Final Report1 With Correct Title Sheet
No ratings yet
Final Report1 With Correct Title Sheet
74 pages
Latexcode
No ratings yet
Latexcode
42 pages
MINI PROJECT Kshetrika
No ratings yet
MINI PROJECT Kshetrika
41 pages
Mini Project Front Phdumla
No ratings yet
Mini Project Front Phdumla
52 pages
Heart Disease Predicition
No ratings yet
Heart Disease Predicition
42 pages
Black Book Index
No ratings yet
Black Book Index
6 pages
Be MJ Report
No ratings yet
Be MJ Report
35 pages
In Format GROUP FILE
No ratings yet
In Format GROUP FILE
64 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
47 pages
1822 B.E Cse Batchno 95
No ratings yet
1822 B.E Cse Batchno 95
57 pages
Project Report PDF
No ratings yet
Project Report PDF
54 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Cccccccccccccccs
No ratings yet
Cccccccccccccccs
32 pages
Final Report
No ratings yet
Final Report
25 pages
Fypd - 18-510
No ratings yet
Fypd - 18-510
49 pages
Latexcode
No ratings yet
Latexcode
45 pages
1822 B.E Cse Batchno 296
No ratings yet
1822 B.E Cse Batchno 296
83 pages
30 - Heart Disease Prediction
No ratings yet
30 - Heart Disease Prediction
50 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
83 pages
BT3277 Project Report
No ratings yet
BT3277 Project Report
19 pages
GR No-01-Project-Report PDF
No ratings yet
GR No-01-Project-Report PDF
46 pages
1822 B.E Cse Batchno 114
No ratings yet
1822 B.E Cse Batchno 114
42 pages
Project Documentation
No ratings yet
Project Documentation
45 pages
Report Heart Disease
No ratings yet
Report Heart Disease
39 pages
Final Documentation
No ratings yet
Final Documentation
10 pages
BDA Final
No ratings yet
BDA Final
33 pages
Human Disease Prediction (2) - 1 - Compressed
No ratings yet
Human Disease Prediction (2) - 1 - Compressed
62 pages
6th Sem Project PDF
No ratings yet
6th Sem Project PDF
18 pages
Report Heart
No ratings yet
Report Heart
62 pages
A Project Report CPP
No ratings yet
A Project Report CPP
55 pages
ML QB Odd 2023
No ratings yet
ML QB Odd 2023
23 pages
T.John Institute of Technology: Visvesvaraya Technological University
No ratings yet
T.John Institute of Technology: Visvesvaraya Technological University
29 pages
Hearts Report Final Pages
No ratings yet
Hearts Report Final Pages
27 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
48 pages
Heart Disease Prediction System Report
No ratings yet
Heart Disease Prediction System Report
31 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
60 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Proj Report
No ratings yet
Proj Report
29 pages
A Guide To Machine Learning Algorithms 100+
No ratings yet
A Guide To Machine Learning Algorithms 100+
49 pages
Fundamentals of Predictive Analytics A Business Analytics Course
No ratings yet
Fundamentals of Predictive Analytics A Business Analytics Course
36 pages
Mini Report2
No ratings yet
Mini Report2
40 pages
NM Reports
No ratings yet
NM Reports
30 pages
Towards CRISP-ML (Q) : A Machine Learning Process Model With Quality Assurance Methodology
No ratings yet
Towards CRISP-ML (Q) : A Machine Learning Process Model With Quality Assurance Methodology
20 pages
HEART DISEASE PREDICTION REPORT Op Edited
No ratings yet
HEART DISEASE PREDICTION REPORT Op Edited
29 pages
Lab 1 ML 414
No ratings yet
Lab 1 ML 414
5 pages
CVR College of Engineering: in The Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
CVR College of Engineering: in The Partial Fulfillment of The Requirements For The Award of The Degree of
63 pages
19MCA1097 Project Report On Heart Failure Prediction
No ratings yet
19MCA1097 Project Report On Heart Failure Prediction
63 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Heart Disease Prediction Using Machine Learning: Aftab Alam Khan Alka Joshi Amit Rai Gaurav Kumar
No ratings yet
Heart Disease Prediction Using Machine Learning: Aftab Alam Khan Alka Joshi Amit Rai Gaurav Kumar
44 pages
1 s2.0 S2666827022001104 Main
No ratings yet
1 s2.0 S2666827022001104 Main
9 pages
Quiz 4 5 6
No ratings yet
Quiz 4 5 6
11 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
70 pages
Unit 2
No ratings yet
Unit 2
37 pages
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
No ratings yet
Diagnosis of Bearing Faults Using Temporal Vibration Signals: A Comparative Study of Machine Learning Models With Feature Selection Techniques
17 pages
Feature Generation and Selection
No ratings yet
Feature Generation and Selection
12 pages
CS L08 AIML Lecture0107 Recap
No ratings yet
CS L08 AIML Lecture0107 Recap
123 pages
Unit 2
No ratings yet
Unit 2
125 pages
Feature Selection Based On Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy
No ratings yet
Feature Selection Based On Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy
13 pages
CS8080 Irt Unit 3 23 24
No ratings yet
CS8080 Irt Unit 3 23 24
48 pages
1 s2.0 S258972172300017X Main
No ratings yet
1 s2.0 S258972172300017X Main
11 pages
Network Intrusion Detection Using Supervised Machine Learnin (3) )
No ratings yet
Network Intrusion Detection Using Supervised Machine Learnin (3) )
24 pages
Dmda M1
No ratings yet
Dmda M1
30 pages
A Comprehensive Review of Approaches To Building Occupancy Detection
No ratings yet
A Comprehensive Review of Approaches To Building Occupancy Detection
14 pages
Firefly Algorithm Approach For The Optimization of Feature Selection To Perform Classification
No ratings yet
Firefly Algorithm Approach For The Optimization of Feature Selection To Perform Classification
4 pages
Hybrid Parameter Optimization Approach With Adaptive Neuro Fuzzy Inference System For The Software Maintainability
No ratings yet
Hybrid Parameter Optimization Approach With Adaptive Neuro Fuzzy Inference System For The Software Maintainability
12 pages
2020-IEEE-Intelligence Bearing Fault Diagnosis Model Using Multiple Feature Extraction and Binary Particle Swarm Optimization With Extended Memory
No ratings yet
2020-IEEE-Intelligence Bearing Fault Diagnosis Model Using Multiple Feature Extraction and Binary Particle Swarm Optimization With Extended Memory
14 pages
Machine Learning Tools For Long-Term Type 2 Diabetes Risk Prediction
No ratings yet
Machine Learning Tools For Long-Term Type 2 Diabetes Risk Prediction
21 pages
International Journal of Communication Networks and Information Security
No ratings yet
International Journal of Communication Networks and Information Security
17 pages
A Design and An Implementation of Forecast Sentence Extractor
No ratings yet
A Design and An Implementation of Forecast Sentence Extractor
10 pages
A Novel Hybrid Classification Model For The Loan Repayment Capability Prediction System
No ratings yet
A Novel Hybrid Classification Model For The Loan Repayment Capability Prediction System
6 pages
Multi-Modal Hate Speech Detection Using Machine Learning
No ratings yet
Multi-Modal Hate Speech Detection Using Machine Learning
4 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
Modelling Accelerated Proficiency in Organisations: Practices and Strategies to Shorten Time-to-Proficiency of the Workforce
From Everand
Modelling Accelerated Proficiency in Organisations: Practices and Strategies to Shorten Time-to-Proficiency of the Workforce
Dr Raman K Attri
No ratings yet