0% found this document useful (0 votes)
31 views32 pages

Final Report 2024 - Merged

dnabsdjasdasd asd asd asj dsajd sad asm d

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views32 pages

Final Report 2024 - Merged

dnabsdjasdasd asd asd asj dsajd sad asm d

Uploaded by

aditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

DECLARATION

We hereby declare that the work presented in this report entitled “HEART DISEASE
PREDICTION USING CLASSIFICATION ALGORITHM”, was carried out by us. We
have not submitted the matter embodied in this report for the award of any other degree or
diploma of any other University or Institute. We have given due credit to the original
authors/sources for all the words, ideas, diagrams, graphics, computer programs, experiments,
results, that are not my original contribution. We have used quotation marks to identify
verbatim sentences and given credit to the original authors/sources.

We affirm that no portion of our work is plagiarized, and the experiments and results reported
in the report are not manipulated. In the event of a complaint of plagiarism and the manipulation
of the experiments and results, we shall be fully responsible and answerable.

Name : Aditya Tandon


Roll Number : 2001330100019

Name : Abhishek Pr Singh


Roll Number : 2001330100019

Name : Amit Rawat


Roll Number : 2001330100047

Name : Ashutosh Tyagi


Roll Number : 2001330100077

i
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

Certified that Aditya Tandon (Roll No_1: 2001330100027), Abhishek Pratap Singh(Roll

No_2: 2001330100019), Amit Rawat(Roll No_3: 2001330100047), Ashutosh Tyagi(Roll

No_3: 2001330100071) have carried out the research work presented in this Project Report

entitled “HEART DISEASE PREDICTION USING CLASSIFICATION ALGORITHM”

for the award of Bachelor of Technology, Computer Science & Engineering from Dr. APJ

Abdul Kalam Technical University, Lucknow under our supervision. The Project Report

embodies results of original work, and studies are carried out by the students herself/himself.

The contents of the Project Report do not form the basis for the award of any other degree to

the candidate or to anybody else from this or any other University/Institution.

Signature Signature

Dr. Kumud Saxena Dr. Kumud Saxena

HoD HoD
CSE & IT CSE & IT
NIET Greater Noida NIET Greater Noida

Date:

ii
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ACKNOWLEDGEMENTS

We would like to express my gratitude towards Dr. Kumud Saxena ma’am for their guidance
and constant supervision as well as for providing necessary information regarding the project
& also for their support in completing the project.
Our thanks and appreciations to respected HOD, Dy. HOD, for their motivation and support
throughout.

iii
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

ABSTRACT

Cardiovascular disease is a sickness that can cause sudden death. It happens when the heart is
not working properly due to many things. There are many factors that can affect the heart, such
as obesity, high blood pressure, and cholesterol. The number of cases for death due to heart
disease has been increased and there is a need for methods to help predict the disease, aid in
early diagnosis, and help doctors treat patients medically. The current study aims to estimate
the risk of heart attack based on data from patients. In practice, prediction and interpretation
are the main goals of data discovery. Predictive data mining involves attributes or variables in
datasets to determine unknown or future values of other factors. This definition refers to finding
patterns that interpret data for human interpretation. Machine learning is now used in many
fields, and healthcare is no exception. K-nearest, random forests etc. such as machine learning
algorithms (classification algorithms). Medical care is about people's lives and should be the
right one. Therefore, we need to create a system that can accurately predict the disease.

iv
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

TABLE OF CONTENTS
Page No.
Declaration i
Certificate ii
Acknowledgements iii
Abstract iv
List of Figures v
List of Abbreviations vi
List of Tables vii

CHAPTER 1: INTRODUCTION 1-25

1.1 INTRODUCTION 1
1.2 Formulation Of Problem
1.2.1 Tool and Technology Used
CHAPTER 2: LITERATURE SURVEY/PROJECT DESIGN
CHAPTER 3: Result and Outcomes
REFERENCES

v
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

LIST OF FIGURES

Fig No Caption Page No


1 Dataset
2 EDA
3 Heatmap
4 Correlation graph
5 Sex analysis
6 Data Splitting
7 KNN model
8 Logistic Regression Model
9 Random Forest
10 Flow Diagram
11 System Design

vi
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

LIST OF ABBREVIATIONS

Abbreviation Full Form


ML Machine learning
EDA Exploratory Data Analysis
KNN K-Nearest Neighbours

vii
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

LIST OF TABLES

Table No. Table Caption Page No

1 Precision, recall table

2 Result table

viii
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHAPTER 1

INTRODUCTION

Heart attacks remain a prominent cause of mortality globally, including within the confines of
South Africa, and timely identification has the potential to avert such cardiac events. The
compendium of cardiac datasets harbors a plethora of classified information that, though
readily accessible, lacks practicality in predictive endeavors. This investigation employs
multiple data mining methodologies in order to transfigure untapped data into comprehensive
database models. Individuals often succumb to symptoms they never anticipated. In order to
preemptively forestall these occurrences, physicians must prognosticate the likelihood of heart
attacks in patients. Data mining represents a technique for acquiring data, enabling analysis,
and ultimately rendering it serviceable. The present study endeavors to evaluate the probability
of heart attack occurrences based on patient data. In actuality, prognostication and
interpretation constitute the primary objectives of data exploration. Predictive data mining
entails the utilization of attributes or variables within datasets to ascertain unknown or
forthcoming values of other factors. This delineation alludes to the discovery of patterns that
facilitate comprehension and interpretation of data by humans.

With the increasing number of deaths due to heart diseases, it has become mandatory to develop
a system to predict heart diseases effectively and accurately. The motivation for the project was
to find the most efficient ML algorithm for detection of heart diseases.

1
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHAPTER 2

LITERATURE SURVEY

Problem Identification
After studying the heart disease problems, we get to know about that there are certain
parameters that effect the heart. Such as trestbps, age, sex, cholesterol, fbs, thalach, exang,
oldpeak etc. by using these parameters, we can predict that the person is suffering with the
heart disease or not.

Proposed Solution
To address the aforementioned issue, we propose an approach that involves
enhancing the Machine Learning model. This will be achieved by training the
model using the specified parameters that have an impact on the heart.
Subsequently, the Machine Learning model will make predictions regarding the
likelihood of an individual having heart disease. One crucial aspect of this approach
is to visually represent the data through the use of graphs. Through data analysis,
we aim to identify the ideal relationship between the attributes, thereby gaining a
deeper understanding of the significant attributes. In the realm of Machine Learning
algorithms, we will employ classification algorithms such as K-nearest neighbor,
Logistic Regression, and Random Forest. These algorithms will yield results in the
form of either 0 or 1, indicating the presence or absence of heart disease in an
individual. Following the processing of the data using various classification
algorithms, we will select the most accurate model and draw conclusions
accordingly.

Cardiac disease prediction encompasses several crucial elements.:


 Firstly, the process involves extracting valuable insights from the heart
disease detection dataset, as mentioned previously. These insights are then
utilized to derive meaningful conclusions.
 Secondly, the exploration of the data, known as exploratory data analysis
(EDA), plays a pivotal role in obtaining significant outcomes. Thirdly,
feature engineering becomes imperative once we have acquired a deep

2
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

understanding of the data. This involves modifying the features in order to


proceed to the subsequent phase of model building.
 Lastly, the model building phase focuses on constructing a machine learning
model specifically designed to detect heart disease.

The dataset utilized for this prediction is derived from a collection of patient data
gathered by physicians in South Africa. From the vast database, only 14 attributes
have been deemed essential for the purpose of heart disease prediction.
Furthermore, the following factors with normal values are taken into consideration
in this process.

3
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

No. Attribute Description


1 age Patient's age
2 sex Patient’s gender (1 = male; 0 =
female)
3 cp Type of the chest pain:
0: typical angina

1: atypical angina

2: non-anginal pain

3: asymptomatic

4 trestbps Patient’s resting blood pressure


(in mm Hg)
5 col Cholesterol level (in mg/dl)

6 fbs (Fasting blood sugar > 120


mg/dl) (1 = true; 0 = false)
7 restecg resting electrocardiographic
results—
0: normal,
1: having ST-T wave
abnormality,
2: showing probable left
ventricular hypertrophy

8 thalach max heart rate achieved.

9 exang exercise induced angina (1 =


yes; 0 = no)
10 oldpeak ST depression induced by
exercise relative to rest.
11 slope the slope of the peak exercise
ST segment (1 = upsloping; 2 =
flat; 3 = down sloping)
12 ca number of major vessels (0-3)
colored by fluoroscopy.

13 thal Thalassemia: 3 = normal; 6 =


fixed defect; 7 = reversable
defect.
14 target 0=healthy heart

1= heart with chance of heart


disease

FIGURE 1.

4
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Preparation and Exploration:

FIGURE 2.

FIGURE 3.

5
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Correlation of each attribute with the target value:

FIGURE 4.

Inference:
The above correlation graph provides valuable insights into the relationships
between various features and the target feature. It is evident from the graph that
four specific features, namely "cp", "restecg", "thalach", and "slope", exhibit a
positive correlation with the target feature. This means that as the values of these
features increase, the value of the target feature also tends to increase. On the other
hand, the remaining features demonstrate a negative correlation with the target
feature. This implies that as the values of these features increase, the value of the
target feature tends to decrease.

6
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Sex Analysis:

FIGURE 5.

Here 0 denotes female and 1 denote male, in the above bargraph , it is clearly visible that male
are more likely to have chance of being diagnosed by heart disease than females. The ratio is
about 2:1.

Data Splitting (Features and Target)


Given dataset is divided in the ratio of 80-20, among training and testing dataset respectively,
model will be then trained with the training dataset and then test over testing data to analyze if
the model is predicting correctly or not, and the accuracy with which it is predicting.

FIGURE 6.

7
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

After the execution of previous process that was data splitting, we further moved towards the
model training, we trained various available machine learning models, with the training dataset.
The model will get familiar with the pattern in the dataset and will used this to predict for the
testing dataset.

FIGURE 7.

FIGURE 8.

FIGURE 9.

8
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHAPTER 3

SYSTEM DESIGN AND METHODOLOGY

3.1. System Design

3.1.1. Flow Diagram

Data
Collection

Data Cleaning

Data Visualization (EDA)

Data Processing

Splitting Data

Training Data
Testing Data

Less
chance
Classifier
Classifier
High
chance

FIGURE 10.

9
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

3.1.2. System Design

Input

Dataset Model Result

FIGURE 11.

3.2. Algorithm(s)
3.2.1. Logistic Regression: It is one of most common Machine Learning model, using
widely in data science field. It is used to estimate the categorical dependent
variable using a set of independent variables. It estimates the output of the
dependent variable. Therefore, the result must be categorical. In this case the
dependent variable is the ‘Target’ variable which defines the chances of a person
having a heart disease while taking into account the independent variables such as
cholesterol, ECG, Blood Pressure, Age, Gender and other factors as well. It
basically helps in prediction of the disease while establishing a relationship
between the dependent and independent variables.

FIGURE 12.

10
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

3.2.2. KNN: K-Nearest Neighbors is a way to teach computers how to sort things into
groups. It looks at all the things that have already been sorted and finds things that
are similar to the new thing. Then it puts the new thing in the same group as the
similar things. This makes it easy to sort things quickly and accurately. The K-
Neighbors method can be used to sort things into groups or to predict what a new
thing might be like based on what similar things are like. This clustering algorithm
groups data with most similarities in one group and data with none similarities in
one group. This helps the machine to classify the new input in one of the formed
clusters based on the dependent variables.

FIGURE 13.

3.2.3. Random Forest: Random Forest is a special way for machines to learn. It helps
them figure out if things belong in certain groups or how to predict things. Random
Forest Algorithm forms multiple decision trees based on the variables and predicts
the final output based on the average of each tree.

11
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

FIGURE 14.

3.3. Working Process


Process starts with the gathering of data , then data cleaning, then further move to analyzing
the data, model building and model training then lastly predicting the result and comparing the
output.
3.3.1. Data Cleaning
It is the process of removing bad or unnecessary data from the dataset like duplicate records,
attribute with no value, outliers, etc. In the process to make the data more useful and capable
of being used for various statistical analysis.
3.3.2. EDA

It is the process of analyzing the dataset using data visualization and, then finding the
characteristics of various attributes and, the relationship (correlation) among the attributes.

12
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

FIGURE 15.

3.3.3. Data Splitting

Given dataset is divided in the ratio of 80-20, among training and testing dataset respectively,
model will be then trained with the training dataset and then test over testing data to analyze if
the model is predicting correctly or not, and the accuracy with which it is predicting.

3.3.4. Model Training

After the execution of previous process that was data splitting, we further moved towards the
model training, we trained various available machine learning models, with the training dataset.
The model will get familiar with the pattern in the dataset and will used this to predict for the
testing dataset.

13
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

3.3.5. Model Evaluation

Now, it’s time for evaluating the model, we need to find the accuracy of each model. The most
accurate model will be used for the prediction.
TABLE 1.
Model Precision Recall F1-score
KNN 0.84 0.76 0.79
Logistic 0.83 0.93 0.88
Regression
Random 0.97 0.96 0.95
Forest

14
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHAPTER 4

IMPLEMENTATION AND RESULTS

4.1. Software and Hardware Requirements


1. Hardware requirements:
 laptop
 high speed internet
2. Software requirements:
 anaconda (jupyter note book)

4.2. Implementation Details

4.2.1. Snapshots Of Interfaces

15
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

4.2.2. Test Cases

4.2.3. Results

TABLE 2.

Model Accuracy
Logistic Regression 0.8634

KNN 0.795

Random Forest 0.97

16
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHAPTER 5

CONCLUSION

5.1. We have made a computer program with the help of various available machine
learning model, that can detect when someone might be having a heart attack. It can
also help doctors treat things like chest pain, diabetes, and high blood pressure. The
program looks at a person's medical history and compares it to other people's histories
to see if they might be at risk for heart disease. We have used various supervised
machine learning algorithms like KNN, random forest and logistic regressor. And we
are concluding that, "random forest," was the most accurate and gave the best result
among all. This model predicts with upmost accuracy.

5.2. In accordance with the training and testing of the Machine learning model, utilizing
80 percent of the dataset, it was observed that the Logistic Regression model and
Random Forest model were able to accurately predict the outcomes (as shown in Table
2) in comparison to the KNN model. This notable result can be attributed to the
inherent nature of KNN as a 'lazy learner', which hinders its ability to effectively learn
from the input dataset. Rather than truly comprehending and internalizing the data, the
KNN model simply memorizes the outcomes. Thus, when faced with a large dataset
containing numerous features, its performance tends to be compromised. In contrast,
the logistic regression model excels in this regard as it actively learns from the given
dataset, allowing for a more robust and comprehensive understanding of the underlying
patterns and relationships. Furthermore, the Random Forest model emerges as the most
suitable option due to its unique ability to amalgamate multiple decision trees, resulting
in highly accurate predictions that surpass those of the other models.
5.3. In light of the escalating mortality rates associated with cardiovascular illnesses, it has
become imperative to formulate a comprehensive mechanism that can effectively and
accurately forecast the onset of such ailments. Moving forward, the efficacy of this
approach can be further augmented by the development of a web-based application,
grounded in the Random Forest algorithm, which would be capable of accommodating
a significantly larger dataset in comparison to the one utilized in the present analysis.

17
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

This augmentation in dataset size would undoubtedly yield superior outcomes, thereby
empowering healthcare practitioners to predict the occurrence of heart diseases with
heightened effectiveness and efficiency.

18
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

REFERENCES

[1] (2018). A Hybrid Intelligent System Framework for the Prediction of Heart
Disease Using Machine Learning Algorithms. Mobile Information Systems, (2018),
1-21. https://doi.org/10.1155/2018/3860146
[2] (2023). Diagnosis of Heart Disease using Machine Learning Algorithms.
IJARSCT, 171-182. https://doi.org/10.48175/ijarsct-9491
[3] (2023). MACHINE LEARNING-BASED CARDIAC DISEASE PREDICTION
SURVEY PAPER. IRJMETS. https://doi.org/10.56726/irjmets34502
[4] (2022). Comparative analysis and feature importance of machine learning and
deep learning for heart disease prediction. IJEECS, 1(29), 451.
https://doi.org/10.11591/ijeecs.v29.i1.pp451-459
[5] (2022). Diagnosis of heart disease using oversampling methods and decision tree
classifier in cardiology. Res. Biomed. Eng., 1(39), 99-113.
https://doi.org/10.1007/s42600-022-00253-9
[6] (2022). A Review on Machine Learning-Based Algorithms for Heart Disease
Diagnosis and Prediction. IJSRCSEIT, 606-611.
https://doi.org/10.32628/cseit228686
[7] (2022). Performance analysis of machine learning algorithms in heart disease
prediction. Concurrent Engineering, 4(30), 335-343.
https://doi.org/10.1177/1063293x221125231
[8] (2018). A Hybrid Intelligent System Framework for the Prediction of Heart
Disease Using Machine Learning Algorithms. Mobile Information Systems, (2018),
1-21. https://doi.org/10.1155/2018/3860146
[9] (2023). Diagnosis of Heart Disease using Machine Learning Algorithms.
IJARSCT, 171-182. https://doi.org/10.48175/ijarsct-9491
[10] (2020). Heart Disease Identification Method Using Machine Learning
Classification in E-Healthcare. IEEE Access, (8), 107562-107582.
https://doi.org/10.1109/access.2020.3001149
[11] (2021). Improving the Efficiency by Novel Feature Extraction Technique Using
Decision Tree Algorithm Comparing with SVM Classifier Algorithm for Predicting
Heart Disease. alinteri, 1(36), 713-720.
https://doi.org/10.47059/alinteri/v36i1/ajas21100

19
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

[12] (2020). Heart Disease Prediction Using Machine Learning. IJMTST, 12(6),
290-293. https://doi.org/10.46501/ijmtst061254

[13] (2021). Comparison of Coronary Heart Disease Prediction models using various Machine
Learning Algorithms. JER is an international, peer-reviewed journal that publishes f.
https://doi.org/10.36909/jer.icari.1532

20
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

APPENDICES

<Times New Roman, Font size 12>

(Appendix may include Raw datasets used, Computer Programs, Fundamental


Theorems, Charts, Graphs, Audio Video File Links, Any Probability Distribution
used, Log Table Used, etc.)

21
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PUBLICATIONS

<Please paste the actual online published paper showing conference / journal
name and year of publication as appearing on the conference/ journal website>

22
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PLAGIARISM REPORT

23
NOIDA INSTITUTE OF ENGINEERING AND TECHNOLOGY, GREATER NOIDA DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CURRICULUM VITAE

<Please write one page latest CV of the student>

24

You might also like