0% found this document useful (0 votes)
56 views34 pages

Compparison of Classification Algorithm For Heart Disease - Predictionpdf

This document describes a minor project report submitted by three students to fulfill the requirements for a Bachelor's degree in Computer Engineering. The report compares different machine learning algorithms for predicting heart disease using a heart disease dataset. It explores potential risk factors that can improve predictive performance. The project aims to build a system for early detection of cardiovascular diseases using machine learning algorithms.

Uploaded by

Bipin Poudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views34 pages

Compparison of Classification Algorithm For Heart Disease - Predictionpdf

This document describes a minor project report submitted by three students to fulfill the requirements for a Bachelor's degree in Computer Engineering. The report compares different machine learning algorithms for predicting heart disease using a heart disease dataset. It explores potential risk factors that can improve predictive performance. The project aims to build a system for early detection of cardiovascular diseases using machine learning algorithms.

Uploaded by

Bipin Poudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
ADVANCED COLLEGE OF ENGINEERING AND MANAGEMENT

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

KALANKI, KATHMANDU

A Minor Project Final Defense Report

On

“COMPARISON OF CLASSIFICATION ALGORITHMS FOR

HEART DISEASE PREDICTION”

[CT-654]

SUBMITTED BY:
Nhyuumila Kasaa [30147]
Pradip Sapkota [30155]
Prajwal Kunwar [30158]

Under Supervision of:


Er. Laxmi Prasad Bhatt
Er. Sameep Dhakal

A Minor Project Final report submitted to the department of


Electronics and Computer Engineering in the partial fulfillment of the
requirements for degree of Bachelor of Engineering in Computer Engineering
Kathmandu, Nepal
April 30, 2023
ADVANCED COLLEGE OF ENGINEERING AND MANAGEMENT
DEPARTMENT OF COMPUTER AND ELECTRONICS ENGINEERING

APPROVAL LETTER

The undersigned certify that they have read and recommended to the Institute of
Engineering for acceptance, a project report entitled “COMPARISON OF
CLASSIFICATION ALGORITHMS FOR HEART DISEASE PREDICTION”

SUBMITTED BY:
Nhyuumila Kasaa [30147]
Pradip Sapkota [30155]
Prajwal Kunwar [30158]

In partial fulfillment for the degree of Bachelor in Computer Engineering.

.................................. ..................................
Project Supervisor Project Supervisor

..................................
External Examiner

..................................
Er. Laxmi Prasad Bhatt
Academic Project coordinator

Department of Computer and Electronics Engineering


April 30, 2023

ii
ACKNOWLEDGEMENT
We take this opportunity to express our deepest and sincere gratitude to our Project
Supervisor Er. Laxmi Prasad Bhatt and Er. Sameep Dhakal, Department of
Electronics and Computer Engineering for their insightful advice, motivating
suggestions, invaluable guidance, help and support in successful completion of this
project. We are also grateful for his constant encouragement and advice throughout our
Bachelor's program.

We express our deep gratitude to Er. Ajaya Shrestha, Head of Department of


Electronics and Computer Engineering, Er. Bikash Acharya, Deputy Head,
Department of Electronics and Computer Engineering, Er. Laxmi Prasad Bhatt,
Academic Project Coordinator, Department of Electronics and Computer Engineering
for their regular support, co-operation, and coordination.

The in-time facilities provided by the department throughout the Bachelors program
are also equally acknowledgeable.

We would like to convey our thanks to the teaching and non-teaching staff of the
Department of Electronics & Communication and Computer Engineering, ACEM for
their invaluable help and support throughout the period of Bachelor’s Degree. We are
also grateful to all our classmates for their help, encouragement and invaluable
suggestions.

Nhyuumila Kasaa [30147]

Pradip Sapkota [30155]

Prajwal Kunwar [30158]

iii
ABSTRACT
The term ‘heart disease’ is often used interchangeably with the term ‘cardiovascular
disease’(CVD). Cardiovascular disease is the combination of heart and vascular
diseases and is responsible for premature death and chronic disability worldwide.
According to the World Health Organization (WHO) 17.9 million people die every year
due to heart related conditions.[1] It is estimated that 80% of these deaths could be
prevented if detected early.[2] Since CVD has a slow onset and long incubation period,
it is generally at a more serious stage at the time of diagnosis. Therefore, early
identification is particularly essential for its prevention and control. In recent years
many projects have been developed in the field of medicine and engineering to detect
the conditions in the early stages. Many artificial intelligence (AI) systems are currently
being used in hospitals and at other medical organizations as well. Our aim is to build
a similar system but for a small scale. It mainly focuses on detection of heart diseases
so as to prevent any further degradation. The machine learning (ML) algorithm is a
traditional statistical method that can effectively solve the problems of non-linearity,
variable redundancy and interaction between variables. In this project we explore the
potential risk factors for CVD to improve the predictive performance. Studies have
shown that the predictions made by ML were far more accurate compared to traditional
statistical methods.

Keywords: Cardiovascular Diseases, Machine Learning, Artificial Intelligence

iv
TABLE OF CONTENT

Title Page

ACKNOWLEDGEMENT iii
ABSTRACT iv
LIST OF ABBREVIATIONS/ACRONYMS viii
CHAPTER 1 1
INTRODUCTION 1
1.1 BACKGROUND 1
1.2 Motivation 1
1.3 Statement of the problem 2
1.4 Project objective 2
1.5 Significance of the study 2
CHAPTER 2 LITERATURE REVIEW 3
CHAPTER 3 SYSTEM REQUIREMENTS AND ANALYSIS 4
3.1 System Requirements 4
3.2 Requirement Analysis 4
3.3 Feasibility Study 5
CHAPTER 4 SYSTEM DESIGN AND ARCHITECTURE 7
4.1 System Design 7
4.2 Use case diagram 7
4.3 DFD Diagram 8
4.4 System Architecture 9
CHAPTER 5 METHODOLOGY 11
5.1 DATA COLLECTION AND PREPROCESSING 11
5.2 Project Completion Plan 16
CHAPTER 6 17
RESULT AND ANALYSIS 17
6.1 Result 17
6.2 Output 18
CHAPTER 7 22
CONCLUSION, LIMITATION AND FUTURE ENHANCEMENTS 22

v
REFERENCES 24
APPENDICES 25

vi
LIST OF FIGURES

Title Page No
FIGURE 4.1 USE CASE DIAGRAM 8
FIGURE 4.2 DFD LEVEL 0 9
FIGURE 4.3 DFD LEVEL 1 9
FIGURE 4.4 SYSTEM ARCHITECTURE 10
FIGURE 5.1 SVM HYPERPLANE CONCEPT 12
FIGURE 5.2 KNN ALGORITHM 14
FIGURE 5.3 FLOW OF GRADIENT DESCENT ALGORITHM 15
FIGURE 5.4 ITERATIVE MODEL 16
FIGURE 6.1 ACCURACY OF FOUR ALGORITHMS 17
FIGURE 6.2 EVALUTAION OF ALGORTHMS USING DIFFERENT
PERFORMANCE METRICS 18

vii
LIST OF ABBREVIATIONS/ACRONYMS

ML Machine learning
CVD Cardiovascular Disease
AI Artificial Intelligence
WHO World Health Organization
ECG Electro Cardio Graphs
HD Heart Disease
CSS Cascading Style Sheets
HTML Hypertext Markup Language
UI User Interface
JS JavaScript
DFD Data Flow Diagram
UCI University of California
RF Random Forest
GB Gradient Boost
KNN K-Nearest Neighbors
SVM Support Vector Machine

DMLC Distributed Machine Learning Community

viii
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND

Heart diseases or cardiovascular diseases refers to the conditions that involve narrowed
or blocked blood vessels that can lead to a heart attack, chest pain or stroke.[3] CVDs
are considered to be one of the leading causes of premature death worldwide. The
various risk factors of heart diseases are: metabolic syndrome, hypertension, obesity,
unhealthy diet, and so on. Prediction of CVDS is one of the most important subjects in
the field of clinical data analysis. The risk factors alone make it difficult to identify
heart diseases on time. Furthermore, the devices used in the hospitals like the ECG
machines are expensive and the manual process is time consuming. Hence, scientists
have turned towards approaches like data mining and machine learning for predicting
diseases. ML is used in large quantities of data. Our aim is also to use Machine learning
(ML) algorithms to create a heart disease prediction system for the early detection of
CVDs and enable people to decide on a suitable approach for further treatment and
prevention. The algorithms to be used in these systems are: Support vector machine,
extreme gradient boost, random forest, KNN. Our aim is to train and test these
algorithms and finally use the one which gives the most accurate result to generate
output.

1.2 Motivation

Although there have been advancements in every sector of our nation, the medical field
still lacks the use of artificial intelligence in its systems. The procedures to get a medical
checkup done and waiting for the result is annoyingly time consuming. Further, some
factors go unnoticed during the diagnosis. Our plan is to develop an application easily
available to the user where they can enter their data and receive the predictions
regarding their heart conditions.

1
1.3 Statement of the problem

Heart disease is a major health problem. Many CVDs go undetected or get detected
late. People have to then spend more time and monetary resources on their health which
could have been prevented had they been alerted earlier. Not much research needs to
be done for projects like these as ample amounts of data can easily be received from
hospitals and test results. Projects like these would be beneficial to people even with
low incomes and busy schedules.

1.4 Project objective

● To compare 4 classification algorithms namely: SVM, Random Forest, KNN


and Extreme Gradient Boost for Heart Disease Prediction.

1.5 Significance of the study

The project can have the following significances: -


● To improve an intelligent clinical decision support system for the prediction of
heart disease.
● To enhance the result set of predictions by making it more relevant and decrease
the rate of heart disease.
● To boost the accuracy achieved by individual machine learning algorithms.

2
CHAPTER 2
LITERATURE REVIEW
Many developers, machine learning as well as deep learning experts have contributed
a lot and proposed different models for prediction of heart disease. We can find
numerous approaches on how these problems are solved using several machine
learning and deep learning algorithms.
The prediction depends on several features that are extracted from the data and how
different algorithms work on those features. Several works have been done related to
disease and heart prediction systems using different algorithms such as Naïve Bayes,
KNN, Logistic Regression and Classification tree in order to identify the high
performance for predicting heart disease.
According to Mai Shouman, Tim Turner and Rob Diagnosing Heart Disease Patients.
[4] Stocker applying Logistic Regression in Their results show that Logistic
Regression could achieve higher accuracy than neural network ensemble in the
diagnosis of heart disease patients.
K. Polaraju proposed Prediction of Heart Disease using Multiple Regression Model
and it proves that Multiple Linear Regression is appropriate for predicting heart disease
chance. [5] The work is performed using a training data set consisting of 3000
instances with 13 different attributes which has been mentioned earlier. The data set is
divided into two parts. 70% of the data are used for training and 30% used for testing.
Based on the results, it is clear that the classification accuracy of the
Regression algorithm is better compared to other algorithms.
Chala Beyene proposed a methodology to foretell the occurrence of HD to overcome
the problem of diagnosis of HD. [6] It improved the existence methodology by the
choosing Naïve Bayes, J48, and SVM for predicting the occurrence of HD for early
automatic diagnosis in short time in order to support the qualities of services and
reduce costs to save the life of individuals. This methodology uses various attributes
of HD in order to identify whether a patent has HD or not.

3
CHAPTER 3
SYSTEM REQUIREMENTS AND ANALYSIS

3.1 System Requirements

3.1.1 Software Requirements

3.1.1.1 Operating System


• Windows/Linux

3.1.1.2 Applications software


• Microsoft Office products such as MS-WORD, PowerPoint for documentation
and representation.
• HTML, JavaScript, CSS, Flask
• Visual Studio Code
• Sklearn, NumPy, Python, Google Colab
• Browser: Google Chrome/Firefox/Brave

3.2 Requirement Analysis

Requirement analysis results in the specification of software’s operational


characteristics indicates software’s interface with other system elements and
establishes constraints that software must meet. The requirements analysis task is a
process of discovery, refinement, modeling and specification. The scope, initially
established by us and refined during project planning, is refined in details. Models of
the required data, information and control flow and operations behavior are created.

3.2.1 Functional Requirements

The functional requirements of our system are:

● The system allows users to predict heart disease.


● The system allows users to contact admin.

4
3.2.2 Non-Functional Requirements

These requirements are not needed by the system but are essential for the better
performance of the sentiment engine. The points below focus on the non-functional
requirement of the system proposed.
● Performance: The system will predict the result with good accuracy.
● Functionality: The system will fulfill functional requirements.
● Availability: The system will give an accurate outcome if the input is genuine.
● Reliability: The system is reliable to provide promising output.
● Usability: System is very easy to use. It has a very simple and clean UI.

3.3 Feasibility Study

The feasibility study determines if the system can be built successfully with available
cost, time and effort. The study is conducted by analyzing the collected requirements.

3.3.1 Technical Feasibility

A number of issues have to be considered while doing a technical analysis. Understand


the different technologies involved in the proposed system before commencing the
project we have to be very clear about what are the technologies that are to be required
for the development of the new system. In our case, a computer with medium
specification and internet can run a web application which is created using Python
language along with HTML, CSS, JS. ML model is used for training and testing which
with some basic research can be implemented.

3.3.2 Operational Feasibility

In operational feasibility we take the measure of how well our system solves the
problems, and take advantage of the opportunities identified during scope definition
and how it satisfies the requirements identified in the requirements analysis phase of
system development. The developed system is reliable, maintainable, usable,
sustainable, supportable and affordable so our system is considered operationally
feasible.

5
3.3.3 Economical Feasibility

In economic feasibility we study the cost-benefit analysis of our proposed system to


make sure whether it is possible to implement it. We created the system with pre-
existing resources meaning the system is cost efficient. A user in possession of a
computer device with internet can use the proposed system making the system
economically feasible.

6
CHAPTER 4
SYSTEM DESIGN AND ARCHITECTURE
4.1 System Design

This chapter of the project document provides a system design of this project. System
design focuses on transforming the analysis model into the design model that takes into
account the non-functional requirements and constraints described in the problem
statement and requirement analysis sections discussed earlier. Up to now we were in
the problem domain. System design is the first part to get into the solution domain in
software development. This chapter contains and describes use case diagrams, DFD
diagrams etc.

4.2 Use case diagram

Use-case diagrams describe the high-level functions and scope of a system. These
diagrams also identify the interactions between the system and its actors. The use cases
and actors in use-case diagrams describe what the system does and how the actors use
it, but not how the system operates internally. We can explain and design our concept
in a more structured way with the help of these diagrams.

7
Use case Diagram for Heart Disease Prediction is discussed below:

Figure 4.1 Use Case Diagram

4.3 DFD Diagram

A data-flow diagram is a way of representing a flow of data through a process or a


system. The DFD also provides information about the outputs and inputs of each entity
and the process itself. A data-flow diagram has no control flow — there are no decision
rules and no loops. There are three levels of DFD diagrams. They are level 0, level 1
and level 2. But we have only discussed level 0 and level 1 diagram which are
discussed below:

8
Figure 4.2 DFD Level 0

Figure 4.3 DFD Level 1

4.4 System Architecture

Our system can be accessed with or without logging in. Users who just seek results
rather than keeping results for future use can directly proceed by providing the input

9
to the system. Users who like to keep a record of their result can go through logging
in. The input from either user goes into the model which processes the data and
provides the outcome. Outcome or the result is the probability if the patient / user is
likely to have heart disease. The user then can view the result. If the user has logged
in then the result is stored in the database as well, else no record is kept just shown.
Simple architecture of system is shown below:

Figure 4.4 System Architecture

10
CHAPTER 5
METHODOLOGY

5.1 DATA COLLECTION AND PREPROCESSING

5.1.1 Data Collection

Data collection is the key in any system. To train the model numerous amounts of
reliable data is required. We collected our data from a heart disease data set under UCI
Machine Learning repository from Kaggle.[7] The data we use in this model has 1024
instances. The data thus collected was split into 80% training and 20% testing to provide
a reliable result.
14 parameters have been taken into account for research purposes in our project. They
are:
1.Age
2.Sex
3.Chest pain type
4.Resting blood pressure
5.Serum Cholesterol
6.Fasting blood Sugar
7.Resting ECG
8.Max Heart rate achieved
9.Exercise induced angina
10.ST depression induced by exercise relative to rest
11.Peak exercise St segment
12.Number of major Vessels colored by fluoroscopy
13.Thalasemia
14.Diagnosis of heart disease (target)

5.1.2 Data Preprocessing

The dataset used was the Heart disease Dataset which is a combination of 4 different
databases, but only the UCI Cleveland dataset was used. This database consists of a
total of 76 attributes but all published experiments refer to using a subset of only 14

11
features. The data set does not contain any missing or repeated values. Therefore, we
have used the already processed UCI Cleveland dataset available in the Kaggle website
for our analysis.

5.1.3 Description of Algorithm Use

5.1.3.1 Support Vector Machine

SVM is a rough realization of structure-based minimization of risk and a categorization


of linear/non-linear data. SVMs maximize the margin around the separating
hyperplane. Each subset of training samples (support vectors) specifies decision
boundary functions. SVM comprises three steps, the support vector creation, formation
of maximal distance between points found and perpendicular decision boundary. This
is the basic type of SVM called linear SVM. The linear SVM hyperplane’s concept is
shown in figure.

Figure 5.1 SVM Hyperplane Concept

[Source: https://www.mdpi.com/2411-9660/6/5/87] [Accessed Dec. 13, 2022]


The maximal margin linear classifier is an inapplicable approach for many practical
problems. For practical application, where a non-linear separable data set is to be
separated by hyperplane, the non-linear data are to be mapped to another feature space
through Kernel functions. In the original input space, the data points are separated by
hyperplane, and Kernel functions map the non-linear training samples to high
dimensional space. Then, an algorithm search for the best hyperplane to separate the
transformed data into two different classes is carried out. The margin of the hyperplane
is maximized for classification purposes while minimizing the classification errors. The
algorithm predicts the risk of heart disease in a multi-dimensional hyperplane and

12
categorizes the data into different labels optimally by creating the margin between data
clusters.

5.1.3.2 Random Forest


Random Forest, one of the most accurate machine learning algorithms, is a decision
tree-based ensemble classifier approach which contains flowchart like tree structure.
RF is a combination of tree-structured classifiers {h(x, n)}, where for “x” data input
and “n” are distributed random trees for the classification of data. Each one of the
decision trees in a random tree-structured forest, casts a vote that indicates the decision
about the class of data. RF uses the Gini-index for determining the final class in each
tree. This algorithm chooses optimal attributes from “M” total number of input
attributes at random for each tree. With this selected attribute, the best possible split is
created using the Gini index to develop a decision tree model. This is an iterative
process for each of the branches until the terminating nodes are too small to split further.
For data set X having “n” classes, Gini-index, Gini(X) can be defined by:

Gini(X) = ∑𝒏𝒊=𝟏(𝑹𝒋)𝟐
where “Rj” is the relative frequency of class j in data set “X”. In RF, the split at which
the Gini index is lowest is chosen at the split value.

5.1.3.3 K-Nearest Neighbor

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised


machine learning algorithm that can be used to solve both classification and regression
problems. A supervised machine learning algorithm (as opposed to an unsupervised
machine learning algorithm) relies on labeled input data to learn a function that
produces an appropriate output when given new unlabeled data.

A classification problem has a discrete value as its output. Here, the dependent variable
‘Target’ is the output. The KNN algorithm assumes that similar things exist nearby. In
other words, similar things are near to each other. To find the distance we use the
Euclidean-distance formula to calculate the similarity/distance.

d(x,y) = √∑𝒏𝒊=𝟏(𝒙𝒊 − 𝒚𝒊 )𝟐
13
Here, the ‘k’ in the k-nearest neighbor is the number of neighbors that our model
takes into consideration to predict the outcome.

Figure 5.2 KNN Algorithm

[Source: https://www.datacamp.com/tutorial/k-nearest-neighbor-classification-
scikit-learn] [Accessed Dec. 13, 2022]

5.1.3.4 Extreme Gradient Boosting (XG Boost)

XGBoost (Extreme Gradient Boosting) and Gradient Boosting(GB) are both ensemble
tree methods that use the gradient descent architecture to boost weak learners.
XGBoost, however, strengthens the basic GB architecture through system optimization
and algorithmic improvements. XGboost is a package that belongs to the Distributed
Machine Learning Community (DMLC). GB is a stagewise additive modeling. First, a
weak classifier is fit to the data. It fits one weaker classifier to improve the performance
of the current model, without making changes in the previous classifier, and this process
continues. Each new classifier has to consider where the previous classifiers were not
performing well. General Boosting algorithm flow is shown in Figure below. First, we
estimate y1 by fitting the data to a decision tree, and the second tree is fitted based on
the residual from the previous step which is y-y1 and this process continues. The
algorithm error can be decreased efficiently by analogy.

14
Figure 5.3 Flow of Gradient Descent Algorithm

[Source: https://www.sciencedirect.com/science/article/pii/S1319157820304936]
[Accessed Dec. 13, 2022]

15
5.2 Project Completion Plan

The tasks of this project will be assigned to individual members based on their interest
in the particular field. There will be weekly meetings to evaluate the project's progress
and propose project restructuring if required. Regular discussions with the project
mentor will be held to gather suggestions related to the project.

We Will be using the iterative development model for this project.

Figure 5.4 Iterative Model

[Source: https://wishdesk.com/blog/what-is-iterative-design-approach]

[Accessed Dec. 13, 2022]

16
CHAPTER 6
RESULT AND ANALYSIS
6.1 Result

The aim of the project was to compare the performance of four classification
algorithms (SVM, RF, XGBoost, and KNN) and use the algorithm with high accuracy
rate for predicting heart disease.
After implementing and evaluating the algorithms using various performance metrics
such as accuracy, precision, recall, and F1-score, it was found that XGBoost
outperformed the other three algorithms in terms of accuracy. Specifically, XGBoost
achieved an accuracy of 98%, while SVM, RF, and KNN achieved accuracies of 67%,
96%, and 74%, respectively.

Figure 6.1 Accuracy of four algorithms

17
Figure 6.2 Evaluation of algorithms using different Performance Metrics

6.2 Output

We got the Output as follows:

1.Landing Page

18
2.Home Page

3.Contact Us

19
4. About Us

5.Quick Predict

20
6. Final report

21
CHAPTER 7

CONCLUSION, LIMITATION AND FUTURE


ENHANCEMENTS

7.1 Conclusion
Based on the comparison of classification algorithms for heart disease prediction in the
project report, it was found that four algorithms - SVM, RF, XGBoost, and KNN - were
evaluated. Among these algorithms, XGBoost was found to have the highest accuracy
result. Therefore, XGBoost could be considered as the most suitable algorithm for
predicting heart disease. However, it is important to note that the performance of the
algorithm may depend on the specific dataset and the features used in the analysis.

7.2 Limitation
One limitation of the comparison of classification algorithms for heart disease
prediction is that the study was based on a single dataset. Therefore, the generalizability
of the results to other datasets and real-world scenarios may be limited. Another
limitation is that only four performance metrics were considered - accuracy, precision,
recall and F1 score. Other metrics such as AUC-ROC and Matthew’s correlation
coefficient could have been used to provide a more comprehensive evaluation of the
algorithms. Lastly, the project did not consider the interpretability of the algorithms,
which is important in medical applications where explanations for the predictions are
required.

22
7.3 Future Enhancements
There are several possible future enhancements that could be considered for the
comparison of classification algorithms for heart disease prediction:
• Collecting and using a larger, more recent dataset to train and evaluate the
models.
• Incorporating additional performance metrics, such as AUC-ROC, and
Matthew’s correlation coefficient, for a more comprehensive evaluation.
• Evaluating other classification algorithms, such as Naive Bayes, Decision
Trees, or Neural Networks, to determine if they could potentially outperform
the current set of algorithms.
• Deploying the model as a mobile application to make it more accessible to the
general public.

23
REFERENCES

[1] World Health Organization, “Cardiovascular Diseases,”

https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)

[Accessed Dec. 13, 2022]

[2] “Frontiers in Cardiovascular Medicine,”

https://www.frontiersin.org/articles/10.3389/fcvm.2022.854287/full

[Accessed Dec. 13, 2022]

[3] “Project on Heart Disease Prediction Using Machine Leaning,”

https://www.projectpro.io/article/heart-disease-prediction-using-machine-
learning-project/615 [Accessed 13 December]

[4] M. T. T. a. S. R. Shouman, “ijiet,” 2012. [Online]. Available:

http://www/ijiet.org/papers/114-K [Accessed 09 November 2020].

[5] D.D.P.K. Polaraju, “Prediction of Heart Disease using Multiple Linear Regression

Model.”2017.

[6] P. K. Chala Beyene, " “Survey on Prediction and Analysis the Occurrence of

Heart Disease Using Data Mining Techniques.," International Journal of Pure and

Applied Mathematics, 2018.

[7] "kaggle,"2018. [2018]. Available: https://www.kaggle.com/ronitf/heart-disease uci

[Accessed 9 March 2020]..[Accessed 05 January 2021]

24
APPENDICES
Dataset Description
The dataset used in this project is the UCI Cleveland Heart Disease dataset, which
contains 1026 samples with 14 features. The features are as follows:
• age: age in years
• sex: sex (1 = male; 0 = female)
• cp: chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal
pain; 4 = asymptomatic)
• trestbps: resting blood pressure (in mm Hg on admission to the hospital)
• chol: serum cholesterol in mg/dl
• fbs: fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
• restecg: resting electrocardiographic results (0 = normal; 1 = having ST-T wave
abnormality; 2 = showing probable or definite left ventricular hypertrophy by
Estes' criteria)
• thalach: maximum heart rate achieved
• exang: exercise induced angina (1 = yes; 0 = no)
• oldpeak: ST depression induced by exercise relative to rest
• slope: the slope of the peak exercise ST segment (1 = upsloping; 2 = flat; 3 =
downsloping)
• ca: number of major vessels (0-3) colored by fluoroscopy
• thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
• target: result (1=true; 0=false)
The target variable is the presence of heart disease, with a value of 0 indicating the
absence of heart disease and a value of 1 indicating the presence of heart disease.

25
Model Evaluation Results
Model Accuracy Precision Recall F1 Score
KNN 0.74 0.74 0.77 0.75
Random Forest 0.96 0.97 0.96 0.97
SVM 0.67 0.66 0.75 0.70
XGBoost 0.98 1.00 0.97 0.99

26

You might also like