0% found this document useful (0 votes)

20 views29 pages

Proj Report

Uploaded by

Chauhan Ji Chauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views29 pages

Proj Report

Uploaded by

Chauhan Ji Chauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

A

Mini Project Report on

“Heart Disease Prediction Using Machine Learning”

In Partial Fulfillment of the Requirements

For the Degree of
Bachelor of Technology

Submitted by:
Jigeishu Srivastava
(2201100220016)

Under the Guidance of:

Dr. Alka Verma

Department of Instrumentation and Control Engineering

INSTITUTE OF ENGINEERING AND RURAL TECHNOLOGY,

PRAYAGRAJ

Session: 2024-25

Abstract

In this digital world, data is an asset, and enormous data was generated in all the fields. Data in the healthcare
industry consists of all the information related to patients. Now day’s cardiovascular diseases are growing
rapidly by busy and stress full life. All type of age groups is under the threat of these chronic diseases so there
is a need of detection of these diseases by using symptoms or reports. Early identification and treatment are the
best available option for the infected people. Main objective behind to develop a system is to help the doctors
to cross verify their diagnosed results which gives promising solution over existing death rates.
The recent development in medical supportive technologies based on data mining, deep learning plays an
important role in detecting cardiovascular diseases that are caused by taking many factors in consideration
such as age, type of chest pain, blood pressure, cholesterol levels, etc.
By using our proposed work try to implement a promising solution for detection of heart disease. The Given
heart disease prediction system enhances medical care and reduces the cost. This project gives us significant
knowledge that can help us predict the patients with heart disease.
Acknowledgement

I would like to express my deepest gratitude to Dr. Alka Verma, who provided invaluable
guidance and support throughout the development of this mini project. Her expertise and
encouragement have been instrumental in the successful completion of this work.

I am also grateful to my peers and family for their constant support and motivation during this
project. Their belief in my abilities has been a source of strength and inspiration.

Finally, I extend my thanks to Institute of Engineering and Rural Technology, Prayagraj for
providing the resources and environment necessary for learning and experimentation, which
significantly contributed to the completion of this project.
Contents

Heart Disease Detection Using Machine Learning .......................................................

Chapter 1 Introduction................................................................................................. 6

1. Introduction ............................................................................................................. 6
1.1. Preface.............................................................................................................. 6
1.2. Motivation ........................................................................................................ 6
1.3 Problem Statement ................................................................................................ 7
1.4 Objectives.............................................................................................................. 8
1.5 Scope And Limitations.......................................................................................... 8
1.6 Organization Of Project ........................................................................................ 8

2. Literature Survey ................................................................................................ 9

3. Methodology ........................................................................................................ 7
3.1 System Architecture ............................................................................................. 7
3.2 Dataset Details ..................................................................................................... 8
3.3 Machine Learning ................................................................................................ 8
3.3.1 Supervised Machine Learning.......................................................................... 8
3.3.2 Unsupervised Machine Learning ................................................................... 11
3.4 Supervised Algorithms ....................................................................................... 11
3.4.1 Random Forest ............................................................................................... 11
3.4.2 K-Nearest Neighbour ..................................................................................... 12
3.4.3 Logistic Regression ........................................................................................ 13
3.4.4 Xgboost .......................................................................................................... 14

4. Implementation.................................................................................................. 15
4.1 Existing System.................................................................................................. 15
4.2 Proposed System ............................................................................................ 15
4.2.1 Data Collection............................................................................................... 16
4.2.2 Data Pre-Processing ....................................................................................... 16
4.2.3 Feature Selection ............................................................................................ 17
4.2.4 Model Selection ............................................................................................. 18

5. Results………………………………………………………………………….
5.1 Hardware Platform Used .................................................................................... 20
5.2 Libraries And Software Platform Used .............................................................. 20
5.3 Visualization Results........................................................................................... 20

6. Conclusion .......................................................................................................... 27
1. INTRODUCTION

1.1. PREFACE

Machine Learning is a powerful tool that enables us to extract valuable information from data that
was previously unknown or implicit. The domain of machine learning is extensive and multifaceted
and it encompasses various classifiers such as supervised, unsupervised, and ensemble learning, that
can be employed to forecast and assess the precision of a particular dataset. The implementation of
machine learning is increasing day by day, and it has the potential to revolutionize many fields,
including healthcare. Cardiovascular disease (CVD) is an area in healthcare that can significantly gain
from the implementation of machine learning techniques. With 17.9 million fatalities globally, as per
the World Health Organization, CVD is currently the primary cause of death in adults. To help
address this problem, our project aims to predict which patients are likely to be diagnosed with CVD
based on their medical history. By recognizing patients who exhibit symptoms for example, chest pain
or elevated blood pressure, we can help diagnose the illness with fewer medical examinations and
provide more efficient treatments. Our project focuses on three data mining techniques: XGBoost,
KNN, and Random Forest Classifier. By using these techniques in combination, we are able to
achieve an accuracy rate of above 95%, which is better than previous systems that relied on only one
data mining technique. The objective of our project is to classify by examining their medical
characteristics, such as age, gender, fasting sugar levels, chest pain, and more, it is possible to predict
whether a person is likely to have heart disease or not.

To accomplish this, we selected a dataset from the Kaggle repository this dataset was created by
combining different datasets already available independently but not combined before, that contains
medical history and characteristics of the patient. We trained our algorithms using the 12 medical
attributes of each patient and used XGBoost, Random Forest and KNN to classify the patients based
on their medical history. We found that XGBoost was the most efficient algorithmMOTIVATION

The main motivation of doing this research is to present a heart disease prediction model for the
prediction of occurrence of heart disease. Further, this research work is aimed towards identifying the
best classification algorithm for identifying the possibility of heart disease in a patient. This work is
justified by performing a comparative study and analysis using classification algorithms namely,
XGBoost, Logistic Regression, KNN, and Random Forest are used at different levels of evaluations.
Although these are commonly used machine learning algorithms, the heart disease prediction is a vital
task involving highest possible accuracy. Hence, these algorithms are evaluated at numerous levels
and types of evaluation strategies

1.2 PROBLEM STATEMENT

The major challenge in heart disease is its detection. There are instruments available which can
predict heart disease but either they are expensive or are not efficient to calculate chance of heart
disease in human. Early detection of cardiac diseases can decrease the mortality rate and overall
complications. However, it is not possible to monitor patients every day in all cases accurately and
consultation of a patient for 24 hours by a doctor is not available since it requires more sapience, time
and expertise. Since we have a good amount of data in today’s world, we can use various machine
learning algorithms to analyse the data for hidden patterns. The hidden patterns can be used for health
diagnin medicinal data.
1.3 OBJECTIVES

1. Data collection from different sources and pre-processing

2. To develop machine learning model to predict future possibility of heart disease by

implementing Logistic Regression, KNN, XGboost, Decision Tree

3. To determine significant risk factors based on medical data set which may lead to heart disease.

4. To analyze feature selection methods and understand their working principle

1.4 SCOPE AND LIMITATIONS

Scope

1. The system will help identify important factors that lead to a heart disease.

2. The main scope of the project is to detect heart disease.

3. It will help the patients to obtain results quick and diagnose as early as possible

1.5 ORGANIZATION OF PROJECT

The project is organized as follows:

Chapter 1 gives introduction to the project. Chapter 2 provides

literature survey of the project.
Chapter 3 explains materials and methods required to complete the project. Chapter 4 provides
implementation of project.
Chapter 5 provides deployment phase of the project. Chapter 6 gives conclusion
of the project
Chapter 7 discusses future scope of the project.
2. LITERATURE SURVEY

Ijaz Bo Jin, Chao Che et al. (2018) proposed a “Predicting the Risk of Heart Failure with EHR Sequential Data
Modeling” model designed by applying neural network. This paper used the electronic health record (EHR) data
from real-world datasets related to congestive heart disease to perform the experiment and predict the heart
disease before itself. We tend to used one-hot encryption and word vectors to model the diagnosing events and
foretold coronary failure events victimization the essential principles of an extended memory network model.
By analyzing the results, we tend to reveal the importance of respecting the sequential nature of clinical records.
Aakash Chauhan at al. (2018) presented “Heart Disease Prediction using Evolutionary Rule Learning”. This
study eliminates the manual task that additionally helps in extracting the information (data) directly from the
electronic records. To generate strong association rules, we have applied frequent pattern growth association
mining on patient’s dataset. This will facilitate (help) in decreasing the number of services and shown that
overwhelming majority of the rules help within the best prediction of coronary sickness.

Ashir Javeed, Shijie Zhou et al. (2017) designed “An Intelligent Learning System based on Random Search
Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection”. This paper uses
random search algorithm (RSA) for factor selection and random forest model for diagnosing the cardiovascular
disease. This model is principally optimized for using grid search algorithmic program. Two forms of
experiments are used for cardiovascular disease prediction. In the first form, only random forest model is
developed and within the second experiment the proposed Random Search Algorithm based random forest
model is developed. This methodology is efficient and less complex than conventional random forest model.
Comparing to conventional random forest it produces 3.3% higher accuracy. The proposed learning system can
help the physicians to improve the quality of heart failure detection.

“Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques” proposed by Senthilkumar
Mohan, Chandrasegar Thirumalai et al. (2019) was efficient technique using hybrid machine learning
methodology. The hybrid approach is combination of random forest and linear method. The dataset and subsets
of attributes were collected for prediction. The subset of some attributes was chosen from the pre-processed
knowledge (data) set of cardiovascular disease. After prep-processing, the hybrid techniques were applied and
diagnosis the cardiovascular disease.

K. Prasanna Lakshmi, Dr. C.R.K.Reddy (2015) designed “Fast Rule-Based Heart Disease Prediction using
Associative Classification Mining”. In the proposed Stream Associative Classification Heart Disease Prediction
(SACHDP), we used associative classification mining over landmark window of data streams. This paper
contains two phases: one is generating rules from associative classification mining and next one is pruning the
rules using chi-square testing and arranging the rules in an order to form a classifier.

M.Satish, et al. (2015) used different Data Mining techniques like Rule based, Decision Tree, Naive Bayes, and
Artificial Neural Network. An efficient approach called pruning classification association rule (PCAR) was
used to generate association rules from cardiovascular disease warehouse for prediction of heart disease. Heart
Disease data warehouse was used for pre-processing for mining. All the above discussed data mining technique
were described.

Lokanath Sarangi, Mihir Narayan Mohanty, Srikanta Pattnaik (2015) “An Intelligent Decision Support System
for Cardiac Disease Detection”, designed a cost-efficient model by using genetic algorithm optimizer technique.
The weights were optimized and fed as an input to the given network. The accuracy achieved was 90% by using
the hybrid technique of GA and neural networks.

“Prediction and Diagnosis of Heart Disease by Data Mining Techniques” designed by Boshra Bahrami,
Mirsaeid Hosseini Shirvani. This paper uses various classification methodologies for diagnosing cardiovascular
disease. Classifiers Like KNN, SVM classifier and Decision Tree are used to divide the datasets. Once the
classification and performance evaluation the Decision tree is examined as the best one for cardiovascular
disease prediction from the dataset.

Mamatha Alex P and Shaicy P Shaji (2019) designed “Prediction and Diagnosis of Heart Disease Patients using
Data Mining Technique”. This paper uses techniques of Artificial Neural Network, KNN, Random Forest and
Support Vector Machine. Comparing with the above-mentioned classification techniques in data mining to
predict the higher accuracy for diagnosing the heart disease is Artificial Neural Network.
3. METHODOLOGY

3.1 SYSTEM ARCHITECTURE

The system architecture gives an overview of the working of the system. The working of this
system is shown below:

Figure 1: Proposed System

3.2 DATASET DETAILS

Dataset Link: https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction

Dataset Attributes

1. Age: age of the patient [years]

2. Sex: sex of the patient [M: Male, F: Female]

3. ChestPainType: chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-
Anginal Pain, ASY: Asymptomatic]
4. RestingBP: resting blood pressure [mm Hg]

5. Cholesterol: serum cholesterol [mm/dl]

6. FastingBS: fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]

7. RestingECG: resting electrocardiogram results [Normal: Normal, ST: having ST-T wave
abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH:
showing probable or definite left ventricular hypertrophy by Estes' criteria]
8. MaxHR: maximum heart rate achieved [Numeric value between 60 and 202]

9. ExerciseAngina: exercise-induced angina [Y: Yes, N: No]

10. Oldpeak: oldpeak = ST [Numeric value measured in depression]

11. ST_Slope: the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, down:
downsloping]
12. HeartDisease: output class [1: heart disease, 0: Normal]

3.3 MACHINE LEARNING

In machine learning, classification refers to a predictive modelling problem where a class label is
predicted for a given example of input data.

3.3.1 SUPERVISED MACHINE LEARNING

As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based on
the training, the machine predicts the output. Here, the labelled data specifies that some of the
inputs are already mapped to the
output. More preciously, we can say; first, we train the machine with the input and corresponding
output, and then we ask the machine to predict the output using the test dataset. The main goal of
the supervised learning technique is to map the input variable(x) with the output variable(y).

Categories of Supervised Machine Learning:

Supervised machine learning can be classified into two types of problems, which are given below:
a) Classification

b) Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output variable is
categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms
predict the categories present in the dataset.
b) Regression

Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as
market trends, weather prediction, etc.
Some popular Regression algorithms are given below:

• Simple Linear Regression Algorithm

• Multivariate Regression Algorithm

• Decision Tree Algorithm

• Lasso Regression
3.3.2 UNSUPERVISED MACHINE LEARNING

Unsupervised learning is different from the supervised learning technique; as its name suggests,
there is no need for supervision. It means, in unsupervised machine learning, the machine is trained
using the unlabeled dataset, and the machine predicts the output without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labeled, and the model acts on that data without any supervision. The main aim of the
unsupervised learning algorithm is to group or categories the unsorted dataset according to the
similarities, patterns, and differences. Machines are instructed to find the hidden patterns from
the input dataset.

3.4 SUPERVISED ALGORITHMS

3.4.1 RANDOM FOREST

Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output. The
greater number of trees in the forest leads to higher accuracy and prevents the problem of over
fitting.

Since the random forest combines multiple trees to predict the class of the dataset, it is possible
that some decision trees may predict the correct output, while others may not. But together, all
the trees predict the correct output.
Therefore, below are two assumptions for a better Random Forest classifier:

1. There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result.
2. The predictions from each tree must have very low correlations.

Advantages:

• Random Forest is capable of performing both Classification and Regression tasks.

• It is capable of handling large datasets with high dimensionality.

• It enhances the accuracy of the model and prevents the over fitting issue.

Disadvantages:

• Although Random Forest can be used for both classification and regression tasks, it is not
more suitable for Regression tasks.

3.4.2 K-NEAREST NEIGHBOUR

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique. K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to the available
categories. K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm. KNN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems. K-NN is a non-parametric
algorithm, which means it does not make any assumption on underlying data. It is also called a
lazy learner algorithm because it does not learn from the training set immediately instead it stores
the dataset and at the time of classification, it performs an action on the dataset.
3.4.3 LOGISTIC REGRESSION

Logistic regression is one of the most popular Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for predicting the categorical dependent variable
using a given set of independent variables. Logistic regression predicts the output of a
categorical dependent variable. Therefore, the outcome must be a categorical or discrete value. It
can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1,
it gives the probabilistic values which lie between 0 and 1. Logistic Regression is much similar to
the Linear Regression except that how they are used. Linear Regression is used for solving
Regression problems, whereas Logistic regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1). The curve from the logistic function indicates the
likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not
based on its weight, etc. Logistic Regression is a significant machine learning algorithm because
it has the ability to provide probabilities and classify new data using continuous and discrete
datasets.

Figure 2: Logistic Regression

3.4.4 XGBOOST

XGBoost is an efficient implementation of Gradient Boosted decision trees that is designed to

improve both the speed and performance of models. The algorithm creates decision trees
sequentially, with weights assigned to all independent variables that are then fed into the tree for
prediction. The weight of variables that are predicted incorrectly is increased, and those variables
are fed into a second decision tree. The individual classifiers or predictors are then combined to
create a strong and precise model that can work on regression, classification, ranking, and user-
defined prediction tasks.

One of the strengths of XGBoost is its built-in L1 and L2 regularization, which helps prevent
over fitting and makes it a regularized form of GBM. When using the Scikit Learn library, alpha
and lambda hyper-parameters related to regularization are passed to XGBoost. Alpha is used for
L1 regularization, while lambda is used for L2 regularization.

Another strength of XGBoost is its ability to leverage parallel processing to execute models
much faster than GBM. When using the Scikit Learn library, the nthread hyper-parameter is used
for parallel processing, representing the number of CPU cores to be used. If you want to use all
available cores, don't specify a value for nthread, and the algorithm will detect them
automatically.

XGBoost also has built-in capabilities to handle missing values. When the algorithm encounters
a missing value at a node, it tries both left and right-hand splits and learns the way that leads to a
higher loss for each node. It then does the same when working on testing data.

Cross-validation is another feature of XGBoost that allows the user to run a cross- validation at
each iteration of the boosting process, making it easy to get the exact optimum number of
boosting iterations in a single run. This is unlike GBM, where a grid search must be run, and only
a limited number of values can be tested.
4. IMPLEMENTATION

4.1 EXISTING SYSTEM

Heart disease is even being highlighted as a silent killer which leads to the death of a person
without obvious symptoms. The nature of the disease is the cause of growing anxiety about the
disease & its consequences. Hence continued efforts are being done to predict the possibility of
this deadly disease in prior. So that various tools & techniques are regularly being experimented
with to suit the present-day health needs. Machine Learning techniques can be a boon in this
regard. Even though heart disease can occur in different forms, there is a common set of core risk
factors that influence whether someone will ultimately be at risk for heart disease or not. By
collecting the data from various sources, classifying them under suitable headings & finally
analyzing to extract the desired data we can conclude. This technique can be very well adapted to
the do the prediction of heart disease. As the well-known quote says “Prevention is better than
cure”, early prediction & its control can be helpful to prevent & decrease the death rates due to
heart disease.

4.2 PROPOSED SYSTEM

The working of the system starts with the collection of data and selecting the important attributes.
Then the required data is pre-processed into the required format. The data is then divided into
two parts training and testing data. The algorithms are applied and the model is trained using the
training data. The accuracy of the system is obtained by testing the system using the testing data.
This system is implemented using the following modules.

1. Data Collection

2. Data Pre-Processing

3. Feature Selection

4. Model Selection
4.2.1 DATA COLLECTION

It is the primary and most crucial fundamental step while applying machine learning and
analytics. The data required in this project is the patient’s medical data. We have collected the
dataset from Kaggle which includes all the required information for prediction. The features that
the dataset includes are medical information like age, sex, chest paint type, resting blood
pressure, cholesterol, fasting blood sugar, old peak etc. The dataset consists of 918 observations
having 14 attributes.

Figure 4: Parameters of Selected dataset

4.2.2 DATA PRE-PROCESSING

This is one of the most crucial tasks in the process of analytics. Often it is observed that more than
half of the total time of analytics process is taken by pre-processing phase. It is an important step for
the creation of a machine learning model. Initially, data may not be clean or in the required
format for the model which can cause misleading outcomes. In pre-processing of data, we
transform data into our required format. It is used to deal with noises, duplicates, and missing
values of the dataset. Data pre-processing has the activities like importing datasets, splitting
datasets, attribute scaling, etc. Pre-processing of data is required for improving the accuracy of
the model.
4.2.3 FEATURE SELECTION

Once we have the required data, next step is featuring extraction. Many times, it happens that
some features do not contribute in evaluation or have negative impact on the accuracy. Feature
selection is the step where we try to reduce number of features and try to create new features from
existing ones. These new features now created should summarize the information obtained from
existing features. The final features to be considered while prediction can be identified using
correlation matrix shown in following image:

Figure 5: Correlation Matrix

4.2.4 MODEL SELECTION

It is the process to select one final algorithm for concerned purpose. It is decided by observing the
accuracy by applying multiple algorithms. We can use logistic regression, XGBoost, KNN,
random forest, etc. The final accuracy depends of the type of model we select.

While selecting the algorithm, we have to compare the accuracies.

Comparative analysis is performed among algorithms and the algorithm that gives the highest
accuracy is used for heart disease prediction

Figure 6: Connection of widgets in Orange

In this project, we have compared following ML algorithms and obtained corresponding
accuracies:

• Random Forest: 87.50 %

• Logistic Regression: 86.96 %

• Naïve Bayes: 85.87 %

• K-Nearest Neighbor: 81.52 %

Figure: Model and Accuracy

5. RESULTS

5.1 HARDWARE PLATFORM USED

The hardware requirement may serve as the basis for a contract for the implementation of the
system and should therefore be complete and consistent in specification.
The hardware used for the system is mentioned below.

• PROCESSOR: Intel CORE i3 or above

• RAM: minimum 4.00GB

• HARD DISK: minimum 100GB

It should be noted that better the hardware facilities available, higher would-be response time of
the system.

5.2 LIBRARIES AND SOFTWARE PLATFORM USED

The software requirement document is the specification of the system. The software requirement
provides a basis for creating the software requirements specification.
OPERATING SYSTEM: Windows

SYSTEM TYPE: 64-bit, intel CORE i5 SOFTWARE: Jupyter

Notebook, VS Code, Anaconda TECHNOLOGIES: Python
LIBRARIES: pandas, NumPy, pickle, sklearn, xgboost, Seaborn etc

5.3 VISUALIZATION RESULTS

Based on the findings obtained from various algorithms used for identifying patients who have
been diagnosed with heart disease, it is observed that Logistic Regression, Random Forest
Classifier, and KNN have provided better results as compared to other techniques such as
Logistic Regression, XGBoost, SVM and Decision Tree.
Algorithms used in previous research studies. The highest level of accuracy possible by
Random Forest and Logistic Regression is either greater than or nearly equal to the accuracy
that was obtained from earlier research studies. It can be inferred that the improvement in
accuracy is due to the increased number of attributes used from the medical dataset that was used
in the project.

Figure 8: Shows the risk of heart Disease based on age

Figure 9: Shows the risk of heart Disease based on restingbp

Figure 10: Shows the presence of heart Disease based on Gender

Figure 11: Shows presence of heart Disease based on chestpain

Additionally, the study has revealed that Logistic Regression and Random Forest outperform
KNN in the detection of patients who are diagnosed with the possibility of having a heart
disease, indicating that Logistic Regression and Random Forest Classifier are more effective in
diagnosing heart disease.

In this project, the data was formulated in different formulations and the model was trained
using Logistic Regression tree algorithm with above 87% accuracy.
6. CONCLUSION

Cardiovascular disease (CVD) is one of the leading causes of deaths happening worldwide, making
early detection and intervention crucial for improving patient outcomes. To address this need, a
machine learning technique was used to develop a model using patient medical history data to predict
the probability of fatal heart disease. The dataset includes variables such as chest pain, sugar levels,
and blood pressure, which are important indicators of heart health.

These classification algorithms - Logistic Regression, Random Forest Classifier, and KNN - were
utilized to develop the model, which achieved an accuracy rate of over 87%. The accuracy of the
model was further improved by increasing the size of the dataset, enabling the identification of more
subtle patterns and risk factors.

The application of machine learning techniques in medical diagnosis has several benefits, including
increased speed and accuracy of diagnoses, reduced costs, and improved patient outcomes. By
analyzing large amounts of data and identifying complex patterns, machine learning algorithms can
provide valuable insights into patient health that may not be immediately apparent to human
clinicians.

20 End-to-End Data Science Projects For A Junior Portfolio
No ratings yet
20 End-to-End Data Science Projects For A Junior Portfolio
7 pages
Cognitive Computing Model Brief - Hospital Admissions and ED Visits
No ratings yet
Cognitive Computing Model Brief - Hospital Admissions and ED Visits
9 pages
Report Heart
No ratings yet
Report Heart
62 pages
Project Report
No ratings yet
Project Report
26 pages
DocScanner 14-Mar-2025 11-59
No ratings yet
DocScanner 14-Mar-2025 11-59
64 pages
Final Heart Disease Prediction
No ratings yet
Final Heart Disease Prediction
26 pages
Project Review 2
No ratings yet
Project Review 2
18 pages
Heart Disease Prediction System
No ratings yet
Heart Disease Prediction System
22 pages
30 - Heart Disease Prediction
No ratings yet
30 - Heart Disease Prediction
50 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
60 pages
Bala
No ratings yet
Bala
28 pages
Heart Disease Paper
No ratings yet
Heart Disease Paper
10 pages
Heart - Front Page
No ratings yet
Heart - Front Page
9 pages
Group 6
No ratings yet
Group 6
68 pages
Mini Report2
No ratings yet
Mini Report2
40 pages
A Study of Heart Disease Diagnosis Using Machine Learning and Dat
No ratings yet
A Study of Heart Disease Diagnosis Using Machine Learning and Dat
52 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
112 pages
T.John Institute of Technology: Visvesvaraya Technological University
No ratings yet
T.John Institute of Technology: Visvesvaraya Technological University
29 pages
Project Documentation
No ratings yet
Project Documentation
45 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Phase 1 Project Report
No ratings yet
Phase 1 Project Report
44 pages
Heart Disease
No ratings yet
Heart Disease
19 pages
Review 2
No ratings yet
Review 2
23 pages
INTRODUCTION
No ratings yet
INTRODUCTION
8 pages
Heart Disease Prediction Technical Seminar Report
No ratings yet
Heart Disease Prediction Technical Seminar Report
18 pages
1822 B.E Cse Batchno 95
No ratings yet
1822 B.E Cse Batchno 95
57 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
15 pages
Project Proposal
No ratings yet
Project Proposal
11 pages
Fypd - 18-510
No ratings yet
Fypd - 18-510
49 pages
Batch 06 Book Chapter
No ratings yet
Batch 06 Book Chapter
7 pages
BoraNikhil Fall2021
No ratings yet
BoraNikhil Fall2021
18 pages
??? ??????? ?????? - ?????? ? - 1??20??403
No ratings yet
??? ??????? ?????? - ?????? ? - 1??20??403
34 pages
Seminar Report - Shubham.2101229151
No ratings yet
Seminar Report - Shubham.2101229151
21 pages
Final PPT Heart Disease1
No ratings yet
Final PPT Heart Disease1
17 pages
Review 1
No ratings yet
Review 1
18 pages
Synopsis - Group - 6 - CSE - 3 Changes (2)
No ratings yet
Synopsis - Group - 6 - CSE - 3 Changes (2)
15 pages
Heart Disease Python Report 1st Phase
No ratings yet
Heart Disease Python Report 1st Phase
33 pages
2023-Heart Disease Prediction Using Machine Learning
No ratings yet
2023-Heart Disease Prediction Using Machine Learning
11 pages
2nd Review
No ratings yet
2nd Review
21 pages
Heart Disease Prediction Using
No ratings yet
Heart Disease Prediction Using
8 pages
Review 1
No ratings yet
Review 1
18 pages
A Project Report CPP
No ratings yet
A Project Report CPP
55 pages
Project Report
No ratings yet
Project Report
46 pages
MINI PROJECT Kshetrika
No ratings yet
MINI PROJECT Kshetrika
41 pages
Karthik Ai Project Report
No ratings yet
Karthik Ai Project Report
29 pages
Sanya 13
No ratings yet
Sanya 13
46 pages
HEART DISEASE PREDICTION REPORT Op Edited
No ratings yet
HEART DISEASE PREDICTION REPORT Op Edited
29 pages
INTRODUCTION
No ratings yet
INTRODUCTION
14 pages
Editing
No ratings yet
Editing
16 pages
Jaswanth Narayana R (40738003) Vishesh K (40738007)
100% (1)
Jaswanth Narayana R (40738003) Vishesh K (40738007)
37 pages
Heart Disease Prediction Documentation
No ratings yet
Heart Disease Prediction Documentation
4 pages
Synopsis ......
No ratings yet
Synopsis ......
17 pages
Project Report First Phase @8 Suhana
No ratings yet
Project Report First Phase @8 Suhana
32 pages
GR No-01-Project-Report PDF
No ratings yet
GR No-01-Project-Report PDF
46 pages
Heart Disease Prediction Using Machine Learning and Data Analytics Approach
No ratings yet
Heart Disease Prediction Using Machine Learning and Data Analytics Approach
4 pages
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
No ratings yet
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
6 pages
BT-40820 Project Report
No ratings yet
BT-40820 Project Report
24 pages
Report - Mini ProjectFINAL
No ratings yet
Report - Mini ProjectFINAL
22 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
2 pages
SST Word
No ratings yet
SST Word
13 pages
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
No ratings yet
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
6 pages
Analysis of User Behavior Patterns Using Machine Learning Algorithms
No ratings yet
Analysis of User Behavior Patterns Using Machine Learning Algorithms
7 pages
Evaluation of Mineral Trioxide Aggregate (MTA) Versus Calcium Hydroxide Cement (Dycal ) in The Formation of A Dentine Bridge: A Randomised Controlled Trial
No ratings yet
Evaluation of Mineral Trioxide Aggregate (MTA) Versus Calcium Hydroxide Cement (Dycal ) in The Formation of A Dentine Bridge: A Randomised Controlled Trial
7 pages
Turnover Begets Turnover: Nicholas G. Castle, PHD
No ratings yet
Turnover Begets Turnover: Nicholas G. Castle, PHD
10 pages
Factors Influencing Attainment of CEO Position For Women - USA
No ratings yet
Factors Influencing Attainment of CEO Position For Women - USA
18 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
Unit No 4
No ratings yet
Unit No 4
9 pages
BOJ Intervention Policy
No ratings yet
BOJ Intervention Policy
32 pages
Statistical Models of Horse Racing Outcomes Using R: DR Alun Owen, Coventry University, UK Aa5845@coventry - Ac.uk
100% (2)
Statistical Models of Horse Racing Outcomes Using R: DR Alun Owen, Coventry University, UK Aa5845@coventry - Ac.uk
21 pages
Determinants of Students' Spending Habits: A Case Study of Students at A Premier University of African Scholarship
No ratings yet
Determinants of Students' Spending Habits: A Case Study of Students at A Premier University of African Scholarship
21 pages
Underrepresentation and Symbolic Annihilation of Socially Disenfranchised Groups ( Out Groups'') in Animated Cartoons
No ratings yet
Underrepresentation and Symbolic Annihilation of Socially Disenfranchised Groups ( Out Groups'') in Animated Cartoons
20 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
Psychiatric Comorbidity in Epilepsy: A Population-Based Analysis
No ratings yet
Psychiatric Comorbidity in Epilepsy: A Population-Based Analysis
9 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
2021 Book AppliedAdvancedAnalytics
No ratings yet
2021 Book AppliedAdvancedAnalytics
236 pages
Unit 3
No ratings yet
Unit 3
18 pages
Econometric Toolkit
No ratings yet
Econometric Toolkit
2 pages
Econometrics II
100% (1)
Econometrics II
4 pages
STAT3301 - Term Exam 2 - CH11 Study Package
No ratings yet
STAT3301 - Term Exam 2 - CH11 Study Package
6 pages
Consideration of Both Latent Variables and Taste Variation in Modeling Destination Choice For Student's Non-Mandatory Activities
No ratings yet
Consideration of Both Latent Variables and Taste Variation in Modeling Destination Choice For Student's Non-Mandatory Activities
10 pages
7 Minitab Regression
No ratings yet
7 Minitab Regression
18 pages
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
No ratings yet
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
9 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Certificate Course of Data Analytics
No ratings yet
Certificate Course of Data Analytics
5 pages
Missing Data Techniques - UCLA
No ratings yet
Missing Data Techniques - UCLA
66 pages
A Statistical Study On Awareness and Attitude of Students of Assam, India Towards HIVAIDS
No ratings yet
A Statistical Study On Awareness and Attitude of Students of Assam, India Towards HIVAIDS
12 pages
Prediction of Heart Disease Using Random Forest in Comparison With Logistic Regression To Measure Accuracy
No ratings yet
Prediction of Heart Disease Using Random Forest in Comparison With Logistic Regression To Measure Accuracy
5 pages
The University of Auckland
No ratings yet
The University of Auckland
24 pages
Occupational Stress and Associated Among Nurses Working at Public Hospitals of Addis Ababa, Ethiopia, 2022 A Hospital
No ratings yet
Occupational Stress and Associated Among Nurses Working at Public Hospitals of Addis Ababa, Ethiopia, 2022 A Hospital
7 pages

Proj Report

Uploaded by

Proj Report

Uploaded by

A

Mini Project Report on

In Partial Fulfillment of the Requirements

Under the Guidance of:

Department of Instrumentation and Control Engineering

Heart Disease Detection Using Machine Learning .......................................................

2. Literature Survey ................................................................................................ 9

1.2 PROBLEM STATEMENT

1. Data collection from different sources and pre-processing

2. To develop machine learning model to predict future possibility of heart disease by

4. To analyze feature selection methods and understand their working principle

1.4 SCOPE AND LIMITATIONS

2. The main scope of the project is to detect heart disease.

1.5 ORGANIZATION OF PROJECT

The project is organized as follows:

Chapter 1 gives introduction to the project. Chapter 2 provides

3.1 SYSTEM ARCHITECTURE

Figure 1: Proposed System

Dataset Link: https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction

1. Age: age of the patient [years]

2. Sex: sex of the patient [M: Male, F: Female]

5. Cholesterol: serum cholesterol [mm/dl]

9. ExerciseAngina: exercise-induced angina [Y: Yes, N: No]

10. Oldpeak: oldpeak = ST [Numeric value measured in depression]

3.3 MACHINE LEARNING

3.3.1 SUPERVISED MACHINE LEARNING

Categories of Supervised Machine Learning:

• Simple Linear Regression Algorithm

• Multivariate Regression Algorithm

• Decision Tree Algorithm

3.4 SUPERVISED ALGORITHMS

3.4.1 RANDOM FOREST

• Random Forest is capable of performing both Classification and Regression tasks.

3.4.2 K-NEAREST NEIGHBOUR

Figure 2: Logistic Regression

XGBoost is an efficient implementation of Gradient Boosted decision trees that is designed to

4.1 EXISTING SYSTEM

4.2 PROPOSED SYSTEM

Figure 4: Parameters of Selected dataset

4.2.2 DATA PRE-PROCESSING

Figure 5: Correlation Matrix

While selecting the algorithm, we have to compare the accuracies.

Figure 6: Connection of widgets in Orange

• Random Forest: 87.50 %

• Logistic Regression: 86.96 %

• Naïve Bayes: 85.87 %

• K-Nearest Neighbor: 81.52 %

Figure: Model and Accuracy

5.1 HARDWARE PLATFORM USED

• PROCESSOR: Intel CORE i3 or above

• RAM: minimum 4.00GB

• HARD DISK: minimum 100GB

5.2 LIBRARIES AND SOFTWARE PLATFORM USED

SYSTEM TYPE: 64-bit, intel CORE i5 SOFTWARE: Jupyter

5.3 VISUALIZATION RESULTS

Figure 8: Shows the risk of heart Disease based on age

Figure 9: Shows the risk of heart Disease based on restingbp

Figure 11: Shows presence of heart Disease based on chestpain

You might also like