0% found this document useful (0 votes)

36 views28 pages

Bala

The document summarizes previous work on using machine learning to predict heart disease. It discusses several papers that tested different classification algorithms like logistic regression, random forest, SVM, and neural networks on heart disease datasets. Many of the papers found that random forest and neural networks achieved the highest prediction accuracies of over 85%. Feature selection was often used to increase accuracy. The document also reviews different optimization algorithms, activation functions, and their effects on neural network performance for heart disease prediction.

Uploaded by

Rishi rao Kulakarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views28 pages

Bala

Uploaded by

Rishi rao Kulakarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

THE

FINAL
REVIEW

A WEBSITE TO PREDICT HEART

DISEASE USING MACHINE
LEARNING
Done by:
SIDDU BALA MALLIKARJUN REDDY
19bit0006
List of content:

INTRODUCTION
LITERATURE SURVEY
REQUIREMENTS
ANALYSIS & DESIGN
IMPLEMENTATION & TESTING
RESULTS
CONCLUSIONS AND FUTURE WORK
REFERENCES
1
ABSTRACT

 Heart disease is a major cause of death worldwide. The ability to accurately

predict the risk of heart disease can help individuals take preventive measures
and lead a healthier life. Machine learning (ML) algorithms have shown
promising results in predicting heart disease risk. In this paper, we explore the
use of ML techniques for heart disease prediction. We use a dataset containing
clinical and demographic information of patients to train and evaluate various
ML models. We experiment with several classification algorithms such as
logistic regression, random forest, and support vector machines to predict the
risk of heart disease. Our results show that ML models can accurately predict
the risk of heart disease, with an accuracy of up to 90%. The proposed
approach can be a useful tool for healthcare professionals to identify high-risk
individuals and provide early interventions.
INTRODUCTION

BACKGROUND:
Machine Learning, an integral part of Artificial Intelligence, has begun
penetrating various industries, amongst which healthcare stands an obvious
one. Currently, this field is working on algorithms that reliably predict the
presence or absence of lung cancer, HD, and other ailments. Such data, if
predicted ahead of time, can provide valuable insights to clinicians, allowing
them to tailor their diagnosis and treatment to the individual patient. The
current situation is that the healthcare business collects vast amounts of data,
but not all of it is mined to uncover hidden patterns and make effective
decisions. As a result, the projections have huge variations from the true value.
MOTIVATION

 In United States and many other developed countries, 50% of deaths are
caused due to cardiovascular diseases. Similarly, in many countries leading
cause for deaths is heart disease. Among many types of heart disease, coronary
heart disease led to the highest number of deaths. As these diseases occur
suddenly or in most of the cases they are diagnosed at the last stages, where
the patients and doctors are helpless to cure the disease. So, we came up with
this project idea of creating a website with good UI and more accurate
prediction of these diseases so that they can recognize the disease in the
starting stage itself and take measures accordingly. Technology should be used
not only for business but also for the better living of the people.
PROJECT STATEMENT

 At the end of the day, it's our health what matters. Being proactive is the best
solution when it comes to taking care of health. With this objective, as we
know mostly "Heart-related problems" are the ones that occur suddenly,
sometimes they might be severe. Various factors of our health contribute to the
disease occurrence. Our project is a website that predicts the probability of
coronary heart disease occurrence. Dataset with valuable attributes that
contribute to Heart problem has been considered. Various ML models are
applied to train and test pre-processed data and comparative analysis of
algorithms has been made. ML model is implemented at the backend and
Flask server is used to connect frontend and backend. Input features on the
website are selected based on their impact on the accuracy of the model.
Validation of the input data user provides is implemented using JavaScript.
Ensuring that valid data is entered helps in reducing the outliers and increases
the accuracy of the product. A website like this keeps us updated about our
health condition and helps us change our lifestyle and habits that improve our
health.
OBJECTIVE

 This project objective is to develop a website, in which users can provide their
data of health factors like Blood Pressure levels and Medication,
Smoking/Drinking habits, Body Mass Index, Heart Rate, Anxiety, Yellow
Fingers and various other features that have an important role in prediction of
heart disease and more other common now-adays factors for this occurrence.
The data collected will undergo prediction of “10 Year Risk” for the
occurrence of heart disease.
SCOPE OF THE PROJECT

 The scope of the project "A Website to Predict Coronary Heart Disease
Occurrence Using ML in real life" would involve designing and developing a
website that utilizes machine learning algorithms to predict the occurrence of
coronary heart disease in real life. The website would need to collect relevant
data from users, including personal and medical information, and then use this
data to train a machine learning model that can accurately predict the
likelihood of developing coronary heart disease. The final product could be
used by individuals who are concerned about their risk of developing coronary
heart disease, as well as healthcare professionals who could use the
predictions to guide treatment and prevention strategies. The project has the
potential to make a significant impact in the field of healthcare and could help
reduce the incidence of coronary heart disease.
2
SUMMARY OF THE EXISTING WORKS

[1] Here, various classifiers are applied on heart disease (HD) dataset to find the most
accurate classifiers that works for dataset and they are compared based on accuracy
score. As there are many attributes, they minimalized the number and prioritized the
attributes. The algorithms used are KNN, SVM, Adaboost, SGD and Decision Table (DT)
classifiers to analyze the dataset and predict the disease.

[2] In this paper, there is clear explanation of pre-processing of unbalanced dataset and
training the dataset with machine learning models and predicted the risk of occurrence
of coronary heart disease. Random Forest algorithm acquired 96.80 % which is highest
of others. After comparative analysis of three supervised ML algorithms, to create
randomness in data K-Fold cross validation technique is carried out.

 [3] Heart disorder occurrence is predicted by applying algorithms. ROC curve is used to
validate these methods. Logistic Regression acquired the maximum correctness. To
make sure that the model works for all diverse datasets, it should be trained & tested
over high dimensional datasets.
SUMMARY OF THE EXISTING WORKS

[4] In this paper, at first Support vector classifier and KNN classifier applied
together 85% accuracy. Following this neural network & Naïve bayes classifier
combination is applied. To acquire more accuracy of model, Associate
classification is applied as the output is association of various models. It is
proven that, Associate classification along with Naïve bayes classifier, Decision
tree & neural network is more reliable & will also handle unstructured data.

[5] Classifiers applied together 85% accuracy. Following this neural network &
Naïve bayes classifier combination is applied. To acquire more accuracy of
model, Associate classification is applied as the output is association of various
models. It is proven that, Associate classification along with Naïve bayes
classifier, Decision tree & neural network is more reliable & will also handle
unstructured data. To make sure that the model works for all diverse datasets,
it should be trained & tested over high dimensional datasets.
SUMMARY OF THE EXISTING WORKS

[6] In this paper, survey of several research papers involving prediction of Cardiovascular
diseases by Data Mining, ML and DL techniques. Feature selection is used to increase
accuracy in many of the classification algorithms. When feature selection applied, to decrease
the search space, greedy based sequential forward & backward selection is used. They also
mentioned the algorithms and their accuracies in a tabular column. Artificial Neural
Network, regression classification & clustering techniques are discussed.

 [7] In this paper they applied Machine Learning to predict cardio vascular disease for the
patients who are undergoing dialysis. Amongst American and Italian datasets, many ML
algorithms are trained & tested. But as Italian dataset is biased, prediction results might
differ in accuracy.

 [8] In this paper, they proposed an idea for occurrence which analyzes various optimization
algorithms, weight initialization techniques and their accuracy levels are compared. In neural
network, activation functions like ReLU is used. Comparative analysis of combination of
ReLU and various optimization algorithms like Adam, Adagrad is carried out. Adagrad
optimizing algorithm along with ReLu has shown 85% accuracy which is the highest, when
compared to other algorithms.
SUMMARY OF THE EXISTING WORKS

[9] In this paper, limited dataset is used. Discussed the functioning of every
algorithm used and why they used them for this dataset. Artificial Neural
Network consists of 3 layers and in hidden layer Activation function is applied.
To predict targeted label, ReLU activation function is applied. ANN acquired
highest accuracy of 85% when compared to other algorithms.

 [10] In this paper, they performed Classification techniques of Machine

Learning for accurate results which in return helps medical industry for faster
detection of heart disease. They implemented “Deep Neural Network” classifiers
which analyzes various optimization algorithms, weight initialization techniques
and their accuracy levels are compared. In neural network, activation functions
like ReLU is used. Comparative analysis of combination of ReLU and various
optimization algorithms like Adam, Adagrad is carried out. Adagrad optimizing
algorithm along with ReLu has shown 85% accuracy which is the highest, when
compared to other algorithms.
CHALLENGES PRESENT IN EXISTING SYSTEM

 Here are some challenges present in the existing system for a website to predict coronary heart disease
occurrence using ML:
1. Limited availability of data: One of the main challenges in developing a website to predict coronary heart
disease occurrence using ML is the limited availability of high-quality data. Machine learning algorithms
require a large amount of accurate and diverse data to make accurate predictions.
2. Data quality: The quality of the data used to train the machine learning models is essential for accurate
predictions. The data collected from different sources may have errors, missing values, and inconsistencies that
can affect the accuracy of the model.
3. Interpretability: The interpretability of machine learning models is a significant concern in the healthcare
industry. The ability to understand how a model arrives at its predictions is crucial to building trust in the
model's recommendations.
4. Legal and ethical considerations: Collecting and using sensitive medical data comes with legal and ethical
considerations. Websites that collect personal data are required to comply with data privacy laws and
regulations, and healthcare data has additional protections due to its sensitive nature.
5. Bias in the data: Machine learning algorithms can amplify biases present in the data. If the training data is
biased, the machine learning model will learn and perpetuate that bias. This could lead to inaccurate predictions
for certain populations, such as minorities or underrepresented groups.
6. Limited user adoption: Developing a website to predict coronary heart disease occurrence using ML may not
be sufficient if users are not adopting it. Encouraging users to provide accurate and complete data can be
challenging. Additionally, if the website is not user-friendly or accessible, users may not use it at all.
3
HARDWARE REQUIREMENTS

 Laptop
 Internet/Wi-Fi Hotspot
 i3 Processor Based Computer or higher

SOFTWARE REQUIREMENTS:
• Front-End : HTML, CSS, BOOTSTRAP
• Back-End : MYSQL ML
• model training : PYTHON – Scikit-Learn/Keras(Google Colab)
• ML model Deployment : Flask
GANTT CHART
4
ANALYSIS
& DESIGN
PROPOSED METHODOLOGY

 At the end of the day, it's our health what matters. Being proactive is the
best solution when it comes to taking care of health. With this objective,
as we know, mostly "Heartrelated problems" are the ones that occur
suddenly, sometimes they might be severe. Various factors of our health
contribute to the disease occurrence. Our project is a website that
predicts the probability of coronary heart disease occurrence. The prior
discovery of common diseases like diabetes, heart disease and
pulmonary cancer may control and reduce the likelihood of patient being
fatal. As the machine education and the artificial intelligence progresses,
this is achieved by using several classifiers and clustering algorithms.
This paper presents an algorithm for machine learning for prevention of
coronary heart disease, which for many people is the leading cause of
death. We would like to do some ensemble methods in this prediction.
SYSTEM ARCHITECTURE
MODULE DESCRIPTIONS

ALGORITHMS USED:
1. SUPPORT VECTOR CLASSIFIER: This classifier works well on small datasets when
compared to large datasets. All data is divided into 2 sets. The goal is to mark a hyper plane
which basically has maximum margin value from the nearest data point in 2 sets. Margin is the
distance between data point and hyper plane. Problems based on subset solving, SVM is a better
choice.

2. RANDOM FOREST CLASSIFIER: Random forest comes under supervised algorithm

category. It can be implemented for both regression and classification. As it’s in the name,
“Forest” basically comprises of trees. The more the trees, denser and robust the forest. This
classifier creates trees called “Decision Trees” on data sample and result of every tree is
considered. The result with majority is treated as best solution. Random forest algorithm has an
enormous application in recommendation engines, image classification and feature selection.

3. GRADIENT BOOSTER CLASSIFIER: Gradient boosting can be applied for both regression
and classification problems, it generally produces an ensemble of weak hypothesis, mostly it
tries to minimalize the function of cost generated by decision trees.
4. XG BOOST CLASSIFIER: Extreme Gradient Boosting Algorithm is which is
highly efficient and provides parallel tree boosting. The main objective of
Gradient Boost is to minimize the loss function by adding weak learners using
a gradient descent optimization algorithm.

5. LOGISTIC REGRESSION: It is an algorithm to check the probability for an

event occurrence. outcome or a binary outcome with 2 classes. variable
outcome which is categorical regression. Logit Link function is being used
here where data values are fitted.
5
DATA SET

 We are using “pandas” a machine learning library in our project for further
processing. This loads the dataset. Collecting dataset is primary task.
Collecting a dataset containing credible, diverse, and massive data is very
important. As we give this data to machine to learn and predict the future input
based on its learning on current data, ensuring the quality of data is very
important. We collected a dataset with certain number of samples. Training
dataset contains of various attributes diversified from basic attributes like age,
gender, education to ingenious attributes like diabetes, heart rate, total
cholesterol level, systolic blood pressure, diastolic blood pressure, cigarettes
per day etc. Normally the number of these attributes varies from dataset to
other. The dataset chosen is containing all types of constraints varying form
small to big that influences the coronary heart disease
SAMPLE CODE

DATA PRE-PROCESSING:
It is an important process as it ensures valid data is given to machine to learn.
So, any null values in data are replaced with other values like mean, median
or less-dominant value to balance the dataset and ensuring result is un-biased.
We had to do ‘Feature Scaling’ using ‘Standardization’ technique. This
technique rescales value such that it has distribution with mean equals 0 and
variance equal to 1. To ensure that machine learns from a quality data, we
need to clean up data. To clean data, we need to check if there are null values
or values which are impossible for an attribute to have (called as outliers).
For example, Age of a person is 500. We need to replace them with some
other values like mean, median or mode. As the dataset contains attributes
with the imbalanced values, we need to balance the null values or make sure
that there are no gaps left in the dataset such that values are close to the
materiality.
Checking if there are any Null values in data:

Mechanics Problem
No ratings yet
Mechanics Problem
9 pages
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
Project Report
No ratings yet
Project Report
26 pages
Final Heart Disease Prediction
No ratings yet
Final Heart Disease Prediction
26 pages
INTRODUCTION
No ratings yet
INTRODUCTION
14 pages
BT-40820 Project Report
No ratings yet
BT-40820 Project Report
24 pages
Proj Report
No ratings yet
Proj Report
29 pages
Heart Disease
No ratings yet
Heart Disease
19 pages
Heart Disease Paper
No ratings yet
Heart Disease Paper
10 pages
Review 2
No ratings yet
Review 2
23 pages
NM Report
No ratings yet
NM Report
15 pages
Mini Report2
No ratings yet
Mini Report2
40 pages
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
No ratings yet
Heart Disease Prediction and Classification Using Machine Learning and Transfer Learning Model
7 pages
Synopsis 1
No ratings yet
Synopsis 1
9 pages
Heart Disease Prediction Using
No ratings yet
Heart Disease Prediction Using
8 pages
Heart Disease Prediction Using Machine Learning and Data Analytics Approach
No ratings yet
Heart Disease Prediction Using Machine Learning and Data Analytics Approach
4 pages
Heart Disease Prediction Model: Dissertation
No ratings yet
Heart Disease Prediction Model: Dissertation
4 pages
14th ICCCNT 2023 Paper 15732
No ratings yet
14th ICCCNT 2023 Paper 15732
8 pages
Review Paper Heart Disease Prediction
No ratings yet
Review Paper Heart Disease Prediction
5 pages
AI Project Report (HDP)
No ratings yet
AI Project Report (HDP)
13 pages
Report Heart
No ratings yet
Report Heart
62 pages
Synopsis - Group - 6 - CSE - 3 Changes (2)
No ratings yet
Synopsis - Group - 6 - CSE - 3 Changes (2)
15 pages
A Machine Learning Approach To Early Heart Disease Paper
No ratings yet
A Machine Learning Approach To Early Heart Disease Paper
6 pages
Journal To Publish Research Paper
No ratings yet
Journal To Publish Research Paper
5 pages
Project Review 2
No ratings yet
Project Review 2
18 pages
Heart Disease Identification Using Machine Learning Classification
100% (2)
Heart Disease Identification Using Machine Learning Classification
11 pages
Heart Disease Prediction Technical Seminar Report
No ratings yet
Heart Disease Prediction Technical Seminar Report
18 pages
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
No ratings yet
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
6 pages
Paper 2
No ratings yet
Paper 2
5 pages
Group 6
No ratings yet
Group 6
68 pages
2022 Research
No ratings yet
2022 Research
19 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Batch 06 Book Chapter
No ratings yet
Batch 06 Book Chapter
7 pages
Seminar Report - Shubham.2101229151
No ratings yet
Seminar Report - Shubham.2101229151
21 pages
Phase 1 Project Report
No ratings yet
Phase 1 Project Report
44 pages
Heart Failure Prediction Using Machine Learning Algorithm
No ratings yet
Heart Failure Prediction Using Machine Learning Algorithm
5 pages
2nd Review
No ratings yet
2nd Review
21 pages
A Machine Learning Approach To Early Heart Disease Paper - 12
No ratings yet
A Machine Learning Approach To Early Heart Disease Paper - 12
6 pages
Final PPT Heart Disease1
No ratings yet
Final PPT Heart Disease1
17 pages
0 - 2nd Review
No ratings yet
0 - 2nd Review
31 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
112 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
15 pages
??? ??????? ?????? - ?????? ? - 1??20??403
No ratings yet
??? ??????? ?????? - ?????? ? - 1??20??403
34 pages
Heart Disease Prediction Documentation
No ratings yet
Heart Disease Prediction Documentation
4 pages
Sanya 13
No ratings yet
Sanya 13
46 pages
Report - Mini ProjectFINAL
No ratings yet
Report - Mini ProjectFINAL
22 pages
Group 12 Heart Disease Prediction Project Proposal
No ratings yet
Group 12 Heart Disease Prediction Project Proposal
9 pages
INTRODUCTION
No ratings yet
INTRODUCTION
8 pages
Final 1
No ratings yet
Final 1
21 pages
First Review
No ratings yet
First Review
24 pages
Research Paper Group 9
No ratings yet
Research Paper Group 9
9 pages
Review 1
No ratings yet
Review 1
18 pages
A Prediction of Heart Disease Using Machine Learning Algorithms
No ratings yet
A Prediction of Heart Disease Using Machine Learning Algorithms
8 pages
Synopsis ......
No ratings yet
Synopsis ......
17 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
11 pages
SST Word
No ratings yet
SST Word
13 pages
Heart Disease Python Report 1st Phase
No ratings yet
Heart Disease Python Report 1st Phase
33 pages
C.I Project Presentation sp20-bcs-164,160 (Group 4)
No ratings yet
C.I Project Presentation sp20-bcs-164,160 (Group 4)
23 pages
Review 1
No ratings yet
Review 1
18 pages
Editing
No ratings yet
Editing
16 pages
Clinical Decision Support System: Fundamentals and Applications
From Everand
Clinical Decision Support System: Fundamentals and Applications
Fouad Sabry
5/5 (1)
Data Science Project Ideas, Methodology & Python Codes in Health Care
From Everand
Data Science Project Ideas, Methodology & Python Codes in Health Care
Zemelak Goraga
No ratings yet
DataHack Summit 2023 Agenda
No ratings yet
DataHack Summit 2023 Agenda
1 page
Meanings
No ratings yet
Meanings
1 page
Final Review PPT 19bit0029
No ratings yet
Final Review PPT 19bit0029
65 pages
EN WBNR NSlideDeck SRDEM130897
No ratings yet
EN WBNR NSlideDeck SRDEM130897
165 pages
Capstone Review 02
No ratings yet
Capstone Review 02
54 pages
Schaum S Theory and Problems of State Space and Linear Systems PDF
100% (2)
Schaum S Theory and Problems of State Space and Linear Systems PDF
246 pages
Data Science and Machine Learning
100% (1)
Data Science and Machine Learning
190 pages
Implementation of Pattern Matching Algorithm
No ratings yet
Implementation of Pattern Matching Algorithm
4 pages
Bitcoin Encryption Decryption DSA
100% (1)
Bitcoin Encryption Decryption DSA
16 pages
Mtes1104 Coursework
100% (1)
Mtes1104 Coursework
3 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
AI Azure Basics
No ratings yet
AI Azure Basics
17 pages
Advanced Statistics in Criminology and Criminal Justice 5th Edition David Weisburd David B Wilson Alese Wooditch Chester Britt
100% (3)
Advanced Statistics in Criminology and Criminal Justice 5th Edition David Weisburd David B Wilson Alese Wooditch Chester Britt
40 pages
Fake Jobs Code
No ratings yet
Fake Jobs Code
3 pages
Digital Signal Processing
100% (6)
Digital Signal Processing
354 pages
Survival Analysis Homework Solutions
100% (1)
Survival Analysis Homework Solutions
7 pages
Poolin Layer
No ratings yet
Poolin Layer
28 pages
Kim 2016
No ratings yet
Kim 2016
5 pages
Group 1 - Heap Sort and Timsort
No ratings yet
Group 1 - Heap Sort and Timsort
19 pages
7 Statistical Thermodynamics-II
No ratings yet
7 Statistical Thermodynamics-II
30 pages
Stats - Mock Set 2
No ratings yet
Stats - Mock Set 2
20 pages
Pushdown Automata
No ratings yet
Pushdown Automata
11 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
A Deep Learning Approach For Optimizing Monoclonal Antibody Production Process Parameters
No ratings yet
A Deep Learning Approach For Optimizing Monoclonal Antibody Production Process Parameters
15 pages
ERM Study Schedule
No ratings yet
ERM Study Schedule
32 pages
EDA Lec10 Week 10 Dec v1
No ratings yet
EDA Lec10 Week 10 Dec v1
58 pages
NVidia Question1
No ratings yet
NVidia Question1
3 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
NM First Practical
No ratings yet
NM First Practical
9 pages
Stanford Dog Classification Using Convolutional Neural Network (CNN)
No ratings yet
Stanford Dog Classification Using Convolutional Neural Network (CNN)
8 pages
Duality&Sensitivity PDF
No ratings yet
Duality&Sensitivity PDF
4 pages
Mahzaib CV
No ratings yet
Mahzaib CV
2 pages
Chapter 2 (Part 1) OOP Vs SP
No ratings yet
Chapter 2 (Part 1) OOP Vs SP
11 pages

Bala

Uploaded by

Bala

Uploaded by

THE

A WEBSITE TO PREDICT HEART

 Heart disease is a major cause of death worldwide. The ability to accurately

 [10] In this paper, they performed Classification techniques of Machine

2. RANDOM FOREST CLASSIFIER: Random forest comes under supervised algorithm

5. LOGISTIC REGRESSION: It is an algorithm to check the probability for an

You might also like