0% found this document useful (0 votes)
36 views28 pages

Bala

The document summarizes previous work on using machine learning to predict heart disease. It discusses several papers that tested different classification algorithms like logistic regression, random forest, SVM, and neural networks on heart disease datasets. Many of the papers found that random forest and neural networks achieved the highest prediction accuracies of over 85%. Feature selection was often used to increase accuracy. The document also reviews different optimization algorithms, activation functions, and their effects on neural network performance for heart disease prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

Bala

The document summarizes previous work on using machine learning to predict heart disease. It discusses several papers that tested different classification algorithms like logistic regression, random forest, SVM, and neural networks on heart disease datasets. Many of the papers found that random forest and neural networks achieved the highest prediction accuracies of over 85%. Feature selection was often used to increase accuracy. The document also reviews different optimization algorithms, activation functions, and their effects on neural network performance for heart disease prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

THE

FINAL
REVIEW

A WEBSITE TO PREDICT HEART


DISEASE USING MACHINE
LEARNING
Done by:
SIDDU BALA MALLIKARJUN REDDY
19bit0006
List of content:

INTRODUCTION
LITERATURE SURVEY
REQUIREMENTS
ANALYSIS & DESIGN
IMPLEMENTATION & TESTING
RESULTS
CONCLUSIONS AND FUTURE WORK
REFERENCES
1
ABSTRACT

 Heart disease is a major cause of death worldwide. The ability to accurately


predict the risk of heart disease can help individuals take preventive measures
and lead a healthier life. Machine learning (ML) algorithms have shown
promising results in predicting heart disease risk. In this paper, we explore the
use of ML techniques for heart disease prediction. We use a dataset containing
clinical and demographic information of patients to train and evaluate various
ML models. We experiment with several classification algorithms such as
logistic regression, random forest, and support vector machines to predict the
risk of heart disease. Our results show that ML models can accurately predict
the risk of heart disease, with an accuracy of up to 90%. The proposed
approach can be a useful tool for healthcare professionals to identify high-risk
individuals and provide early interventions.
INTRODUCTION

BACKGROUND:
Machine Learning, an integral part of Artificial Intelligence, has begun
penetrating various industries, amongst which healthcare stands an obvious
one. Currently, this field is working on algorithms that reliably predict the
presence or absence of lung cancer, HD, and other ailments. Such data, if
predicted ahead of time, can provide valuable insights to clinicians, allowing
them to tailor their diagnosis and treatment to the individual patient. The
current situation is that the healthcare business collects vast amounts of data,
but not all of it is mined to uncover hidden patterns and make effective
decisions. As a result, the projections have huge variations from the true value.
MOTIVATION

 In United States and many other developed countries, 50% of deaths are
caused due to cardiovascular diseases. Similarly, in many countries leading
cause for deaths is heart disease. Among many types of heart disease, coronary
heart disease led to the highest number of deaths. As these diseases occur
suddenly or in most of the cases they are diagnosed at the last stages, where
the patients and doctors are helpless to cure the disease. So, we came up with
this project idea of creating a website with good UI and more accurate
prediction of these diseases so that they can recognize the disease in the
starting stage itself and take measures accordingly. Technology should be used
not only for business but also for the better living of the people.
PROJECT STATEMENT

 At the end of the day, it's our health what matters. Being proactive is the best
solution when it comes to taking care of health. With this objective, as we
know mostly "Heart-related problems" are the ones that occur suddenly,
sometimes they might be severe. Various factors of our health contribute to the
disease occurrence. Our project is a website that predicts the probability of
coronary heart disease occurrence. Dataset with valuable attributes that
contribute to Heart problem has been considered. Various ML models are
applied to train and test pre-processed data and comparative analysis of
algorithms has been made. ML model is implemented at the backend and
Flask server is used to connect frontend and backend. Input features on the
website are selected based on their impact on the accuracy of the model.
Validation of the input data user provides is implemented using JavaScript.
Ensuring that valid data is entered helps in reducing the outliers and increases
the accuracy of the product. A website like this keeps us updated about our
health condition and helps us change our lifestyle and habits that improve our
health.
OBJECTIVE

 This project objective is to develop a website, in which users can provide their
data of health factors like Blood Pressure levels and Medication,
Smoking/Drinking habits, Body Mass Index, Heart Rate, Anxiety, Yellow
Fingers and various other features that have an important role in prediction of
heart disease and more other common now-adays factors for this occurrence.
The data collected will undergo prediction of “10 Year Risk” for the
occurrence of heart disease.
SCOPE OF THE PROJECT

 The scope of the project "A Website to Predict Coronary Heart Disease
Occurrence Using ML in real life" would involve designing and developing a
website that utilizes machine learning algorithms to predict the occurrence of
coronary heart disease in real life. The website would need to collect relevant
data from users, including personal and medical information, and then use this
data to train a machine learning model that can accurately predict the
likelihood of developing coronary heart disease. The final product could be
used by individuals who are concerned about their risk of developing coronary
heart disease, as well as healthcare professionals who could use the
predictions to guide treatment and prevention strategies. The project has the
potential to make a significant impact in the field of healthcare and could help
reduce the incidence of coronary heart disease.
2
SUMMARY OF THE EXISTING WORKS

[1] Here, various classifiers are applied on heart disease (HD) dataset to find the most
accurate classifiers that works for dataset and they are compared based on accuracy
score. As there are many attributes, they minimalized the number and prioritized the
attributes. The algorithms used are KNN, SVM, Adaboost, SGD and Decision Table (DT)
classifiers to analyze the dataset and predict the disease.

[2] In this paper, there is clear explanation of pre-processing of unbalanced dataset and
training the dataset with machine learning models and predicted the risk of occurrence
of coronary heart disease. Random Forest algorithm acquired 96.80 % which is highest
of others. After comparative analysis of three supervised ML algorithms, to create
randomness in data K-Fold cross validation technique is carried out.

 [3] Heart disorder occurrence is predicted by applying algorithms. ROC curve is used to
validate these methods. Logistic Regression acquired the maximum correctness. To
make sure that the model works for all diverse datasets, it should be trained & tested
over high dimensional datasets.
SUMMARY OF THE EXISTING WORKS

[4] In this paper, at first Support vector classifier and KNN classifier applied
together 85% accuracy. Following this neural network & Naïve bayes classifier
combination is applied. To acquire more accuracy of model, Associate
classification is applied as the output is association of various models. It is
proven that, Associate classification along with Naïve bayes classifier, Decision
tree & neural network is more reliable & will also handle unstructured data.

[5] Classifiers applied together 85% accuracy. Following this neural network &
Naïve bayes classifier combination is applied. To acquire more accuracy of
model, Associate classification is applied as the output is association of various
models. It is proven that, Associate classification along with Naïve bayes
classifier, Decision tree & neural network is more reliable & will also handle
unstructured data. To make sure that the model works for all diverse datasets,
it should be trained & tested over high dimensional datasets.
SUMMARY OF THE EXISTING WORKS

[6] In this paper, survey of several research papers involving prediction of Cardiovascular
diseases by Data Mining, ML and DL techniques. Feature selection is used to increase
accuracy in many of the classification algorithms. When feature selection applied, to decrease
the search space, greedy based sequential forward & backward selection is used. They also
mentioned the algorithms and their accuracies in a tabular column. Artificial Neural
Network, regression classification & clustering techniques are discussed.

 [7] In this paper they applied Machine Learning to predict cardio vascular disease for the
patients who are undergoing dialysis. Amongst American and Italian datasets, many ML
algorithms are trained & tested. But as Italian dataset is biased, prediction results might
differ in accuracy.

 [8] In this paper, they proposed an idea for occurrence which analyzes various optimization
algorithms, weight initialization techniques and their accuracy levels are compared. In neural
network, activation functions like ReLU is used. Comparative analysis of combination of
ReLU and various optimization algorithms like Adam, Adagrad is carried out. Adagrad
optimizing algorithm along with ReLu has shown 85% accuracy which is the highest, when
compared to other algorithms.
SUMMARY OF THE EXISTING WORKS

[9] In this paper, limited dataset is used. Discussed the functioning of every
algorithm used and why they used them for this dataset. Artificial Neural
Network consists of 3 layers and in hidden layer Activation function is applied.
To predict targeted label, ReLU activation function is applied. ANN acquired
highest accuracy of 85% when compared to other algorithms.

 [10] In this paper, they performed Classification techniques of Machine


Learning for accurate results which in return helps medical industry for faster
detection of heart disease. They implemented “Deep Neural Network” classifiers
which analyzes various optimization algorithms, weight initialization techniques
and their accuracy levels are compared. In neural network, activation functions
like ReLU is used. Comparative analysis of combination of ReLU and various
optimization algorithms like Adam, Adagrad is carried out. Adagrad optimizing
algorithm along with ReLu has shown 85% accuracy which is the highest, when
compared to other algorithms.
CHALLENGES PRESENT IN EXISTING SYSTEM

 Here are some challenges present in the existing system for a website to predict coronary heart disease
occurrence using ML:
1. Limited availability of data: One of the main challenges in developing a website to predict coronary heart
disease occurrence using ML is the limited availability of high-quality data. Machine learning algorithms
require a large amount of accurate and diverse data to make accurate predictions.
2. Data quality: The quality of the data used to train the machine learning models is essential for accurate
predictions. The data collected from different sources may have errors, missing values, and inconsistencies that
can affect the accuracy of the model.
3. Interpretability: The interpretability of machine learning models is a significant concern in the healthcare
industry. The ability to understand how a model arrives at its predictions is crucial to building trust in the
model's recommendations.
4. Legal and ethical considerations: Collecting and using sensitive medical data comes with legal and ethical
considerations. Websites that collect personal data are required to comply with data privacy laws and
regulations, and healthcare data has additional protections due to its sensitive nature.
5. Bias in the data: Machine learning algorithms can amplify biases present in the data. If the training data is
biased, the machine learning model will learn and perpetuate that bias. This could lead to inaccurate predictions
for certain populations, such as minorities or underrepresented groups.
6. Limited user adoption: Developing a website to predict coronary heart disease occurrence using ML may not
be sufficient if users are not adopting it. Encouraging users to provide accurate and complete data can be
challenging. Additionally, if the website is not user-friendly or accessible, users may not use it at all.
3
HARDWARE REQUIREMENTS

 Laptop
 Internet/Wi-Fi Hotspot
 i3 Processor Based Computer or higher

SOFTWARE REQUIREMENTS:
• Front-End : HTML, CSS, BOOTSTRAP
• Back-End : MYSQL ML
• model training : PYTHON – Scikit-Learn/Keras(Google Colab)
• ML model Deployment : Flask
GANTT CHART
4
ANALYSIS
& DESIGN
PROPOSED METHODOLOGY

 At the end of the day, it's our health what matters. Being proactive is the
best solution when it comes to taking care of health. With this objective,
as we know, mostly "Heartrelated problems" are the ones that occur
suddenly, sometimes they might be severe. Various factors of our health
contribute to the disease occurrence. Our project is a website that
predicts the probability of coronary heart disease occurrence. The prior
discovery of common diseases like diabetes, heart disease and
pulmonary cancer may control and reduce the likelihood of patient being
fatal. As the machine education and the artificial intelligence progresses,
this is achieved by using several classifiers and clustering algorithms.
This paper presents an algorithm for machine learning for prevention of
coronary heart disease, which for many people is the leading cause of
death. We would like to do some ensemble methods in this prediction.
SYSTEM ARCHITECTURE
MODULE DESCRIPTIONS

ALGORITHMS USED:
1. SUPPORT VECTOR CLASSIFIER: This classifier works well on small datasets when
compared to large datasets. All data is divided into 2 sets. The goal is to mark a hyper plane
which basically has maximum margin value from the nearest data point in 2 sets. Margin is the
distance between data point and hyper plane. Problems based on subset solving, SVM is a better
choice.

2. RANDOM FOREST CLASSIFIER: Random forest comes under supervised algorithm


category. It can be implemented for both regression and classification. As it’s in the name,
“Forest” basically comprises of trees. The more the trees, denser and robust the forest. This
classifier creates trees called “Decision Trees” on data sample and result of every tree is
considered. The result with majority is treated as best solution. Random forest algorithm has an
enormous application in recommendation engines, image classification and feature selection.

3. GRADIENT BOOSTER CLASSIFIER: Gradient boosting can be applied for both regression
and classification problems, it generally produces an ensemble of weak hypothesis, mostly it
tries to minimalize the function of cost generated by decision trees.
4. XG BOOST CLASSIFIER: Extreme Gradient Boosting Algorithm is which is
highly efficient and provides parallel tree boosting. The main objective of
Gradient Boost is to minimize the loss function by adding weak learners using
a gradient descent optimization algorithm.

5. LOGISTIC REGRESSION: It is an algorithm to check the probability for an


event occurrence. outcome or a binary outcome with 2 classes. variable
outcome which is categorical regression. Logit Link function is being used
here where data values are fitted.
5
DATA SET

 We are using “pandas” a machine learning library in our project for further
processing. This loads the dataset. Collecting dataset is primary task.
Collecting a dataset containing credible, diverse, and massive data is very
important. As we give this data to machine to learn and predict the future input
based on its learning on current data, ensuring the quality of data is very
important. We collected a dataset with certain number of samples. Training
dataset contains of various attributes diversified from basic attributes like age,
gender, education to ingenious attributes like diabetes, heart rate, total
cholesterol level, systolic blood pressure, diastolic blood pressure, cigarettes
per day etc. Normally the number of these attributes varies from dataset to
other. The dataset chosen is containing all types of constraints varying form
small to big that influences the coronary heart disease
SAMPLE CODE

DATA PRE-PROCESSING:
It is an important process as it ensures valid data is given to machine to learn.
So, any null values in data are replaced with other values like mean, median
or less-dominant value to balance the dataset and ensuring result is un-biased.
We had to do ‘Feature Scaling’ using ‘Standardization’ technique. This
technique rescales value such that it has distribution with mean equals 0 and
variance equal to 1. To ensure that machine learns from a quality data, we
need to clean up data. To clean data, we need to check if there are null values
or values which are impossible for an attribute to have (called as outliers).
For example, Age of a person is 500. We need to replace them with some
other values like mean, median or mode. As the dataset contains attributes
with the imbalanced values, we need to balance the null values or make sure
that there are no gaps left in the dataset such that values are close to the
materiality.
Checking if there are any Null values in data:

You might also like