0% found this document useful (0 votes)
32 views21 pages

Heart Disease Prediction: Leveraging Machine Learning: Presented By: Haritima Sinha 2022UGCM004

Uploaded by

itsmekumaranurag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views21 pages

Heart Disease Prediction: Leveraging Machine Learning: Presented By: Haritima Sinha 2022UGCM004

Uploaded by

itsmekumaranurag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Heart Disease

Prediction:
Leveraging
Machine Learning

PRESENTED BY:
HARITIMA SINHA
2022UGCM004
Index
1 Importance of heart 2 Heart Health in India
health
3 Our Solution 4 Initial Parameter

5 New Parameter added 6 Models :- LR and RF

7 Model Explanation 8 Web


App
I
1. Essential for Overall Well-being

2. Prevents Life-threatening Diseases

3. Enhances Quality of Life

4. Supports Longevity

5. Impact on Other Systems


Heart health in India
• India accounts for one-fifth of global deaths
due to heart disease.
• 30% of adult deaths are caused by CVD, with
an increasing trend in both urban and rural
populations.
• Ischemic heart disease and stroke are the
leading causes, contributing to over 80% of
cardiovascular mortality.
• Annual heart disease cases in India have
doubled from 1990 to 2020, especially in
younger populations.
Our Solution

We propose an ML model-based solution to predict and prevent heart


disease by leveraging the power of machine learning and deep
learning. This solution analyzes patient data and provides accurate
predictions on the likelihood of heart disease, enabling early
interventions and better healthcare outcomes.

Our solution provides an innovative, data-driven approach to


preventing heart disease, utilizing both classical and quantum
technologies for more effective, timely interventions.
Methodology: Data Preprocessing

Cleaning Normalization
Handle missing values and Scale numerical features to
inconsistencies to improve ensure equal contribution to
data quality. the model.

Encoding
Convert categorical features into a format suitable for machine
learning algorithms.
INITIAL PARAMETERS USED:
1. Age
2. Sex (0 for female, 1 for male)
3. Chest pain type (cp: 0-3)
4. Fasting blood sugar (fbs: 0-No, 1-Yes)
5. Maximum heart rate achieved (thalach): 23
6. Exercise-induced angina (exang: 0-No, 1-Yes)
7. ST depression (oldpeak)
8. The slope of peak exercise ST segment (slope: 0-2)
9. Number of major vessels (ca: 0-3)
10. Thalassemia (thal: 0-Normal, 1-Fixed defect, 2-Reversible defect)
11. Resting blood pressure (trestbps)
12. Cholesterol level (chol)
13. Resting electrocardiographic value (restecg: 0-Normal, 1-ST-T wave abnormality, 2-Left ventricular
hypertrophy)
NEW PARAMETERS ADDED:
14 Body_stress
body_stress = trestbps * chol

15. Heart_risk_indeX
heart_risk_index = thalach - oldpeak * 10 - exang * 20

16. cp_thalach_interaction
cp_thalach_interaction = cp * thalach

17. chol_to_thalach
chol_to_thalach = chol / thalach

18. bp_to_thalach
bp_to_thalach = trestbps / thalach
WHY THESE PARAMETERS?

1. Body Stress: Calculated by multiplying blood 2. Heart Risk Index: Derived from heart rate
pressure (trestbps) and cholesterol (chol), this (thalach), oldpeak (exercise-induced
metric represents the strain on the heart due to depression), and angina (exang), this index
these two factors, often indicating cardiovascular quantifies heart disease risk, with higher
stress. values suggesting greater risk.
3. CP Thalach Interaction:
This feature represents the interaction between chest pain type (cp) and heart rate (thalach),
which can give insights into the severity of heart conditions and exercise tolerance..

4. Chol to Thalach Ratio:


By dividing cholesterol (chol) by heart rate (thalach), this ratio provides a measure of how
efficiently the cardiovascular system is working under stress or during exercise.

5. BP to Thalach Ratio:
The ratio of blood pressure (trestbps) to heart rate (thalach) helps identify potential
cardiovascular issues by assessing the relationship between two key heart health indicators.
Model Used

Random Forest Logistic


Regression
Ensemble learning method Statistical method for
using multiple decision binary classification with
trees for enhanced straightforward
accuracy. interpretation.
Logistic Regression
Logistic regression is a supervised machine learning algorithm widely used for binary classification
tasks, such as identifying whether an email is spam or not and diagnosing diseases by assessing the
presence or absence of specific conditions based on patient test results.

Key Points:
 Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.

 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and 1.

 In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function,
which predicts two maximum values (0 or 1).
RANDOM- FOREST MODEL
 Ensemble Method: Random Forest is an ensemble learning technique that combines multiple decision trees
to improve the accuracy of predictions and reduce overfitting.
 Bootstrap Aggregating (Bagging): It uses bagging, which means it trains multiple trees on different random
subsets of the training data to reduce variance and avoid overfitting.
 Randomness in Feature Selection: During training, at each split, a random subset of features is selected to
create diverse decision trees, making the model more robust.
 Voting for Classification: For classification tasks, Random Forest uses majority voting from all decision trees to
make the final prediction.
 Averaging for Regression: For regression tasks, Random Forest takes the average of predictions from all
decision trees.
 Out-of-Bag Error Estimation: The model estimates its accuracy using out-of-bag samples (samples not
selected in the training subset), helping assess performance without needing separate validation data.
 Handles Missing Data: Random Forest can handle missing data by finding the best way to fill in missing
values, often by using available data points from other trees.
 Feature Importance: It can calculate feature importance, showing which variables have the most influence on
the model’s predictions, which is useful for feature selection.
 Non-linear Relationships: Random Forest can model complex, non-linear relationships between features and
outcomes, making it suitable for various real-world problems.
 Less Prone to Overfitting: Since it averages multiple trees, Random Forest is less likely to overfit compared to
a single decision tree, especially with a large number of trees.
Evaluation Metrics of Logistic Regression

82%
Accuracy
Percentage of correct predictions made by the model.

0.88
Precision of 0:
Prcesion of positive predictions.

0.78
Precision of 1
Prcesion of negative predictions.
Evaluation Metrics of Random Forest Model

99%
Accuracy
Percentage of correct predictions made by the model.

0.98
Precision of 0:
Precesion of positive predictions.

1.00
Precision of 1
Precesion of negative predictions.
Key Findings
1 Random Forest 2 Logistic Regression
Performance
Achieved an accuracy of 98%, demonstrating its Interpretability
Achieved an accuracy of 88%, demonstrating its
effectiveness. effectiveness.
Future Directions
Explore Advanced
1
Algorithms
Utilize gradient boosting or deep learning for potential performance
improvements.

Integrate Real-Time
2 Data
Incorporate data from wearable devices for dynamic and
personalized predictions.

Develop User-Friendly
3 Tools intuitive interfaces for healthcare
Create
professionals to easily utilize the model.

You might also like