Heart Disease Prediction: Leveraging Machine Learning: Presented By: Haritima Sinha 2022UGCM004
Heart Disease Prediction: Leveraging Machine Learning: Presented By: Haritima Sinha 2022UGCM004
Prediction:
Leveraging
Machine Learning
PRESENTED BY:
HARITIMA SINHA
2022UGCM004
Index
1 Importance of heart 2 Heart Health in India
health
3 Our Solution 4 Initial Parameter
4. Supports Longevity
Cleaning Normalization
Handle missing values and Scale numerical features to
inconsistencies to improve ensure equal contribution to
data quality. the model.
Encoding
Convert categorical features into a format suitable for machine
learning algorithms.
INITIAL PARAMETERS USED:
1. Age
2. Sex (0 for female, 1 for male)
3. Chest pain type (cp: 0-3)
4. Fasting blood sugar (fbs: 0-No, 1-Yes)
5. Maximum heart rate achieved (thalach): 23
6. Exercise-induced angina (exang: 0-No, 1-Yes)
7. ST depression (oldpeak)
8. The slope of peak exercise ST segment (slope: 0-2)
9. Number of major vessels (ca: 0-3)
10. Thalassemia (thal: 0-Normal, 1-Fixed defect, 2-Reversible defect)
11. Resting blood pressure (trestbps)
12. Cholesterol level (chol)
13. Resting electrocardiographic value (restecg: 0-Normal, 1-ST-T wave abnormality, 2-Left ventricular
hypertrophy)
NEW PARAMETERS ADDED:
14 Body_stress
body_stress = trestbps * chol
15. Heart_risk_indeX
heart_risk_index = thalach - oldpeak * 10 - exang * 20
16. cp_thalach_interaction
cp_thalach_interaction = cp * thalach
17. chol_to_thalach
chol_to_thalach = chol / thalach
18. bp_to_thalach
bp_to_thalach = trestbps / thalach
WHY THESE PARAMETERS?
1. Body Stress: Calculated by multiplying blood 2. Heart Risk Index: Derived from heart rate
pressure (trestbps) and cholesterol (chol), this (thalach), oldpeak (exercise-induced
metric represents the strain on the heart due to depression), and angina (exang), this index
these two factors, often indicating cardiovascular quantifies heart disease risk, with higher
stress. values suggesting greater risk.
3. CP Thalach Interaction:
This feature represents the interaction between chest pain type (cp) and heart rate (thalach),
which can give insights into the severity of heart conditions and exercise tolerance..
5. BP to Thalach Ratio:
The ratio of blood pressure (trestbps) to heart rate (thalach) helps identify potential
cardiovascular issues by assessing the relationship between two key heart health indicators.
Model Used
Key Points:
Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and 1.
In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function,
which predicts two maximum values (0 or 1).
RANDOM- FOREST MODEL
Ensemble Method: Random Forest is an ensemble learning technique that combines multiple decision trees
to improve the accuracy of predictions and reduce overfitting.
Bootstrap Aggregating (Bagging): It uses bagging, which means it trains multiple trees on different random
subsets of the training data to reduce variance and avoid overfitting.
Randomness in Feature Selection: During training, at each split, a random subset of features is selected to
create diverse decision trees, making the model more robust.
Voting for Classification: For classification tasks, Random Forest uses majority voting from all decision trees to
make the final prediction.
Averaging for Regression: For regression tasks, Random Forest takes the average of predictions from all
decision trees.
Out-of-Bag Error Estimation: The model estimates its accuracy using out-of-bag samples (samples not
selected in the training subset), helping assess performance without needing separate validation data.
Handles Missing Data: Random Forest can handle missing data by finding the best way to fill in missing
values, often by using available data points from other trees.
Feature Importance: It can calculate feature importance, showing which variables have the most influence on
the model’s predictions, which is useful for feature selection.
Non-linear Relationships: Random Forest can model complex, non-linear relationships between features and
outcomes, making it suitable for various real-world problems.
Less Prone to Overfitting: Since it averages multiple trees, Random Forest is less likely to overfit compared to
a single decision tree, especially with a large number of trees.
Evaluation Metrics of Logistic Regression
82%
Accuracy
Percentage of correct predictions made by the model.
0.88
Precision of 0:
Prcesion of positive predictions.
0.78
Precision of 1
Prcesion of negative predictions.
Evaluation Metrics of Random Forest Model
99%
Accuracy
Percentage of correct predictions made by the model.
0.98
Precision of 0:
Precesion of positive predictions.
1.00
Precision of 1
Precesion of negative predictions.
Key Findings
1 Random Forest 2 Logistic Regression
Performance
Achieved an accuracy of 98%, demonstrating its Interpretability
Achieved an accuracy of 88%, demonstrating its
effectiveness. effectiveness.
Future Directions
Explore Advanced
1
Algorithms
Utilize gradient boosting or deep learning for potential performance
improvements.
Integrate Real-Time
2 Data
Incorporate data from wearable devices for dynamic and
personalized predictions.
Develop User-Friendly
3 Tools intuitive interfaces for healthcare
Create
professionals to easily utilize the model.