0% found this document useful (0 votes)

6 views7 pages

Project Deliverable 3

This document outlines an analysis aimed at identifying factors associated with cardiovascular disease, focusing on demographic and health indicators like age, blood pressure, and cholesterol levels. It describes the dataset, hypotheses for testing, and the statistical methods used, including t-tests and logistic regression. The findings suggest significant differences in resting blood pressure and maximum heart rate between patients with and without heart disease, with recommendations for improved health monitoring and lifestyle interventions.

Uploaded by

zille.huma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

Project Deliverable 3

Uploaded by

zille.huma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Project Deliverable 3.

1
1. Problem Statement
The primary objective of this analysis is to investigate potential factors associated
with cardiovascular disease. Specifically, we aim to explore if certain demographic
and health indicators (such as age, resting blood pressure, cholesterol levels, and
maximum heart rate) are significantly associated with the presence of cardiovascular
disease. This analysis will involve hypothesis testing to compare means and predictive
modeling to identify key risk factors.

2. Dataset Description
2.1. Dataset Overview
Here’s a list of variables from the dataset along with brief descriptions based on
typical cardiovascular datasets:
 patientid: Unique identifier for each patient.
 age: Patient's age in years.
 gender: Gender of the patient (1 = Male, 0 = Female).
 chestpain: Type of chest pain experienced (0–3, with different types indicating
various risks of heart disease).
 restingBP: Resting blood pressure in mm Hg.
 serumcholestrol: Serum cholesterol level in mg/dL.
 fastingbloodsugar: Whether fasting blood sugar > 120 mg/dL (1 = Yes, 0 =
No).
 restingrelectro: Resting electrocardiographic results (0–2, with higher values
possibly indicating abnormalities).
 maxheartrate: Maximum heart rate achieved.
 exerciseangia: Exercise-induced angina (1 = Yes, 0 = No).
 oldpeak: ST depression induced by exercise relative to rest.
 slope: The slope of the peak exercise ST segment (0–2).
 noofmajorvessels: Number of major vessels (0–3) colored by fluoroscopy.
 target: Outcome variable (1 = Heart disease, 0 = No heart disease).
3. Hypotheses
Based on the research objectives, we define hypotheses for the analysis. For example:

Hypothesis 1:
 There is a significant difference in the mean resting blood pressure between
patients with and without heart disease.
 Null Hypothesis (H0): There is no difference in resting blood pressure between
patients with and without heart disease.
 Alternative Hypothesis (H1): Patients with heart disease have a different mean
resting blood pressure than those without.

Hypothesis 2:
 There is a significant difference in the mean maximum heart rate between
patients with and without heart disease.
 Null Hypothesis (H0): There is no difference in maximum heart rate between
patients with and without heart disease.
 Alternative Hypothesis (H1): Patients with heart disease have a different mean
maximum heart rate than those without.

Hypothesis 3:
 Age and cholesterol levels are associated with the risk of heart disease.
 This can be tested with regression analysis where age and serum cholesterol are
predictors, and the outcome variable is the target (heart disease status).

4. Conducting Significance Testing and Regression

Analysis
The goal of this analysis is to understand relationships between various health
indicators and the likelihood of heart disease (the target variable). We will use
statistical hypothesis testing and regression modeling to:
1. Compare Means: Identify if there are significant differences in specific health
metrics (e.g., resting blood pressure, maximum heart rate) between patients
with and without heart disease.
2. Predictive Modeling: Investigate if variables such as age and serum
cholesterol levels are associated with a higher likelihood of heart disease.
For this purpose, we will conduct t-tests for comparing means and logistic regression
for predictive modeling.
# Load necessary libraries
library(ggplot2)
library(ggcorrplot)
library(pscl)

# Assume your dataset is named `data`

# If not, load the data
data <- read.csv("CardiovascularDisease.csv")

# --- Section 1: Descriptive Statistics and Visualizations ---

# 1.1 Descriptive statistics for numerical variables

summary(data)

# 1.2 Frequency counts for categorical variables

table(data$gender)
table(data$chestpain)
table(data$target)

# 1.3 Proportion of target outcomes (Heart Disease vs. No Heart Disease)

prop.table(table(data$target))

# 1.4 Graphs to illustrate descriptive statistics

# Histogram for Age Distribution

ggplot(data, aes(x = age)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Age Distribution", x = "Age", y = "Frequency")

# Bar Plot for Chest Pain Types

ggplot(data, aes(x = factor(chestpain))) +
geom_bar(fill = "lightgreen") +
labs(title = "Chest Pain Type Distribution", x = "Chest Pain Type", y = "Count")

# Boxplot for Resting Blood Pressure by Target

ggplot(data, aes(x = factor(target), y = restingBP, fill = factor(target))) +
geom_boxplot() +
labs(title = "Resting BP by Heart Disease Status", x = "Heart Disease (1 = Yes, 0 = No)", y =
"Resting Blood Pressure")

# Scatter Plot for Age vs. Max Heart Rate by Target

ggplot(data, aes(x = age, y = maxheartrate, color = factor(target))) +
geom_point() +
labs(title = "Age vs Max Heart Rate by Heart Disease Status", x = "Age", y = "Max Heart Rate")

# Bar Plot for Exercise-Induced Angina vs Heart Disease

ggplot(data, aes(x = factor(exerciseangia), fill = factor(target))) +
geom_bar(position = "dodge") +
labs(title = "Exercise-Induced Angina vs Heart Disease", x = "Exercise-Induced Angina", y =
"Count")
# Correlation Heatmap for Numerical Variables
cor_matrix <- cor(data[sapply(data, is.numeric)])
ggcorrplot(cor_matrix, lab = TRUE)

# --- Section 2: Hypothesis Testing ---

# 2.1 T-test for Resting Blood Pressure by Heart Disease Status

t_test_restingBP <- t.test(restingBP ~ target, data = data)
print(t_test_restingBP)

# 2.2 T-test for Maximum Heart Rate by Heart Disease Status

t_test_maxheartrate <- t.test(maxheartrate ~ target, data = data)
print(t_test_maxheartrate)

# --- Section 3: Logistic Regression Model ---

# Logistic Regression with age and serum cholesterol as predictors

log_reg_model <- glm(target ~ age + serumcholestrol, data = data, family = "binomial")
summary(log_reg_model)

# --- Section 4: Model Diagnostics and Goodness of Fit ---

# 4.1 Calculate Pseudo R-squared

pR2(log_reg_model)

# 4.2 Confusion Matrix for Model Predictions

predicted_class <- ifelse(predict(log_reg_model, type = "response") > 0.5, 1, 0)
table(Predicted = predicted_class, Actual = data$target)

5. Result and Discussion

5.1. T-Tests Interpretation
5.1.1. Resting Blood Pressure:
o The t-test comparing resting blood pressure (BP) between patients with heart
disease (target = 1) and those without (target = 0) yields a t-value of -17.342 with
a p-value < 2.2e-16. Since the p-value is significantly less than 0.05, we reject
the null hypothesis, indicating a significant difference in resting BP between the
two groups.
o Patients with heart disease have a higher mean resting BP (164.04) than those
without (134.77). This suggests that higher resting BP may be associated with the
presence of heart disease in this dataset.
5.1.2. Maximum Heart Rate:
o The t-test results show a t-value of -7.0404 with a p-value of 4.488e-12, which
also supports rejecting the null hypothesis. This indicates a statistically
significant difference in max heart rate between the groups.
o The mean max heart rate is lower in patients without heart disease (136.31) than
in those with heart disease (152.12). Higher maximum heart rate could therefore
be indicative of heart disease in this context.

5.2. Logistic Regression Interpretation

The logistic regression model examined age and serum cholesterol as predictors of
heart disease status (target):
 Age: The p-value for age is 0.966, suggesting it is not a significant predictor of
heart disease in this model.
 Serum Cholesterol: The coefficient for serum cholesterol is positive (0.0031)
with a high significance (p < 0.001), indicating that higher serum cholesterol
is significantly associated with an increased likelihood of heart disease.
The pseudo R-squared values (McFadden's R² = 0.028) suggest that the model
explains a small portion of the variance in heart disease status.

5.3. Model Evaluation (Confusion Matrix)

The confusion matrix shows that:
 The model correctly predicted 82 out of 420 patients without heart disease
and 480 out of 580 patients with heart disease.
 However, it misclassified a significant number of patients (338 false
positives and 100 false negatives), which suggests that the model may not be
highly accurate for prediction purposes, possibly due to limited predictive
power of age and serum cholesterol alone.

5.4. Decision-Making Suggestions

 Resting Blood Pressure and Maximum Heart Rate: Since both of these
metrics show a significant difference between patients with and without heart
disease, these could be valuable metrics for clinical evaluation and early risk
assessment.
 Serum Cholesterol: The logistic regression analysis shows it is a significant
predictor of heart disease. Thus, reducing serum cholesterol levels through
lifestyle changes or medication could potentially lower heart disease risk.
 Model Improvement: The current logistic regression model could be
improved by including additional predictors, such as other health indicators or
demographic factors, to increase predictive accuracy.

Plots:
6. Improvement and Suggestions for Decision Making
Health Screening and Monitoring:
 Given the strong associations, implement regular monitoring of resting BP and
maximum heart rate for patients, particularly those in high-risk categories, as
early indicators of cardiovascular risk.
 Serum cholesterol management, including lifestyle adjustments and, if
necessary, medication, is recommended as a preventative measure against heart
disease.

Enhanced Risk Models:

 Develop more comprehensive models by including additional variables beyond
age and cholesterol (e.g., lifestyle factors, genetic predisposition) to improve
predictive accuracy for heart disease.

Preventative Care and Lifestyle Counseling:

 Educate patients, especially those with elevated BP, heart rate, and cholesterol,
on lifestyle changes—like diet, exercise, and stress management—that can
lower their risk. This proactive approach could reduce the incidence of heart
disease over time.

Tailored Intervention Programs:

 For high-risk patients, consider creating personalized health programs focusing
on BP, heart rate, and cholesterol control, which may reduce their likelihood of
developing heart disease.

Final Perioperative Guideline For Print
100% (1)
Final Perioperative Guideline For Print
154 pages
Case Presentation Postnatal Acute Mastitis
67% (3)
Case Presentation Postnatal Acute Mastitis
36 pages
Postural Assessment
100% (1)
Postural Assessment
64 pages
Bailey - Surgical Textbook
75% (4)
Bailey - Surgical Textbook
1,127 pages
MTH3901 Mini Project Report 2021
100% (1)
MTH3901 Mini Project Report 2021
83 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages
Guide Orthopaedics-MCQs
100% (7)
Guide Orthopaedics-MCQs
139 pages
Abdominal Paracentesis
100% (1)
Abdominal Paracentesis
14 pages
Heart Disease Project Report Full
No ratings yet
Heart Disease Project Report Full
5 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
A Male Adult Patient Hospitalized For Treatment of A Pulmonary Embolism Develops Respiratory Alkalosis
100% (1)
A Male Adult Patient Hospitalized For Treatment of A Pulmonary Embolism Develops Respiratory Alkalosis
4 pages
181B226 Internship Report
No ratings yet
181B226 Internship Report
48 pages
Logistic Reg Application 2024-1
No ratings yet
Logistic Reg Application 2024-1
56 pages
Coronary Heart Risk Study
0% (1)
Coronary Heart Risk Study
2 pages
Chapter 3 Old
No ratings yet
Chapter 3 Old
45 pages
Ai For Life
No ratings yet
Ai For Life
48 pages
Development of Heart Disesase Prediction System Using Firefly Feature Selection and Logistic Regression Algorithm (Tobless)
No ratings yet
Development of Heart Disesase Prediction System Using Firefly Feature Selection and Logistic Regression Algorithm (Tobless)
42 pages
Heart Disease
No ratings yet
Heart Disease
37 pages
Project Report Soft
No ratings yet
Project Report Soft
123 pages
IRMM Chapter5
No ratings yet
IRMM Chapter5
73 pages
03 Supervised - Machine.learning - Classification
No ratings yet
03 Supervised - Machine.learning - Classification
45 pages
Heart Disease
No ratings yet
Heart Disease
33 pages
03-Supervised Machine Learning Classification
No ratings yet
03-Supervised Machine Learning Classification
33 pages
Heart Desease Presentation
No ratings yet
Heart Desease Presentation
23 pages
Heart Disease Prediction Project Documentation
No ratings yet
Heart Disease Prediction Project Documentation
22 pages
Final Project AinaMarti
No ratings yet
Final Project AinaMarti
21 pages
Heart Disease Prediction & Accuracy Estimation Comparison
No ratings yet
Heart Disease Prediction & Accuracy Estimation Comparison
24 pages
Antidepressants - AMBOSS
100% (1)
Antidepressants - AMBOSS
7 pages
Case Study
No ratings yet
Case Study
21 pages
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
No ratings yet
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
25 pages
Heart Disease Prediction-02-1
No ratings yet
Heart Disease Prediction-02-1
27 pages
Project Report
No ratings yet
Project Report
18 pages
PrimerEntregable MOET
No ratings yet
PrimerEntregable MOET
17 pages
Heart Disease Diagnostic Analysis
No ratings yet
Heart Disease Diagnostic Analysis
19 pages
Mini Projet
No ratings yet
Mini Projet
24 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Squash Seeds Final
No ratings yet
Squash Seeds Final
9 pages
Final AI Homework Amanuel Tesfalem
No ratings yet
Final AI Homework Amanuel Tesfalem
16 pages
AI-Based Predictive Support For Heart Disease Diagnosis
No ratings yet
AI-Based Predictive Support For Heart Disease Diagnosis
16 pages
10 Detailed Project Report
No ratings yet
10 Detailed Project Report
15 pages
My ML Project
No ratings yet
My ML Project
14 pages
AI Presentation
No ratings yet
AI Presentation
13 pages
Set 3 Report - Dhruv Pasricha - PA-31
No ratings yet
Set 3 Report - Dhruv Pasricha - PA-31
15 pages
AI & ML Report
No ratings yet
AI & ML Report
14 pages
Mini Review
No ratings yet
Mini Review
10 pages
A.I Lab Report
No ratings yet
A.I Lab Report
24 pages
QT Report
No ratings yet
QT Report
20 pages
Final
No ratings yet
Final
13 pages
A Woman's Ayurvedic Herbal A Guide For Natural Health and Well Being All Chapter
100% (11)
A Woman's Ayurvedic Herbal A Guide For Natural Health and Well Being All Chapter
14 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Health Care Analytics: Science
No ratings yet
Health Care Analytics: Science
16 pages
Eda Report
No ratings yet
Eda Report
8 pages
Web Application
No ratings yet
Web Application
13 pages
Data Science Week 4
No ratings yet
Data Science Week 4
14 pages
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
No ratings yet
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
6 pages
Ntroduction: Uses Proximity To Make Classifications or Predictions
No ratings yet
Ntroduction: Uses Proximity To Make Classifications or Predictions
7 pages
ALY6015 Final Project Report
No ratings yet
ALY6015 Final Project Report
19 pages
Project Report
No ratings yet
Project Report
6 pages
Abstract
No ratings yet
Abstract
4 pages
Project Mid
No ratings yet
Project Mid
4 pages
Heart Disease Predictive Analysis
No ratings yet
Heart Disease Predictive Analysis
4 pages
Machine Learning: Course-End Project Problem Statement
No ratings yet
Machine Learning: Course-End Project Problem Statement
4 pages
Synopsis
No ratings yet
Synopsis
4 pages
Lab Program 7
No ratings yet
Lab Program 7
5 pages
Project - Predicting Heart Disease
No ratings yet
Project - Predicting Heart Disease
2 pages
Final Report Mini Project-Sdm (Task B) Performing Logistic Regression To Predict The Occurrence of Heart Diseases
No ratings yet
Final Report Mini Project-Sdm (Task B) Performing Logistic Regression To Predict The Occurrence of Heart Diseases
4 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
5 pages
Mini Project - Heart Disease Statistical Report With Regression 2
No ratings yet
Mini Project - Heart Disease Statistical Report With Regression 2
6 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
Derm Blue Book
No ratings yet
Derm Blue Book
48 pages
Lec (1,2) Oral Surgery (Exodontia and Methods... )
No ratings yet
Lec (1,2) Oral Surgery (Exodontia and Methods... )
10 pages
Unit 1. A LONG AND HEALTHY LIFE
No ratings yet
Unit 1. A LONG AND HEALTHY LIFE
3 pages
C-Reactive Protein: Turbilatex
100% (1)
C-Reactive Protein: Turbilatex
1 page
Heart Attack
No ratings yet
Heart Attack
2 pages
International Ayurvedic Medical Journal: Case Report ISSN: 2320 5091 Impact Factor: 4.018
No ratings yet
International Ayurvedic Medical Journal: Case Report ISSN: 2320 5091 Impact Factor: 4.018
5 pages
Intensive Care Unit (Icu)
No ratings yet
Intensive Care Unit (Icu)
24 pages
Preeclampsia:Pathogenesis
No ratings yet
Preeclampsia:Pathogenesis
46 pages
Health Qube Brochure
No ratings yet
Health Qube Brochure
6 pages
Đáp Án
No ratings yet
Đáp Án
14 pages
(FREE PDF Sample) Neuroendovascular Surgery Progress in Neurological Surgery 1st Edition Michael S. Horowitz Ebooks
No ratings yet
(FREE PDF Sample) Neuroendovascular Surgery Progress in Neurological Surgery 1st Edition Michael S. Horowitz Ebooks
51 pages
Full Aquatic Fitness Professional Manual 7th Edition Aquatic Exercise Association Ebook All Chapters
No ratings yet
Full Aquatic Fitness Professional Manual 7th Edition Aquatic Exercise Association Ebook All Chapters
48 pages
2004 Lecture April-2024
No ratings yet
2004 Lecture April-2024
58 pages
Clinical Enzymology
No ratings yet
Clinical Enzymology
40 pages
Liver Alterations in Cats - A Literature Review
No ratings yet
Liver Alterations in Cats - A Literature Review
39 pages
Blaser 2015
No ratings yet
Blaser 2015
12 pages
Escherichia Coli: September 2015
No ratings yet
Escherichia Coli: September 2015
5 pages
Gabion - Porcine Cystitis Pyelonephritis Complex - Finals
No ratings yet
Gabion - Porcine Cystitis Pyelonephritis Complex - Finals
7 pages
Fruits
No ratings yet
Fruits
2 pages
Complementary and Alternative Medical Lab Testing Part 3: Cardiology
From Everand
Complementary and Alternative Medical Lab Testing Part 3: Cardiology
Ronald Steriti
1/5 (1)
Complementary and Alternative Medical Lab Testing Part 4: Vascular
From Everand
Complementary and Alternative Medical Lab Testing Part 4: Vascular
Ronald Steriti
No ratings yet

Project Deliverable 3

Uploaded by

Project Deliverable 3

Uploaded by

Project Deliverable 3.

4. Conducting Significance Testing and Regression

# Assume your dataset is named `data`

# --- Section 1: Descriptive Statistics and Visualizations ---

# 1.1 Descriptive statistics for numerical variables

# 1.2 Frequency counts for categorical variables

# 1.3 Proportion of target outcomes (Heart Disease vs. No Heart Disease)

# 1.4 Graphs to illustrate descriptive statistics

# Histogram for Age Distribution

# Bar Plot for Chest Pain Types

# Boxplot for Resting Blood Pressure by Target

# Scatter Plot for Age vs. Max Heart Rate by Target

# Bar Plot for Exercise-Induced Angina vs Heart Disease

# --- Section 2: Hypothesis Testing ---

# 2.1 T-test for Resting Blood Pressure by Heart Disease Status

# 2.2 T-test for Maximum Heart Rate by Heart Disease Status

# --- Section 3: Logistic Regression Model ---

# Logistic Regression with age and serum cholesterol as predictors

# --- Section 4: Model Diagnostics and Goodness of Fit ---

# 4.1 Calculate Pseudo R-squared

# 4.2 Confusion Matrix for Model Predictions

5. Result and Discussion

5.2. Logistic Regression Interpretation

5.3. Model Evaluation (Confusion Matrix)

5.4. Decision-Making Suggestions

Enhanced Risk Models:

Preventative Care and Lifestyle Counseling:

Tailored Intervention Programs:

You might also like