Lab 2
Lab 2
2023-11-10
Introduction
This report aims to analyze the relationship between the number of calories consumed and
the corresponding weight gained in grams. The dataset provided contains information on
calories consumed and weight gained over a period. We will employ simple linear
regression to understand and model this relationship.
Analysis
a. Steps Involved in Building a Simple Linear Regression Model
Data Collection: The dataset includes two variables - “Weight_gained_grams” and
“Calories_Consumed.” Data Exploration: Examine the dataset for any anomalies, missing
values, or patterns.
Data Visualization: Create visualizations such as scatter plots to visualize the relationship
between calories consumed and weight gained.
Correlation Analysis: Calculate the correlation coefficient to quantify the strength and
direction of the relationship.
Model Training: Split the dataset into training and testing sets. Train the model on the
training set. Model Evaluation: Evaluate the model’s performance on the testing set using
appropriate metrics.
library(readr)
data <- read_csv("C:/Users/Admin/Downloads/calories_consumed.csv")
## Rows: 14 Columns: 2
## ── Column specification
────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Weight_gained_grams, Calories_Consumed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this
message.
# a. Scatter Diagram and Coefficient of Correlation
plot(data$Calories_Consumed, data$Weight_gained_grams, main="Scatter Plot",
xlab="Calories Consumed", ylab="Weight Gained (grams)")
##
## Call:
## lm(formula = Weight_gained_grams ~ Calories_Consumed, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -158.67 -107.56 36.70 81.68 165.53
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -625.75236 100.82293 -6.206 4.54e-05 ***
## Calories_Consumed 0.42016 0.04115 10.211 2.86e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 111.6 on 12 degrees of freedom
## Multiple R-squared: 0.8968, Adjusted R-squared: 0.8882
## F-statistic: 104.3 on 1 and 12 DF, p-value: 2.856e-07
## Residuals:
## 103.5174 -140.6079 97.21979 -98.59224 -124.6392 63.50174 165.5331 -
110.5453 49.31377 87.14147 24.09077 -22.54525 -158.6706 65.28245
# R-squared value
r_squared <- summary(model)$r.squared
cat("R-squared Value:", r_squared, "\n\n")
R=cor(data$Weight_gained_grams,fit)
R
## [1] 0.946991
R^2
## [1] 0.896792
From above we could say that our predicted value is significant as we have the same
correlation value.The R-squared value is approximately 0.8968, meaning that around
89.68% of the variability in weight gained is explained by the model. This is a relatively
high R-squared value, indicating a strong relationship.
summary(model)
##
## Call:
## lm(formula = Weight_gained_grams ~ Calories_Consumed, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -158.67 -107.56 36.70 81.68 165.53
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -625.75236 100.82293 -6.206 4.54e-05 ***
## Calories_Consumed 0.42016 0.04115 10.211 2.86e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 111.6 on 12 degrees of freedom
## Multiple R-squared: 0.8968, Adjusted R-squared: 0.8882
## F-statistic: 104.3 on 1 and 12 DF, p-value: 2.856e-07
The critical t-value for a two-tailed test with 12 degrees of freedom at a 0.025 significance
level is approximately 2.179.Since the t-value of 10.211 is much larger than 2.179, you
would reject the null hypothesis for the coefficient of “Calories_Consumed” in a two-tailed
test as well. The large t-value indicates that the effect of “Calories_Consumed” is statistically
significant, whether considering a positive or negative relationship.
Conclusion
This report has demonstrated the application of simple linear regression to understand the
relationship between calories consumed and weight gained. The analysis provides insights
into the predictive power of the model and assesses its quality of fit based on the given
dataset.