Practical Session 2 Linear Regression Model Assumptions
Practical Session 2 Linear Regression Model Assumptions
This session builds on the previous one, where we derived the regression equation:
Yt = + X t
From the previous session, we have calculated the following values:
• = 3.7
• = 0.74
Thus, the estimated regression equation is:
Yt = 3.7 + 0.74 X t
In this session, we will focus on verifying whether the model satisfies the core
assumptions of the classical linear regression framework, which are critical for
ensuring the reliability of our results. The assumptions we will test are:
1. The errors have zero mean: E(ut)=0
2. Homoscedasticity: Var(ut)=σ2
3. No autocorrelation of errors: Cov(ui,uj)=0
4. No correlation between the errors and the independent variable:
Cov(ut,Xt)=0
5. Normality of the residuals: ut∼N(0,σ2)
For each assumption, we will:
• Provide the manual steps of the test,
• Verify the results using R,
• Interpret the findings,
Yt = 3.7 + 0.74 X t
• Check if the mean of the residuals is approximately zero:
n
Interpretation:
• Mean of residuals: If the mean is close to zero, the assumption E(ut)=0 holds.
This means that, on average, the errors do not systematically overestimate or
underestimate the dependent variable (stock return).
1.2 R Code
# Data: import your data into R
# Linear regression
model <- lm(stock_return ~ market_return)
# Residuals
residuals <- model$residuals
# Mean of residuals
mean_residuals <- mean(residuals)
mean_residuals
II. Homoscedasticity: Var(ut)=σ2
2.1 Manual Calculation:
• Calculate the residuals ut: The residuals ut are the difference between the
observed values Yt and the predicted values Yt
• Visual Inspection: Residuals vs. Fitted (Predicted Y) Values Plot: Plot the
residuals 𝑢𝑡 on the y-axis and the predicted (fitted) values Yt on the x-axis.
Interpretation: If the residuals are spread evenly across all predicted values (i.e., the
spread does not systematically increase or decrease), the variance is constant,
indicating homoscedasticity. If the spread shows patterns, such as a funnel shape
(increasing or decreasing spread), this suggests heteroscedasticity.
2.2 R Code
# Plot residuals vs. fitted values to visually inspect homoscedasticity
plot(model$fitted.values, residuals, xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
5.2 R Code
# Plot histogram of residuals
hist(residuals, breaks=5, main="Histogram of Residuals", xlab="Residuals")
Interpretation:
Normality (Shapiro-Wilk test and Q-Q plot): If the p-value of the Shapiro-Wilk test is
greater than 0.05, and the Q-Q plot shows points close to the line, the residuals can
be considered normally distributed. This means the error terms follow a normal
distribution, which is necessary for valid statistical inference (e.g., t-tests, F-tests).