0% found this document useful (0 votes)
24 views7 pages

Practical Session 2 Linear Regression Model Assumptions

Practical session 1

Uploaded by

Jad Zakaria Elo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Practical Session 2 Linear Regression Model Assumptions

Practical session 1

Uploaded by

Jad Zakaria Elo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Practical Session 2: Testing the Assumptions of the

Linear Regression Model


Objective:
In this session, students will test the assumptions of the classical linear regression
model using both manual methods and R. The model investigates the relationship
between market return (independent variable X) and Apple Inc.’s stock return
(dependent variable Y).
Data:
Observation Market Return (X) Stock Return (Y)
1 5 7
2 10 12
3 15 14
4 20 19
5 25 22

This session builds on the previous one, where we derived the regression equation:

Yt =  +  X t
From the previous session, we have calculated the following values:

•  = 3.7
•  = 0.74
Thus, the estimated regression equation is:

Yt = 3.7 + 0.74 X t
In this session, we will focus on verifying whether the model satisfies the core
assumptions of the classical linear regression framework, which are critical for
ensuring the reliability of our results. The assumptions we will test are:
1. The errors have zero mean: E(ut)=0
2. Homoscedasticity: Var(ut)=σ2
3. No autocorrelation of errors: Cov(ui,uj)=0
4. No correlation between the errors and the independent variable:
Cov(ut,Xt)=0
5. Normality of the residuals: ut∼N(0,σ2)
For each assumption, we will:
• Provide the manual steps of the test,
• Verify the results using R,
• Interpret the findings,

I. The errors have zero mean: E(ut)=0?


1.1 Manual Calculation
• Calculate the residuals :

Yt = 3.7 + 0.74 X t
• Check if the mean of the residuals is approximately zero:

Mean of the residuals =


u t

n
Interpretation:
• Mean of residuals: If the mean is close to zero, the assumption E(ut)=0 holds.
This means that, on average, the errors do not systematically overestimate or
underestimate the dependent variable (stock return).
1.2 R Code
# Data: import your data into R
# Linear regression
model <- lm(stock_return ~ market_return)

# Residuals
residuals <- model$residuals

# Mean of residuals
mean_residuals <- mean(residuals)
mean_residuals
II. Homoscedasticity: Var(ut)=σ2
2.1 Manual Calculation:

• Calculate the residuals ut: The residuals ut are the difference between the
observed values Yt and the predicted values Yt

• Visual Inspection: Residuals vs. Fitted (Predicted Y) Values Plot: Plot the
residuals 𝑢𝑡 on the y-axis and the predicted (fitted) values Yt on the x-axis.

Interpretation: If the residuals are spread evenly across all predicted values (i.e., the
spread does not systematically increase or decrease), the variance is constant,
indicating homoscedasticity. If the spread shows patterns, such as a funnel shape
(increasing or decreasing spread), this suggests heteroscedasticity.

2.2 R Code
# Plot residuals vs. fitted values to visually inspect homoscedasticity
plot(model$fitted.values, residuals, xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")

# Perform Breusch-Pagan test for homoscedasticity


library(lmtest)
bptest(model)
Interpretation:
Breusch-Pagan Test: This is a formal statistical test for heteroscedasticity.
• H0: The residuals are homoscedastic (constant variance).
• H1: The residuals are heteroscedastic (variance is not constant).
Homoscedasticity (Breusch-Pagan test): If the p-value of the Breusch-Pagan test is
greater than 0.05, there is no evidence of heteroscedasticity. This means the residuals
have a constant variance, and the model’s estimates (coefficients) are reliable.

III. No autocorrelation of errors: Cov (ui,uj)=0


This assumption checks if the residuals are independent of each other. For time
series data, this is typically done with tests like the Durbin-Watson test.
3.1 Manual Calculation:
For independent data points (not time series), autocorrelation is usually not a concern.
However, if you're dealing with time series data, you can calculate the covariance
between residuals from different observations.
3.2 R Code
# Durbin-Watson test for autocorrelation of residuals
dwtest(model)
Interpretation:
• Autocorrelation (Durbin-Watson test): If the Durbin-Watson test statistic is
close to 2, there is no autocorrelation in the residuals. This means that the errors
for one observation do not influence or relate to the errors of another
observation.

IV. Cov(ut, xt) = 0: No Correlation Between Errors and


Independent Variable
This assumption tests that the residuals are not correlated with the independent
variable. If they are, it indicates a problem with the model specification.
4.1 Manual Calculation:
• Calculate the correlation coefficient between the residuals ui and the
independent variable Xi.

• If the correlation is close to zero, the assumption holds.


Interpretation:
No correlation between residuals and independent variable: A correlation close to 0
indicates that there is no correlation between the residuals and the independent
variable (market return). This suggests that the error term does not capture any part of
the independent variable.
4.2 R Code
# Correlation between residuals and independent variable (market return)
cor(residuals, market_return)

V. Normality of the residuals: ut∼N(0,σ2)


The residuals should be normally distributed for the hypothesis tests to be valid.
5.1 Manual Calculation:
• Plot a histogram of the residuals: Step: After calculating the residuals Ui plot a
histogram.
Interpretation: If the histogram shows a bell-shaped curve (similar to a normal
distribution), it suggests the residuals might follow a normal distribution. If the
histogram is skewed or shows extreme peaks or flatness, this may indicate a
departure from normality.
➢ Calculate skewness and kurtosis to assess whether the residuals follow a normal
distribution.
How to Interpret the Results:
• Skewness ≈ 0 and Kurtosis ≈ 3: This suggests that the residuals are normally
distributed.
• Significant deviation from 0 (skewness) or 3 (kurtosis): This indicates that
the residuals deviate from normality, which may violate assumptions of the
classical linear regression model (such as normality of errors).

5.2 R Code
# Plot histogram of residuals
hist(residuals, breaks=5, main="Histogram of Residuals", xlab="Residuals")

# Q-Q plot to check normality


qqnorm(residuals)
qqline(residuals, col="red")

# Shapiro-Wilk test for normality


shapiro.test(residuals)

Interpretation:
Normality (Shapiro-Wilk test and Q-Q plot): If the p-value of the Shapiro-Wilk test is
greater than 0.05, and the Q-Q plot shows points close to the line, the residuals can
be considered normally distributed. This means the error terms follow a normal
distribution, which is necessary for valid statistical inference (e.g., t-tests, F-tests).

You might also like