0% found this document useful (0 votes)
6 views3 pages

Regression Script

The document outlines the process of fitting a linear regression model using the mtcars dataset and performing various assumption checks. It covers linearity, normality of residuals, homoscedasticity, multicollinearity, and independence of residuals, providing interpretations for each check. The document emphasizes the importance of these assumptions in ensuring the validity of the regression model.

Uploaded by

rgulati005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Regression Script

The document outlines the process of fitting a linear regression model using the mtcars dataset and performing various assumption checks. It covers linearity, normality of residuals, homoscedasticity, multicollinearity, and independence of residuals, providing interpretations for each check. The document emphasizes the importance of these assumptions in ensuring the validity of the regression model.

Uploaded by

rgulati005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

# Load necessary libraries

library(ggplot2)
library(car) # for VIF
library(lmtest) # for Breusch-Pagan and Durbin-Watson tests
library(nortest) # for normality tests
library(ggfortify) # for diagnostic plots

# Load dataset
data(mtcars)

# Fit the linear regression model


model <- lm(Y ~ X1 + X2 + X3 + X4, data = DATA)
summary(model)

# Interpretation:
# The summary shows coefficient estimates, R-squared, and significance.
# Significant predictors (p < 0.05) impact mpg.
# R-squared tells how well the model explains mpg variation.

# --- Assumption Checks with Interpretations ---

## 1. Linearity Assumption
plot(model, which = 1)

# Interpretation:
# The Residuals vs Fitted plot should show no distinct pattern.
# A random scatter suggests a linear relationship between predictors and the dependent variable.
# If you see a curve or funnel shape, linearity may be violated.

## 2. Normality of Residuals
residual <- residuals(model)
shapiro.test(residual)
qqnorm (residual, col= “red”) # Q-Q Plot
qqline (residual, col= “green”) # Q-Q Plot

# Interpretation:
# Q-Q plot should show points roughly along the diagonal.
# Shapiro-Wilk test p-value > 0.05 suggests residuals are normally distributed.
# If p < 0.05, residuals deviate significantly from normality.

# Plot the histogram


hist(residual,
breaks = 10,
col = "skyblue",
main = "Histogram of Residuals",
xlab = "Residuals",
ylab = "Frequency")

# Shape:
# Look for a bell-shaped curve that is roughly symmetrical around zero.
# If the shape is skewed or multi-peaked, it suggests non-normality.
# Center:
#The bulk of residuals should be centered around zero.
# Spread:
#Residuals should be spread reasonably without extreme outliers.

## 3. Homoscedasticity (Constant Variance)


bptest(model)

# Interpretation:
# Breusch-Pagan test checks for equal residual variance (homoscedasticity).
# p-value > 0.05 = residuals have constant variance → assumption met.
# p < 0.05 = heteroscedasticity (non-constant variance), which can bias standard errors.

# Generate the Scale-Location plot


plot(model, which = 3)

# --- INTERPRETATION GUIDE ---


# Scale-Location Plot (aka Spread-Location or √|Standardized Residuals| vs Fitted values):
# - X-axis: Fitted values (predicted mpg)
# - Y-axis: Square root of the absolute standardized residuals
# - Each point = one observation
# - Red smooth line helps identify any trend in the spread

# What you want to see:


# Points scattered randomly around the red line with no clear pattern.
#The red line is mostly flat.
#This suggests **homoscedasticity**: the variance of residuals is roughly constant.

# What you DON'T want to see:


# A funnel shape (spread increases or decreases across fitted values)
# A curved or sloped red line
# This suggests **heteroscedasticity**, meaning the model’s residuals have **non-constant
variance**.

## 4. Multicollinearity
vif(model)
# Interpretation:
# VIF values above 5 (some say 10) indicate high multicollinearity.
# High VIF means predictors are correlated, which can inflate SEs and distort the model.
# Try removing or combining variables with high VIF.

## 5. Independence of Residuals
dwtest(model)

# Interpretation:
# Durbin-Watson test detects autocorrelation in residuals.
# DW ≈ 2 and p > 0.05 = residuals are independent.
# p < 0.05 = autocorrelation present, common in time series data.

## Optional: Residual Plot


plot(model$residuals, main = "Residuals", ylab = "Residuals", xlab = "Index")
abline(h = 0, col = "red")

# Interpretation:
# Residuals should appear randomly scattered around zero.
# Patterns

You might also like