Regression Script
Regression Script
library(ggplot2)
library(car) # for VIF
library(lmtest) # for Breusch-Pagan and Durbin-Watson tests
library(nortest) # for normality tests
library(ggfortify) # for diagnostic plots
# Load dataset
data(mtcars)
# Interpretation:
# The summary shows coefficient estimates, R-squared, and significance.
# Significant predictors (p < 0.05) impact mpg.
# R-squared tells how well the model explains mpg variation.
## 1. Linearity Assumption
plot(model, which = 1)
# Interpretation:
# The Residuals vs Fitted plot should show no distinct pattern.
# A random scatter suggests a linear relationship between predictors and the dependent variable.
# If you see a curve or funnel shape, linearity may be violated.
## 2. Normality of Residuals
residual <- residuals(model)
shapiro.test(residual)
qqnorm (residual, col= “red”) # Q-Q Plot
qqline (residual, col= “green”) # Q-Q Plot
# Interpretation:
# Q-Q plot should show points roughly along the diagonal.
# Shapiro-Wilk test p-value > 0.05 suggests residuals are normally distributed.
# If p < 0.05, residuals deviate significantly from normality.
# Shape:
# Look for a bell-shaped curve that is roughly symmetrical around zero.
# If the shape is skewed or multi-peaked, it suggests non-normality.
# Center:
#The bulk of residuals should be centered around zero.
# Spread:
#Residuals should be spread reasonably without extreme outliers.
# Interpretation:
# Breusch-Pagan test checks for equal residual variance (homoscedasticity).
# p-value > 0.05 = residuals have constant variance → assumption met.
# p < 0.05 = heteroscedasticity (non-constant variance), which can bias standard errors.
## 4. Multicollinearity
vif(model)
# Interpretation:
# VIF values above 5 (some say 10) indicate high multicollinearity.
# High VIF means predictors are correlated, which can inflate SEs and distort the model.
# Try removing or combining variables with high VIF.
## 5. Independence of Residuals
dwtest(model)
# Interpretation:
# Durbin-Watson test detects autocorrelation in residuals.
# DW ≈ 2 and p > 0.05 = residuals are independent.
# p < 0.05 = autocorrelation present, common in time series data.
# Interpretation:
# Residuals should appear randomly scattered around zero.
# Patterns