Multiple Linear Regression
Multiple Linear Regression
Sanjay Fuloria
Model and Equation
2
Estimated Multiple Regression Equation
3
Problem
4
Solution
5
Multiple R Squared and Adjusted R Squared
6
Problem
7
Solution
8
Problem
9
Solution
10
Model Diagnostics
Model diagnostics in multiple regression are essential steps taken after fitting a
regression model to assess its validity and to check whether the underlying
assumptions of the regression analysis are met.
Proper diagnostics help ensure that the conclusions drawn from the model are
reliable and that the predictions made are accurate.
This process involves examining residuals (the differences between observed and
predicted values) and other statistical measures to detect any anomalies or
violations of regression assumptions.
11
Key Assumptions in Multiple Regression
Linearity: The relationship between each predictor and the response variable is
linear.
Independence: Observations are independent of each other.
Homoscedasticity: The variance of residuals is constant across all levels of the
independent variables.
Normality: The residuals are normally distributed.
No Multicollinearity: Independent variables are not highly correlated with each other.
No Influential Outliers: There are no extreme values unduly influencing the model.
12
Example
Dependent Variable: House Price (in thousands of dollars)
Independent Variables:
Square Footage
Number of Bedrooms
Age of the House (in years)
Distance to City Center (in miles)
The analyst collects data from 200 houses and fits a multiple regression model.
13
Model Diagnostics Steps
Residual Analysis
Residuals vs. Fitted Values Plot: Plotting residuals against the predicted (fitted) values
helps check for homoscedasticity and linearity.
Ideal Outcome: Residuals are randomly scattered around zero with no discernible
pattern.
Potential Issue: A funnel shape indicates heteroscedasticity (non-constant variance).
Normal Q-Q Plot: This plot assesses the normality of residuals.
Ideal Outcome: Residuals fall approximately along the reference line.
Potential Issue: Systematic deviations from the line suggest non-normality.
14
Linearity Check
15
Multicollinearity Detection
Variance Inflation Factor (VIF): Calculates how much the variance of an estimated
regression coefficient increases due to multicollinearity.
Potential Issue: High VIF values suggest multicollinearity, which can be addressed
by removing or combining correlated variables.
16
Outliers and Influential Points
Leverage and Cook's Distance:
Ideal Outcome: Most observations have low leverage and low Cook's
Distance.
Potential Issue: Points with high leverage and Cook's Distance may unduly
influence the model.
17
Independence of errors
Durbin-Watson Test: Checks for autocorrelation in residuals.
Examples:
Temporal Autocorrelation : If house sales are recorded over consecutive months or years.
The residuals from houses sold in one month might be similar to those in the next month
due to market trends.
Houses located in the same neighborhood might have similar errors due to unobserved
local factors.
18
Residuals Vs. Observations (Autocorrelation)
19
Homoscedasticity Tests
Breusch-Pagan Test: Statistically tests for constant variance.
Example
Low Priced Houses : Prices are tightly clustered around the predicted values.
High Priced Houses : Prices vary widely around the predicted values.
Reasons: Market Factors (High priced houses might have more unique features)
20
Heteroscedasticity
21
Addressing Detected Issues
22