Multiple Linear Regression-Part V Exhaustive Search: Dr. Gaurav Dixit
Multiple Linear Regression-Part V Exhaustive Search: Dr. Gaurav Dixit
EXHAUSTIVE SEARCH
LECTURE 26
1
MULTIPLE LINEAR REGRESSION
• Exhaustive Search
– Large no. of subsets
– Criteria to compare models
• Adjusted R2
2
MULTIPLE LINEAR REGRESSION
• Adjusted R2
𝑛−1
R2adj = 1 − (1 − 𝑅2)
𝑛−𝑝−1
Where R2 is proportion of explained variability in the model
𝑛
𝑆𝑆𝐸 𝑖=1(𝑦𝑖−𝑦𝑖 )
R2 =1− =1− 𝑛 (𝑦𝑖−𝑦)
𝑆𝑆𝑇 𝑖=1
1 𝑛
𝑦= 𝑖=1 𝑦𝑖
𝑛
• R2 is called coefficient of determination
3
MULTIPLE LINEAR REGRESSION
• Exhaustive Search
– Criteria to compare models
• Mallow’s Cp
• Mallow’s Cp
SSR
Cp = +2 p+1 −n
σf 2
5
MULTIPLE LINEAR REGRESSION
• Mallow’s Cp
– Assumption: full model with all predictors is unbiased
• Predictors elimination would reduce the variability
– Best subset model would have Cp ~ p+1 and p would be a small value
– Requires high n value for the training partition relative to p
• Open RStudio
6
MULTIPLE LINEAR REGRESSION
• Partial-iterative search
– Computationally cheaper
– Best subset is not guaranteed
• Potential of missing “good” sets of predictors
– Produce close-to-best subsets
– Preferred approach for large no. of predictors
– For moderate no. of predictors, exhaustive search is better
• Trade-off between computation cost vs. potential of finding
best subset
7
MULTIPLE LINEAR REGRESSION
• Open RStudio
8
Key References
9
Thanks…
10