Multiple Linear Regression Part-4: Lectures 25
Multiple Linear Regression Part-4: Lectures 25
LECTURES 25
1
MULTIPLE LINEAR REGRESSION
• Variable Selection
– Availability of large no. of variables for selecting a set of predictors
– Main idea is to select most useful set of predictors for a given outcome
variable of interest
– Selecting all the variables in the model is not recommended
• Data collection issues in future
• Measurement accuracy issues for some variables
• Missing values
• Parsimony
2
MULTIPLE LINEAR REGRESSION
• Variable Selection
– Selecting all the variables in the model is not recommended
• Multicollinearity: two or more predictors sharing the same linear relationship with
the outcome variable
• Sample size issues: Rule of thumb
n > 5*(p+2)
Where n=no. of observations
And p=no. of predictors
• Variance of predictions might increase due to inclusion of predictors which are
uncorrelated with the outcome variable
• Average error of predictions might increase due to exclusion of predictors which
are correlated with the outcome variable
3
MULTIPLE LINEAR REGRESSION
• Bias-variance trade-off
– too few vs. too many predictors
• Few predictors -> higher bias -> lower variance
– Drop variables with ‘coefficient < std. dev. of noise’ and with moderate
or high correlation with other variables
• Lower variance
4
MULTIPLE LINEAR REGRESSION
• Exhaustive Search
– Large no. of subsets
– Criteria to compare models
• Adjusted R2
5
Key References
6
Thanks…