0% found this document useful (0 votes)
31 views7 pages

Multiple Linear Regression Part-4: Lectures 25

This document discusses variable selection in multiple linear regression. It explains that selecting all available variables is not recommended due to issues like multicollinearity, sample size constraints, and increased variance. It describes bias-variance tradeoffs in variable selection and recommends dropping uncorrelated or highly correlated variables. Statistical methods like exhaustive and partial-iterative search can be used to reduce variables, comparing models based on criteria like adjusted R-squared. Domain knowledge and practical reasons also guide variable selection.

Uploaded by

Aniket Sujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views7 pages

Multiple Linear Regression Part-4: Lectures 25

This document discusses variable selection in multiple linear regression. It explains that selecting all available variables is not recommended due to issues like multicollinearity, sample size constraints, and increased variance. It describes bias-variance tradeoffs in variable selection and recommends dropping uncorrelated or highly correlated variables. Statistical methods like exhaustive and partial-iterative search can be used to reduce variables, comparing models based on criteria like adjusted R-squared. Domain knowledge and practical reasons also guide variable selection.

Uploaded by

Aniket Sujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

MULTIPLE LINEAR REGRESSION Part-4

LECTURES 25

DR. GAURAV DIXIT


DEPARTMENT OF MANAGEMENT STUDIES

1
MULTIPLE LINEAR REGRESSION

• Variable Selection
– Availability of large no. of variables for selecting a set of predictors
– Main idea is to select most useful set of predictors for a given outcome
variable of interest
– Selecting all the variables in the model is not recommended
• Data collection issues in future
• Measurement accuracy issues for some variables
• Missing values
• Parsimony

2
MULTIPLE LINEAR REGRESSION

• Variable Selection
– Selecting all the variables in the model is not recommended
• Multicollinearity: two or more predictors sharing the same linear relationship with
the outcome variable
• Sample size issues: Rule of thumb
n > 5*(p+2)
Where n=no. of observations
And p=no. of predictors
• Variance of predictions might increase due to inclusion of predictors which are
uncorrelated with the outcome variable
• Average error of predictions might increase due to exclusion of predictors which
are correlated with the outcome variable

3
MULTIPLE LINEAR REGRESSION

• Bias-variance trade-off
– too few vs. too many predictors
• Few predictors -> higher bias -> lower variance
– Drop variables with ‘coefficient < std. dev. of noise’ and with moderate
or high correlation with other variables
• Lower variance

• Steps to reduce the no. of predictors


– Domain knowledge
– Practical reasons

4
MULTIPLE LINEAR REGRESSION

• Steps to reduce the no. of predictors


– Summary statistics and graphs
– Statistical methods using computational power
• Exhaustive search: all possible combinations
• Partial-iterative search: algorithm based

• Exhaustive Search
– Large no. of subsets
– Criteria to compare models
• Adjusted R2

5
Key References

• Data Science and Big Data Analytics: Discovering, Analyzing,


Visualizing and Presenting Data by EMC Education Services
(2015)
• Data Mining for Business Intelligence: Concepts, Techniques,
and Applications in Microsoft Office Excel with XLMiner by
Shmueli, G., Patel, N. R., & Bruce, P. C. (2010)

6
Thanks…

You might also like