week6 pre稿
week6 pre稿
Linear regression is widely used to model the relationship between a response and
several predictors. However, when some of the predictors are highly correlated, the
regression results can become unstable and misleading. This issue is known as
multicollinearity.
This creates redundancy in the model, making it difficult to isolate the individual
effect of each predictor. As a result, the standard errors of the regression coefficients
inflate, which can lead to non-significant t-tests, even for predictors that are truly
important. In severe cases, the coefficients may change direction or magnitude
drastically depending on the inclusion of other variables, leading to interpretation
errors. This also reduces the statistical power and overall credibility of the model. So
detecting and addressing this issue is essential before interpreting the regression
outputs.
Correlation
A good starting point is to examine the correlation matrix of the predictors. If some
pairs of variables show correlation coefficients above 0.70 or 0.80, this signals a
potential multicollinearity problem. However, pairwise correlation alone is not
sufficient. A more comprehensive diagnostic is the Variance Inflation Factor (VIF),
which measures how much the variance of a regression coefficient increases due to
multicollinearity. A VIF value above 10 indicates a serious problem, while values
between 5 and 10 suggest moderate concern.
Summary
Multicollinearity is a serious threat to model interpretability. As emphasized by Hair
et al. (p. 314), it must be detected and corrected before drawing conclusions from
regression coefficients. We have shown three ways to handle it: eliminating redundant
predictors, combining variables through PCA, and applying Ridge regression. Each
has its own trade-offs. The key is to choose the approach that aligns with your
analysis goals — be it clarity of interpretation or prediction performance.