Assumptions Linear Regression
Assumptions Linear Regression
Assumptions
Linearity
Assumption around correct functional form
The issue that arise because of heteroskedasticity is the standard error in output cannot
be relied upon.
Breusch-Pagan test can be used to detect it. To remedy it log of X and Y could be used.
Independent error terms
Each successive error is independent of the last term. If it is violated then it is
called autocorrelation
The issue if autocorrelation exists is that the standard error in the output become
unreliable. Durbin-Watson test could be used to detect it.
Normality of errors
Most of the data points are centred around the best-fit line and fewer on the
extremes.
The test which can be used for detecting it is Shapiro-Wilk test. We can use
histogram or Q-Q plot also.
No multicollinearity (a.k.a truly independent X terms)
Multicollinearity occurs when the X’s are related to each other.
Consider the following model:
The issue is that the coefficients and standard errors of the affecting variables are
unreliable. To detect it, we can look at the correlation between X variables or use
Variance Inflation Factors(VIF). Remedy is to remove one of the variable.
Exogeneity
No omitted variable bias.
Consider the following model:
Socio-economic status affects both X and Y which could cause omitted variable
bias. It would affect εi as education is no longer wholly exogenous as it can be
explained by the error term.
In absence of exogeneity, model can be used for predictive purpose and not for
causation purpose. Correlation can be used but not clear way of detecting it.
Intuition is the only solution.