0% found this document useful (0 votes)
17 views

Assumptions Linear Regression

Linear regression models make several key assumptions, including linearity, constant error variance, independent error terms, normality of errors, no multicollinearity, and exogeneity. If these assumptions are violated, it can lead to unreliable coefficient estimates and standard errors in the regression output. Various statistical tests can be used to detect assumption violations, such as residual plots and likelihood ratio tests for linearity, Breusch-Pagan tests for constant error variance, Durbin-Watson tests for independent error terms, and variance inflation factors for multicollinearity. Meeting the model assumptions is important for obtaining accurate and reliable results from linear regression analysis.

Uploaded by

Abhijeet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Assumptions Linear Regression

Linear regression models make several key assumptions, including linearity, constant error variance, independent error terms, normality of errors, no multicollinearity, and exogeneity. If these assumptions are violated, it can lead to unreliable coefficient estimates and standard errors in the regression output. Various statistical tests can be used to detect assumption violations, such as residual plots and likelihood ratio tests for linearity, Breusch-Pagan tests for constant error variance, Durbin-Watson tests for independent error terms, and variance inflation factors for multicollinearity. Meeting the model assumptions is important for obtaining accurate and reliable results from linear regression analysis.

Uploaded by

Abhijeet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Linear regression

Assumptions
Linearity
Assumption around correct functional form

The model is simple and additive.

The parameters are linear.


Linearity
Consider the following model:
Linearity
Consider the second model:
What is the issue if linearity is not there?
If the functional form is incorrect, then the coefficient and the standard errors in the
output are not reliable.

How can we assure this?

● It can be detected with the residual plots or likelihood ratio(LR) test.


● These two can be used as a statistical measure to compare the goodness of
fit of distinct models.
Constant error variance(a.k.a. homoskedasticity or no
heteroskedasticity)
Homo - same and Scedasticity - variance

Consider the following model:


Constant error variance(a.k.a. homoskedasticity or no
heteroskedasticity)

The issue that arise because of heteroskedasticity is the standard error in output cannot
be relied upon.

Breusch-Pagan test can be used to detect it. To remedy it log of X and Y could be used.
Independent error terms
Each successive error is independent of the last term. If it is violated then it is
called autocorrelation

So, no autocorrelation should exist.

This violation can occur only in time-series data.


Independent error terms
Consider the following model:

The issue if autocorrelation exists is that the standard error in the output become
unreliable. Durbin-Watson test could be used to detect it.
Normality of errors
Most of the data points are centred around the best-fit line and fewer on the
extremes.

Consider the following model:


Normality of errors
If it is violated and n is small, standard error is the output is affected.

The test which can be used for detecting it is Shapiro-Wilk test. We can use
histogram or Q-Q plot also.
No multicollinearity (a.k.a truly independent X terms)
Multicollinearity occurs when the X’s are related to each other.
Consider the following model:

The issue is that the coefficients and standard errors of the affecting variables are
unreliable. To detect it, we can look at the correlation between X variables or use
Variance Inflation Factors(VIF). Remedy is to remove one of the variable.
Exogeneity
No omitted variable bias.
Consider the following model:

Socio-economic status affects both X and Y which could cause omitted variable
bias. It would affect εi as education is no longer wholly exogenous as it can be
explained by the error term.
In absence of exogeneity, model can be used for predictive purpose and not for
causation purpose. Correlation can be used but not clear way of detecting it.
Intuition is the only solution.

You might also like