Introduction To Econometrics
Introduction To Econometrics
ECONOMETRICS
By : W. CHIWOKO
Observed values
UNILIL BSc in Applied and Development Economics 04/30/2025 13
Graphical representation of the residuals
ŷ = 11.329 + 1.0616x
unbiased estimators.
Assumption 1: The regression model
is linear in parameters and variables
• This assumption addresses the functional form of the model.
• In statistics, linearity of a model can be expressed in two ways;
• Linearity in variables and
• Linearity in parameters
• Linearity in variables means that the conditional expectation(E) of Y is a linear function
of xi.
• That is, geometrically the regression curve in this case is a straight line y = mx i + c.
• the powers of the variables are always one. That is,
• E(Y/Xi) = β0+β1Xi is a linear function whereas, E(Y/Xi) = β0+β1Xi2 is not a linear function.
• In other words, a function y =f(x) is said to be linear in the variables if x appears with a
power or index of 1 only and is not multiplied or divided by any other variable
UNILIL BSc in Applied and Development Economics 04/30/2025 19
OLS Assumption 2: The
conditional mean should be zero.
• The expected value of the mean of the error terms of OLS
regression should be zero given the values of independent
variables.
• Mathematically, E(ε∣X)=0. This is sometimes just written
as E(ε)=0.
• In other words, the distribution of error terms has zero
mean and doesn’t depend on the independent variables X′s.
Thus, there must be no relationship between the X′s and the
error
UNILIL term.
BSc in Applied and Development Economics 04/30/2025 20
Assumption 3: OLS should be
independent(no multicollinear)
• In a simple linear regression model, there is only one independent variable and
hence, by default, this assumption will hold true.
• However, in the case of multiple linear regression models, there are more than one
independent variable.
• The OLS assumption of no multi-collinearity says that there should be no linear
relationship between the independent variables.
• For example, suppose you spend your 24 hours in a day on three things – sleeping, studying, or
playing. Now, if you run a regression with dependent variable as exam score/performance and
independent variables as time spent sleeping, time spent studying, and time spent playing, then
this assumption will not hold.
• This is because there is perfect collinearity between the three independent variables.
UNILIL BSc in Applied and Development Economics 04/30/2025 21
• Time spent sleeping = 24 – Time spent studying – Time spent playing.
• In such a situation, it is better to drop one of the three independent
variables from the linear regression model.
• If the relationship (correlation) between independent variables is
strong (but not exactly perfect), it still causes problems in OLS
estimators.
• Hence, this OLS assumption says that you should select independent variables
that are not correlated with each other.
• An important implication of this assumption of OLS regression is
that there should be sufficient variation in the X′s. More the
variability in X′s, better are the OLS estimates in determining the
impact of X′s on Y.
UNILIL BSc in Applied and Development Economics 04/30/2025 22
OLS Assumption 4: Spherical errors: There is
homoscedasticity and no autocorrelation.
• When the patterns are in a funnel shape it suggest heteroscedasticity (no constant
variance)
• In simple terms, this OLS assumption means that the error
terms should be IID (Independent and Identically
Distributed).
• The diagram below shows the difference between Homoscedasticity
and Heteroscedasticity. The variance of errors is constant in case of
homoscedasticity while it’s not the case if errors are
heteroscedastic.
• = 0.08
• =
• Where:
• RSS = ∑( Ŷ- Ῡ = sum of squared residuals
• TSS = ∑ = total sum of squares
• ESS = ∑ (= error sum of squares
. reg Y X
.
UNILIL BSc in Applied and Development Economics 04/30/2025 42
Is the R-squared enough?
• However, R2 is less useful in measuring the goodness of fit
of a multiple regression model. This is because it increases
each time you add new independent variables, even if the
variation explained by them may not be statistically
significant.
• An overfitted model contains deceptively high multiple
R2 values thus have a decreased ability to make precise
predictions.
• So it is appropriate to use the adjusted R-squared
UNILIL BSc in Applied and Development Economics 04/30/2025 43
Adjusted R2