Econometrics Chap - 2
Econometrics Chap - 2
Linear Regression
2. Linear Regression 1
What is the chapter about?
Model assumptions
Goodness of fit
Hypothesis testing
Multicollinearity
2 Linear Regression 2
Reading
Obligatory reading
Additional reading:
2 Linear Regression 3
OLS problem
{yi , xik }, i = 1, . . . , N, k = 2, . . . , K.
Then we can write the approximation error for the ith observation as
ei = yi − x′i β̃.
The vector β̃ that minimizes this criterion is called the ordinary least
squares (OLS) vector.
To minimize S(β̃) we have to set its first derivative w.r.t. β̃ equal to zero:
∂S(β̃) XN
= −2 xi (yi − x′i β̃) = 0,
∂ β̃ i=1
!
XN
′
XN
(or) xi xi β̃ = x i yi .
i=1 i=1
PN ′
The existence of the OLS vector requires that K × K matrix i=1 x i x i is
invertible.
This in turn requires that the K x-variables are linearly independent
(no-multicollinearity assumption).
∂ 2 S(β̃) XN
=2 xi x′i ,
∂ β̃∂ β˜′ i=1
The difference between the observed and the approximated value, yi − ŷi ,
is the OLS residual ei and we can write
yi = ŷi + ei = x′i b + ei .
Let
1 x12 ··· x1K x′1
.. .. .. .. ..
X = . . . . = . ,
(N ×K)
1 xN 2 ··· xN K x′N
and
y1
.
y = .. .
(N ×1)
yN
∂S(β̃)
= −2(X ′ y − X ′ X β̃) = 0,
∂ β̃
gives the solution
y = Xb + e = ŷ + e.
ŷ = Xb = X(X ′ X)−1 X ′ y = P y,
| {z }
P
e = y − Xb = y − P y = (I − P ) y = M y.
| {z }
M
P = P ′ , M = M ′ (symmetric)
P 2 = P , M 2 = M (idempotent)
So far we have been concerned with how to get the best linear
approximation of yi by x′i β̃ for some observed sample values {yi , xi }N
i=1 .
To begin with, we restrict our attention to linear models and specify our
model as
(or) yi = x′i β + εi ,
(or) y = Xβ + ε.
yi = x′i β + εi ,
In the LRM,
yi = x′i β + εi ,
It follows that the regression line x′i β gives the conditional expectation of y
given x, i.e.
E{yi |xi } = x′i β + E{εi |xi } = x′i β.
The coefficients βk measure by how much E{yi |xi } changes given an one
unit change in xik and holding all other variables in xi constant (known as
ceteris paribus (c.p.) condition).
A prerequisite for βk to measure the causal effect of xik on yi (and not just
the correlation or the causal effect overlaid by additional correlation) is
that we can reasonably assume that E{εi |xi } = 0
to the effect that
∂E{yi |xi }
E{yi |xi } = x′i β, with = βk (if xik is continuous).
∂xik
A rule by which we compute a certain value from our sample data is called
an estimator, which gives us an estimate.
Here b is now a random variable (with specific realized values for each
sample) and we are interested in how well b approximates the true value of
the population parameters β.
How good OLS works for estimating β depends on the assumptions we are
willing to make on the behavior of εi and xi .
Here we start to consider the so-called Gauss-Markov (GM) assumptions,
under which OLS has some desirable properties. (Later we will relax some
of these assumptions.)
E{εi } = 0, i = 1, . . . , N. (A1)
·
{ε1 , . . . , εN } and {x1 , . . . , xN } are independent. (A2)
V {εi } = σ 2 , i = 1, . . . , N. (A3)
cov{εi , εj } = 0, i, j = 1, . . . , N, i ̸= j. (A4)
= β + (X ′ X)−1 X ′ E{ε} = β
(A2) (A1)
Best means that there is no other linear unbiased estimator that has a
smaller variance than b – stated mathematically,
1 XN
e′ e
s2 = 2
ei = , e = y − Xb.
N − K i=1 N −K
e = M y = M (Xβ + ϵ) = M ε.
It follows that
so that
E{e′ e}
E{s } =
2
= σ2 .
N −K
This result provides the basis of statistical tests w.r.t. β based upon b.
2 e′ e
s = , where e = M y = M ε,
N −K
follows a scaled χ2 -distribution, i.e
N −K 2
s ∼ χ 2
(N −K) .
σ2
This follows from the fact that we can rewrite this variable as
N −K 2 N − K ε′ M ε ε ′ ε
s = = M ,
σ2 σ2 N − K σ σ
so that it is a quadratic form of a vector with N ID(0, 1) variates (ε/σ).
V̂ {ŷi }
R2 = (V̂ {·} denotes the sample variance).
V̂ {yi }
P
where ȳ = (1/N ) i yi .
2.4 Goodness-of-fit 31
Interpretation of the R2
Note that 0 ≤ R2 ≤ 1.
If R2 = 0 this implies that
X
N X
N
e2i = (yi − ȳ)2 ,
i=1 i=1
In practice there is no general rule, which values for the R2 are ‘good’.
In particular, a small R2 does not automatically imply that the LRM is
incorrect or useless: It just indicates that there is a lot of heterogeneity in
y not captured by x.
2.4 Goodness-of-fit 32
The adjusted R2
P
The reason is that i e2i in the R2 can only decrease (increasing R2 ) when
adding a variable.
2.4 Goodness-of-fit 33
Hypothesis testing: t-test
This result can be used to test hypotheses about the regression coefficients.
Consider the case that we want to test
0.4
0.35
0.3
pdf of tk under H0
0.25 rejection non-rejection rejection
0.2
0.15
0.1
0.05
/2 1-
/2
0
-3 -2 -1 0 1 2 3
-t tk tN-K,1-
N-K,1- /2 /2
H0 : βk = 0 against H1 : βk ̸= 0,
H0 : βK−J+1 = · · · = βK = 0,
For a formal test we can exploit that under (A2) and (A5) it holds that
S0 − S1
ξ1 = ∼ χ 2
J under H0 .
σ2
But σ 2 is unknown and so ξ1 can not be used for testing.
However, we know that the scaled estimator s2 for σ 2 in the unrestricted
LRM is χ2 -distributed, i.e.
(N − K)s2 S1
ξ2 = = ∼ χ 2
N −K under H0 and H1 .
σ2 σ2
So combining this with the result that the ratio of two independent
χ2 -variables scaled by their degrees of freedom is F -distributed, we find
that
ξ1 /J (S0 − S1 )/J
F = = ∼ FJ,N −K under H0 .
ξ2 /(N − K) S1 /(N − K)
H0 : β2 = · · · = βK = 0.
Points to note:
β 2 + · · · + βK = 1 and β 2 = β3 .
For testing
H0 : Rβ = q, against H1 : Rβ ̸= q,
we can
and then use an F -test which compares the sum of squared residuals
of the restricted model (S0 ) and sum of squared residuals of the
unrestricted model (S1 ) (see above).
yi = β1 + β2 x2i + β3 x3i + εi ,
with H0 : β2 + β3 = 1 ⇒ H0 : β 3 = 1 − β2 .
yi = β1 + β2 x2i + (1 − β2 )x3i + εi
⇒ yi − x3i = β1 + β2 (x2i − x3i ) + εi .
b ∼ N (β, σ 2 (X ′ X)−1 )
⇒ Rb − q ∼ N (Rβ − q, Rσ 2 (X ′ X)−1 R′ ).
So under H0 : Rβ − q = 0,
Rb − q ∼ N ( 0 , Rσ 2 (X ′ X)−1 R′ )
−1
⇒ ξ3 =(Rb − q)′ σ 2 R(X ′ X)−1 R′ (Rb − q) ∼ χ2J .
There are four possible (random) events that can happen when testing a
hypothesis:
Reducing the size of the test will typically also reduce its power, so that
there is a tradeoff between type I and type II errors.
2.6 Multicollinearity 51
Multicollinearity
The term
1
V IF (bk ) =
1 − Rk2
is the so-called variance inflation factor.
It tells us by how much the variance is inflated compared to the
hypothetical situation that the regressors are uncorrelated.
2.6 Multicollinearity 52
Multicollinearity
2.6 Multicollinearity 53
Multicollinearity
In the general case (Rk2 < 1 but large) there is no easy solution and it
depends on the situation.
Here we have highly inaccurate estimates meaning that our sample does
not provide enough information about the parameters.
Hence we would need to use more information either by extending the
sample (how?) or imposing a-priori restrictions on the parameters.
The latter commonly means to exclude regressors from the model (which
ones?).
2.6 Multicollinearity 54
Application: The capital asset pricing model
E{rjt } − rf = βj (E{rmt } − rf ),
where rjt is the risky return on asset j in period t, rmt is the return on the
market portfolio and rf is the riskless return (usually also depends on t
but is deterministic).
Assume rational expectations and define the unexpected returns for asset j
as
ujt = rjt − E{rjt } ⇔ E{rjt } = rjt − ujt ,
and likewise for the market portfolio.
with
εjt = ujt − βj umt .
The error can be shown to satisfy the requirements for a regression error
term and we can estimate the beta-factor by OLS.