0% found this document useful (0 votes)

102 views57 pages

Econometrics Chap - 2

The document provides an overview of linear regression and ordinary least squares (OLS) estimation. It discusses that OLS finds the linear combination of variables that best predicts the variable of interest by minimizing the sum of squared errors. It describes how the OLS estimator is obtained by solving the normal equations, and that the OLS estimator exists if the variables are not perfectly collinear. It also explains that OLS residuals are orthogonal to the predictor variables.

Uploaded by

Cris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views57 pages

Econometrics Chap - 2

Uploaded by

Cris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

H:/Lehre/Econometrics Master PO 2021/lecture slides/chap 2.

tex (October 7, 2021)

Linear Regression

2. Linear Regression 1
What is the chapter about?

Principle of ordinary least squares (OLS)

The linear regression model

Model assumptions

Small-sample properties of the OLS estimator

Goodness of ﬁt

Hypothesis testing

Multicollinearity

Illustration: Capital asset pricing model

2 Linear Regression 2
Reading

Obligatory reading

Verbeek: Sections 2.1-2.5, 2.8, 2.10.

Additional reading:

Wooldridge: Chapters 2-4, Sections 6.3, 6.4.

Greene: Chapter 2, Sections 3.1-3.3, 3.5, 4.1-4.3, 5.1-5.3.

Hayashi: Sections 1.1-1.4.

2 Linear Regression 3
OLS problem

Consider having N observations on a variable of interest, y, and a set of

other variables x2 , . . . , xK that are believed to be related to y.

The question at hand is what linear combination of a constant and

x2 , . . . , xK of the form

β̃1 + β̃2 x2 + · · · + β̃K xK

provides a good approximation to y.

Assign to each observation an index i, denoting our sample

{yi , xik }, i = 1, . . . , N, k = 2, . . . , K.

Put all x values and β̃ coeﬃcients into K × 1 vectors

xi = (1, xi2 , xi3 , . . . , xiK )′ , β̃ = (β̃1 , . . . , β̃K )′ .

2.1 Ordinary least squares as an algebraic instrument 4

OLS problem

Then we can write the approximation error for the ith observation as

ei = yi − x′i β̃.

We want this error to be on average as small as possible.

One (but not the only) possibility to achieve that is to pick β̃ so as to

minimize
XN
S(β̃) ≡ (yi − x′i β̃)2 .
i=1

The vector β̃ that minimizes this criterion is called the ordinary least
squares (OLS) vector.

2.1 Ordinary least squares as an algebraic instrument 5

OLS vector

To minimize S(β̃) we have to set its ﬁrst derivative w.r.t. β̃ equal to zero:

∂S(β̃) XN
= −2 xi (yi − x′i β̃) = 0,
∂ β̃ i=1
!
XN
′
XN
(or) xi xi β̃ = x i yi .
i=1 i=1

Those are the ﬁrst-order conditions (focs) for the OLS-minimization

problem and are known as the normal equations.
They deﬁne a system of K linear equations in K unknowns.

Solving these equations and denoting the solution as b gives

!−1 N
XN
′
X
b= xi xi x i yi (OLS vector).
i=1 i=1

2.1 Ordinary least squares as an algebraic instrument 6

Existence of the OLS vector

PN ′
The existence of the OLS vector requires that K × K matrix i=1 x i x i is
invertible.
This in turn requires that the K x-variables are linearly independent
(no-multicollinearity assumption).

The second-order conditions for the OLS-minimization problem requires

that the Hessian matrix with the second-order derivatives:

∂ 2 S(β̃) XN
=2 xi x′i ,
∂ β̃∂ β˜′ i=1

is positive deﬁnite. In this case b is indeed a minimizer of S(β̃).

This is the case as long as the no-multicollinearity assumption is satisﬁed.

2.1 Ordinary least squares as an algebraic instrument 7

Residual sum of squares

With b we can now approximate yi in terms of the xi variables as follows:

ŷi = x′i b (best linear approximation).

The diﬀerence between the observed and the approximated value, yi − ŷi ,
is the OLS residual ei and we can write

yi = ŷi + ei = x′i b + ei .

As a result, our minimized criterion function can be written as the sum of

squared residuals which is known as the residual sum of squares
X X
N N
S(β̃) = S(b) = (yi − x′i b)2 = e2i .
β̃=b i=1 i=1

2.1 Ordinary least squares as an algebraic instrument 8

Properties of the OLS residuals

The normal equations (for β̃ = b) imply that

X
N X
N X
N
xi (yi − x′i b) = xi ei = 0, i.e. xik ei = 0, k = 1, . . . , K.
i=1 i=1 i=1

This means that the residuals have the following properties:

The residual vector is orthogonal to each vector of observations on xk ,
i.e.  
e1
 .. 
(x1k , . . . , xN k )  .  = 0, k = 1, . . . , K;
eN

If xi1 = 1 for all i (constant) then

X
N
ei = 0.
i=1

2.1 Ordinary least squares as an algebraic instrument 9

OLS in matrix notation

The use of matrix notation is convenient because it simpliﬁes an otherwise

cumbersome notation with a lot of sums and indices.

Let    
1 x12 ··· x1K x′1
 .. .. .. ..   .. 
X = . . . .  =  . ,
(N ×K)
1 xN 2 ··· xN K x′N
and  
y1
 . 
y =  ..  .
(N ×1)
yN

2.1 Ordinary least squares as an algebraic instrument 10

OLS in matrix notation

The OLS objective function can then be written as

S(β̃) = (y − X β̃)′ (y − X β̃) = y ′ y − 2y ′ X β̃ + β̃ ′ X ′ X β̃.

Taking the ﬁrst derivative w.r.t. β̃ and setting it equal to zero,

∂S(β̃)
= −2(X ′ y − X ′ X β̃) = 0,
∂ β̃
gives the solution

b = (X ′ X)−1 X ′ y (OLS vector).

This allows us to decompose the observed y-vector as

y = Xb + e = ŷ + e.

2.1 Ordinary least squares as an algebraic instrument 11

Geometric interpretation of OLS

The predicted y-vector and the residual vector can be written as

ŷ = Xb = X(X ′ X)−1 X ′ y = P y,
| {z }
P

e = y − Xb = y − P y = (I − P ) y = M y.
| {z }
M

P and M are projection matrices with the following special properties:

P = P ′ , M = M ′ (symmetric)

P 2 = P , M 2 = M (idempotent)

P M = P ′ M = M ′ P = M P = 0 (orthogonal to each other).

P projects y into the column space of X, and it holds that P X = X;

M (‘residual-maker matrix’) projects y into the space which is orthogonal
to the column space of X so that ŷ ′ e = 0 and X ′ e = 0.

2.1 Ordinary least squares as an algebraic instrument 12

The linear regression model

So far we have been concerned with how to get the best linear
approximation of yi by x′i β̃ for some observed sample values {yi , xi }N
i=1 .

However, economists are typically interested in a relationship that is more

generally valid, i.e. for the whole population of interest and not just for the
incidentally observed sample {yi , xi }N
i=1 .

For this we need a statistical model, which

(1) ideally reﬂects a fundamental economic relationship between the y-

and x-variables and

(2) which is supposed to be valid for all possible y- and x-observations

(not just for the sample incidentally observed).

2.2 The linear regression model 13

The linear regression model

To begin with, we restrict our attention to linear models and specify our
model as

yi = β1 + β2 xi2 + · · · + βK xiK + εi , i = 1, ..., N,

(or) yi = x′i β + εi ,

(or) y = Xβ + ε.

yi is the dependent variable and the variables in xi are the regressors

or explanatory variables or covariates,

β is the vector of regression coeﬃcients,

εi is the error term or disturbance term.

It captures the inﬂuences of all variables that are relevant for yi but
not explicitly included in the model.

2.2 The linear regression model 14

The linear regression model

The linear regression model (LRM)

yi = x′i β + εi ,

is supposed to hold for the whole (y, x)-population, of which we observe a

random sample of size N .

Thus yi , εi and possibly even xi are random variables.

The elements in β are unknown population parameters that need to be

estimated based on the realization of the random sample.

2.2 The linear regression model 15

Deterministic versus random regressors

In the LRM,
yi = x′i β + εi ,

xi may be deterministic (ﬁxed in repeated samples).

Then the only source of randomness is εi leading to diﬀerent yi -values
across repeated samples.
This is the case for experimental data in laboratory settings where the
xi -values are determined by the researcher.

In empirical economic studies, we typically rely on observational data.

In this case, we commonly consider yi as well as xi to be random, so both
yi and xi have diﬀerent values in repeated samples.
In this context we need certain assumptions about the joint distribution of
(xi , εi ) in order to give the LRM a meaning.

2.2 The linear regression model 16

Exogeneity assumption

In this context, a common assumption is

E{εi |xi } = 0 (cond. mean independence),

which is known as the assumption that the x variables are exogenous.

Then it holds that E{E{εi |xi }} = E{εi } = 0 and cov{xi , εi } = 0.

It follows that the regression line x′i β gives the conditional expectation of y
given x, i.e.
E{yi |xi } = x′i β + E{εi |xi } = x′i β.

The coeﬃcients βk measure by how much E{yi |xi } changes given an one
unit change in xik and holding all other variables in xi constant (known as
ceteris paribus (c.p.) condition).

2.2 The linear regression model 17

Exogeneity assumption

Often we would like to interpret the model as describing a causal

relationship between yi and one or more regressors in xi .

A prerequisite for βk to measure the causal effect of xik on yi (and not just
the correlation or the causal effect overlaid by additional correlation) is
that we can reasonably assume that E{εi |xi } = 0
to the effect that
∂E{yi |xi }
E{yi |xi } = x′i β, with = βk (if xik is continuous).
∂xik

2.2 The linear regression model 18

OLS estimator

After we have speciﬁed a LRM and have a sample of N observations, we

would like to make some statements about the values of the unknown β.

A rule by which we compute a certain value from our sample data is called
an estimator, which gives us an estimate.

We already saw the OLS estimator,

!−1 N
XN
′
X
b = xi xi xi yi = (X ′ X)−1 X ′ y,
i=1 i=1

which is the most frequently used estimator for β.

Here b is now a random variable (with speciﬁc realized values for each
sample) and we are interested in how well b approximates the true value of
the population parameters β.

2.3 The OLS estimator and its small sample properties 19

Gauss-Markov assumptions

How good OLS works for estimating β depends on the assumptions we are
willing to make on the behavior of εi and xi .
Here we start to consider the so-called Gauss-Markov (GM) assumptions,
under which OLS has some desirable properties. (Later we will relax some
of these assumptions.)

The GM assumptions are:

· The regression line holds on average:

E{εi } = 0, i = 1, . . . , N. (A1)

·
{ε1 , . . . , εN } and {x1 , . . . , xN } are independent. (A2)

2.3 The OLS estimator and its small sample properties 20

Gauss-Markov assumptions

· All the errors have the same variance, known as homoscedasticity:

V {εi } = σ 2 , i = 1, . . . , N. (A3)

· The errors are mutually uncorrelated, thereby excluding

autocorrelation:

cov{εi , εj } = 0, i, j = 1, . . . , N, i ̸= j. (A4)

The GM assumptions imply that

(A2) (A1)
E{ε|X} = E{ε} = 0,
(A2) (A3,A4)
V {ε|X} = V {ε} = σ 2 IN .

Hence, the regressors in X do not convey any information about the

expected values of the εi ’s and their (co)variances.

2.3 The OLS estimator and its small sample properties 21

Gauss-Markov assumptions

To derive the properties of the OLS estimator under the GM assumptions,

for the sake of simplicity we assume that X is a deterministic
non-stochastic matrix.
This implies that we can take X as given without aﬀecting the εi ’s so that
(A2) is satisﬁed.
Alternatively, one could derive all properties conditional upon X.

2.3 The OLS estimator and its small sample properties 22

The OLS estimator under the GM assumptions

The OLS estimator b is unbiased for β, i.e. E{b} = β.

This means that in repeated sampling the OLS estimates are on average
equal to the true value β.

This can be shown as follows:

E{b} = E{(X ′ X)−1 X ′ y} = E{β + (X ′ X)−1 X ′ ε}

= β + E{(X ′ X)−1 X ′ ε}

= β + (X ′ X)−1 X ′ E{ε} = β
(A2) (A1)

Note that we only used (A1) and (A2).

Hence, unbiasedness even holds under heteroscedastic and correlated errors!

2.3 The OLS estimator and its small sample properties 23

The OLS estimator under the GM assumptions

The covariance matrix of b is:

V {b} = E{(b − β)(b − β)′ } = E{(X ′ X)−1 X ′ εε′ X(X ′ X)−1 }

= (X ′ X)−1 X ′ (σ 2 IN )X(X ′ X)−1 = σ 2 (X ′ X)−1 .

The variance of each element bj of b can be found on the main diagonal

and the covariance of bj and bk is the corresponding oﬀ-diagonal element.
The smaller the variances, the smaller the probability that the estimate b
is far from β.

This covariance matrix is unknown since it depends on the unknown σ 2 .

2.3 The OLS estimator and its small sample properties 24

The OLS estimator under the GM assumptions

Gauss-Markov Theorem: Under the assumptions (A1)-(A4), the OLS estima-

tor b is the best linear unbiased estimator (BLUE) for β.

Linear means that we consider estimators of the form b̃ = Ay with a

K × N matrix A.

Unbiasedness means that E{Ay} = β.

Best means that there is no other linear unbiased estimator that has a
smaller variance than b – stated mathematically,

V {d′ b̃} ≥ V {d′ b} for any vector d,

or, equivalently, V {b̃} − V {b} is a positive semi-deﬁnite matrix.

b is thus the most accurate in the class of linear unbiased estimators.

2.3 The OLS estimator and its small sample properties 25

Estimating the variance of b

In order to estimate V {b} = σ 2 (X ′ X)−1 we need to estimate the error

variance E{ε2i } = σ 2 .

Under the GM assumptions an unbiased estimator for σ 2 is the sample

variance of the OLS residuals ei = yi − x′i b (which have a sample mean
equal to 0):

1 XN
e′ e
s2 = 2
ei = , e = y − Xb.
N − K i=1 N −K

2.3 The OLS estimator and its small sample properties 26

Estimating the variance of b

In order to show that E{S 2 } = σ 2 we use the residual maker matrix,

M = (I − X(X ′ X)−1 X ′ ), with M M = M, M X = 0.

With M we can write the OLS residuals as

e = M y = M (Xβ + ϵ) = M ε.

It follows that

E{e′ e} = E{ε′ M ε} = E{tr(ε′ M ε)}

= E{tr(M εε′ )} = tr(M E{εε′ })
= tr(σ 2 M ) = σ 2 tr(I − X(X ′ X)−1 X ′ ) = σ 2 (N − K),

so that
E{e′ e}
E{s } =
2
= σ2 .
N −K

2.3 The OLS estimator and its small sample properties 27

Estimating the variance of b

With s2 the variance of b, i.e. V {b} = σ 2 (X ′ X)−1 , can be estimated by

!−1
′ −1
XN
V̂ {b} = s (X X) = s
2 2
xi x′i .
i=1

The estimated variances of each bk are on the main diagonal of V̂ {b}:

V̂ {bk } = s2 ckk , ckk : kth diagonal element of (X ′ X)−1 .

The square root of V̂ {bk } is the standard error of bk :

√
se(bk ) = s ckk .

2.3 The OLS estimator and its small sample properties 28

Distribution of b

If we assume a law of distribution for ε we get a corresponding law of

distribution for b = β + (X ′ X)−1 X ′ ε.

The most common distributional assumption is:

ε ∼ N (0, σ 2 IN ), i.e. εi ∼ N ID(0, σ 2 ), (A5)

where N ID stands for ‘normally and independently distributed’.

From (A5) together with (A2) it follows that

b ∼ N (β, σ 2 (X ′ X)−1 ), and bk ∼ N (βk , σ 2 ckk ), k = 1, . . . , K.

This result provides the basis of statistical tests w.r.t. β based upon b.

2.3 The OLS estimator and its small sample properties 29

Distribution of s2

Under (A2) and (A5) the unbiased estimator for σ 2 ,

2 e′ e
s = , where e = M y = M ε,
N −K
follows a scaled χ2 -distribution, i.e
N −K 2
s ∼ χ 2
(N −K) .
σ2

This follows from the fact that we can rewrite this variable as
N −K 2 N − K ε′ M ε ε ′ ε
s = = M ,
σ2 σ2 N − K σ σ
so that it is a quadratic form of a vector with N ID(0, 1) variates (ε/σ).

Finally it can be shown that b and s2 are independent random variables.

2.3 The OLS estimator and its small sample properties 30

How well does the estimated regression line ﬁt the data?

A goodness-of-ﬁt measure is the proportion of variation in the yi ’s that is

explained by the regression line represented by the ŷi ’s.
This proportion deﬁnes the R2 statistic,

V̂ {ŷi }
R2 = (V̂ {·} denotes the sample variance).
V̂ {yi }

Since yi = ŷi + ei , it holds in LRMs with intercept that

V̂ {yi } = V̂ {ŷi } + V̂ {ei } (as ŷ and e are orthogonal).

Hence, we can write the R2 as

PN
V̂ {yi } − V̂ {ei } 1/(N − 1) i=1 e2i
R = 2
=1− PN ,
V̂ {yi } 1/(N − 1) i=1 (yi − ȳ) 2

P
where ȳ = (1/N ) i yi .

2.4 Goodness-of-ﬁt 31
Interpretation of the R2

Note that 0 ≤ R2 ≤ 1.
If R2 = 0 this implies that
X
N X
N
e2i = (yi − ȳ)2 ,
i=1 i=1

so the LRM explains none of the variation in the yi ’s.

If R2 = 1 then
X
N
e2i = 0 ⇒ ei = 0 for all i,
i=1

and so there is a prefect ﬁt, i.e. yi = x′i b for all i.

In practice there is no general rule, which values for the R2 are ‘good’.
In particular, a small R2 does not automatically imply that the LRM is
incorrect or useless: It just indicates that there is a lot of heterogeneity in
y not captured by x.

2.4 Goodness-of-ﬁt 32
The adjusted R2

A drawback of the R2 is that when adding a variable to the LRM the R2

will never decrease, even if this variable is unrelated to y.

P
The reason is that i e2i in the R2 can only decrease (increasing R2 ) when
adding a variable.

This issue is addressed by the adjusted R2 :

PN
1/(N − K) i=1 e2i
R̄ = 1 −
2
PN ,
1/(N − 1) i=1 (yi − ȳ) 2

which involves a ‘degrees-of-freedom correction’ when estimating the

variance of ei .
So adding a new variable may increase that variance estimate and
therefore decrease R̄2 .

2.4 Goodness-of-ﬁt 33
Hypothesis testing: t-test

Under the GM condition (A2) and normality (A5) we have

b k − βk
z= √ ∼ N (0, 1), (ckk : diag. element of (X ′ X)−1 ).
σ ckk

Furthermore, it holds for the estimator s2 of σ 2 that

(N − K)s2 /σ 2 ∼ χ2N −K , and b⊥

⊥ s2 .

Consequently, the random variable

b k − βk b k − βk
tk = √ =
s ckk se(bk )
is the ratio of a standard normal variable and the root of an independent
χ2 -variable divided by its degrees of freedom.
So tk follows a t-distribution with N − K degrees of freedom (tk ∼ tN −K ).

2.5 Hypothesis testing: t and F -tests 34

Hypothesis testing: t-test

This result can be used to test hypotheses about the regression coeﬃcients.
Consider the case that we want to test

H0 : βk = βk0 (null hypothesis),

against H1 : βk ̸= βk0 (alternative hypothesis).

For this we would use the t-statistic

bk − βk0
tk = ∼ tN −K under H0 .
se(bk )

The decision rule is: Reject H0 in favor of H1 at signiﬁcance level α, if

|tk | > tN −K;α/2 (tN −K;α/2 : critical value),

where tN −K;α/2 is the (1 − α/2)-quantile of a tN −K -distribution.

2.5 Hypothesis testing: t and F -tests 35

Hypothesis testing: t-test

0.4

0.35

0.3

pdf of tk under H0
0.25 rejection non-rejection rejection

0.2

0.15

0.1

0.05
/2 1-
/2
0
-3 -2 -1 0 1 2 3
-t tk tN-K,1-
N-K,1- /2 /2

pdf of the t-statistic tk under H0 : βk = βk0 ; Non-rejection and rejection

regions are those for the two-sided test with H1 : βk ̸= βk0 .

The critical value for a given level α is implicitly deﬁned by

P {|tk | > tN −K;α/2 } = α.

A typically selected value for α is 5%.

2.5 Hypothesis testing: t and F -tests 36

Hypothesis testing: t-test

As N − K → ∞, we have tN −K → N (0, 1).

Thus, for a large N − K, H0 is rejected at the 5% level, if |tk | > 1.96.

A common hypothesis to test is

H0 : βk = 0 against H1 : βk ̸= 0,

for which the t-statistic (then also called t-ratio) is simply

bk
tk = .
se(bk )

Regression software typically report this t-statistic, because it tests the

hypothesis whether xik has a statistically signiﬁcant impact on yi .

2.5 Hypothesis testing: t and F -tests 37

Testing the joint signiﬁcance of regression coeﬃcients

Often it is important to test whether a subset (or all) of the regression

coeﬃcients are zero.
If this test is for the last J coeﬃcients, H0 becomes

H0 : βK−J+1 = · · · = βK = 0,

and H1 is that at least one of these coeﬃcients is not equal to zero.

2.5 Hypothesis testing: t and F -tests 38

Testing the joint signiﬁcance of regression coeﬃcients

A simple test approach compares the ﬁtted restricted LRM under H0 ,

yi = b1 + b2 x2i + · · · + bK−J xK−Ji + ẽi ,

with the ﬁtted unrestricted full LRM,

yi = b1 + b2 x2i + · · · + bK−J xK−Ji

+ bK−J+1 xK−J+1i + · · · + bK xKi + ei ,

by using their respective sum of squared OLS residuals,

PN 2 PN 2
S0 = i=1 ẽi , and S1 = i=1 ei .

Under H0 : βK−J+1 = · · · = βK = 0, we expect that S0 is not much larger

than S1 .
A point to note: S0 ≥ S1 .

2.5 Hypothesis testing: t and F -tests 39

Testing the joint signiﬁcance of regression coeﬃcients

For a formal test we can exploit that under (A2) and (A5) it holds that
S0 − S1
ξ1 = ∼ χ 2
J under H0 .
σ2
But σ 2 is unknown and so ξ1 can not be used for testing.
However, we know that the scaled estimator s2 for σ 2 in the unrestricted
LRM is χ2 -distributed, i.e.
(N − K)s2 S1
ξ2 = = ∼ χ 2
N −K under H0 and H1 .
σ2 σ2

So combining this with the result that the ratio of two independent
χ2 -variables scaled by their degrees of freedom is F -distributed, we ﬁnd
that
ξ1 /J (S0 − S1 )/J
F = = ∼ FJ,N −K under H0 .
ξ2 /(N − K) S1 /(N − K)

2.5 Hypothesis testing: t and F -tests 40

Testing the joint signiﬁcance of regression coeﬃcients

The decision rule is: Reject H0 at a signiﬁcance level α, if

J J
F > FN −K;α (FN −K;α : critical value),

−K;α is the (1 − α)-quantile of an FJ,N −K -distribution.

J
where FN

A common use of this F -test is to test the joint signiﬁcance of all

regressors except the intercept, i.e.

H0 : β2 = · · · = βK = 0.

Regression software typically report the F -statistic for this test.

2.5 Hypothesis testing: t and F -tests 41

Testing the joint signiﬁcance of regression coeﬃcients

Points to note:

The signiﬁcance of the regressors can be tested individually by a

sequence of corresponding t-tests.
However, this says nothing about their joint signiﬁcance.

The conclusions from an F -test of joint signiﬁcance and t-tests of

individual signiﬁcance can diﬀer since the explanatory power of the
regressors overlaps.

2.5 Hypothesis testing: t and F -tests 42

The general F -test

The F -test can be generalized to jointly test J linear restrictions on β of

the form
H0 : Rβ = q against H1 : Rβ ̸= q,
where R is a J × K full-rank matrix and q is a J × 1 vector.

For example suppose we wish to test

β 2 + · · · + βK = 1 and β 2 = β3 .

In this case we have

! !
0 1 1 ··· ··· 1 1
R= , q= .
0 1 −1 0 ··· 0 0

2.5 Hypothesis testing: t and F -tests 43

The general F -test

For testing
H0 : Rβ = q, against H1 : Rβ ̸= q,

we can

ﬁt the LRM, both with and without imposing the restrictions to be

tested,

and then use an F -test which compares the sum of squared residuals
of the restricted model (S0 ) and sum of squared residuals of the
unrestricted model (S1 ) (see above).

2.5 Hypothesis testing: t and F -tests 44

The general F -test

An example for imposing linear restrictions:

Consider the LRM

yi = β1 + β2 x2i + β3 x3i + εi ,

with H0 : β2 + β3 = 1 ⇒ H0 : β 3 = 1 − β2 .

Then the restricted model is

yi = β1 + β2 x2i + (1 − β2 )x3i + εi
⇒ yi − x3i = β1 + β2 (x2i − x3i ) + εi .

There is an alternative formulation of this F -test based on the sum of

squared residuals that does not require the estimation of the restricted
model.

2.5 Hypothesis testing: t and F -tests 45

The general F -test

This alternative exploits that under (A2) and (A5)

b ∼ N (β, σ 2 (X ′ X)−1 )
⇒ Rb − q ∼ N (Rβ − q, Rσ 2 (X ′ X)−1 R′ ).

So under H0 : Rβ − q = 0,

Rb − q ∼ N ( 0 , Rσ 2 (X ′ X)−1 R′ )
−1
⇒ ξ3 =(Rb − q)′ σ 2 R(X ′ X)−1 R′ (Rb − q) ∼ χ2J .

But σ 2 is unknown and so ξ3 can not be used for testing.

However, we know that under H0 and H1
(N − K)s2
ξ2 = ∼ χ 2
N −K .
σ2

2.5 Hypothesis testing: t and F -tests 46

The general F -test

So we ﬁnd that under H0

′
′

−1 ′ −1
ξ3 /J (Rb − q) R(X X) R (Rb − q)
F = = ∼ FJ,N −K .
ξ2 /(N − k) Js2

This F -statistic is algebraically identical to the one which is formulated in

terms of the sum of squared residuals of the restricted model (S0 ) and sum
of squared residuals of the unrestricted model (S1 )!

2.5 Hypothesis testing: t and F -tests 47

Size and Power of a test

There are four possible (random) events that can happen when testing a
hypothesis:

1. Not rejecting H0 when it is true : correct decision

2. Rejecting H0 when it is not true : correct decision

3. Rejecting H0 when it is true : Type I error

4. Not rejecting H0 when it is not true : Type II error

Ideally we want a test to make no type I and no type II errors.

But since we are dealing with randomness this is impossible.

2.5 Hypothesis testing: t and F -tests 48

Size and Power of a test

However, by selecting the signiﬁcance level α we can control the

probability of type-I and II errors.
So for a test at signiﬁcance level α, we have that P (type I error) = α.
The signiﬁcance level is also called as the (nominal) size of a test.

The probability that a test correctly rejects H0 when H1 is true is the

power of a test which is given by 1 − P (type II error).
It tells us how powerful a test is at detecting deviations from H0 and
depends upon the true parameter value.

Reducing the size of the test will typically also reduce its power, so that
there is a tradeoﬀ between type I and type II errors.

2.5 Hypothesis testing: t and F -tests 49

p-values

A useful number when doing hypothesis tests is the p-value.

It is the smallest (marginal) signiﬁcance level at which we can reject H0 .

Conceptually, the p-value is the probability of observing a value of the test

statistic as extreme or more extreme than the one observed, assuming that
H0 holds.

Thus in practice, a p-value smaller than α means that we reject H0 at the

α signiﬁcance level.

2.5 Hypothesis testing: t and F -tests 50

Multicollinearity

Regressors in LRMs are typically correlated.

E.g. in a LRM for workers’ earnings using their age and experience as
regressors we have regressors which are highly correlated.

If regressors are highly correlated, X ′ X can be near singular so that:

X ′ X may be numerically diﬃcult to invert;

Estimates for β tend to be inaccurate with large se’s so that regressors

may (falsely) appear to be insigniﬁcant.

Essentially, the empirical identiﬁcation of the individual eﬀects of

highly correlated regressors on y is diﬃcult.

This problem is known as multicollinearity.

2.6 Multicollinearity 51
Multicollinearity

To illustrate the problem, consider the variance of OLS estimator bk in a

LRM
" #−1
1 1 X
2 N
σ
V {bk } = (x ik − x̄ k ) 2
,
1 − Rk2 N N i=1

where Rk2 is the R2 obtained from regressing xk on the remaining

regressors and an intercept.
So the larger Rk2 , the larger the variance.

The term
1
V IF (bk ) =
1 − Rk2
is the so-called variance inﬂation factor.
It tells us by how much the variance is inﬂated compared to the
hypothetical situation that the regressors are uncorrelated.

2.6 Multicollinearity 52
Multicollinearity

In the extreme case (Rk2 → 1) when there is a perfect linear relationship

between regressors, we have exact multicollinearity.
In this case a regressor is a linear combination of one or more other
regressors (price in cents and price in Euro) and the OLS estimator is not
longer uniquely deﬁned.
This problem can easily solved by omitting redundant regressors.

2.6 Multicollinearity 53
Multicollinearity

In the general case (Rk2 < 1 but large) there is no easy solution and it
depends on the situation.

Here we have highly inaccurate estimates meaning that our sample does
not provide enough information about the parameters.
Hence we would need to use more information either by extending the
sample (how?) or imposing a-priori restrictions on the parameters.
The latter commonly means to exclude regressors from the model (which
ones?).

2.6 Multicollinearity 54
Application: The capital asset pricing model

The CAPM states that

E{rjt } − rf = βj (E{rmt } − rf ),

where rjt is the risky return on asset j in period t, rmt is the return on the
market portfolio and rf is the riskless return (usually also depends on t
but is deterministic).

The so-called beta-factor

cov{rjt , rmt }
βj =
V {rmt }
measures the systematic (or market) risk of asset j, i.e. how strong
ﬂuctuations in the returns are related to market movements.

The model tells us that a larger βj leads to a higher risk premium.

2.7 Illustration: Capital asset pricing model 55

Application: The capital asset pricing model

Assume rational expectations and deﬁne the unexpected returns for asset j
as
ujt = rjt − E{rjt } ⇔ E{rjt } = rjt − ujt ,
and likewise for the market portfolio.

Then we can rewrite the CAPM as

rjt − rf = βj (rmt − rf ) + εjt ,

with
εjt = ujt − βj umt .

The error can be shown to satisfy the requirements for a regression error
term and we can estimate the beta-factor by OLS.

2.7 Illustration: Capital asset pricing model 56

Application: The capital asset pricing model

For the empirical analysis of the CAPM with gretl

· use the data set from the 2nd edition of Verbeek’s textbook (available
through gretl’s website in the ﬁle capm2.gdt).
It contains monthly returns from 1960/1 to 2002/12 (N = 516) of
three US-industry portfolios (food, consumer durables, construction
industry) and of a value-weighted US stock index (as proxy for the
market portfolio); Returns are measured in deviations from the
riskfree rate.

· Estimate the CAPM without intercept.

· Test the theoretical implication of the CAPM that there is no

intercept.

· Test for an anomaly, the January eﬀect.

2.7 Illustration: Capital asset pricing model 57

Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Philo - Mod2 Q1 Method of Philospphizing v3
No ratings yet
Philo - Mod2 Q1 Method of Philospphizing v3
27 pages
Ols 2
No ratings yet
Ols 2
19 pages
Ordinary Least Squares: Linear Model
No ratings yet
Ordinary Least Squares: Linear Model
13 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Introduction To Econometrics (ET2013) : Teresa Randazzo
No ratings yet
Introduction To Econometrics (ET2013) : Teresa Randazzo
30 pages
As of Sep 16, 2020: Seppo Pynn Onen Econometrics I
No ratings yet
As of Sep 16, 2020: Seppo Pynn Onen Econometrics I
52 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Lec3 2019 PDF
No ratings yet
Lec3 2019 PDF
43 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
統計摘要
No ratings yet
統計摘要
12 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Ordinary Least Squares With A Single Independent Variable
No ratings yet
Ordinary Least Squares With A Single Independent Variable
6 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
Topic 3b Simple Linear Regression
No ratings yet
Topic 3b Simple Linear Regression
17 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Eco 3
No ratings yet
Eco 3
68 pages
Lecture 4
No ratings yet
Lecture 4
18 pages
Hayashi 1 13
No ratings yet
Hayashi 1 13
13 pages
Module 4
No ratings yet
Module 4
36 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
Lecture 6. Linear Regression
No ratings yet
Lecture 6. Linear Regression
12 pages
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
No ratings yet
Regression: Introduction: Basic Idea: Use Data To Identify Among Variables and Use These Relationships To Make
23 pages
Econometrics: Two Variable Regression: The Problem of Estimation
No ratings yet
Econometrics: Two Variable Regression: The Problem of Estimation
28 pages
Logic Final Model Exam-1
100% (1)
Logic Final Model Exam-1
13 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
Multivariate Regression
No ratings yet
Multivariate Regression
20 pages
Application of Ordinary Least Square Method in Nonlinear
No ratings yet
Application of Ordinary Least Square Method in Nonlinear
4 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Multivariate Analysis
100% (2)
Multivariate Analysis
11 pages
Session 2 - 25.10.23
No ratings yet
Session 2 - 25.10.23
20 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Econometrics II. Lecture Notes 1
No ratings yet
Econometrics II. Lecture Notes 1
17 pages
An Ideo-Stylistic Analysis of Bacon's Essays
100% (2)
An Ideo-Stylistic Analysis of Bacon's Essays
20 pages
Research Methodology in MercedesBenz
100% (1)
Research Methodology in MercedesBenz
5 pages
Pertemuan 3
No ratings yet
Pertemuan 3
23 pages
Econometric S
No ratings yet
Econometric S
11 pages
Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
No ratings yet
Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
35 pages
Ec 2
No ratings yet
Ec 2
12 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
37 pages
Lecture 4 & 5 - Chapter 5 - Forecasting
No ratings yet
Lecture 4 & 5 - Chapter 5 - Forecasting
50 pages
MLRM
No ratings yet
MLRM
22 pages
Lec Topic3
No ratings yet
Lec Topic3
51 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
Persuasion and Critical Thinking
No ratings yet
Persuasion and Critical Thinking
10 pages
Artificial Intelligence and Neural Network
No ratings yet
Artificial Intelligence and Neural Network
20 pages
(Tesfa G. Gebremedhin, Luther G. Tweeten) Research
No ratings yet
(Tesfa G. Gebremedhin, Luther G. Tweeten) Research
184 pages
FormulaSheet Stat 124
No ratings yet
FormulaSheet Stat 124
2 pages
Hypothesis Testing - For Variance
No ratings yet
Hypothesis Testing - For Variance
5 pages
Lab 6 Quantitative Method in Forestry FHM 3112 Faculty of Forestry Universiti Putra Malaysia
No ratings yet
Lab 6 Quantitative Method in Forestry FHM 3112 Faculty of Forestry Universiti Putra Malaysia
2 pages
Kunci Soal Latihan
No ratings yet
Kunci Soal Latihan
7 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Regression ANOVA
No ratings yet
Regression ANOVA
42 pages
ch11 Inferences About Population Variance
No ratings yet
ch11 Inferences About Population Variance
21 pages
Z-Test For 2 Varables (Petrol and Diesel) in Dubai ($/BBL)
No ratings yet
Z-Test For 2 Varables (Petrol and Diesel) in Dubai ($/BBL)
5 pages
L4 MLR With 2 Regressors
No ratings yet
L4 MLR With 2 Regressors
19 pages
Code Phương Pháp Nghiên C U
No ratings yet
Code Phương Pháp Nghiên C U
6 pages
Quiz 5 Chap 6
No ratings yet
Quiz 5 Chap 6
5 pages
4 Methods of Philosophy
No ratings yet
4 Methods of Philosophy
7 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
L1 The SLR Model
No ratings yet
L1 The SLR Model
11 pages
Admin, Document
No ratings yet
Admin, Document
18 pages
Revised AFF Test Material by Saboor and Sagheer 1.3
No ratings yet
Revised AFF Test Material by Saboor and Sagheer 1.3
78 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
MAMW100 Module 5 1st Semester AY 2024 2025
No ratings yet
MAMW100 Module 5 1st Semester AY 2024 2025
5 pages
Expedition Agroparks Research by Design Into Sustainable Development and Agriculture in The Network Society Peter J. A. M. Smeets (Auth.)
100% (12)
Expedition Agroparks Research by Design Into Sustainable Development and Agriculture in The Network Society Peter J. A. M. Smeets (Auth.)
84 pages
Chapter 3 Econometrics Edited
No ratings yet
Chapter 3 Econometrics Edited
48 pages
Philosophy of Science Perspectives From Scientists 2nd Edition Song 2024 Scribd Download
100% (7)
Philosophy of Science Perspectives From Scientists 2nd Edition Song 2024 Scribd Download
32 pages
The Power of Critical Thinking Effective Reasoning About Ordinary and Extraordinary Claims 5th Ed Fifth Edition Vaughn Instant Download
No ratings yet
The Power of Critical Thinking Effective Reasoning About Ordinary and Extraordinary Claims 5th Ed Fifth Edition Vaughn Instant Download
84 pages
Econometrics Notes Heidelberg
No ratings yet
Econometrics Notes Heidelberg
62 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
4-Econometrics-Linear Regression
No ratings yet
4-Econometrics-Linear Regression
12 pages
Econometric S
No ratings yet
Econometric S
8 pages