0% found this document useful (0 votes)
33 views23 pages

(2021) EC6041 Lecture 4 Inference

Uploaded by

G.Edward Gar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views23 pages

(2021) EC6041 Lecture 4 Inference

Uploaded by

G.Edward Gar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Hypothesis Testing for LRM

Ratjomose P. Machema
[email protected]

Department of Economics
National University of Lesotho (NUL)

EC6041:
Econometric Theory and Applications

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 1 / 23


Table of contents

1 Introduction

2 Testing Linear Restrictions


A Simple t-Test
Linear Restriction
Wald type tests
LM type tests
Likelihood ratio tests

3 Non-linear transformations of the estimators:

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 2 / 23


Introduction

We begin the analysis with the regression model as a statement of a


proposition,
y = Xβ + ε.
thus the model is a general statement and a hypothesis is a
proposition that narrows that statement.
We define the null hypothesis as the statement that narrows
the model and the alternative hypothesis as the broader one.
Our primary goal is to develop a systematic method for testing
restrictions, which allow us to distinguish between nested models.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 3 / 23


Introduction

The formal procedure of hypothesis testing is that we formulate a


“null hypothesis” H0 about the parameter vector θ as well as an
“alternative hypothesis” H1 .
On the assumption that H0 is true we can derive the sampling
distribution (or asymptotic distribution) of a given estimator θ̂.
The procedure itself is a rule, stated in terms of the data, that
dictates whether the null hypothesis should be rejected or not.
The decision rule might state that the hypothesis should be
rejected if a sample estimate of that parameter is too far away
from that value (where “far” remains to be defined).
If the observed data (i.e., the test statistic) fall in the rejection
region (or the critical region), then the null hypothesis is
rejected; if they fall in the acceptance region, then it is not.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 4 / 23


Introduction

Since the sample is random, the test statistic, however defined, is


also random. The same test procedure can lead to different
conclusions in different samples.
We can summarise the possible outcomes of the test in the form
of the following table:
State of the “world”
H0 is true H1 is true
Test decision Accept H0 correct Type II error
Reject H0 Type I error Correct

The probability of a type I error is the size of the test. This is


conventionally denoted α and is also called the significance
level.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 5 / 23


Introduction
There are broadly three approaches to testing:
Wald tests: We can base the test on the unrestricted model and
investigate how different the unrestricted estimates are to the
values given by the null hypothesis. Typical examples of
Wald-like tests are t-tests and F-tests run on unrestricted
regressions.
Likelihood ratio principle : We can base the test on how much
the “fit” of the regression changes from the unrestricted to the
restricted model. The important point is that we have to
estimate both the restricted and the unrestricted model and
compare their log likelihoods or residual sum of squares.
Lagrange multiplier (LM) tests: We can estimate the restricted
model and investigate whether the restrictions appear to be
binding, i.e. whether we would get very different estimates if we
relaxed the restrictions.
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 6 / 23
Introduction

These three general principles have a certain symmetry which has


revolutionized the teaching of hypothesis tests and the development
of new procedures.
Essentially, the Lagrange Multiplier approach starts at the null
and asks whether movement toward the alternative would be an
improvement, while
the Wald approach starts at the alternative and considers
movement toward the null.
The Likelihood ratio method compares the two hypotheses
directly on an equal basis.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 7 / 23


The distribution of β

Since the disturbances are normally distributed

ε|X ∼ N 0, σ 2 In

(1)

−1 0
and β̂ = X0 X X y is a linear combination of ε , then
 −1 
β̂|X ∼ N β, σ 2 X0 X (2)

and the standard normal distribution for OLS estimators can be defined as

β̂j − βj0
∼ Normal(0, 1)
sd(β̂j )

The standardized random variable always has zero mean and variance one.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 8 / 23


t Distribution
If we replace the standard deviation of the estimator σ with the standard error σ̂
(that varies across samples), the appropriate probability distribution becomes t
instead of standard normal.
This can be written as

β̂j − βj0
∼ tn−k−1 = tdf (3)
se(β̂j )

The t distrbution also has a bell shape, but is more spread out than the
Normal(0, 1).
As df → ∞,
tdf → Normal(0, 1)
The difference is practically small for df > 120.
The quantity eqt 3 in is a test statistic and is computed from the estimate β̂j ,
its standard error se β̂j , and the hypothesized value βj0

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 9 / 23


Introduction

The test discussed above involves a restriction on a single coefficient.


Often, a hypothesis of economic interest implies a linear restriction
on more than one coefficient, such as β2 + β3 + · · · + βK = 1
We will represent a set of J linear testable restrictions on y = Xβ + ε
as
H0 : Rβ = c
against the alternative
H1 : Rβ 6= c
where R is a (J × k) restriction matrix with full row rank and c is a
(J × 1) vector of constants.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 10 / 23


Introduction

H0 : β1 = 0 H 0 : β2 = β3 = · · · = βk = 0
 
R= 1 0 ··· 0 0 1 0 ··· 0
 
1×k
0 0 1 ··· 0
β0 =
   
β1 β2 ··· βk 1×k R=

.. .. .. ..

.. 
c=0
 . . . . . 
0 0 0 ··· 1 J×k
β0 =
 
H0 : β2 + β3 = 1 β1 β2 ··· βk 1×k
 
  c= 0 0 ··· 0 1×J
R= 0 1 1 0 ··· 0 1×k
0
 
β = β1 β2 ··· βk 1×k H0 : β2 + β3 + · · · + βk = 1 and β2 = β3
 
c= 0 0 ··· 0 1×J  
0 1 1 ··· ··· 1
R=
0 1 −1 ··· ··· 0 1×k
β0 =
 
β1 β2 ··· βk 1×k
 
1
c=
0 J×1

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 11 / 23


A Wald test
The operating principle of the Wald test procedure is to fit the regression
without the restrictions, and then assess whether the results appear, within
sampling variability, to agree with the hypothesis.
What is the sampling distribution of Rβ̂?
 
E Rβ̂ = Rβ
and then

  
  0   
Var Rβ̂ = E R β − β̂ β − β̂ R0 = RVar β̂ R0
−1
= Rσ 2 X0 X R0

therefore Rβ̂ is normally distributed as follows


 −1 0 
Rβ̂ ∼ N Rβ, σ 2 R X0 X R

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 12 / 23


A Wald test
If H0 is true and assuming Rβ̂ − c = m, then
h i
E [m|X] = RE β̂|X − c = Rβ − c = 0

and
h i h i
−1
Var [m|X] = Var Rβ̂ − c|X = RVar β̂|X R0 = Rσ 2 (X0 X) R0

therefore  
−1
Rβ̂ − c ∼ N 0, Rσ 2 (X0 X) R0
the Wald statistic will be distributed

 0 h i−1  
2 0 −1 0
W = Rβ̂ − c Rσ (X X) R Rβ̂ − c (4)
∼ χ2 [2]
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 13 / 23
F test
The only unknown quantity in eqtn 4 is σ 2 , but we know that
e0 e
σ̂ 2 = n−k so in large samples we could base our Wald statistic on

 0 h i−1  
2 0 −1 0
Ŵ = Rβ̂ − c Rσ̂ (X X) R Rβ̂ − c

If we know that the errors are normally distributed, then


n−k 2
2
σ̂ ∼ χ2 (n − k) (5)
σ
Since β̂ and σ̂ 2 are independent and Wald statistic in eqt 3 will be
2
independent of σ̂σ2 , we can form an F statistic by dividing each
chi-square variable by its degrees of freedom
W/J

(n−k)σ̂ 2
σ2 /(n−k)
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 14 / 23
F test

This will be distributed as an Fj,n−k variable that simplifies to

1  0 h i−1  
0 −1 0
F = 2 Rβ̂ − c R (X X) R Rβ̂ − c (6)
j σ̂
−1
where σ̂ 2 (X0 X) is the estimated variance matrix of β̂. The test
procedure is then to reject the hypothesis Rβ̂ = c if the
computed F value exceeds a preselected critical value.
 0 h i−1  
−1
Rβ̂ − c R (X0 X) R0 Rβ̂ − c
F = ∼ F (j, n − k)
j σ̂ 2

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 15 / 23


t tests
In the particular case where our test involves only one restriction (i.e., if
H0 : βi = 0), the test statistic can equivalently be formulated as a t-test.
In these cases the R matrix is a row vector (i.e., Rβ̂ picks out β̂i ) and the
−1 0
matrix Rσ̂ 2 X0 X R is a 1 × 1 matrix, i.e. a scalar (i.e., icks out the i th
−1
diagonal element in X0 X ).
We can therefore rewrite the F1,n−k statistic given in eqtn 6 equivalently as
 2
Rβ̂ − c
F = h i
ˆ Rβ̂
Var
 2
Rβ̂ − c
=  h i
ˆ Rβ̂
se
= t2
where
Rβ̂ − c
t= h i
ˆ Rβ̂
se
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 16 / 23
t tests

If we take the square root, of the t 2 ∼ F1,n−k we get t ∼ t1,n−k . Formally, we


have

Rβ̂ − c
t= h i
ˆ Rβ̂
se

Since the distribution of a t variable with n − k degrees of freedom is exactly


equal to the distribution of the square root of an F1,n−k variable, these two tests
are statistically and numerically equivalent.

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 17 / 23


LM type tests
Lagrange multiplier (or score) tests are based on estimating the
restricted model and then compare the change in the goodness-of-fit
of the model with and without the restriction imposed.
The problem of a restricted regression, therefore, is how to
minimise the residual sum of squares, subject to the restrictions
Rβ̂ − c. Formally, the problem is

min (y − Xβ)0 (y − Xβ) subject to Rβ = c


β

the Lagrangian
 0
L = (y − Xβ)0 (y − Xβ) + 2 Rβ̂ − c λ (7)

1
where we have multiplied by 2
in order to simplify the algebra
later on.
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 18 / 23
LM type tests
The first order conditions are
 
−X0 y − Xβ̂ R + R0 λ̂ = 0 (8)
Rβ̂ R − c = 0

From eqtn 8 we see that


 
R0 λ̂ = X y − Xβ̂ R
= X0 eR

It is clear that if the restriction is valid, the term on the right hand
side should asymptotically converge to zero. It is also plausible that
we should be able to apply some central limit theorem to this vector
to show that it is asymptotically normal with convariance matrix
σ 2 (X0 X).
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 19 / 23
LM type tests
We should therefore be able to base a test of the hypothesis that λ is
zero on the statistic
1
LM = λ̂R (X0 X) R0 λ̂
se
ˆ
 
Equivalently we can use the fact that R0 λ̂ = X0 y − Xβ̂ R to write
the statistic as
1  R

0 0

R

LM = y − Xβ̂ X (X X) X y − Xβ̂
se
ˆ
The residuals from the restricted regression are standardised (by
dividing through by the estimated standard error of the restricted
regression) and then regressed on the full set of explanatory variables.
If the restriction is valid the explained sum of squares should be small.
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 20 / 23
Asymptotic LR test
In order to implement this, we will initially consider the case where we assume
normality of the errors. Assume also that we know σ 2 but need to estimate β.
Under the assumption of normality, the maximum likelihood estimator will
be the least squares estimator β̂.
The LR test compares the log likelihoods of the two models and tests whether
this difference is statistically significant. If the difference is statistically
significant, then the less restrictive model (the one with more variables) is said to
fit the data significantly better than the more restrictive model.
The likelihood ratio statistic will be given 2 (lU − lR ) that is
 0  0  0  0
y − Xβ̂R y − Xβ̂R − y − Xβ̂ y − Xβ̂
LR =
σ2
e0 eR − e0 e0
= R
σ2
Under the assumption of normality (as well as asymptotically), it is
distributed as χ2 (j).
R.P. Machema, (NUL ) Lecture_4 Inference EC6041 21 / 23
The delta method

Thus far we have looked at tests of linear restrictions on the data.


We may, however, wish to investigate also nonlinear functions of the
estimators. This turns out to be fairly straightforward.
In this case we will write our set of hypotheses in the form

Rβ̂ = c

where R is a set of functions, such as


   
β1 β2 c1
β1 =
β2 +β3 c2

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 22 / 23


The delta method

The corresponding with the Wald statistic is


   ∂Rβ 0 −1 
( )
 0  ∂Rβ  
Rβ̂ − c var β̂ Rβ̂ − c
∂β 0 ∂β 0

and for the above example, we have


   
∂Rβ β2 β1 0
= 1 β1 β1
∂β 0 β1 +β3
− (β2 +β 3)
2 − (β2 +β 3)
2

R.P. Machema, (NUL ) Lecture_4 Inference EC6041 23 / 23

You might also like