0% found this document useful (0 votes)
49 views10 pages

Time Series

The document presents the classical linear regression model (CLRM) in matrix notation. It defines the population regression function as y = Xβ + u, where y is an (n x 1) vector of dependent variable observations, X is an (n x k) matrix of independent variables including a column of 1s for the intercept, β is a (k x 1) vector of unknown parameters, and u is an (n x 1) vector of disturbances. It expresses the assumptions of the CLRM in matrix form, including that the expected value of u is a zero vector, the variance-covariance matrix of u is σ2I, X is non-stochastic, and u is normally distributed. It concludes by presenting

Uploaded by

Samara Chaudhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views10 pages

Time Series

The document presents the classical linear regression model (CLRM) in matrix notation. It defines the population regression function as y = Xβ + u, where y is an (n x 1) vector of dependent variable observations, X is an (n x k) matrix of independent variables including a column of 1s for the intercept, β is a (k x 1) vector of unknown parameters, and u is an (n x 1) vector of disturbances. It expresses the assumptions of the CLRM in matrix form, including that the expected value of u is a zero vector, the variance-covariance matrix of u is σ2I, X is non-stochastic, and u is normally distributed. It concludes by presenting

Uploaded by

Samara Chaudhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

The Matrix Approach

proach to the Classical Linear R


Regression
egression Model (CLRM)
(Gujarati, Appendix C)

We will begin with the k-variable


variable model. The Population Regression
egression Function is as
follows:

Yi  1  2X 2i  3X 3i  ...  k X ki  ui ; i  1,2, 3,..., n (1)

where,
1 : Intercept
2 ...k :Partial
: Partial slope coefficients
u :Stochastic disturbance term
n : Population size

Remember that the PRF is a conditional expectation E Y X2i , X 3i ,..., Xki  

Expand (1)

Y1  1  2X 21  3X 31  ...  k X k 1  u1


Y2  1  2X 22  3X 32  ...  k Xk 2  u2
Y3  1  2X 23  3X 33  ...  k Xk 3  u 3 (2)
.................................................................
Yn  1  2X 2n  3X 3n  ...  k Xkn  un

Write down (2) in matrix notation


y is a (n  1) column vector of observations on
Y  1 X dependent variable Y .
 1   21
X 31 . . X k 1 
 1  u1 
Y  1 X  
X 32 . . X k 2  2  u2 

 2   22     
Y  1 X X is a (n  k ) data matrix of (k  1) variables
X 33 . . X k 3  3  u3 
 3    23       X 2 to Xk , the column of 1's represents the
.  . . . . . .  .  . 
       intercept term.
.  . . . . . .  .  . 
       
Yn  1 X 2n X 3n . . Xkn   k  un 
 is a (k  1) column vector of
unknown parameters 1...k .
y
  X
 
  u

(n1) (nk ) (k 1) (n1)
u is a (n  1) column vector
1 of n disturbances ui .
So, in vector notation, the PRF is written as y  X  u (3)

Assumptions of the CLRM in Matrix Notation

First, recall the assumptions in scalar notation:

1. E ui   0, for each i




0 for i  j (no auto/serial correlation)
 
2. E ui u j   2
 for i  j (homoscedasticity)



3. X 2, X 3,..., Xk are non-stochastic or fixed.
4. No exact linear relationship among the X varia variables
bles (no multicollinearity)

5. ui  N 0,  2 
Now we will look at those exact same assumptions but expressed in vector
vector-matrix
notation.

2
u  E (u )   0
 1  1   
u  E (u )   0
 2  2   
     
1. E .   . 
 . 
     
.  .  . 
     
un  E (un )  0
     

u 
 1 u 2 u u ... u u 
u   1 1 2 1 n 
 2  2 
     u u u ... u u 
2. E uu   E .  u1 u2 ... un   E  2 1 2 2 n

    ... ... ... ... 
.   
  u u u u ... u 2 
un   n 1 n 2 n 
 
E (u 2 ) E (u u ) ... E (u u ) 
 1 1 2 1 n 
 2 
E (u2u1 ) E (u2 ) ... E (u2un ) 
  
 ... ... ... ... 
 
E (u u ) E (u u ) ... E (u ) 2
 n 1 n 2 n 

 2 0 ... 0 1 0 ... 0
   
 2   
 0  ... 0 2 
0 1 ... 0 
   
 ................   ............... 
   0 0 ... 1
 0 0 ...  2   
 

  2I

variance covariance matrix of the disturbances ui. The diagonal


This is called the variance-covariance
elements are the (constant) variances while the off-diagonal
off diagonal elements are the (zero)
covariances. The matrix is symmetric.

3. The (n x k)) data matrix X is non-stochastic (fixed).

4. The data matrix X has full column rank. This means that none of the columns of
X are linearly dependent on each other i.e. there is no exact linear relationship
among the X variables. This implies no multicoll
multicollinearity.
inearity. (Recall that the rank of a
matrix is the maximum number of linearly independent
dependent rows or columns)

 
5. ui  N 0,  2 , no change here, though note the zero-vector
zero (0).

3
OLS eestimation in matrix notation

First write down the k-variable


variable Sample Regression Function (SRF)

Yi  ˆ1  ˆ2X 2i  ˆ3X 3i  ...  ˆk Xki  uˆi

In matrix notation:

y  Xˆ  uˆ

Y  1 X X 31 . . Xk 1   ˆ  uˆ 
 1   21  1   1 
Y  1 X X 32 . . Xk 2  ˆ2  uˆ2 
 2   22    
Y  1 X X 33 . . Xk 3  ˆ3  uˆ3 
 3    23
    
.  . . . . . . .  . 
      
.  . . . . . . .  . 
      
Yn  1 X 2n X 3n . . Xkn ˆn  uˆn 

Recall that OLS estimators are found by minimising the Residual Sum of Squares

2
min  uˆi2   Yi  ˆ1  ˆ2X 2i  ...  ˆk Xki  (4)
i i

Expression (4) in matrix notation can be written as min uˆ u


ˆ

uˆ 
 1 
uˆ 
 2 
ˆ u
since u   

ˆ  uˆ1 uˆ2 ... uˆn .   uˆ12  uˆ22  ...  uˆn2   uˆi2
i
. 
uˆ 
 n

ˆ  y  X  u
Now, u ˆ  y  Xˆ y  Xˆ  yy  2ˆX y  ˆXXˆ
ˆ u

remembering that AB   B A .

4
So, we are required to min uˆ u 2 ˆXy  ˆXXˆ with respect to ̂ , to do so
ˆ  y y  2
ˆ u
we have to set the partial derivative of u ˆ with respect to ̂ equal to zero. Doing so
we get the following result:

ˆ u
u ˆ
 2X y  2XXˆ  0
ˆ
 XXˆ  X y
1 1
 X  X  XX ˆ  XX X y
1
  ˆ  X X X  y
   
k 1 kk    n1
kn

This result is the matrix counterpart of the scalar OLS estimator of ̂2 . You will
variable sample regression function Yi  ˆ1  ˆ2Xi  uˆi , the
recall that for the two-variable
 x i yi
OLS estimator of ˆ2  i
.
 xi2
i

We will now express in vector


vector-matrix
matrix notation, some further standard results in
econometrics.

Variance – covariance matrix of ̂ :

-1
Recall, ̂  X X X y and y = X + u , substitute and obtain
-1
̂  XX X X + u 
-1 -1
 XX XX  X X u
-1
   X X u
which means
-1
̂    X X Xu

5
Now, by definition,
 
Var  Cov ˆ  E ˆ   ˆ    
 
  

 -1

 E  XX Xu X X X u


-1
 



  -1

 E  XX Xu u X X X
-1




 -1 -1 
 E X X X uu X XX 
 

Since the X’s are non-stochastic E( = X,, so we have,


stochastic (i.e. constant), E(X)
-1 -1
Var  Cov ˆ  XX XE (uu )X XX
-1 -1
 XX X 2I X XX
-1 -1
  2 XX X  X X X
-1
  2 X X 

R-squared
squared and population variance:
variance

 uˆi2
ˆ2 
Recall, in the scalar case ̂ i
, in the k -variable
variable case, in vector notation we
n k
ˆ u
u ˆ
write ˆ2  .
n k

ESS ˆXy  nY 2
R2  
TSS y y  nY 2

6
Hypothesis Testing:

  
We have the following distributions, u  N 0 ,  2I and ˆ  N  ,  2 (XX)1 which 
are the starting points for hypothesis testing. Since, in practice,  2 is unknown, it
has to be replaced by its sample estimate ̂ 2 .

ˆi  i
Recall, t  n – k) degrees of freedom and the F test for overall
with (n
se ˆ 
i

RSSR  RSSUR  / k
significance is accomplished by calculating the statistic F 
RSSUR  / n1  n2  2k 

In the k – variable case, for the null hypothesis H 0 : 2  3  ...  k  0 , we can

express the F statistic in vector notation as F 


ˆXy  nY  / (k  1) . Notice that
2

yy  ˆXy / (n  k )
R2 / (k  1)
this bears a close resemblance to the scalar result F  which, as
(1  R2 ) / (n  k )
you will recall, is another way of calculating the F statistic using R--squared.

When we have linear restrictions, the general procedure for hypothesis testing is as
follows:

If uˆR are the residuals from the restricted least-squares


least squares regression, then we can
calculate RSSR   uˆR2  u
ˆRu
ˆ R , and in a similar fashion if uˆUR are the residuals
from the unrestricted least--squares regression we have RSSUR   uˆUR
2
u  u
ˆ UR ˆ UR ,
uˆ R uˆ R  uˆ UR
 uˆ UR  / m
then F  , where m is the number of linear restrictions, k is
uˆ UR
 u ˆ UR  / (n  k )
the number of parameters (including the intercept) in the unrestricted regression and
n is the number of observations.

7
Generalised Least Squares (GLS):

In order to simplify matters we will work with a 3 x 3 matrix.

In OLS we assumed E (uu )   2I , where I is just the identity matrix. In the 3 x 3


case, if we expand E (uu ) we obtain
 2 0 0 

 
E (uu )   0  2 0  which represents the OLS homoscedasticity and no serial
 
0 0
  2 
correlation assumption.

Let us depart from the OLS assumption and now assume that E (uu )   2V where V
is a known (n x n) variance-covariance
variance matrix. The elements on the main diagonal of
V are the variances (possibly not all the same) and the off
off-diagonal
diagonal elements ar
are
autocorrelations of the error terms (possibly not all equal to zero). V could be of
three types.

1 0 0
 
1. 0 1 0 which is the same as I and would yield the OLS assumption. In fact
 
0 0 1
OLS is just a special case of GLS.

 2 0 0 
 1
 
2.  0 22 0  , now we have heteroscedasticity but no serial correlation.
 
 0
 0 32 

 2 cov(uiu j ) cov(ui u j )


 1
 
3. cov(uiu j ) 22 cov(ui u j ) , here we have both heteroscedasticity and serial
 
cov(u u ) cov(u u )  2 
 i j i j 3 
correlation.

8
Now, if y  X  u with E (u)  0 and var-cov(u)= 2V , and if  2 is unknown, V
represents the assumed underlying structure of the variances and covariances among
1
the random errors ui . Then, GLS  XV 1X   XV 1y and
1
 
var cov GLS   2 XV 1X   .

In practice, we may not know either  2 , or indeed the structure of V. Then we have
to estimate both. Estimated GLS is known as EGLS or Feasible GLS (FGLS).

1 1

 EGLS  XVˆ1X   
XVˆ1y and var cov  EGLS  ˆ2 XVˆ1X   , where Vˆ is an
estimator of V.

BLUE Properties of OLS Estimators in Matrix Notation

When is an estimator (like ̂


 ) BLUE (Best Linear Unbiased Estimator)?

1. When it is a linear function of a random variable, like Y.


2. When it is unbiased i.e. E ˆ   .
3. When it is “best” in the sense that it has minimum variance in the class of all
linear-unbiased estimators.

Let us see these properties expressed in matrix notation.

1 1
1. We know that ̂  XX X y , XX X is just a matrix of fixed numbers, so
̂ is a linear function of y.. It is, therefore, a linear estimator.

1
2. The PRF is y = X  u , substitute for y in ̂  X X X y and get
1
̂  X X X  X  u 
1
   X X  X u
1
Taking expectations we obtain, E (ˆ)  E ( )  XX XE (u)  E (ˆ)   , which
means ̂ is an unbiased estimator of  .

9
3. Let ̂  be any other linear estimator of  which we can write as
 -1 
ˆ  X X X   C y, where C is a matrix of constants
 
 -1 
 ˆ  X X X   C X + u 
 
-1
   CX  XX Xu  Cu

If ̂  is to be an unbiased estimator of  we must have CX = 0, if that is the case


-1
then we have ̂     XX Xu  Cu .

   
    
-1 -1
Now, Var  Cov ˆ : E ˆ   ˆ    E XX Xu + Cu  XX Xu + Cu  .
   

 
1
Simplify and obtain: Var  Cov ˆ   2 XX   2CC .
1
But,  2 X X is Var  Cov ˆ and CC is a positive semi-definite
definite matrix, so

   
Var  Cov ˆ  Var  Cov ˆ   2CC implying Var  Cov ˆ  Var  Cov ˆ .
Hence, ̂ has the smallest variance, making it BLUE.

10

You might also like