0% found this document useful (0 votes)
176 views

Lecture Note 3 - Introduction To Vector and Matrix Differentiation

This document provides an introduction to vector and matrix differentiation. It begins with conventions for taking derivatives of scalar and vector functions. It then discusses derivatives of some special functions that are useful in econometrics, including linear combinations of vectors and matrix-vector multiplications. Finally, it applies these concepts to derive the ordinary least squares estimator in a linear regression model. The overall purpose is to expand on concepts of matrix differentiation that are important for econometrics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views

Lecture Note 3 - Introduction To Vector and Matrix Differentiation

This document provides an introduction to vector and matrix differentiation. It begins with conventions for taking derivatives of scalar and vector functions. It then discusses derivatives of some special functions that are useful in econometrics, including linear combinations of vectors and matrix-vector multiplications. Finally, it applies these concepts to derive the ordinary least squares estimator in a linear regression model. The overall purpose is to expand on concepts of matrix differentiation that are important for econometrics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

INTRODUCTION

TO VECTOR AND MATRIX


DIFFERENTIATION
Econometrics C ¨ Lecture Note 3
Heino Bohn Nielsen
February 6, 2012

I
n this note we expand on Verbeek (2004, Appendix A.7) on matrix differentiation.
We first present the conventions for derivatives of scalar and vector functions;
then we present the derivatives of a number of special functions particularly useful
in econometrics, and, finally, we apply the ideas to derive the ordinary least squares
(OLS) estimator in a linear regression model. I should be emphasized that this note
is cursory reading; the particular results needed in this course are indicated with a
(∗).

Outline
§1 Conventions for Scalar Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
§2 Conventions for Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
§3 Some Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
§4 The Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1
1 Conventions for Scalar Functions
Let  = ( 1     )0 be a  ×1 vector and let  () =  ( 1     ) be a real-valued function
that depends on , i.e.  (·) : R 7−→ R maps the vector  into a single number,  ().
Then the derivative of  (·) with respect to  is defined as
⎛  () ⎞
 1
 () ⎜ .. ⎟
=⎜
⎝ . ⎟
⎠ (1)

 ()
 

This is a  × 1 column vector with typical elements given by the partial derivative () .

Sometimes this vector is referred to as the gradient. It is useful to remember that the
derivative of a scalar function with respect to a column vector gives a column vector as
the result1 .
Similarly, the derivative of a scalar function with respect to a row vector yields the
1 ×  row vector
 () ³  ()  ()
´
= · · · 
 0  1  

2 Conventions for Vector Functions


Now let ⎛ ⎞
1 ()
⎜ .. ⎟
() = ⎜
⎝ . ⎟⎠
 ()
be a vector function depending on  = ( 1     )0 , i.e. (·) : R 7−→ R maps the  × 1
vector into a  × 1 vector, where  () =  ( 1     ),  = 1 2  , is a real-valued
function.
Since (·) is a column vector it is natural to consider the derivative with respect to a
row vector,  0 , i.e. ⎛  () ⎞
1 1 ()
 1 · · ·  
() ⎜ ⎜ .. . .. ⎟
⎟
0 =⎝ . . . . (2)
 ⎠
 ()  ()
 ··· 
1 

where each row,  = 1 2  , contains the derivative of the scalar function  (·) with
respect to the elements in . The result is therefore a  ×  matrix of derivatives with
typical element ( ) given by 
 ()
. If the vector function is defined as a row vector, it

is natural to take the derivative with respect to the column vector, .
We can note that it holds in general that
µ ¶
 (()0 ) () 0
= , (3)
  0
1  ()
Note that Wooldridge (2006, p. 815) does not follow this convention, and lets 
be a row vector.

2
which in the case above is a  ×  matrix.
Applying the conventions in (1) and (2) we can define the Hessian matrix of second
derivatives of a scalar function  () as
⎛  2  ()  2  ()

  · · ·  
 2  ()  2  () ⎜⎜
1
..
1
..
1
..


⎟
0 = 0 = ⎝ . . . ⎠
  
 2  ()  2  ()
  · · ·  
 1  

2
  ()
which is a  ×  matrix with typical elements ( ) given by the second derivative  .
  
Note that it does not matter if we first take the derivative with respect to the column or
the row.

3 Some Special Functions


First, let  be a  × 1 vector and let  be a  × 1 vector of parameters. Next define the
scalar function  () = 0 , which maps the  parameters into a single number. It holds
that
 (0 )
=  (4∗)

To see this, we can write the function as

 () = 0  = 1  1 + 2  2 +  +    

Taking the derivative with respect to  yields


⎛ (  +  ++ ⎞ ⎛ ⎞
1 1 2 2   )
 1 1
 () ⎜ .. ⎟ ⎜ . ⎟
=⎜
⎝ . ⎟ = ⎜ .. ⎟ = 
⎠ ⎝ ⎠

(1  1 +2  2 ++   )
  

which is a  × 1 vector as expected. Also note that since  0  = 0 , it holds that


¡ ¢
 0
=  (5∗)

Now, let  be a  ×  matrix and let  be a  × 1 vector of parameters. Furthermore
define the vector function () = , which maps the  parameters into  function values.
() is an  × 1 vector and the derivative with respect to  0 is a  ×  matrix given by

 ()
=  (6∗)
 0
To see this, write the function as
⎛ ⎞
11  1 + 12  2 +  + 1  
⎜ .. ⎟
() =  = ⎜
⎝ . ⎟

1  1 + 2  2 +  +   

3
and find the derivative
⎛ (  ++ ⎞ ⎛ ⎞
11 1 1   ) (11  1 ++1   )
 1 ···   11 · · · 1
() ⎜ .. .. .. ⎟ ⎜ . .. .. ⎟
=⎜
⎝ . . . ⎟ = ⎜ ..
⎠ ⎝ . . ⎟
⎠ = 
 0
(1  1 ++   ) (1  1 ++   )
 1 ···   1 · · · 

Similarly, if we consider the transposed function, () =  0 0 , which is a 1 ×  row vector,


we can find the  ×  matrix of derivatives as
¡ ¢
  0 0
= 0 . (7∗)

This is just an application of the result in (3).
Finally, consider a quadratic function  () =  0   for some  ×  matrix  . This
function maps the  parameters into a single number. Here we find the derivatives as the
 × 1 column vector ¡ ¢
 0 
= ( +  0 ) (8∗)

or the row variant ¡ ¢
 0 
=  0 ( +  0 ) (9∗)
 0
If  is symmetric this reduces to 2  and 2 0  , respectively. To see how this works,
consider the simple case  = 3 and write the function as
⎛ ⎞⎛ ⎞
³ ´ 11 12 13 1
⎜ ⎟⎜ ⎟
0  =  1  2  3 ⎝ 21 22 23 ⎠ ⎝  2 ⎠
31 32 33 3
= 11  21 + 22  22 + 33  23 + (12 + 21 ) 1  2 + (13 + 31 ) 1  3 + (23 + 32 ) 2  3 

Taking the derivative with respect to , we get


⎛ ⎞
( 0  )
¡ 0 ¢  1
   ⎜ ( 0  ) ⎟
= ⎜ ⎝ 0 2 ⎠

 (  )
 3
⎛ ⎞
211  1 + (12 + 21 ) 2 + (13 + 31 ) 3
⎜ ⎟
= ⎝ 222  2 + (12 + 21 ) 1 + (23 + 32 ) 3 ⎠
233  3 + (13 + 31 ) 1 + (23 + 32 ) 2
⎛ ⎞⎛ ⎞
211 12 + 21 13 + 31 1
⎜ ⎟⎜ ⎟
= ⎝ 12 + 21 222 23 + 32 ⎠ ⎝  2 ⎠
13 + 31 23 + 32 233 3
⎛⎛ ⎞ ⎛ ⎞⎞ ⎛ ⎞
11 12 13 11 21 31 1
⎜⎜ ⎟ ⎜ ⎟⎟ ⎜ ⎟
= ⎝⎝ 21 22 23 ⎠ + ⎝ 12 22 32 ⎠⎠ ⎝  2 ⎠
31 32 33 13 23 33 3
= ( +  0 )

4
4 The Linear Regression Model
To illustrate the use of matrix differentiation consider the linear regression model in matrix
notation,
 =  + ,
where  is a  × 1 vector of stacked left-hand-side variables,  is a  ×  matrix of
explanatory variables,  is a  × 1 vector of parameters to be estimated, and  is a  × 1
vector of error terms. Here  is the number of explanatory variables and  is the number
of observations.
One way to motivate the ordinary least squares (OLS) principle is to choose the esti-
b as the value of  that minimizes the sum of squared residuals, i.e.
mator, ,

X
b = arg min
 2 = arg min 0 
 
=1

Looking at the function to be minimized, we find that

0  = ( − )0 ( − )
¡ ¢
=  0 −  0  0 ( − )
=  0  −  0  −  0  0  +  0  0 
=  0  − 2 0  +  0  0 

where the last line uses the fact that  0  and  0  0  are identical scalar variables.
Note that 0  is a scalar function and taking the first derivative with respect to  yields
the  × 1 vector
¡ ¢
 (0 )   0  − 2 0  +  0  0 
= = −2 0  + 2 0 
 
where we have used the results in (4∗) and (8∗) for  0  symmetric. Solving the 
equations,
 (0 ) b = 0
= −2 0  + 2 0  

yields the OLS estimator
¡ ¢
b =  0  −1  0 

provided that  0  is non-singular.
To make sure that  b is a minimum of 0  and not a maximum, we should formally
ensure that the second derivative is positive definite. The  ×  Hessian matrix of second
derivatives is given by
 2 (0 )  (−2 0  + 2 0 )
= = 2 0 
 0  0
which is a positive definite matrix by construction.

5
References
Verbeek, M. (2004): A Guide to Modern Economtrics. John Wiley & Sons, 2nd edn.

Wooldridge, J. M. (2006): Introductory Econometrics: A Modern Approach. Thomson,


South-Western Publishing, 3rd edn.

You might also like