0% found this document useful (0 votes)
44 views5 pages

Vector and Matrix Calculus: Herman Kamper 30 January 2013

The document discusses vector and matrix calculus, including: 1) There are multiple competing notations for matrix derivatives that can cause confusion when consulting different sources. The document will use the "denominator layout" notation. 2) Definitions are provided for the derivatives of scalars, vector functions, and matrices with respect to scalars, vectors, and other matrices. 3) An identity is derived for the derivative of a scalar product of two vector functions with respect to a variable, using the vector-by-vector product rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views5 pages

Vector and Matrix Calculus: Herman Kamper 30 January 2013

The document discusses vector and matrix calculus, including: 1) There are multiple competing notations for matrix derivatives that can cause confusion when consulting different sources. The document will use the "denominator layout" notation. 2) Definitions are provided for the derivatives of scalars, vector functions, and matrices with respect to scalars, vectors, and other matrices. 3) An identity is derived for the derivative of a scalar product of two vector functions with respect to a variable, using the vector-by-vector product rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Vector and Matrix Calculus

Herman Kamper
[email protected]

30 January 2013

1 Introduction

As explained in detail in [1], there unfortunately exists multiple competing notations concerning
the layout of matrix derivatives. This can cause a lot of difficulty when consulting several
sources, since different sources might use different conventions. Some sources, for example [2]
(from which I use a lot of identities), even use a mixed layout (according to [1, Notes]). Identities
for both the numerator layout (sometimes called the Jacobian formulation) and the denominator
layout (sometimes called the Hessian formulation) is given in [1], so this makes it easy to check
what layout a particular source uses. I will aim to stick to the denominator layout, which seems
to be the most widely used in the field of statistics and pattern recognition (e.g. [3] and [4,
pp. 327–332]). Other useful references concerning matrix calculus include [5] and [6]. In this
document column vectors are assumed in all cases expect where specifically stated otherwise.
Table 1: Derivatives of scalars, vector functions and matrices [1, 6].

column vector
scalar y matrix Y ∈ Rm×n
y ∈ Rm
∂y row vector matrix ∂Y
∂x (only
scalar x scalar ∂y m
∂x ∈ R
∂x numerator layout)
column vector matrix
column vector x ∈ Rn ∂y n ∂y
∂x ∈ R ∂x ∈ Rn×m
∂y
matrix X ∈ Rp×q matrix ∂X ∈ Rp×q

2 Definitions

Table 1 indicates the six possible kinds of derivatives when using the denominator layout. Using
this layout notation consistently, we have the following definitions.
The derivative of a scalar function f : Rn → R with respect to vector x ∈ Rn is
 
∂f (x)
 ∂x1 
 ∂f (x) 
∂f (x) def  ∂x2 
=  .. 
 (1)
∂x  . 
 
∂f (x)
∂xn

This is the transpose of the gradient (some authors simply call this the gradient, irrespective of
whether numerator or denominator layout is used).

1
T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x)


and x ∈ Rn , with respect to scalar xi is

∂f (x) def h ∂f1 (x) ∂f2 (x) ∂fm (x)


i
= ∂xi ∂xi ... ∂xi
(2)
∂xi

T
The derivative of a vector function f : Rn → Rm , where f (x) = f1 (x) f2 (x) . . . fm (x) ,


with respect to vector x ∈ Rn is


   
∂f (x) ∂f1 (x) ∂f2 (x) ∂fm (x)
. . .
 ∂x1   ∂x1 ∂x1 ∂x1 
 ∂f (x)   ∂f1 (x) ∂f2 (x) ∂fm (x) 
∂f (x) def  ∂x2  ...
 =  ∂x2 ∂x2 ∂x2 

=  ..   .. .. .. .. 
 (3)
∂x  .   . . . . 
   
∂f (x) ∂f1 (x) ∂f2 (x) ∂fm (x)
∂xn ∂xn ∂xn ... ∂xn

This is just the transpose of the Jacobian matrix.


The derivative of a scalar function f : Rm×n → R with respect to matrix X ∈ Rm×n is
 
∂f (X) ∂f (X) ∂f (X)
· · ·
 ∂X11 ∂X12 ∂X1n 
 ∂f (X) ∂f (X) ∂f (X) 
∂f (X) def  ∂X21 ∂X22 · · · ∂X2n 
=  .. .. .. .. 
 (4)
∂X  . . . . 
 
∂f (X) ∂f (X) ∂f (X)
∂Xm1 ∂Xm2 ··· ∂Xmn

Observe that the (1) is just a special case of (4) for column vectors. Often (as in [3]) the gradient
notation is used as an alternative to the notation used above, for example:

∂f (x)
∇x f (x) = (5)
∂x
∂f (X)
∇X f (X) = (6)
∂X

3 Identities

3.1 Scalar-by-vector product rule

If a ∈ Rm , b ∈ Rn and C ∈ Rm×n then


 
m
X m
X n
X m X
X n
aT Cb = ai (Cb)i = ai  Cij bj  = Cij ai bj (7)
i=1 i=1 j=1 i=1 j=1

Now assume we have vector functions u : Rm → Rm , v = Rn → Rn and A ∈ Rm×n . The vector


functions u and v are functions of x ∈ Rq , but A is not. We want to find an identity for

∂uT Av
(8)
∂x

2
From (7), we have:
 T m n
∂uT Av

∂u Av ∂ XX
= = Aij ui vj
∂x l ∂xl ∂xl
i=1 j=1
m X
n
X ∂
= Aijui vj
∂xl
i=1 j=1
m X n  
X ∂ui ∂vj
= Aij vj + ui
∂xl ∂xl
i=1 j=1
m X
n m n
X ∂ui X X ∂vj
= Aij vj + Aij ui (9)
∂xl ∂xl
i=1 j=1 i=1 j=1

Now we can show (by writing out the elements [Notebook, 2012-05-22]) that:
  m Xn m n
∂u ∂v T X ∂ui X X T ∂vj
Av + A u = Aij vj + (A )ji ui
∂x ∂x l ∂xl ∂xl
i=1 j=1 i=1 j=1
m X n m X n
X ∂ui X ∂vj
= Aij vj + Aij ui (10)
∂xl ∂xl
i=1 j=1 i=1 j=1

A comparison of (9) and (10) completes the proof that

∂uT Av ∂u ∂v T
= Av + A u (11)
∂x ∂x ∂x

3.2 Useful identities from scalar-by-vector product rule

From (11) it follows, with vectors and matrices b ∈ Rm , d ∈ Rq , x ∈ Rn , B ∈ Rm×n , C ∈ Rm×q ,


D ∈ Rq×n , that
∂(Bx + b)T C(Dx + d) ∂(Ax + b) ∂(Dx + d)T T
= C(Dx + d) + C (Bx + b) (12)
∂x ∂x ∂x
resulting in the identity:

∂(Bx + b)T C(Dx + d)


= BT C(Dx + d) + DT CT (Bx + b) (13)
∂x
by using the easily verifiable identities:

∂(u(x) + v(x)) ∂u(x) ∂v(x)


= + (14)
∂x ∂x ∂x

∂Ax
= AT (15)
∂x
∂a
=0 (16)
∂x

Some other useful special cases of (11):

∂xT Ab
= Ab (17)
∂x

3
∂xT Ax
= (A + AT )x (18)
∂x

∂xT Ax
= 2Ax if A is symmetric (19)
∂x

3.3 Derivatives of determinant

See [7, p. 374] for definition of cofactors. Also see [Notebook, 2012-05-22].
We can write the determinant of matrix X ∈ Rn×n as
n
X
|X| = Xi1 Ci1 + Xi2 Ci2 + . . . + Xin Cin = Xij Cij+ (20)
j=1

Thus the derivative will be


 
∂|X| ∂
= {Xi1 Ci1 + Xi2 Ci2 + . . . + Xin Cin }
∂X kl ∂Xkl

= {Xk1 Ck1 + Xk2 Ck2 + . . . + Xkn Ckn }
∂Xkl
(can choose i any number, so choose i = k)
= Ckl (21)

Thus (see [7, p. 386])


∂|X|
= cofactor X = (adj X)T (22)
∂X
But we know that the inverse of X is given by [7, p. 387]
1
X−1 = adj X (23)
|X|

thus
adj X = |X|X−1 (24)
which, when substituted into (22), results in the identity

∂|X|
= |X|(X−1 )T (25)
∂X

From (25) we can also write


 
∂ ln |X| ∂ ln |X| 1 ∂|X| 1
= = = |X|(X−1 )T (26)
∂X kl ∂Xkl |X| ∂X |X|

giving the identity

∂ ln |X|
= (X−1 )T (27)
∂X

4
References

[1] Matrix calculus. [Online]. Available: http://en.wikipedia.org/wiki/Matrix calculus


[2] K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” 2008.
[3] A. Ng, Machine Learning. Class notes for CS229, Stanford Engineering Everywhere,
Stanford University, 2008. [Online]. Available: http://see.stanford.edu
[4] S. R. Searle, Matrix Algebra Useful for Statistics. New York, NY: John Wiley & Sons, 1982.
[5] J. R. Schott, Matrix Analysis for Statistics. New York, NY: John Wiley & Sons, 1996.
[6] T. P. Minka, “Old and new matrix algebra useful for statistics,” 2000. [Online]. Available:
http://research.microsoft.com/en-us/um/people/minka/papers/matrix
[7] D. G. Zill and M. R. Cullen, Advanced Engineering Mathematics, 3rd ed. Jones and Bartlett,
2006.

You might also like