0% found this document useful (0 votes)
67 views6 pages

Matrix Calculus 2

This document provides notation and rules for calculating derivatives of matrix operations and functions. It defines notation for real and complex matrices and vectors. It then details derivatives of linear, quadratic and cubic functions of matrices as well as derivatives of inverses, traces and determinants of matrices.

Uploaded by

João Vieira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views6 pages

Matrix Calculus 2

This document provides notation and rules for calculating derivatives of matrix operations and functions. It defines notation for real and complex matrices and vectors. It then details derivatives of linear, quadratic and cubic functions of matrices as well as derivatives of inverses, traces and determinants of matrices.

Uploaded by

João Vieira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

Matrix Calculus
Go to: Introduction, Notation, Index

Contents of Calculus Section


Notation
Differentials of Linear, Quadratic and Cubic Products
Differentials of Inverses, Trace and Determinant
Hessian matrices

Notation
j is the square root of -1
XR and XI are the real and imaginary parts of X = XR + jXI
XC is the complex conjugate of X
X: denotes the long column vector formed by concatenating the columns of X (see vectorization).
A ¤ B = KRON(A,B), the kroneker product
A • B the Hadamard or elementwise product
matrices and vectors A, B, C do not depend on X

Derivatives
In the main part of this page we express results in terms of differentials rather than derivatives for two
reasons: they avoid notational disagreements and they cope easily with the complex case. In most cases
however, the differentials have been written in the form dY: = dY/dX dX: so that the corresponding
derivative may be easily extracted.

Derivatives with respect to a real matrix

If X is p#q and Y is m#n, then dY: = dY/dX dX: where the derivative dY/dX is a large mn#pq matrix. If
X and/or Y are column vectors or scalars, then the vectorization operator : has no effect and may be
omitted. dY/dX is also called the Jacobian Matrix of Y: with respect to X: and det(dY/dX) is the
corresponding Jacobian. The Jacobian occurs when changing variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

Although they do not generalise so well, other authors use alternative notations for the cases when X and
Y are both vectors or when one is a scalar. In particular:

dy/dx is sometimes written as a column vector rather than a row vector


dy/dx is sometimes transposed from the above definition or else is sometimes written dy/dxT to
emphasise the correspondence between the columns of the derivative and those of xT.
dY/dx and dy/dX are often written as matrices rather than, as here, a column vector and row vector
respectively. The matrix form may be converted to the form used here by appending : or :T
respectively.

http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 1 of 6
Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

Derivatives with respect to a complex matrix

If X is complex then dY: = dY/dX dX: can only be true iff Y(X) is an analytic function which normally
implies that Y(X) does not depend on XC or XH.

Even for non-analytic functions we can write uniquely dY: = dY/dX dX: + dY/dXC dXC: provided that
is analytic with respect to X and XC individually (or equivalently with respect to XR and XI individually).
dY/dX is the Generalized Complex Derivative and dY/dXC is the Complex Conjugate Derivative [R.4,
R.9].

We define the generalized derivatives in terms of partial derivatives with respect to XR and XI:

dY/dX = ! (dY/dXR - j dY/dXI)


dY/dXC = (dYC/dX)C = ! (dY/dXR + j dY/dXI)

We have the following relationships for both analytic and non-analytic functions Y(X):

Cauchy Riemann equations: The following are equivalent:


Y(X) is an analytic function of X
dY: = dY/dX dX:
dY/dXC = 0 for all X
dY/dXR + j dY/dXI = 0 for all X
dY: = dY/dX dX: + dY/dXC dXC:
dY/dXR = dY/dX + dY/dXC
dY/dXI = j (dY/dX - dY/dXC)
Chain rule: If Z is a function of Y which is itself a function of X, then dZ/dX = dZ/dY dY/dX. This
is the same as for real derivatives.
Real-valued: If Y(X) is real for all complex X, then
dY/dXC= (dY/dX)C
dY: = 2(dY/dX dX:)R
If W(X) is analytic with W(X)=Y(X) for all real X, then dW/dX = 2 (dY/dX)R for all real X
Example: If C=CH, y(x)=xHCx and w(x)=xTCx, then dy/dx = xHC and dw/dx = 2xTCR

Complex Gradient Vector

If f(x) is a real function of a complex vector then df/dxC= (df/dx)C and we can define grad(f(x)) = 2
(df/dx)H = (df/dxR+j df/dxI)T as the Complex Gradient Vector [R.9] with the following properties:

grad(f(x)) is zero at an extreme value of f .


grad(f(x)) points in the direction of steepest slope of f(x)
The magnitude of the steepest slope is equal to |grad(f(x))|. Specifically, if g(x) = grad(f(x)), then
lima->0 a-1( f(x+ag(x)) - f(x) ) = | g(x) |2
grad(f(x)) is normal to the surface f(x) = constant which means that it can be used for gradient
ascent/descent algorithms.

Basic Properties
http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 2 of 6
Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

We may write the following differentials unambiguously without parentheses:


Transpose: dYT=d(YT)=(dY)T
Hermitian Transpose: dYH=d(YH)=(dY)H
Conjugate: dYC=d(YC)=(dY)C
Linearity: d(Y+Z)=dY+dZ
Chain Rule: If Z is a function of Y which is itself a function of X, then for both the normal and the
generalized complex derivative: dZ: = dZ/dY dY: = dZ/dY dY/dX dX:
Product Rule: d(YZ) =Y dZ + dY Z
d(YZ): = (I ¤ Y) dZ: + (ZT ¤ I) dY: = ((I ¤ Y) dZ/dX + (ZT ¤ I) dY/dX ) dX:
Hadamard Product: d(Y • Z) =Y • dZ + dY • Z
Kroneker Product: d(Y ¤ Z) =Y ¤ dZ + dY ¤ Z

Differentials of Linear Functions


d(Ax) = d(xTA): =A dx
d(xTa) = d(aTx) = aT dx
d(ATXB): = (AT dX B): = (B ¤ A)T dX:
d(aTXb) = (b ¤ a)T dX: = (abT):T dX:
d(aTXa) = d(aTXTa) = (a ¤ a)T dX: = (aaT):T dX:
d(XB): = (dX B): = (BT ¤ I) dX:
d(xbT): = (dx bT): = (b ¤ I) dx
d(aTXTb) = (a ¤ b)T dX: = (baT):T dX:
[x: Complex]
d (xHA): = AT dxC
Writing In = I[n#n] and Tq,m = TVEC(q,m),
d(X[m#n] ¤ A[p#q]): = (In ¤ Tq,m ¤ Ip)(Imn ¤ A:) dX: = (Inq ¤ Tm,p )(In ¤ A: ¤ Im) dX:
d(A[p#q] ¤ X[m#n]): = (Iq ¤ Tn,p ¤ Im)(A: ¤ Imn) dX: = (Tm,n ¤ Ipq )(In ¤ A: ¤ Im) dX:

Differentials of Quadratic Products


d(Ax+b)TC(Dx+e) = ((Ax+b)TCD + (Dx+e)TCTA) dx
d(xTCx) = xT(C+CT)dx = [C=CT] 2xTCdx
d(xTx) = 2xTdx
d(Ax+b)T (Dx+e) = ( (Ax+b)TD + (Dx+e)TA)dx
d(Ax+b)T (Ax+b) = 2(Ax+b)TAdx
d(Ax+b)TC(Ax+b) = [C=CT] 2(Ax+b)TCA dx
d(Ax+b)HC(Dx+e) = (Ax+b)HCD dx + (Dx+e)TCTAC dxC
d (xHCx) =xHC dx +xTCT dxC = [C=CH] 2(xHC dx)R
d (xHx) = 2(xH dx)R
d(aTXTXb) = X(abT + baT):T dX:
d(aTXTXa) = 2(XaaT ):T dX:
d(aTXTCXb) = (CTXabT + CXbaT):T dX:
d(aTXTCXa) = ((C + CT)XaaT ):T dX: = [C=CT] 2(CXaaT):T dX:

http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 3 of 6
Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

d((Xa+b)TC(Xa+b)) = ((C+CT)(Xa+b)aT ):T dX:


d(X2): = (XdX + dX X): = (I ¤ X + XT ¤ I) dX:
d(XTCX): = (XTCdX): + (d(XT) CX): = (I ¤ XTC) dX: + (XTCT ¤ I) dXT:
d(XHCX): = (XHCdX): + (d(XH) CX): = (I ¤ XHC) dX: + (XTCT ¤ I) dXH:

Differentials of Cubic Products


d(xxTAx) = (xxT(A+AT)+xTAxI )dx

Differentials of Inverses
d(X-1) = -X-1dX X-1 [2.1]
d(X-1): = -(X-T ¤ X-1) dX:
d(aTX-1b) = - (X-TabTX-T ):T dX: = - (abT):T (X-T ¤ X-1) dX: [2.6]
d(tr(ATX-1B)) = d(tr(BTXTA)) = -(X-TABTX-T):T dX: = -(ABT):T (X-T ¤ X-1) dX:

Differentials of Trace
Note: matrix dimensions must result in an n*n argument for tr().

d(tr(Y))=tr(dY)
d(tr(X)) = d(tr(XT)) = I:T dX: [2.4]
d(tr(Xk)) =k(Xk-1)T:T dX:
d(tr(AXk)) = (SUMr=0:k-1(XrAXk-r-1)T ):T dX:
d(tr(AX-1B)) = -(X-1BAX-1)T:T dX:= -(X-TATBTX-T):T dX: [2.5]
d(tr(AX-1)) =d(tr(X-1A)) = -(X-TATX-T ):T dX:
d(tr(ATXBT)) = d(tr(BXTA)) = (AB):T dX: [2.4]
d(tr(XAT)) = d(tr(ATX)) =d(tr(XTA)) = d(tr(AXT)) = A:T dX:
d(tr(ATX-1BT)) = d(tr(BXTA)) = -(X-TABX-T):T dX: = -(AB):T (X-T ¤ X-1) dX:
d(tr(AXBXTC)) = (ATCTXBT + CAXB):T dX:
d(tr(XAXT)) = d(tr(AXTX)) = d(tr(XTXA)) =( X(A+AT)):T dX:
d(tr(XTAX)) = d(tr(AXXT)) = d(tr(XXTA)) = ((A+AT)X):T dX:
d(tr(AXBX)) = (ATXTBT + BTXTAT ):T dX:
d(tr((AXb+c)(AXb+c)T) = 2(AT(AXb+c)bT):T dX:
d(tr((XTCX)-1A) = [C:symmetric] d(tr(A (XTCX)-1) = -((CX(XTCX)-1)(A+AT)(XTCX)-1):T dX:
d(tr((XTCX)-1(XTBX)) = [B,C:symmetric] d(tr( (XTBX)(XTCX)-1) = 2(BX(XTCX)-1-
(CX(XTCX)-1)XTBX(XTCX)-1 ):T dX:

Differentials of Determinant
Note: matrix dimensions must result in an n#n argument for det(). Some of the expressions below involve
inverses: these forms apply only if the quantity being inverted is square and non-singular; alternative

http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 4 of 6
Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

forms involving the adjoint, ADJ(), do not have the non-singular requirement.

d(det(X)) = d(det(XT)) = ADJ(XT):T dX: = det(X) (X-T):T dX: [2.7]


d(det(ATXB)) = d(det(BTXTA)) = (A ADJ(ATXB)TBT):T dX: = [A,B: nonsingular] det(ATXB) !
(X-T):T dX: [2.8]
d(ln(det(ATXB))) = [A,B: nonsingular] (X-T):T dX: [2.9]
d(ln(det(X))) = (X-T):T dX:
d(det(Xk)) = d(det(X)k) = k ! det(Xk) ! (X-T):T dX: [2.10]
d(ln(det(Xk))) = k ! (X-T):T dX:
d(det(XTCX)) = [C=CT] 2det(XTCX)!(CX(XTCX)-1):T dX: [2.11]
= [C=CT, CX: nonsingular] 2det(XTCX)!(X-T):T dX:
d(ln(det(XTCX))) = [C=CT] 2(CX(XTCX)-1):T dX:
= [C=CT, CX: nonsingular] 2(X-T):T dX:
d(det(XHCX)) = det(XHCX) ! (CTXC (XTCTXC)-1)dX: + (CX(XHCX)-1):T dXC:) [2.12]
d(ln(det(XHCX))) = (CTXC (XTCTXC)-1):TdX: + (CX(XHCX)-1):T dXC: [2.13]

Jacobian
dY/dX is called the Jacobian Matrix of Y: with respect to X: and JX(Y)=det(dY/dX) is the corresponding
Jacobian. The Jacobian occurs when changing variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

JX(X[n#n]-1)= (-1)ndet(X)-2n

Hessian matrix
If f is a real function of x then the Hermitian matrix Hx f = (d/dx (df/dx)H)T is the Hessian matrix of f(x).
A value of x for which grad f(x) = 0 corresponds to a minimum, maximum or saddle point according to
whether Hx f is positive definite, negative definite or indefinite.

[Real] Hx f = d/dx (df/dx)T


Hx f is symmetric
Hx (aTx) = 0
Hx (Ax+b)TC(Dx+e) = ATCD + DTCTA
Hx (Ax+b)T (Dx+e) = ATD + DTA
Hx (Ax+b)TC(Ax+b) = AT(C + CT)A = [C=CT] 2ATCA
Hx (Ax+b)T (Ax+b) = 2ATA
Hx (xTCx) = C+CT = [C=CT] 2C
Hx (xTx) = 2I
[x: Complex] Hx f = (d/dx (df/dx)H)T = d/dxC (df/dx)T
Hx f is hermitian

http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 5 of 6
Matrix Reference Manual: Matrix Calculus 11/30/11 9:03 AM

Hx (Ax+b)HC(Ax+b) = [C=CH] (AHCA)T [2.14]


Hx (xHCx) = [C=CH] CT

This page is part of The Matrix Reference Manual. Copyright © 1998-2005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying instructions. Please send any comments or
suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: calculus.html,v 1.30 2011/01/14 16:28:04 dmb Exp $

http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html Page 6 of 6

You might also like