Appendix C
Appendix C
Bibliography 717
Index 718
700
Appendix C
This appendix addresses briefly various matrix/linear-algebra basis, really attempting to be a reference
for the reader, as opposed to a development. Many of the operations reviewed here like matrix/tensor
definition , basic operations, inversion, and factorization nominally implement with routines in imple-
mentation packages for design (like Matlab). Further, this section attempts to enumerate and explain
briefly the salient relevant aspects of vector/matrix calculus that supports the designs in Chapters ?? -
5 or adaptive design implementation in Chapter 7.
Section C.1 reviews matrix basics. Many early topics are likely familiar to this text’s reader in
advance, but this section reviews them in terminology common to this text’s other chapters and thus
removes some ambiguities. Some reference to tensors, and alternative products like Hadamard and
Kronecker, to help avert direct consideration of tensors. Common heavily used design factorizations like
QR, Cholesky, singular-value, and eigenvalue appear in common terminology along with some common
matrix inversion recursions. Oder is also addressed along with the Frobenious norm and inner product.
Section C.2 progresses to matrix calculus, focusing first on real matrices and functions, but expand-
ing then to complex matrices. Various useful derivatives are listed in Section C.2, with attempt also to
explain and hopefully avert some common mistakes with complex convex optimization that sometimes
appear in widely used software packages. Section C.3 visits optimization of scalar and matrix functions,
particularly looking at convex procedures and the appropriate descent algorithms’ necessary “gradi-
ent” or “steepest descent” directional calculations. https://www.math.uwaterloo.ca/~hwolkowi/
matrixcookbook.pdf
701
C.1 Vectors and Matrices
This text most often uses matrices in conjunction with MMSE designs (Appendix D addresses MMSE)
and many related matrix and system concepts. In that context and in all this text’s main chapters, the
notation of x∗ or A∗ for conjugate transpose of a vector or matrix respectively is sufficient, averting
special separate notation for each of conjugate (no transpose), transpose, and both. However, general
matrix calculus creates situations where the extra notation is necessary. This notation may be necessary
to help software algorithms (See Appendix G) internal code that may supply results into use in main
chapters where subsequent use only needs the consistent conjugate and transpose notation, x∗ . For this
Appendix (C) and understanding detailed use in software, the notation xt and At means transpose with
no conjugate and xt∗ and At∗ means take the transpose twice and the conjugate of that, or equivalently
just replace every element with its conjugate.
An m × n matrix A ∈ Cmn has column-vector characterization1
where it is clear that often ãti 6= ati and that reocrding causes any deviation. This ordering can become
important when tensors effectively become elongated vectors.
A tensor adds additional indices (third, fourth) etc to a matrix so that there are columns, rows,
pages, .... A vector is a special case of a matrix where either m or n is one. Similarly a matrix is a
special case of tensor and so on. Matlab calls tensors “arrays.”
Matrix multiplication of a vector produces another vector, where the initial vector specifies the weights
of the matrix’ column vectors in a linear transformation. Through multiplication a matrix transforms
the unit hypercube, for instance, into a polytope. The determinant of a square matrix, |A|, measures
that polytope’s volume. The determinant is the sum of the products of any row (column) vector’s
elements with their cofactors. The cofactors are equal to (−1)i+j times the determinant of the submatrix
remaining after the ith row and j th column have been deleted. This creates a recursive way to compute
1 The ordering here is similar to Matlab with 1, 1 entry in upper left and m, n entry in lower right, which differs from
this text’s time/frequency and space orderings in various chapters. The text’s order may simply map all column vectors
according a → J · a where J is the Hankel identity with all ones on the antidiagonal and zeros elsewhere.
702
the determinant because a simple scalar is its own determinant, allowing the 2 × 2 determinant of the
matrix |[a b ; c d]| = a · c − b · d. The adjoint matrix adj(A) replaces each element with its cofactor and
takes the transpose.
∆
The inverse of a square matrix A−1 exists when |A| = 6 0 and is A−1 = adj(A)
|A| , and A·A
−1
= A−1 ·A = I.
The trace of square matrix, trace{A}, is the sum of its diagonal elements.
The following statements are all true (with α ∈ C):
1. trace{A + B} = trace{A} + trace{B},
2. trace{αA} = α · trace{A},
3. trace{A · B} = trace{B · A}
4. |A · B| = |A| · |B|,
5. |α · A| = |α|m · |A|,
6. |A−1 | = 1/|A|, and
7. |I + u · v ∗ | = (1 + u∗ · v)∗ = 1 + u∗ · v .
A symmetric (square) matrix has A = At , while a conjugate-symmetric, or Hermetian symmetric,
matrix has A = A∗ . A unitary matrix has A−1 = A∗ .
The rank of a matrix %A is the maximum number of linearly independent columns or rows (see
Appendix D for linear independence) of A, and %A ≤ min(m, n).
1 .. .. .. ..
A−1 = ·
. . . .
(C.6)
|A| (−1)(n+2) · |An,2 | (−1)(n−1+2) · |An−1,2 |
· · · (−1)(1+2) · |A1,2 |
(−1)(n+1) · |An,1 | (−1)(n−1+1) · |An−1,1 | · · · (−1)(1+1) · |A1,1 |
where Ai,j is the minor matrix formed by deleting the ith row and j th column of A. So, basically replace
each element by ± times its minor matrix’ determinant and take the transpose before dividing by
the overall determinant. The ± sign is always positive on the diagonal and alternates in sign in moving
each position horizontally (or vertically) from the diagonal. The determinants without the plus/minus
are often called co-factors.
which of course requires A−1 and C −1 to exist (are nonsingular). This formula provides a means to
update an inverse when more information becomes available or the matrix changes. For instance if
B = x, C = I, and D = y ∗ , then this reduces to
(A−1 · x) · (y ∗ · A−1 )
(A + x · y ∗ )−1 = A−1 − , (C.8)
1 + y ∗ · A−1 · x
703
which considerably reduces complexity to update the inverse with respect using a full matrix inversion.
Another recursive form follows from
−1
(A − B · C −1 D)−1 −A−1 · B · (C − D · A−1 · B)−1
A B
= (C.9)
D C −C −1 · D · (A − B · C −1 · D)−1 (C − D · A−1 · B)−1
and builds larger-size inverses from smaller ones. Computation of an m×m inverse without any recursion
requires the order of O(m3 ) calculations. The matlab command inv(A) provides the matrix inverse for
nonsingular inputs.
The pseudoinverse A+ can be used when|A| = 0 or when m 6= n, recognizing that such a situation
corresponds to the equation A · x = b having either no exact solution (over-determined) or many
solutions (under-determined). In the over-determined case, the solution is the one that minimizes the
norm kA · x − bk2 in terms of solutions for x, while the under-determined case finds the solution with
minimum norm kxk2 among the many solutions. The unique pseudoinverse satisfies the following
1. A · A+ · A = A,
2. A+ · A · A+ = A+ ,
3. A · A+ = A+∗ · A∗ , and
4. A+ · A = A∗ · A+∗ .
A+ = A−1 when A is square nonsingular. The matlab command for pseudoinverse is pinv(A). However,
when solving the equation A · x = b, the pseudoinverse solution is given by x = A \ b.
C.1.1.2 Factorizations
There are many useful matrix factorizations in data-transmission design.
H =Q·R (C.10)
[Q , R] = qr([ 1 2 3 ; 4 5 6]’)
Q =
-0.2673 0.8729 0.4082
-0.5345 0.2182 -0.8165
-0.8018 -0.4364 0.4082
R =
-3.7417 -8.5524
0 1.9640
0 0
704
This text (see Appendix G) also has an rq.m matlab command that produces H = R · Q∗ for convenience
of use to certain designs. QR factorizations most often use either Householder Transformations or Givens
Rotations. Chapter 7 has more on the Givens approach. These algorithms proceed sequentially with
residual columns of H and can reorder them in a hidden way so that there are many H = Q · R
factorizations. Matlab’s QR factorization can track this ambiguity with respect to the order of diagonal
elements because it places the largest-magnitude diagonal elements in the upper left (something useful
with sparse matrices). More explicity, while H = Q · R, it is possible also that (with J as a permutation
matrix with inverse J ∗ = J t reversing the reordering)
H = (Q · J) · (J ∗ · R) = Q1 · R1 , (C.11)
| {z } | {z }
Q1 R1
which with upper-triangular R1 essentially hides the reordering by J. If separate processing uses the
Q (or Q1) it is sometimes useful to know this reorder that occurred within the algorithm. Matlab
recorders this reordering explicity with the command [Q,R,J ] = qr (H) so then H · J = Q · R and the
user can then back-out the reordering. This operation is particularly important with so-called multiuser
broadcast channels where the reordering would confuse the labels of different user-receiver locations.
Singular Value Decomposition (SVD): Any m × n matrix H has a singular value decomposition
(SVD) such that
H = F · Λ · M∗ (C.12)
where the m × m unitary matrix F satisfies F · F ∗ = F ∗ · F = Im and the n × n unitary matrix M
satisfies M · M ∗ = M ∗ · M = In , while Λ is an m × n diagonal matrix with real nonnegative entries
(even if H is complex). The singular values are unique, and Matlab’s SVD command [F, Lambda, M] =
svd(H) arranges them largest to smallest from the upper left corner down. Matlab’s “economy” option
will keep only the columns of F and M that correspond to nonzero singular values, but H = F · Λ · M ∗
still holds. An algorithm for computing SVD usingQ MMSE methods appears in Chapter 7.
m
When H is square nonsingular, then |H| = eθ · i=1 λi , with the ambiguity on sign arising from the
fact that a unitary matrix may have determinant equal to eθF where 0 ≤ θ < 2π, and thus θ = θF − θM .
Eigendecomposition: Eigenvalues and eigenvectors are useful in many engineering areas. However,
this text uses SVD for matrices that are not conjugate symmetric autocorrelation matrices. This text
only uses eigendecomposition for positive semi-definite autocorrelation matrices (see Appendix D). In
this case such a matrix has eigendecomposition
R=V ·E ·V∗ (C.13)
where V · V ∗ = V ∗ · V = I is unitary and E is a positive semi-definite diagonal matrix of eigenvalues
such that
R · v n = En · v n (C.14)
where V = [v1 ...v%R ]. More generally the V ∗ is replaced by V −1 for any square matrix R. For
Qm
the autocorrelation
Pm matrix, all the eigenvalues are nonnegative and then |R| = i=1 Ei . In general
trace{R} = i=1 Ei .
The nonzero eigenvalues of the symmetric positive semidefinite matrices H · H ∗ and H ∗ · H are the
same and equal to the squared singular values of H.
related to ordering, so the factorization provided from matlab can be reversed by using the command lohc.m when desired.
705
is used where G is monic (has ones on the diagonal) and upper triangular (so then |G| = 1). The
diagonal elements of Sx are all positive real and are the the matrix R’s Cholesky Factors. Note then
|R| = |Sx | when R is nonsingular. Chapter 5 generalizes Cholesky factorization to the singular case. The
factorization in this form (lower times upper) is the inverse of what matlab’s chol command produces, so
this course’s website has an lohc.m matlab command that provides the desired factorization. Matlab’s
chol command is however also directly useful in Chapter 5’s canonical factorization of the backward
channel model. Clearly R−1 = G−∗ · G−1 .
Schur Compliment: Schur Compliments essentially are a block Cholesky factorization of a matrix
I B · C −1 (A − B · C −1 · D) 0
A B I 0
= · · (C.17)
D C 0 I 0 C C −1 · D I
and so then
−1
(A − B · C −1 · D)−1 −B · C −1
A B I 0 0 (I
= · · . (C.18)
D C −C −1 · D I 0 C −1 0 I
The Schur method lends itself readily to recursive implementation and determination of a block Cholesky
factorization by adding successive blocks to R (and enlarging its dimensionality to include increasingly
larger sets of dimensions).
column. The nonzero entries determine the reordering after application to a column vector input.
706
C.1.2.1 Hadamard Product
The Hadamard matrix product is
(A B)i,j = (A)ij · (B)ij (C.24)
It has properties
A B = B A (C.25)
A (B C) = (A B) C (C.26)
A (B + C) = A B+A C (C.27)
A 0 = 0 A=0 . (C.28)
Of particular importance is the relation (with Dx being a matrix with vector x’s elements along its
diagonal)
x∗ · (A B) · y = trace{Dx∗
· A · Dy · B t } (C.29)
In particular by setting x = y, this form shows that the Hadamaard product of two positive semi-definite
matrices must be positive semi-definite and vice versa. Other properties are
and (with λi (A) as the ith largest eigenvalue of positive definite A and also a positive-definite B )
n
Y n
Y
λi (A B) ≥ λi (A · B) ∀k = 1, ..., n . (C.31)
i=k i=k
also,
|A B| ≥ |A| · |B| (C.32)
for positive semi-definite A and B. If the matrices are vectors then
a b = Da · b = Db · a . (C.33)
Also
Diag{a} = (a · 1∗ ) I . (C.34)
where ai 6= aj for i 6= j.
707
Vandermonde-Matrix elements can be from any field, R, C, continuous or finite, GF (q m ) = Fm .
Examples include Chapter 4’s Discrete Fourier Transform matrices (within a scale constant) and parity
matrices for cyclic codes’ square submatrices. The Vandermonde matrix also appears in interpolation
where the rows can be viewed as powers of a variable D for the different ai = D and thus the equation:
A·x=y (C.38)
n−1
for a known y finds the polynomial coefficients xi in x x0 + x1 · D + ... + xn−1 · D to match a set of
measured values y with a polynomial fit, where the solution is A−1 · y. When the elements are from a
finite field with n = 2m − 1, the ai value in each row becomes the GF (q m ) element ai for i = 0, ..., 2m − 2,
leading to an (n − 1) × (n − 1) square Vandermonde matrix, which is useful in Chapter 7’s decoders.
Proof:The proof will be constructive and will find a monic triangular decomposition of
A. Simple column operations of adding a column to a constant times another column
does not change the determinant, that is
(a0 )2 (a0 )n−1
1 a0 ... 1 −a0 0 ... 0
1 a1 (a1 )2 ... (a1 )n−1 0
1 −a0 . . . 0
· . . (C.40)
.. .. . .. .
.. .. . . . . . −a
. . ... ... 0
1 an−1 (an−1 )2 . . . (an−1 )n−1 0 0 0 ... 1
| {z } | {z }
A U0−1
produces
1 0 0 ... 0
1 a1 − a0 a1 · (a1 − a0 ) ... (a1 )n−2 · (a1 − a0 )
(a2 )n−2 · (a1 − a0 )
1 a2 − a0 a1 · (a2 − a0 ) ...
(C.41)
.. .. .. .. ..
. . . . .
1 an−1 − a0 a1 · (an−1 − a0 ) . . . (an−1 )n−2 · (a1 − a0 )
| {z }
A·U0−1
leaving
1 0 0 ... 0
1 a1 − a0 0 ... 0
(a2 )n−3 · (a2 − a0 ) · (a2 − a1 )
1
(a2 − a0) (a2 − a0 ) · (a2 − a1 ) ...
.. .. .. .. ..
. . . . .
1 (an−1 − a0 ) (an−1 − a0 ) · (an−1 − a1 ) . . . (an−1 )n−3 · (an−1 − a0 ) · (an−1 − a1 )
| {z }
A·U0−1 ·U1−1
(C.43)
708
Repeating this exercise until fully triangularized leaves
1 0 0 ... 0
n−2
1 a1 − a0 0 . .. 0
Y
−1 1 (a2 − a0 ) (a2 − a0 ) · (a2 − a1 ) . . . 0
A· Ui = (C.44)
.. .. .. .. ..
i=0 . . . . .
Qn−1
1 ... ... ... i=0 (an−1 − ai )
The Ui matrices are monic upper triangular matrices and thus all have determinant 1.
Thus the determinant is in (C.39). Further, as long as no two ai values are the same,
this determinant is nonzero, meaning A is nonsingular. QED.
709
C.2 Matrix Calculus
This section reviews derivatives with respect to vectors and matrices. Good source for this Peterson and
Pederson’s The Matrix Cookbook [3], the online Complex Analysis [2], and Zhang’s matrix approach to
artifical intelligence [4], and the more in-depth references therein. The reader presumably understands
(x) ∆
well derivative of a real scalar function
R f (x) with respect to a real scalar variable x, dfdx = fx (x) and its
inverse indefinite integration x = f · dfx · df + C (C a constant that disappears with definite integrals
that have limits). Most communication engineers will also recall the partial derivatives of a real scalar
∆ ∂f ∆
function with respect to multiple real inputs, say f (x, y, z) as ∂f ∂x = fx (x, y, z), ∂y = fy (x, y, z), and
∂f ∆
∂z = fz (x, y, z) often organized in a gradient vector (with x now a column vector containing x, y, and
z)
fx
df ∆
= ∇f = fy . (C.45)
dx
fz
Subsection C.2.1 further pursues this and expands to when the function itself is also a vector. Sub-
section C.2.2 yet further expands to when either or both of the function and the inputs are matrices.
t
The transpose of this matrix is sometimes called the Jacobian matrix Jf = [∇x f ] . Block vectors
or “cell arrays” follow the same definitions but with possible variable elements corresponding to the
individual element dimensionalities. The gradient of a scalar linear combination is
d(xt · B)
=B . (C.50)
dx
710
C.2.2 Derivative with respect to a matrix
A scalar function’s derivative with respect to a matrix is a matrix of the same dimension with the partial
derivative with respect to each element of the matrix in its corresponding (same) position. A vector of
course immediately also corresponds to a special case of a matrix. Again in this text: capitalized letters
denote matrices (in context, sometimes capitals also correspond to Fourier, Laplace, or D transforms).
Vectors are boldface lower case.
When the function itself is a vector or a matrix, then the derivatives essentially become potentially
3-dimensional and 4-dimensional tensors respectively. In these situations, typically the last sections
Kronecker product instead characterizes the tensors as matrices. When this happens, typically the
indices for the two function outputs Y are k, ` while for the function input X are i, j. For unconstrained
matrices (so then NOT symmetric, positive definite, Toeplitz, etc), direct expressions can be obtained
when the matrix variable X has its entries xk,` chosen so that when taking partial derivatives with
respect to any matrix element xi,j that
∂xk,`
= δik · δ`j . (C.51)
∂xi,j
3. Quotient
f (X) 1
∇X = 2 · [g(X) · ∇X f − f (X) · ∇x g(X)] (C.54)
g(X) g (X)
4. Chain Rule
df
∇X f (g(X)) = · ∇X g(X) (C.55)
dg
5. Chain Rule for matrix function G that is p × q
p X
q
X ∂f (G) ∂gk`
[∇X f (G(X))]ij = · . (C.56)
∂gk` xij
k=1 `=1
This text has interest in some common scalar functions and their derivatives with respect to matrix
711
X (square when directly the argument of trace or determinant:
f (X) | ∇X f (C.57)
− − − − − − −− −−−−−−−−−−− (C.58)
trace{X} | I (square X only) (C.59)
−1 −2
trace{X } = −X (square X only) (C.60)
t
trace{A · X} = A (C.61)
trace{X k } = k · (X t )k−1 (square X only) (C.62)
t t t t t t
trace{X · A · X · B} | B ·X ·A +A ·X ·B (C.63)
t t t
trace{X · A · X · B} | B ·X ·A +B·X ·A (C.64)
−t t t
trace{A · X · X · B} | (B · A + A · B ) · X (C.65)
trace{A · X −1 · B} | −X t · At · B t · X −t (C.66)
|X| | X −t · |X| (square X only) (C.67)
−t
ln |X| | X (C.68)
−t
|X| | X · |X| (C.69)
X −t
X −1 | (C.70)
|X|
Xk | k · X k · X −t (C.71)
−1
ln X · X t
t
| 2·X · X ·X · X · Xt (C.72)
h i
−1 −1
ln X · A · X t A · Xt · Xt · A · X + X · At · X · A · X t · X · A · X t (C.73)
|
|A · X · B| | |A · X · B| · At (AXB)−t · B t (C.74)
If A, B, and/or X are such that any trace above becomes a scalar, then this is also the gradient of that
scalar function because trace of a scalar is that scalar.
For derivatives with respect to a matrix, the function vec takes a matrix input column by column.
There are also some vector and matrix gradients of interest (in the matrix case, Kronecker products
implicity expand the dimensionality)
F (X) | ∇X f (C.75)
− − − − − − −− −−−−−−−−−−− (C.76)
Xt · A · X | (At · X) ⊗ I + Knm · (A · X) ⊗ I (C.77)
A · Xt · B | Knm · (At ⊗ B) (C.78)
t t t t
A·X ·B·X ·C | Knm · A ⊗ (B · X · C) + (B X · A ) ⊗ C (C.79)
−1 −t
X | − (X ⊗ X) (square X only) (C.80)
k
X
Xk | (X t )j−1 ⊗ X k−j (square X only) (C.81)
j=1
The chain rule as well as some basic principles below can be used with these to construct more sophis-
ticated matrix derivatives.
Another use of the vec function is to write the gradient of a scalar with respect to a matrix as an
enlarged vector, creating the mn × 1 gradient ∇vec(X) f (vec(X)). This rearranges the gradient from an
m × n matrix to an mn × 1 vector. This may seem semantical until viewing the gradient of a matrix
712
function of a matrix like p × q matrix function F (X) where then
∆ ∂vec(F (X))t
∇vec(X) F (X) = , (C.85)
∂vec(X)
Chain Rule Expressions and other Derivative Calculating Aids chain rule, products, division,
distributive.
z =x+·y . (C.86)
A function f may correspondingly have nonzero real and imaginary parts so that
df ∆ f (z + δz) − f (z)
(z) = lim , (C.88)
dz δz→0 δz
has an ambiguity in terms of from which direction δz → 0. When exclusively real, the direction is limited
to one direction; when complex, the direction could be from any angle towards z, see for instance [2]. In
the real-axis approach, the derivative becomes with real h
∂u(x, y) ∂v(x, y)
= and
∂x ∂y
∂u(x, y) ∂v(x, y)
= − , (C.93)
∂y ∂y
the Cauchy-Riemann equations [3], [2]. This Cauchy-Rieman Equation satisfaction implies a
certain symmetry of f (z) with respect to all directions (which would be linear combinations of the real
and imaginary paths) so that all derivatives are the same. Figure C.1 illustrates this
713
𝛿ℜ 𝑓 𝛿ℑ 𝑓
Not all functions f (z) satisfy the Cauchy-Riemann equation, and then the definition of a complex
derivative becomes confused. Figure C.1 also shows that if the two approach directions are δz and
δz ∗ , which are also orthogonal directions (clearly one is 90-degree rotation from the other) that the
magnitude of the δz remains constant even if |h| = 6 |g|. Thus a formal derivative definition that uses
these two directions is an improvement and would always have two orthogonal components, one along δz
and the other along δz∗. For instance the function f (z) = z 2 satsisies CR with z 2 = (x2 − y 2 ) + (2xy)
so that ux = 2x = vy and uy = −2y = −vx . However, the function f (z) = |z|2 = x2 + y 2 clearly has
v = 0 and ux = 2x, uy = 2y which satisfies CR only at the origin. The latter function thus would have
a confused derivative.
When the CR Equation is satisfied, the function is said to be analytic (See Appendix D for more
on analytic functions), which means that over any input domain where CR holds, the function and all
its derivatives exist (are bounded)4
The derivative generalization needed is the Complex Derivative and the Conjugate Derivative , show-
ing explicitly the dependency on both z and z ∗ as per [4],
the first (partial) derivatives needed to be checked - a somewhat striking property left here to mathematicians to explain
further.
714
which has second terms zero for analytic f .
The rules in (C.52) - (C.56) apply to both generalized gradients, as do the F (X) vec-function nota-
tional generalizations
The complex gradient vector’s then follows by using the complex derivative for each complex-
input dimension5 (column vectors, both)
∆ ∂f (z, z t∗ ) ∂f (z, z t∗ )
∇fz (z, z t∗ ) = +· (C.99)
∂<z ∂=z
∆ ∂f (z, z t∗ ) ∂f (z, z t∗ )
∇fz t∗ (z, z t∗ ) = +· . (C.100)
∂<z ∂=z
Similarly for matrix Z:
∆ ∂f (Z, Z t∗ ) ∂f (Z, Z t∗ )
∇fZ (Z, Z t∗ ) = +· (C.101)
∂<Z ∂=Z
∆ ∂f (Z, Z t∗ ) ∂f (Z, Z t∗ )
∇fZ t∗ (Z, Z t∗ ) = +· . (C.102)
∂<Z ∂=Z
These complex forms are used in gradient descent (Chapters 3 - 6) and other optimization, particularly
also the conjugate gradient form ∇fz t∗ (z, z t∗ ) for nonanalytic function’s descent.
The earlier gradients generalize in the complex case to
f (Z) ∇Z f ∇Z t∗ f (C.103)
− − − − − − −− −−−−−−−−−−− −−−−−−−−−−−− (C.104)
t
trace{A · Z} A 0 (C.105)
∗
trace{A · Z } 0 A (C.106)
−1 −t t −t
trace{A · Z } −Z ·A ·Z 0 (C.107)
trace{A · Z −∗ } 0 −Z −∗ · A · Z −∗ (C.108)
t t t
trace{Z · A · Z · B} B ·Z ·A +B·Z ·A 0 (C.109)
t
trace{Z · A · Z · B} (A · Z · B + B · Z · A) 0 (C.110)
t∗ t ∗ t t t t
trace{Z · A · Z · B} B ·Z ·A A ·Z ·B (C.111)
trace{Z · A · Z ∗ · B} B t · Z t∗ · At B·Z ·A (C.112)
k t k−1
trace{Z } k · (Z ) 0 (C.113)
∗ k ∗ k−1
trace{(Z ) } 0 k · (Z ) (C.114)
−t
|Z| Z · |Z| 0 (C.115)
|Z ∗ | 0 Z −∗ · |Z ∗ | (C.116)
t t −1 t
Z ·Z 2(Z · Z ) · Z · |Z · Z | 0 (C.117)
∗t ∗ t −1 ∗ t∗ t ∗ t −1 t∗
Z ·Z (Z · Z ) · Z · |Z · Z | Z · (Z · Z ) · Z ·Z (C.118)
∗ t∗ t −1 t∗ ∗ ∗ −1 ∗
|Z · Z | (Z ·Z ) ·Z · |Z · Z | (Z · Z ) · Z · |Z · Z | (C.119)
k k −t
Z k · |Z| · Z 0 (C.120)
5 Note the factor of 1/2 is already implicit in the conjugate derivative definition so it need not be repeated.
715
C.3 Matrix Functional Optimization
Global and local minima in the matrix cases (scalar function of vector/matrix, or matrix function of
matrix) follow the intuition of scalar function minimization, basically the first-derivative (gradient) needs
to be zero and the second derivative needs to be positive/negative for a minimum/maximum. Iterative
algorithmic generation of such optima usually increments in the gradient’s revealed direction of maximum
descent/ascent.
Section C.2’s use of the vec function expands here to simplify generalization to the matrix case.
∆ ∂ 2 f (X)
∇2vec(X) f (X) = real case . (C.126)
∂vec(X) ∂vec(X)t
for the complex case with generalized derivatives, this Hessian expands properly to
∂ 2 f (vec(Z),vec(Z t∗ )) ∂ 2 f (vec(Z),vec(Z t∗ ))
∂vec(Z t∗ ) ∂vec(Z)t ∂vec(Z t∗ ) ∂vec(Z)∗
∆
∇2vec(X) f (X) = (C.127)
2 t∗ 2 t∗
∂ f (vec(Z),vec(Z )) ∂ f (vec(Z),vec(Z ))
∂vec(Z) ∂vec(Z)t ∂vec(Z) ∂vec(Z)∗
(unique if strict negative definite). The simplification to real case is obvious. When the Hessians are
positive (negative) definite for all Z in a domain, then the resultant functional values are the global
minimum (maximum) for that domain.,
where the updating term is added if the algorithm is steepest ascent to a maximum. So called Newton’s
methods, deflect the gradient by the Hessian’s psedoinverse so
h i−1
Z ← Z − µ · ∇2vec(Z),vec(Z t∗ ) f (Z, Z t∗ ) · ∇vec(Z t∗ ) f (Z, Z t∗ ) , (C.132)
716
Bibliography
[1] Introduction à l’Analyse des lignes Courbes algébriques Gabriel Cramer, Geneva:Europeana 1750,
pp. 656-659.
[2] Complex-Analysis Book: A visual and Interactive Introduction Juan Carlos Ponce Campuzano
https://complex-analysis.com/content/complex_differentiation.html Chapter 2, 2019-2022
[3] Kaare Brandt Petersen and Michael Syskind Pedersen “The Matrix Cookbook”. https://www.math.
uwaterloo.ca/~hwolkowi/matrixcookbook.pdf . November 15, 2012.
[4] Xian-Da Zhang. “A Matrix Algebra Approach to Artificial Intelligence”. Springer, Sinapore. ISBN
978-981-15-2769-2. 2020.
717
Index
Cauchy-Riemann, 712
derivative
complex, 713
product
matrix, 702
718
1
2
3
4
5
6
7
8
719