1 - Summary of Vector Matrix Operations
1 - Summary of Vector Matrix Operations
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SUMMARY OF VECTOR/MATRIX
OPERATIONS
A.1 DEFINITION
A.1.1 Vectors
A vector is a linear collection of elements. We use a lower case bold letter to
denote vectors, which by default are assumed to be column vectors. For
example,
⎡ a1 ⎤
a = ⎢a2 ⎥
⎢ ⎥
⎢⎣ a3 ⎥⎦
a = [ a1 a2 a3 ].
A vector is called unit or normalized when the sum of elements squared is equal to
n
1: ∑a 2
i = 1 for an n-element vector a.
i =1
A.1.2 Matrices
A matrix is a two-dimensional collection of elements. We use bold upper case letters
to denote matrices. For example,
Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook, by Bruce P. Gibbs
Copyright © 2011 by John Wiley & Sons, Inc
555
⎡2 5 9⎤
A = ⎢ 5 1 3⎥
⎢ ⎥
⎢⎣ 9 3 4 ⎥⎦
is symmetric.
A.1.2.3 Toeplitz Matrix A square matrix is Toeplitz if all elements along the
upper left to lower right diagonals are equal: Ai,j = Ai−1,j−1. For example,
⎡1 2 −1 4 ⎤
⎢3 1 2 −1⎥
A=⎢ ⎥
⎢5 3 1 2⎥
⎢ ⎥
⎣ −2 5 3 1⎦
is Toeplitz.
A.1.2.4 Identity Matrix A square matrix that is all zero except for ones along
the main diagonal is the identity matrix, denoted as I. For example,
⎡1 0 0 0⎤
⎢0 1 0 0⎥
⎢ ⎥
⎢0 0 1 0⎥
⎢ ⎥
⎣0 0 0 1⎦
is a 4 × 4 identity matrix. Often a subscript is added to indicate the dimension, as
In is an n × n identity matrix.
A.1.2.5 Triangular Matrix All elements of a lower triangular matrix above the
main diagonal are zero. All elements of an upper triangular matrix below the main
diagonal are zero. For example,
⎡1 7 4 −4 ⎤
⎢0 2 6 1⎥
⎢ ⎥
⎢0 0 3 9⎥
⎢ ⎥
⎣0 0 0 5⎦
is upper triangular.
A.2.1 Transpose
The transpose of a matrix, denoted with superscript T, is formed by interchanging
row and column elements: B = AT is the transpose of A where Bji = Aij. For
example,
T ⎡ A11 A21 ⎤
⎡ A11 A12 A13 ⎤
B=⎢ = ⎢ A12 A22 ⎥ .
⎣ A21 A22 A23 ⎥⎦ ⎢ ⎥
⎢⎣ A13 A23 ⎥⎦
A.2.2 Addition
Two or more vectors or matrices of the same dimensions may be added or sub-
tracted by adding/subtracting individual elements. For example, if
⎡ 1 2 3⎤ ⎡3 7 5 ⎤
A=⎢ ⎥ , B = ⎢ 2 1 −2 ⎥
⎣ 4 5 6 ⎦ ⎣ ⎦
then
⎡4 9 8 ⎤
C= A+B= ⎢ ⎥.
⎣6 6 4 ⎦
Matrix addition is commutative; that is, C = A + B = B + A.
A.2.5 Multiplication
Two matrices, where the column dimension of the first (m) is equal to the row
dimension of the second, may be multiplied by forming the dot product of the rows
m
of the first matrix and the columns of the second; that is, Cij = ∑ Aik Bkj . For
example, if k =1
⎡3 2 ⎤
⎡ 1 2 3⎤ ⎢ ⎥
A=⎢ ⎥ , B = ⎢0 1 ⎥
⎣4 5 6⎦
⎢⎣ 5 −2 ⎥⎦
then
⎡18 −2 ⎤
C = AB = ⎢ ⎥.
⎣42 1 ⎦
Matrix multiplication is not commutative; that is, C = AB ≠ BA. A matrix multiply-
ing or multiplied by the identity I is unchanged; that is, AI = IA = A.
The transpose of the product of two matrices is the reversed product of the
transposes: (AB)T = BTAT.
Vector-matrix multiplication is defined as for matrix-matrix multiplication. If
matrix A is m × n and vector x has m-elements, y = xTA or
m
y j = ∑ xi Aij for j = 1, 2, … , n
i =1
The inverse of the product of two matrices is the reversed product of the inverses:
(AB)−1 = B−1A−1. Nonsquare matrices generally do not have an inverse, but left or
right inverses can be defined; for example, for m × n matrix A, ((ATA)−1AT)A = In,
so (ATA)−1AT is a left inverse provided that (ATA)−1 exists, and A(AT(AAT)−1) = Im
so AT(AAT)−1 is a right inverse provided that (AAT)−1 exists.
A square matrix is called orthogonal when ATA = AAT = I. Thus the transpose
is also the inverse: A−1 = AT. If rectangular matrix A is m × n, it is called column
orthogonal when ATA = I since the columns are orthonormal. This is only possible
when m ≥ n. If AAT = I for m ≤ n, matrix A is called row orthogonal because the
rows are orthonormal.
A square symmetric matrix must be positive definite for it to be invertible. A
symmetric positive definite matrix is a square symmetric matrix for which xTAx > 0
for all nonzero vectors x. A symmetric positive semi-definite or non-negative definite
matrix is one for which xTAx ≥ 0.
⎡A B ⎤
⎢ C D⎥
⎣ ⎦
where the four bold letters indicate smaller matrices. We express the inverse as
⎡E F ⎤
⎢G H ⎥
⎣ ⎦
and write
⎡A B ⎤ ⎡ E F ⎤ ⎡ I 0 ⎤
⎢ C D ⎥ ⎢G H ⎥ = ⎢ 0 I ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
or
AE + BG = I (A3-1)
AF + BH = 0 (A3-2)
CE + DG = 0 (A3-3)
CF + DH = I. (A3-4)
Using equations (A3-2), (A3-4), (A3-1), and (A3-3) in that order, we obtain:
F = − A −1BH
H = (D − CA −1B)−1
E = A −1 (I − BG) (intermediate) (A3-5)
.
G = −(D − CA −1B)−1 CA −1
= − HCA −1
E = A −1 + A −1BHCA −1
G = − D−1CE
E = (A − BD−1C)−1
H = D−1(I − CF)
. (A3-6)
F = −(A − BD−1C)−1 BD−1
= − EBD−1
H = D−1 + D−1CEBD−1
or
−1
⎡A B ⎤ ⎡ (A − BD−1C)−1 −(A − BD−1C)−1 BD−1 ⎤
⎢ C D⎥ = ⎢ − D−1C(A − BD−1C)−1 −1 −1 −1 −1 −1 ⎥
. (A3-8)
⎣ ⎦ ⎣ D + D C(A − BD C) BD ⎦
or
−1
⎡A B⎤ ⎡ (A − BD−1BT )−1 −(A − BD−1BT )−1 BD−1 ⎤
⎢BT = ⎢ − D−1BT (A − BD−1BT )−1 −1 ⎥
⎣ D⎥⎦ ⎣
−1 −1 T −1 T −1
D + D B (A − BD B ) BD ⎦
(A3-10)
where A and D are also symmetric.
or
A.3.4 Determinant
The determinant of a square matrix is a measure of scale change when the matrix
is viewed as a linear transformation. When the determinant of a matrix is zero, the
matrix is indeterminate or singular, and cannot be inverted. The rank of matrix |A|
is the largest square array in A that has nonzero determinant.
The determinant of matrix A is denoted as det(A) or |A|. Laplace’s method for
computing determinants uses cofactors, where a cofactor of a given matrix element
ij is Cij = (−1)i+j|Mij| and Mij, called the minor of ij, is the matrix formed by deleting
the i row and j column of matrix A. For 2 × 2 matrix
⎡ A11 A12 ⎤
⎢A ,
⎣ 21 A22 ⎥⎦
the cofactors are
C11 = A22, C12 = − A21, C21 = − A12, C22 = A11.
The determinant is the sum of the products of matrix elements and cofactors for
any row or column. Thus
A = A11C11 + A12C12 = A11C11 + A21C21 = A21C21 + A22C22 = A12C12 + A22C22
.
= A11 A22 − A12 A21
For a 3 × 3 matrix A,
∑a 2
i = aT a = tr [aaT ]. (A3-20)
i =1
This rearrangement of vector order often allows solutions for minimum variance
problems: it has been used repeatedly in previous chapters.
Since the trace only involves diagonal elements, tr(AT) = tr(A). Also,
Unlike the determinant, the trace of the matrix products is not the product of traces.
If A is an n × m matrix and B is an m × n matrix,
n m m n
tr(AB) = ∑ ∑ Aij Bji = ∑ ∑ Bji Aij
i =1 j =1 j =1 i =1 . (A3-23)
= tr(BA)
However, this commutative property only works for pairs of matrices, or inter-
change of “halves” of the matrix product:
tr(ABC) = tr(C(AB)) = tr(BCA)
≠ tr(ACB) . (A3-24)
≠ tr(CBA)
When the three individual matrices are square and symmetric, any permutation
works:
This permutation does not work with four or more symmetric matrices.
Permutation can be used to express the weighted quadratic form aT Wa as
∂(BA) ∂(BA) ∂A
= , (A3-28)
∂t ∂A ∂t
d(A −1 ) = dB = − A −1 (dA)A −1
is two-dimensional, but has perturbations with respect to all elements of dA; that
is, each i, j element (dB)ij is a matrix of derivatives for all perturbations dA. Thus
∂(A −1 ) ∂A ⎞ −1
= − A −1⎛⎜
⎝ ∂t ⎟⎠
A . (A3-29)
∂t
The derivative of the trace of matrix products is computed by examining the deriva-
tive with respect to an individual element:
∂tr(AB) ∂ ⎛ m m ⎞
= ⎜⎝ ∑ ∑ Akl Blk ⎟⎠ = Aji .
∂Bij ∂Bij k =1 l =1
Thus
∂tr(AB)
= AT . (A3-30)
∂B
For products of three matrices,
∂tr(ABC) ∂tr(CAB)
= = (CA)T . (A3-31)
∂B ∂B
The derivative of the determinant is computed by rearranging equation (A3-18) as
A I = A [C ] .
T
(A3-32)
Thus for any diagonal element i = 1, 2,…, n,
n
A = ∑ AikCik .
k =1
∂A
= [ C ] = A A −T . (A3-33)
∂A
By a similar development
∂A ∂A ⎞
= A tr ⎛⎜ A −1 ⎟. (A3-34)
∂t ⎝ ∂t ⎠
A.3.7 Norms
Norm of vectors or matrices is often useful when analyzing growth of numerical
errors. The Hölder p-norms for vectors are defined as
1/ p
⎛ n p⎞
x = ⎜ ∑ xi ⎟ . (A3-35)
p
⎝ i =1 ⎠
Induced matrix norms measure the ability of matrix A to modify the magnitude of
a vector; that is,
⎛ Ax ⎞
A = max ⎜ .
x =1 ⎝ x ⎟⎠
The l1-norm is A 1 = max a: j 1
where a:j is the j-th column of A, and the l∞-norm
j
is A ∞ = max a i: 1
where ai: is the i-th row of A. It is more difficult to compute an
i
l2-norm based on this definition than a Frobenius norm. 储A储2 is equal to the square
root of the maximum eigenvalue of ATA—or equivalently the largest singular value
of A. These terms are defined in Section A.4.
Norms of matrix products obey inequality conditions:
Ax ≤ A x or AB ≤ A B . (A3-38)
A.4.1 LU Decomposition
LU decomposition has been mentioned previously. Crout reduction is used to factor
square matrix A = LU where L is unit lower triangular and U is upper triangular.
This is often used for matrix inversion or when repeatedly solving equations of the
form Ax = y for x. The equation Lz = y is first solved for z using forward substitu-
tion, and then Ux = z is solved for x using backward substitution.
AM = ML (A4-1)
or
A = MLM −1 (A4-2)
where
⎡ λ1 0 0⎤
⎢0 λ2 0⎥
M = [ x1 x2 x n ], L = ⎢ ⎥.
⎢ ⎥
⎢ ⎥
⎣0 0 λn ⎦
Matrix M is called the modal matrix and λi are the eigenvalues. The eigenvalues are
the roots of the characteristic polynomial p(s) = |sI − A|, and they define the spectral
response of the linear system x (t ) = A x(t ) when x(t) is the system state vector (not
eigenvectors). Eigen decomposition is a similarity transformation, and thus
A = λ1 λ 2 … λ n . (A4-3)
Also
tr(A) = λ1 + λ 2 + … + λ n . (A4-4)
When real A is symmetric and nonsingular, the λi are all real, and the eigenvectors
are distinct and orthogonal. Thus M−1 = MT and A = MΛMT.
Eigenvectors and eigenvalues are computed in LAPACK using either general-
ized QR decomposition or a divide-and-conquer approach.
A = USVT (A4-5)
A.4.6 Pseudo-Inverse
When A is rectangular or singular, A does not have an inverse. However, Penrose
(1955) defined a pseudo-inverse A# uniquely determined by four properties:
AA # A = A
A # AA # = A #
. (A4-6)
(AA # )T = AA #
(A # A)T = A # A
where S1# is the pseudo-inverse of the nonzero square portion of S. For singular
values that are exactly zero, they are replaced with zero in the same location when
forming S1# . Thus
xˆ = V [S1# 0 ] UT y.
This shows that a pseudo-inverse can be computed even when HT H is singular.
The pseudo-inverse provides the minimal norm solution for a rank-deficient
(rank < min(m,n)) H matrix.
κ p (A ) = A p A −1 p (A4-7)
when using the an lp induced matrix norm for A. Because it is generally not conve-
nient to compute condition numbers by inverting a matrix, they are most often
computed from the singular values of matrix A. Decomposing A = USVT where U
and V are orthogonal and Si are the singular values in S, then
max(Si )
κ 2 (A ) = i
. (A4-8)
min(Si )
i