Matrices
Matrices
3. Matrices
a rectangular array of numbers, for example
0 1 −2.3 0.1
A = 1.3 4 −0.1 0
• notation and terminology 4.1 −1 0 1.7
• matrix operations
• numbers in array are the elements (components, entries, coefficients)
• linear and affine functions
• Aij is the i, j element of A; i is its row index, j the column index
• complexity
• size of the matrix is (#rows) × (#columns), e.g., A is a 3 × 4 matrix
• a block matrix is a rectangular array of matrices a matrix can be viewed as a block matrix with row/column vector blocks
• elements in the array are the blocks or submatrices of the block matrix
• m × n matrix A as 1 × n block matrix
Example
B C A= a1 a2 · · · an
A=
D E
each aj is an m-vector (the jth column of A)
is a 2 × 2 block matrix; if the blocks are
• m × n matrix A as m × 1 block matrix
2 0 2 3
B= , C= , D= 1 , E= −1 6 0
1 5 4 7 b1
b2
then A= .
2 0 2 3 ..
A= 1 5 4 7 bm
1 −1 6 0
each bi is a 1 × n row vector (the ith row of A)
Note: dimensions of the blocks must be compatible!
Identity matrix
• square matrix with Aij = 1 if i = j and Aij = 0 if i 6= j
Hermitian matrix: square with Aij = Āji (complex conjugate of Aij )
• notation: I (usually) or In (if dimension is not clear from context)
• columns of In are unit vectors e1, e2, . . . , en; for example, 4 3 − 2j −1 + j
3 + 2j −1 2j
1 0 0 −1 − j −2j 3
I3 = 0 1 0 = e1 e2 e3
0 0 1 note: diagonal elements are real (since Aii = Āii)
Example
• lower triangular matrix: square with Aij = 0 for i < j
• 2,987,012 rows and columns
4 0 0 • 26,621,983 nonzeros
3 −1 0
0 5 −2 (Freescale/FullChip matrix from Univ.
of Florida Sparse Matrix Collection)
Scalar-matrix multiplication:
scalar-matrix product of m × n matrix A with scalar β
β A11 β A12 · · · β A1n
• notation and terminology β A21 β A22 · · · β A2n
βA = .. .. .. ..
• matrix operations . . . .
β Am1 β Am2 · · · β Amn
• linear and affine functions
A and β can be real or complex
• complexity Addition: sum of two m × n matrices A and B (real or complex)
A11 + B11 A12 + B12 · · · A1n + B1n
A21 + B21 A22 + B22 · · · A2n + B2n
A+B = .. .. .. ..
. . . .
Am1 + Bm1 Am2 + Bm2 · · · Amn + Bmn
Matrices 3-11
Transpose Conjugate transpose
the transpose of an m × n matrix A is the n × m matrix the conjugate transpose of an m × n matrix A is the n × m matrix
A11 A21 · · · Am1 Ā11 Ā21 · · · Ām1
A12 A22 · · · Am2 Ā12 Ā22 · · · Ām2
AT = . .. .. AH = . .. ..
.. . . .. . .
A1n A2n · · · Amn Ā1n Ā2n · · · Āmn
• A may be complex, but transpose of complex matrices is rarely needed • a Hermitian matrix satisfies A = AH
• transpose of matrix-scalar product and matrix sum • conjugate transpose of matrix-scalar product and matrix sum
product of m × n matrix A and n × p matrix B (A, B real or complex) directed graph with n = 5 vertices matrix representation
2 3 0 1 0 0 1
C = AB 1 0 1 0 0
A= 0 0 0 1 1
is the m × p matrix with elements 5 1 0 0 0 0
0 0 0 1 0
Cij = Ai1B1j + Ai2B2j + · · · + AinBnj
1 4 Aij = 1 indicates an edge j → i
dimensions must be compatible: #columns in A = #rows in B Question: give a graph interpretation of A2 = AA, A3 = AAA,. . .
1 0 1 1 0 1 1 0 1 2
0 1 0 1 2 2 0 1 2 0
2
A = 1 0 0 1 0 , 3
A = 1 1 0 0 1 , ...
0 1 0 0 1 1 0 1 1 0
1 0 0 1 0 0 1 0 0 1
• transpose and conjugate transpose of product: product of the transpose of column vector b and column vector a
there are exceptions, e.g., AI = IA for square A product of conjugate transpose of column vector b and column vector a
Matrices 3-18
for fixed A ∈ Rm×n, define a function f : Rn → Rm as think of a function f : Rn → Rm in terms of its effect on x
f (x) = Ax
• any function of this type is linear: A(αx + βy) = α(Ax) + β(Ay) x y = f (x) = Ax
A
• every linear function can be written as a matrix-vector product function:
aT x 1
y= a = Ax with A= aaT
kak2 kak2
x
θ
• reflection with respect to line through a
2
z = x + 2(y − x) = Bx, with B= aaT − I
kak2
• directed graph with m vertices, n arcs (directed edges) n-vector x = (x1, x2, . . . , xn) with xj the current through arc j
x3
3 2 3
2 3
−x1 − x2 + x4
−1 −1 0 1 0 x1 x4 x5 x − x
1 4 5 1 0 −1 0 0 Ax = 1 3
x3 − x4 − x5
A=
0
0 1 −1 −1 x2 + x5
0 1 0 0 1 x2
2 1 4
1 4
m-vector y = (y1, y2, . . . , ym) with yi the potential at node i • polynomial of degree n − 1 or less with coefficients x1, x2, . . . , xn:
(AT y)j = yk − y l if edge j goes from node l to k p(t) = x1 + x2t + x3t2 + · · · + xntn−1
the DFT maps a complex n-vector (x1, x2, . . . , xn) to the complex n-vector a function f : Rn → Rm is affine if it satisfies
y1 1 1 1 ··· 1 x1
1 f (αx + βy) = αf (x) + βf (y)
y2 ω −1 ω −2 ··· ω −(n−1) x2
y3 = 1 ω −2 ω −4 ··· ω −2(n−1) x3 for all n-vectors x, y and all scalars α, β with α + β = 1
.. .. .. .. .. ..
. . . . . .
yn 1 ω −(n−1)
ω −2(n−1)
··· ω −(n−1)(n−1) xn
= Wx Extension: if f is affine, then
√
where ω = e2πj/n (and j = −1) f (α1u1 + α2u2 + · · · + αmum) = α1f (u1) + α2f (u2) + · · · + αmf (um)
• DFT matrix W ∈ Cn×n has k, l element Wkl = ω −(k−1)(l−1) for all n-vectors u1, . . . , um and all scalars α1, . . . , αm with
for fixed A ∈ Rm×n, b ∈ Rm, define a function f : Rn → Rm by first-order Taylor approximation of differentiable f : Rn → Rm around z:
∂fi ∂fi
f (x) = Ax + b fbi(x) = fi(z) + (z)(x1 − z1) + · · · + (z)(xn − zn), i = 1, . . . , m
∂x1 ∂xn
i.e., a matrix-vector product plus a constant
in matrix-vector notation: fb(x) = f (z) + Df (z)(x − z) where
• any function of this type is affine: if α + β = 1 then ∂f1 ∂f1 ∂f1
∂x1 (z) ∂x2 (z) ··· ∂xn (z) ∇f1(z)T
∂f2 ∂f2 ∂f2 ∇f2(z)T
A(αx + βy) + b = α(Ax + b) + β(Ax + b) ∂x1 (z) ∂x2 (z) ··· ∂xn (z)
Df (z) =
.. .. ..
=
..
. . . .
• every affine function can be written as f (x) = Ax + b with: ∂fm ∂fm ∂fm
∂x1 (z) ∂x2 (z) ··· ∂xn (z)
∇fm(z)T
A= f (e1) − f (0) f (e2) − f (0) · · · f (en) − f (0)
• Df (z) is called the derivative matrix or Jacobian matrix of f at z
and b = f (0) • fb is a local affine approximation of f around z
Example Outline
f1(x) e2x1+x2 − x1
f (x) = =
f2(x) x21 − x2
• complexity
• first order approximation of f around z = 0:
" #
fb1(x) 1 1 1 x1
fb(x) = = +
fb2(x) 0 0 −1 x2
Matrices 3-37
y = Ax C = AB
• m elements in y; each element requires an inner product of length n • mp elements in C; each element requires an inner product of length n
• approximately 2mn for large n
• approximately 2mnp for large n
Special cases: flop count is lower for structured matrices
• A diagonal: n flops
• A lower triangular: n2 flops
• A sparse: #flops ≪ 2mn
• A = I + uv T followed by y = Ax
in MATLAB: y = (eye(n) + u*v’) * x
• w = (v T x)u followed by y = x + w
in MATLAB: y = x + (v’*x) * u
Matrices 3-40
Left and right inverse
4. Matrix inverses
AB 6= BA in general, so we have to distinguish two types of inverses
Dimensions
−3 −4
1 0 1
A= 4 6 , B= a left or right inverse of an m × n matrix must have size n × m
0 1 1
1 1
AT X T = (XA)T = I
• B is right invertible; the following matrices are right inverses:
• X is a left inverse of A if and only if X H is a right inverse of AH
1 −1 1 0 1 −1
1
−1 1 , 0 1 , 0 0
2 AH X H = (XA)H = I
1 1 0 0 0 1
if A has a left and a right inverse, then they are equal and unique: set of m linear equations in n variables
• in matrix form: Ax = b
Example
• may have no solution, a unique solution, infinitely many solutions
−1 1 −3 2 4 1
1
A= 1 −1 1 , A−1 = 0 −2 1
4
2 2 2 −2 −2 0
Ax = b =⇒ x = XAx = Xb
• left and right inverse
there is at most one solution (if there is a solution, it must be equal to Xb)
• linear independence
Right invertible matrix: if X is a right inverse of A, then
• nonsingular matrices
x = Xb =⇒ Ax = AXb = b
• matrices with linearly independent columns
there is at least one solution (namely, x = Xb)
• matrices with linearly independent rows
Invertible matrix: if A is invertible, then
Ax = b ⇐⇒ x = A−1b
there is a unique solution
a linear combination of vectors a1, . . . , an is a sum of scalar-vector products a collection of vectors a1, a2, . . . , an is linearly dependent if
x1 a1 + x2 a 2 + · · · + xn an x1 a1 + x2 a2 + · · · + xn an = 0
• the scalars xi are the coefficients of the linear combination for some scalars x1, . . . , xn, not all zero
if xi 6= 0
• the trivial linear combination has coefficients x1 = · · · = xn = 0
the vectors
vectors a1, . . . , an are linearly independent if they are not linearly dependent
0.2 −0.1 0
a1 = −7 , a2 = 2 , a3 = −1 • the zero vector cannot be written as a nontrivial linear combination:
8.6 −1 2.2
x1 a1 + x2 a2 + · · · + xn an = 0 =⇒ x1 = x2 = · · · = xn = 0
are linearly dependent
0 = a1 + 2a2 − 3a3
Matrix with linearly independent columns
• a1 can be expressed as a linear combination of a2, a3:
A= a1 a2 · · · an
a1 = −2a1 + 3a3
has linearly independent columns if Ax = 0 only for x = 0
(and similarly a2 and a3)
the vectors if n vectors a1, a2, . . . , an of length m are linearly independent, then
1 −1 0 n≤m
a1 = −2 , a2 = 0 , a3 = 1
0 1 1
(proof is in textbook)
are linearly independent:
• if an m × n matrix has linearly independent columns then m ≥ n
x1 − x2
x1 a1 + x2 a2 + x3 a3 = −2x 1 + x 3
=0
• if an m × n matrix has linearly independent rows then m ≤ n
x2 + x3
only if x1 = x2 = x3 = 0
1. A is left invertible
• left and right inverse 2. the columns of A are linearly independent
we must have xn+1 6= 0 because a1, . . . , an are linearly independent • A is nonsingular because its columns are linearly independent:
1 t1 t21 · · · tn−1
1 Transpose and conjugate transpose
1 t2 t22 · · · tn−1
2 if A is nonsingular, then AT and AH are nonsingular and
A= . . .. . . .. with ti 6= tj for i 6= j
.. .. . . .
1 tn t2n · · · tnn−1 (AT )−1 = (A−1)T , (AH )−1 = (A−1)H ,
we show that A is nonsingular by showing that Ax = 0 only if x = 0 we write these as A−T and A−H
• if x 6= 0, then p(t) can not have more than n − 1 distinct real roots (AB)−1 = B −1A−1
therefore AT A is nonsingular • this matrix exists, because the Gram matrix AT A is nonsingular
• A† is a left inverse of A:
• suppose the columns of A ∈ Rm×n are linearly dependent
A†A = (AT A)−1(AT A) = I
∃x 6= 0, Ax = 0 =⇒ ∃x 6= 0, AT Ax = 0
therefore AT A is singular
(for complex A with linearly independent columns, A† = (AH A)−1AH )
(for A ∈ Cm×n, replace AT with AH and xT with xH )
Matrix inverses 4-22 Matrix inverses 4-23
Summary Outline
1. A is left invertible
• nonsingular matrices
• 1 ⇒ 2 was already proved on page 4-16
• matrices with linearly independent columns
• 2 ⇒ 1: we have seen that the pseudo-inverse is a left inverse
• 2 ⇔ 3: proved on page 4-22 • matrices with linearly independent rows
Pseudo-inverse Summary