Matrix in English
Matrix in English
Marian Vavra
e-mail: [email protected]
September 2007
Contents
1 Matrix algebra 2
1.1 Introduction to matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 Special matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.10 Special operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.11 VEC and VECH operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.12 Matrix decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.13 Matrix differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
∗
Lecture notes are intended for teaching porposes only.
1
1 Matrix algebra
1.1 Introduction to matrix algebra
Definition 1.1 (Matrix): A matrix is an arranged array of (real) numbers.
More generally,
a11 . . . a1n
A = (aij ) = ... .. ..
. .
amn . . . amn
is a matrix of dimension (m × n), that means m rows and n columns. Numbers of matrix aij are
called elements or components of matrix A. We assume implicitly that all elements of any matrix
aij ∈ R, unless otherwise stated. Following given rules of matrix algebra all matrices are noted by
capital letters (e.g. Am×n or simply A), and vectors are denoted by lower case letters (e.g. bm×1
or simply b). If not otherwise noted, all vectors are assumed to be column vectors.
A square matrix with all elements above (bellow) the main diagonal being zero is called upper
(lower) triangular matrix. A transpose of matrix A = (aij ) of dimension (m × n) is a matrix of
dimension (n×m) denotes as A0 = (aji ). The matrix A is said to be symmetric matrix if A = A0 .
Rule 1.1 (Matrix operations): Let’s assume suitable matrices A, B, C, then it holds:
• A+B=B+A
• A(B + C) = AB + AC
• AB 6= BA in general
• AI = IA = A
• (AB)0 = B0 A0
• (A + B)0 = A0 + B0
• AA0 = A0 A for symmetric matrices
2
1.2 Trace
Definition 1.2 (Trace): The trace of a square (n×n) matrix A is the sum of its diagonal elements:
tr(A) = a11 + · · · + ann .
Rule 1.2 (Trace operations): Let’s assume matrices An×n , Bn×n , Cm×n , Dn×m , and λ1 , . . . , λn
be eigenvalues of matrix A
• tr(A + B) = tr(A) + tr(B)
• tr(A) = tr(A0 )
• tr(CD) = tr(DC)
• tr(A) = λ1 + · · · + λn where λi are eigenvalues of a matrix A
1.3 Rank
Definition 1.3 (Rank): Rank is characterized as a maximum number of linearly independent rows
or columns in a given matrix.
This feature is very often used in the solution of system of linear equations, see example below.
Rule 1.3 (Rank operations): Let’s assume matrices A of the dimensions (m × n):
• rk(A) ≤ min (m, n)
• rk(A) = rk(A0 )
• rk(A0 A) = rk(AA0 ) = rk(A)
• rk(AB) = rk(A) if B is matrix of dimension (n × n)
• rk(AB) ≤ min (rk(A), rk(B)) if B is matrix of dimension (n × r)
• rk(C) = n if matrix C has a dimension (n × n) and holds that det (C) 6= 0
The system of equations can be simply written into the matrix form: Ax = b, where A is an
(m × n). Let’s define a new matrix Ab , called augmented, which is a matrix containing original
matrix A and vector b. It is worth noting that rk(Ab ) ≤ rk(A) + 1. Using the rank, we can define
the following results for solution of the system of linear equations:
3
• rk(A) = rk(Ab ) = n =⇒ a system has a exactly 1 solution
• rk(A) = rk(Ab ) = k < n =⇒ a system has ∞ number of solutions depending on the values
of n − k variables
• rk(A) 6= rk(Ab ) =⇒ a system has no solution
1.4 Determinant
Determinants are mathematical objects that are very useful in the analysis and solution of systems
of linear equations. The interpretation is that absolute value of the determinant1 gives the area
of the parallelogram (for matrix of order (2 × 2)), or the volume of the parallelepiped spanned
by those vectors (for matrix of order (n × n), where n > 2), see example below. Determinants
are defined only for square matrices. If the determinant of a matrix is 0, the matrix is said to be
singular, and if the determinant is 1, the matrix is said to be unimodular. The determinant of
a general (n × n) matrix A is defined as:
n
X
det (A) = aij Cij for some i ∈ {1, . . . , n}
j=1
n
X
= aij Cij for some j ∈ {1, . . . , n}
i=1
where Cij is a cofactor of aij , which is adjusted minor: Cij = (−1)i+j Mij .
Example 1.2 (Interpretation of determinants): Find the area of the triangle ABC having ver-
tices A(1, 2), B(−3, 4), C(2, 4). Since the area of the the triangle ABC is half of the area of the par-
allelogram ABCD, the area of the triangle ABC is half of the determinant of the two vectors form-
ing the parallelogram. The two vectors forming the parallelogram are u = AB = (−3 − 1, 4 − 2) =
(−4, 2) and v = AC = (2 − 1, 4 − 2) = (1, 2). Hence,
the areaof ABC = (1/2) area of the
u −4 2
= | − 10|/2 = 5.
parallelogram ABC = (1/2)| det E| = (1/2) det = (1/2) det
v 1 2
Rule 1.4 (Determinant operations): Let’s assume matrices A, B of the dimension (n × n) and
c is a scalar:
• det (In ) = 1
• det (A) = a11 · · · ann if matrix A is diagonal, lower or upper triangular
• det (A) = 0 if some rows and/or columns are linearly dependent
• det (cA) = cn det (A)
• det (A) = det (A0 )
• det (A + B) 6= det (A) + det (B) in general
• det (AB) = det (A) det (B)
4
Figure 1: Interpretation of the determinant
There are used some special determinants in the mathematical literature, for instance for checking
the definiteness of particular matrix or quadratic form. They are called principal minors and
leading principal minors, denoted as ∆k and Dk . The leading principal minors for the following
(n × n) matrix:
a11 . . . a1n
A = (aij ) = ... .. ,
.
ann . . . ann
are calculated as:
a11 a12 a13 a14
a11 . . . a1n
a11 a12 a13
a a12 a a22 a23 a24
.. ..
D1 = |a11 |, D2 = 11 , D3 = a21 a22 a23 , D4 = 21
, . . . , D =
. .
n
a21 a22 a31 a32 a33 a34
a21 a22 a33 an1 . . . ann
a41 a42 a43 a44
The principal minors are various combinations of determinants of order k, k = 1, . . . , n, generated
from A:
(1) (2) (n) (1)
a11 a12 (2)
a11 a13 (3)
a22 a23
∆1 = |a11 |, ∆1 = |a22 |, . . . , ∆1 = |ann |, ∆2 = , ∆2 =
a31 a33 , ∆2 = a32 a33 ,
a21 a22
a11 a12 a13 a14
a11 . . . a1n
a11 a12 a13
a a22 a23 a24
= ... ..
(1) (1)
= 21
. . . , ∆3 = a21 a22 a23 , . . . , ∆4 ,..., ∆(1)
n .
a21 a22 a33 a31 a32 a33 a34
an1 . . . ann
a41 a42 a43 a44
An arbitrary leading principal minor of order k is calculated as a determinant using the first
(leading) rows and columns of the particular square matrix. On the other hand, an arbitrary
principal minor of order k is obtained as determinant of the square matrix after deleting all but k
rows and columns with the same numbers.
1.5 Inverse
Definition 1.4 (Inverse matrix): Let’s define a (n × n) square matrix A invertible (regular,
nonsingular) matrix if there exists a unique matrix B of the same dimension such that: AB = In .
Then, a matrix B is called an inverse matrix of A.
1
We assume that ∀i,j aij ∈ R
5
The matrix B is called the inverse matrix of the matrix A and denoted as A−1 . In practice, an
inverse matrix is calculated using determinant and an adjoint matrix A∗ , which is a transposed
matrix of cofactors:
A−1 = det (A)−1 A∗ .
Rule 1.5 (Inverse operations): Let’s assume matrices A, B of the dimension (n × n) and c is
a scalar:
• (A0 )−1 = (A−1 )0
• (AB)−1 = B−1 A−1
• (cA)−1 = (1/c)A−1
• I−1
n = In
• if A is diagonal matrix, then A−1 is also diagonal matrix with diagonal elements 1/aii
Theorem 1.1 (Generalized inverse operations): Let’s assume matrices A, B of the dimensions
(m × n) and (n × m):
• ABA = A
• BAB = B
• (AB)0 = AB
• (BA)0 = BA
1.6 Eigenvalues
Definition 1.5 (Eigenvalues): The eigenvalues (characteristic values, characteristic roots) of an
(m × m) matrix A are roots of the polynomial in λ given by:
det (A − λIm ) = 0.
Rule 1.6 (Eigenvalues properties): There are some basic properties of eigenvalues:
6
• if A is symmetric, then ∀i λi ∈ R
• the eigenvalues of the diagonal matrix, upper and lower triangular matrix are its diagonal
elements
• matrix of (m × m) dimension has at most m eigenvalues
• if λ1 , . . . , λm are eigenvalues of particular matrix A, then the determinant of matrix is defined
as: det (A) = λ1 · · · λm
where A = (aij ) is a (n×n) matrix of constants and x = (x1 , . . . , xn )0 is a (n×1) vector. Moreover,
if aij = aji , Q(x) is called symmetric quadratic form.
Example 1.3 (Quadratic form): Let assume the following quadratic form:
which can be written in matrix form as Q(x) = x0 Ax, where both components are defined as:
x1 3 0 3
x3×1 = x2 , A3×3 = 0 1 −2
x3 3 −2 8
We are often interested in some properties of Q(x). One of them is the definiteness, the property
ensuring that Q(x) has the same sign for all x.
Definition 1.6 (Definiteness of the quadratic form): A quadratic form Q(x) = x0 Ax, as well
as associated matrix A, is said to be:
• PD iff Q(x) > 0
• PSD iff Q(x) ≥ 0
• ND iff Q(x) < 0
• NSD iff Q(x) ≤ 0
7
• IND iff Q(x) > 0 and Q(y) < 0
for all x, y 6= 0.
Theorem 1.2 (Definiteness of the quadratic form): A quadratic form Q(x) = x0 Ax, as well
as associated matrix A, is said to be:
• PD ⇐⇒ Dk > 0 for k = 1, . . . , n
• PSD ⇐⇒ ∆k ≥ 0 for k = 1, . . . , n
• ND ⇐⇒ (−1)k Dk > 0 for k = 1, . . . , n
• NSD ⇐⇒ (−1)k ∆k ≥ 0 for k = 1, . . . , n
Theorem 1.3 (Definiteness of the quadratic form): Let Q(x) = x0 Ax be a quadratic form,
where matrix A is symmetric matrix associated with the real eigenvalues λ = (λ1 , . . . , λn )0 . Then:
• Q is PD ⇐⇒ ∀i λi > 0
• Q is PSD ⇐⇒ ∀i λi ≥ 0
• Q is ND ⇐⇒ ∀i λi < 0
• Q is NSD ⇐⇒ ∀i λi ≤ 0
• Q is IND ⇐⇒ ∃i λi > 0 and ∃j λj < 0
1.8 Orthogonality
Definition 1.7 (Orthogonal vectors): Two (n × 1) vectors x and y are said to be orthogonal
if it holds that: x0 y = y0 x = 0.
Definition 1.8 (Orthonormal vectors): Vectors x and y are called orthonormal if they are
orthogonal with unit length defined by a standard norm: kzk = 1.
8
• if matrix A is idempotent, then also matrix I − A is idempotent
• if matrix A is idempotent, then all its eigenvalues are 0 or 1
• if matrix A is idempotent and symmetric, then rk(A) = tr(A)
• if matrix A is idempotent and rk(A) = m, then it holds A = Im
• if matrix A is nilpotent, then all is eigenvalues are 0
Rule 1.8 (Kronecker product operations): Basic operations with Kronecker product are as
follows:
• A ⊗ B 6= B ⊗ A in general
• (A ⊗ B) = A0 ⊗ B0
• (A ⊗ (B + C)) = A ⊗ B + A ⊗ C
• (A ⊗ B)−1 = A−1 ⊗ B−1 if matrices are invertible
• det (A ⊗ B) = det (A) det (B)m
n
where matrix A has dimension (m × m) and matrix
B has dimension (n × n)
• tr(A ⊗ B) = tr(A)tr(B) if A, B are square matrices
• eigenvalues of (A ⊗ B) is equal to the product of eigenvalues of matrix A and B
Definition 1.10 (Kronecker sum): Let A, B, C be matrices of any dimensions, then an operation
producing a new matrix in the form:
A 0 0
A ⊕ B ⊕ C = 0 B 0
0 0 C
Rule 1.9 (Kronecker sum operations): Basic operations with Kronecker sum are as follows:
• (A ⊕ B) + (C ⊕ D) = (A + C) ⊕ (B + D) all matrices are compatible
9
• (A ⊕ B)(C ⊕ D) = AC ⊕ BD all matrices are compatible
−1 −1 −1
• (A ⊕ B) =A ⊕B
• det (A ⊕ B) = det (A) ⊕ det (B) for square matrices only
• A ⊕ (−A) 6= 0
Definition 1.11 (Hadamard product): Let A and B be matrices of the same order (m × n),
then a new matrix of dimension (m × n) generated as:
a11 b11 . . . a1n b1n
A B = ... ..
.
am1 bm1 . . . amn bmn
is called Hadamard product of matrices A and B.
This procedure is sometimes called element by element multiplication. On the other hand,
element by element division is denoted by AB, it is a division analogy to Hadamard product.
The vech operator is closely related to the vec operator. It stacks elements only on and below
the main diagonal of a square matrix.
Definition 1.13 (VECH operator): Let’s assume (n × n) matrix A, then VECH operator is
defined as follows:
a11
a21
..
.
a11 . . . a an1
.. . a
.. =
vech(A) = vech .
.22
an1 . . . ann ..
a
n2
.
..
ann
10
Rule 1.10 (VEC and VECH operations): let A, B, C be matrices of appropriate dimensions:
• vec(A + B) = vec(A) + vec(B)
• vec(ABC) = (C0 ⊗ A)vec(B)
• vec(AB) = vec(ABI) = (I0 ⊗ A)vec(B) = (I ⊗ A)vec(B)
• vec(AB) = vec(IAB) = (B0 ⊗ I)vec(A)
• vec(B0 )0 vec(A) = vec(A0 )0 vec(B) = tr(AB) = tr(BA)
• tr(ABC) = vec(A0 )0 (C0 ⊗ I)vec(B) = · · · = vec(C0 )0 (I ⊗ A)vec(B)
Theorem 1.4 (Matrix decomposition): Let’s assume compatible matrices A, P, then it holds
that matrices A and P−1 AP have the same eigenvalues.
Rule 1.11 (Decomposition rules): Assume matrices A, P of order (n × n), where matrix P is
invertible, then it holds:
λ1 0 ... 0
0 λ2 ... 0
Ax = Λx and also P−1 APx = Λx =⇒ P−1 AP = Λ = ..
.. .. ..
. . . .
0 0 . . . λn
However, it should be noted that this transformation (diagonalization) does not work in all cases.
Generally, a matrix A is said to be diagonalizable iff it has a set of n independent eigenvectors
collected in matrix P. Moreover, if matrix A is symmetric, then all eigenvalues are real numbers
and eigenvectors corresponding to different eigenvalues are orthogonal. Moreover, if matrix A is
diagonalizable, then the property between matrices A and P can be used for computing powers of
matrix A: Am = PΛm P−1 .
For a given symmetric positive definite matrix A, the Cholesky decomposition is an upper
triangular matrix U such that: A = U0 U. Taking into account the fact that U0 is lower triangular
matrix, the Cholesky decomposition can be rewritten as LU decomposition in the form: A =
LU. Given a matrix A, its QR decomposition is a matrix decomposition of the form: A = QR,
where R is upper triangular matrix and Q an orthogonal matrix satisfying Q0 Q = I.
11
1.13 Matrix differentiation
In the following it will be assumed that all derivatives exist and are continuous. Let f (β) be a
scalar function depending on the (n × 1) vector β. Then, the derivative of function f (·) with
respect to β is defined as:
∂f
∂β1
∂f . ∂f ∂f ∂f
= .. ,
= ,··· , .
∂β ∂f ∂β 0 ∂β1 ∂βn
∂βn
A (n × n) matrix of second derivatives of function f is called a Hessian matrix:
2
∂ f ∂2f
∂β1 ∂β1 · · · ∂β1 ∂βn
∂2f .. ... .. ∂2f
= . . ∂β 0 ∂β .
=
∂β∂β 0 ∂2f 2
∂ f
···
∂βn ∂β1 ∂βn ∂βn
If f (A) is a scalar function of an (m × n) matrix A, then:
∂f ∂f
∂a11 ···
∂a1n
∂f . .. ..
= .. . .
.
∂A ∂f ∂f
···
∂am1 ∂amn
If y(β) is an (m × 1) vector that depends on the (n × 1) vector β, then:
∂y1 ∂y1
∂β1 · · · ∂βn 0
0
∂y . . . ∂y ∂y
= .. .. ..
, = .
∂β 0 ∂ym ∂β ∂β 0
∂ym
···
∂β1 ∂βn
Let A be (m × n) matrix and β be (n × 1) vector, both independent, then:
∂(Aβ) ∂(Aβ)0 ∂(β 0 A0 )
=A and = = A0 .
∂β 0 ∂β 0 ∂β 0
Rule 1.12 (Basic results of matrix differentiation): Let us always assume a compatible ma-
trices.
• Let A be (m × m) matrix and β be (m × 1) vector, both independent, then
∂(β 0 Aβ) ∂(β 0 Aβ)
= (A + A0 )β and 0 = β 0 (A0 + A).
∂β ∂β
∂ 2 (β 0 Aβ)
= (A + A0 ).
∂β∂β 0
∂ 2 (β 0 Aβ)
= 2A.
∂β∂β 0
12
• Let α and β be (m × 1) and (n × 1) vectors, and suppose h(α) is (p × 1) vector and g(β) is
(m × 1). Then differentiation h(α) with respect to β 0 , α = g(β), gives the following result
∂h(α) ∂h(α) ∂α
0 = .
∂β ∂α0 ∂β 0
∂ 2 c(β)0 Ωc(β)
c(β)
0 = 2c(β)0 Ω ,
∂β ∂β 0
∂ 2 c(β)0 Ωc(β)
c(β)0 c(β) ∂vec(∂c(β)0 /∂β)
0
= 2 Ω + c(β) Ω ⊗ Im .
∂β∂β 0 ∂β ∂β 0 ∂β 0
∂vec(ABC) ∂vec(B)
0 = C0 ⊗ A .
∂β ∂β 0
∂vec(A−1 )
= −(A−1 )0 ⊗ A−1
∂vec(A0 )
• If A is a (m × m) matrix, then:
∂tr(A)
= Im .
∂A
• If A is a (m × n) matrix and B is a (n × m) matrix, then:
∂tr(AB)
= B0 .
∂A
13
• If A is a (m × m) matrix, then:
∂ det (A)
= A∗ 0 .
∂A
• If A is a (m × m) matrix, then:
∂ det (A)
= A∗ 0 .
∂A
• If A is a nonsingular (m × m) matrix with det (A), then:
∂A−1 ∂A −1
= −A−1 A .
∂x ∂x
14
References
[1] Sydsaeter, K. at al. (2005): Further Mathematics for Economic Analysis. Prentice Hall.
15