HKU MATH1853 - Brief Linear Algebra Notes
September - December 2021 Semester
1 Eigenvalues and Eigenvectors
1.1 Eigenvalues and Eigenvectors
Given a square matrix A, an eigenvector is a non-zero vector x that satisfies
Ax = λx for some scalar λ. In other words, they are vectors that don’t change
direction when multiplied by the matrix, only in magnitude. (Or their direc-
tion is reversed, in the case this scalar is negative). We call this scalar λ the
eigenvalue associated with x.
1.2 Finding Eigenvalues
λ being an eigenvalue means (A − λI)x = 0 for some non-zero x. This means
x is in the kernel (or null space) of (A − λI). But we know the kernel contains
non-zero vectors iff det(A − λI) = 0. Hence the eigenvalues of an n × n matrix A
are precisely the roots of the polynomial pA (λ) = det(A − λI). This polynomial
is known as the characteristic polynomial. The equation pA (λ) = 0 is called the
characteristic equation.
1.3 Multiplicity of Eigenvalues
If (x − a)k is a factor of the polynomial p(x) but (x − a)k+1 is not, we say the
root x = a has a multiplicity of k.
For an eigenvalue λ, we call the multiplicity of the root λ in the characteristic
equation λ’s algebraic multiplicity.
The vector space spanned by the eigenvectors corresponding to λ is called
the eigenspace of λ. Its dimension, (the number of linearly independent eigen-
vectors corresponding to λ) is called the geometric multiplicity of λ. The geo-
metric multiplicity is always less than or equal to the algebraic multiplicity of
an eigenvalue. We call the difference between the algebraic multiplicity and the
geometric multiplicity of an eigenvalue its defect.
A set of eigenvectors that each correspond to distinct eigenvalues is always
linearly independent.
1
1.4 Diagonalization
Two n × n matrices P and Q are said to be similar if there exists an invertible
matrix U such that:
P = U −1 QU
Suppose the n × n matrix A has eigenvectors, x1 , x2 , ..., xn with correspond-
ing eigenvectors λ1 , λ2 , ..., λn . Let X be the matrix with these eigenvectors as
its columns, i.e.
X = (x1 , x2 , ..., xn )
Then we have:
λ1 0 ··· 0
0 λ2 ··· 0
AX = (λ1 x1 , λ2 x2 , ..., λn xn ) = X .
.. .. ..
..
. . .
0 0 ··· λn
Let
λ1 0 ··· 0
0 λ2 ··· 0
D= .
.. .. ..
..
. . .
0 0 ··· λn
We say D is a diagonal matrix, as it is zero everywhere except on its diagonal.
If a matrix is similar to a diagonal matrix, we say it is diagonalizable.
If the eigenvectors are linearly independent, then X is invertible, so we have:
A = XDX −1
and we have A being diagonalized by X, and A is diagonalizable.
In general, a matrix is diagonalizable iff its eigenvalues all have 0 defect.
2 Inner Product and Quadratic Forms
2.1 Inner Product
Given two vectors, a, b in Rn , the inner product is defined as:
b1
b2
a · b = aT b = a1 a2 ··· an . = a1 b1 + a2 b2 + ... + an bn
..
bn
Note that this will always be a scalar value and not a vector, i.e. it has a
magnitude, but no direction.
2
From the definition, it is easy to show that the inner product has the follow-
ing properties, for any real vectors a, b, c, d.
a·b=b·a
(λa) · b = λ(a · b)
a · (b + c) = a · b + a · c
(a + b) · (c + d) = a · c + a · d + b · c + b · d
2.2 Norms
We define the norm (or length) of a vector a to be:
√ q
|a| = a · a = a21 + a22 + ... + a2n
(Note that this is often written with two vertical lines instead, as ||a||). If
|a| = 1, we say a is a unit vector.
We define the distance between two vectors a,b to be:
p
dist(a, b) = |a − b| = (a1 − b1 )2 + (a2 − b2 )2 + ... + (an − bn )2
We can show the following properties are true:
|a| ≥ 0, with |a| = 0 if and only if a = 0
dist(a, b) = dist(b, a)
|a · b| ≤ |a||b| (Cauchy-Schwarz Inequality)
|a + b| ≤ |a| + |b| (Triangle Inequality)
2.3 Angle between Vectors
Let a and b be non-zero vectors in R2 or R3 . We draw them starting from the
same point as shown, so the vector from a to b will be b - a, and write the
angle between them as θ.
3
From the Cosine rule we have:
|b − a|2 = |a|2 + |b|2 − 2|a||b|cosθ
We rewrite |b − a|2 using the inner product, to make the substition:
|b − a|2 = (b − a) · (b − a) = |a|2 + |b|2 − 2(a · b)
Hence,
a · b = |a||b|cosθ
Rearranging gives us
a·b
θ = arccos( )
|a||b|
Note that this always gives a value between 0 and 180 degrees (or 0 and π
radians), i.e. The smaller angle between a and b, and not the reflex angle.
Using what we know about the cos function, we can deduce that, for non-zero
vectors, a and b:
a · b = 0 ⇐⇒ a and b are perpendicular.
0 < a · b < |a||b| ⇐⇒ a and b are at an acute angle to each other.
−|a||b| < a · b < 0 ⇐⇒ a and b are at an obtuse angle to each other.
a · b = |a||b| ⇐⇒ a and b are parallel and in the same direction.
a · b = −|a||b| ⇐⇒ a and b are parallel and in the opposite direction.
2.4 Orthogonal Vectors
A set of non-zero vectors {u1 , u2 , ..., up } in Rn is said to be orthogonal if ui ·uj =
0 for all i 6= j. In R2 and R3 this can be thought of as the vectors being at right
angles to each other.
If additionally, the ui are unit vectors, then ui · uj = 1 iff i = j. We say
that the set is orthonormal.
Any set of orthogonal vectors is linearly independent. However, the converse
is not true: Linearly independent vectors are not necessarily orthogonal.
2.5 Projection
Suppose we have a vector y in Rn , and we want to see ’how far it goes’ in some
direction u. More formally, we want to write y as:
y = αu + z
where z is orthogonal to u. α is called the scalar projection of y onto u.
Taking the inner product with u on both sides of the equation eliminates z,
and gives:
4
y · u = α|u|2
Rearranging gives:
y·u
α=
|u|2
In particular, if we have an orthogonal basis of {u1 , u2 , ..., up }, we can write
any vector y in its span by:
y = c1 u1 + c2 u2 + ... + cp up
where
y · ui
ci =
|ui |2
.
2.6 Orthonormal Matrices
If an n × n matrix U satisfies U U T = I, we say it is an orthonormal matrix.
A common alternate definition is if every column of a square matrix U is or-
thonormal, then U is an orthonormal matrix.
The following are true iff the n × n matrix U is orthogonal:
For any x in Rn : |U x| = |x|
For any x, y in Rn : (U x)T (U y) = xT y
In other texts, orthonormal matrices may be called orthogonal matrices.
Despite this, note that orthogonal and orthonormal vectors are not the same.
The columns of an orthogonal matrix need to be orthonormal, which is a stricter
condition than them being orthogonal!
2.7 Symmetric Matrices
A square matrix A is symmetric if AT = A, and anti-symmetric if AT = −A.
The diagonals of an anti-symmetric matrix are all 0, while the diagonals of a
symmetric matrix can take any value.
Any square matrix B can be written as a sum of symmetric and anti-
symmetric parts by:
1 1
B= (B + B T ) + (B − B T )
2 2
1
2 (B + B T ) is symmetric, while 12 (B − B T ) is anti-symmetric.
The eigenvectors of a symmetric matrix are always orthogonal, but the proof
of this is challenging. An immediate result of this is that a symmetric matrix is
always diagonalizable.
5
2.8 Quadratic Forms
Given a square matrix B and vector x in Rn , we may be interested in the
quantity xT Bx.
Decomposing the matrix B into its symmetric part S = 12 (B +B T ) and anti-
symmetric part A = 12 (B − B T ) changes our expression to xT (S + A)x = xT Sx,
since xT Ax = 0.
Expressions like these of the form:
Q(x) = xT Sx
Where S is a real symmetric matrix, are known as quadratic forms.
We can diagonalize S and write S = U DU T with U being an orthonormal
matrix, and D being a diagonal matrix with the eigenvalues of S along its
diagonal. Thus Q(x) becomes:
Q(x) = xT U DU T x = (U T x)T D(U T x)
The columns of U are called the principal axes. This substitution can be
thought of intuitively as a change of basis.
If we write y = U T x, then we get
Q(x) = λ1 y12 + λ2 y22 + ... + λn yn2
Where λ are the eigenvalues of S. This also tells us that the maximum and
minimum of the quadratic form subject to |x| = 1 will be the maximum and
minimum of the eigenvalues respectively.
When all λi > 0 or all λi < 0, we say S is positive definite or negative
definite respectively. If there are both positive and negative eigenvalues, we say
S is indefinite. As an extension to the above, if all λi ≥ 0 or all λi ≤ 0 we say
S is positive semidefinite or negative semidefinite respectively.