Algebraic Methods in Data Science: Lesson 3: Dan Garber

Algebraic Methods in Data Science: Lesson 3
Faculty of Industrial Engineering and Management

Technion - Israel Institute of Technology
Dan Garber
https://dangar.net.technion.ac.il/
Winter Semester 2020-2021
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 1 / 28
Range and Nullspace of Matrices
We focus our attention to real-valued m × n matrices. Let us begin with

recalling some of the most basic definition.
Fix a m × n real matrix A. We denote the range (or image) and

nullspace (kernel) as
R(A) = Im(A) := {Ax : x ∈ Rn };

N (A) = Ker(A) := {x ∈ Rn : Ax = 0}.
Recall R(A) is a subspace of Rm and N (A) is a subspace of Rn . Recall
the rank-nullity theorem:
dim R(A) + dim N (A) = n;

dim R(A> ) + dim N (A> ) = m.

Matrix Inner Products and Norms
Let X = Rm×n . The standard matrix inner product is defined (similarly to
the standard inner product for vectors) as:
X
hA, Bi = Ai,j Bi,j .
i∈[m],j∈[n]
It is also common to write A • B. It is not hard to show that (HW):
hA, Bi = Tr(A> B).
The Euclidean norm for Rm×n is defined similarly to the vector case, and
as in the vector case it is induced by the standard inner product.
It is called the Frobenius norm and it is given by
sX q
kAkF = 2
Ai,j = Tr(A> A).
i,j

Note that the `p norms defined for vectors, also naturally extend to
matrices, i.e., the following are norms
X 1/p
p
kAk(p) = |Ai,j | 1 ≤ p ≤ ∞.
i,j
Theorem (HW)
For every p, q ∈ [1, ∞] × [1, ∞], the following function is a norm
kAkp→q = max kAxkq .

x∈Rn :kxkp =1
kAkp→q measures how much can A “amplify” in q-norm, a vector of unit

length with respect to the p-norm.
In particular, we will use the notation kAkp := kAkp→p .

Examples (HW): Denote by Aj the jth column and by Aj the jth row.
Then,
kAk1 := kAk1→1 = max kAxk1 = max kAj k1 ,

kxk1 =1 j=1,...,n
kAk∞ := kAk∞→∞ = max kAxk∞ = max kAj k1 .

kxk∞ =1 j=1,...,m
Of particular interest (as we’ll discuss in the sequel) is the so-called

spectral norm which is given by
kAk2 := kAk2→2 = max kAxk2 .

kxk2 =1
In particular, as we shall see, this norm is related to the eigenvalues of

A> A, since it holds that
q
kAk2 = max λi (A> A).
i
Recap on Eigenvalues and Eigenvectors of Square Matrices
Definition
Given a real square matrix A ∈ Rn×n we say λ is an eigenvalue of A, if
there exists a vector u 6= 0 such that Au = λu.
We say u is an eigenvector corresponding to eigenvalue λ.
Recall that even when A is real, both eigenvalues and eigenvectors need
not be real (i.e., can be complex).
The eigenvalues of a square real matrix A are the roots of the

characteristic polynomial: pA (λ) = det(λI − A).
Recall this is a polynomial of degree n with real coefficients.
Theorem (Fundamental theorem of algebra)

Any matrix A ∈ Rn×n has n (not necessarily real) eigenvalues, counting
multiplicities.

Theorem (Fundamental theorem of algebra)

Any matrix A ∈ Rn×n has n (not necessarily real) eigenvalues, counting
multiplicities.
Recall that for any eigenvalue λ of a real matrix A, the eigenvectors of A

that correspond to the eigenvalue λ, are the set of all solutions to the
linear system
(λI − A)v = 0,
or equivalently, it is the nullspace
N (λI − A) = Ker(λI − A).
Hence, the set of all eigenvectors associated with an eigenvalue λ is in

particular a subspace of Cn .
For any eigenvalue λ, denoting by µ its number of appearances as a root

of the characteristic polynomial det(λI − A) (algebraic multiplicity) and
by v the dimension of the eigenvectors subspace, i.e.,
v = dim(N (λI − A)) (the geometric multiplicity). We have v ≤ µ.
Theorem
Let λi , i = 1, . . . , k be the distinct eigenvalues of a matrix A ∈ Rn×n , and
let φi = Ker(λi I − A), i = 1, . . . , k be the corresponding eigenspaces.
Then, any k nonzero vectors ui ∈ φi , i = 1, . . . , k, are linearly independent.

Theorem (Diagnolization via eigenvectors)

Let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of a matrix
A ∈ Rn×n , let µi , i = 1, . . . , k, denote the corresponding algebraic
(i) (i)
multiplicities, and let φi = N (λi In − A). Let further U(i) = (u1 · · · uvi )
be a matrix containing by columns a basis of φi where vi = dim(φi ),
i = 1..k. If vi = µi for all i = 1, . . . , k, then the matrix
U = (U(1) · · · U(k) ) ∈ Rn×n is invertible, and A = UΛU−1 , where
 
λ1 Iµ1 0 ··· 0
 0
 λ2 Iµ2 · · · 0 

Λ=  .. . .
.. . .. .
..
.

..
 
0 . 0 λk IµK
Here, Iµj denotes the µj × µj identity matrix.

(i) (i)
U = (U(1) · · · U(k) ) ∈ Rn×n is invertible, and A = UΛU−1 .
Proof: The fact that U is invertible follows since its columns correspond
to eigenvectors of different eigenvalues. Since by previous theorem we
have that eigenvectors of different eigenvalues are linearly independent, we
have that the columns of U are linearly independent and hence it has full
rank and thus it is invertible.

(i) (i)
U = (U(1) · · · U(k) ) is invertible, and A = UΛU−1 .
Proof cont.: It follows from simple calculation that A = UΛU−1 :

AU = (AU(1) , AU(2) , ..., AU(k) )
(1) (2) (k)
= (Au1 , . . . , Au(1) (2) (k)
v1 , Au1 , . . . , Auv2 , . . . , Au1 , . . . , Auvk )
(1) (2) (k)
= (λ1 u1 , . . . , λ1 u(1) (2) (k)
v1 , λ2 u1 , . . . , λ2 uv2 , . . . , λk u1 , . . . , λk uvk )
= (λ1 U(1) , λ2 U(2) , ..., λk U(k) ) = (U(1) , U(2) , ..., U(k) )Λ = UΛ.
Now, multiplying on both sides with U−1 from the right, we get the result.
Eigendecomposition of Real Symmetric Matrices

We turn to discuss real symmetric matrices (recall A is symmetric if
Ai,j = Aj,i ). We shall see that these matrices have a unique structure
(w.r.t. eigenvalues and eigenvectors) which will be the basis to much of
the material in this course.
We denote by Sn the space of real n × n symmetric matrices.
Theorem (Eigendecomposition of a symmetric matrix)

Let A ∈ Sn , let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of A. Let
further µi denote the algebraic multiplicity of λi , and let
φi = Ker(λi In − A). Then, for all i = 1, . . . , k:
1 λi ∈ R and corresponding eigenvectors can alway be chosen to be in
Rn ,
2 φi ⊥ φj (∀ ui ∈ φi , uj ∈ φj , i 6= j : u>
i uj = 0),
3 dim φi = µi .

Proof of the Eigen-decomposition Theorem
Proof of part 1 - real eigenvalues and eigenvectors:

Let λ, u be any eigenvalue/eigenvector pair for A, i.e., Au = λu.
By taking the conjugate transpose of both sides we have u∗ A∗ = λ∗ u∗ .
Multiplying the first blue equation on the left by u∗ and the second blue
equation on the right by u, we have
u∗ Au = λu∗ u, u∗ A∗ u = λ∗ u∗ u. (1)
Since u∗ u = kuk22 6= 0 (recall from tutorial we know u∗ v is inner product

over Cn ), recalling that A is real implies that A∗ = A> , and subtracting
the two equalities in (1), it follows that
u∗ Au − u∗ A∗ u = u∗ (A − A> )u = (λ − λ∗ )kuk22 .
Now, since A − A> = 0, it must hold that λ − λ∗ = 0, which implies that

λ must be a real number.

Proof of part 1 - real eigenvalues and eigenvectors:
Let λ, u be any eigenvalue/eigenvector pair for A, i.e., Au = λu.
We have shown λ must be real.
Let us now show that without loss of generality, we can always choose u
to be real, i.e., u ∈ Rn .
Suppose u 6= 0 is complex and suppose Re(u) 6= 0 (if Re(u) = 0 we can
alway take iu instead and then iu ∈ Rn ).
On one hand we have:
Re(Au) = Re(λu) = λRe(u).
On the other hand we also have that
Re(Au) = Re(A(Re(u) + i · Im(u)) = Re(ARe(u)) = ARe(u).
Thus, we have ARe(u) = λRe(u), which means that Re(u) is an

eigenvector of A associated with λ.
Proof of part 2 - φi ⊥ φj : Recall φi = Ker(λi In − A).
Let vi ∈ φi , vj ∈ φj , i 6= j.
Since Avi = λi vi we have
vj> Avi = λi vj> vi . (2)
Since Avj = λj vj we have
vj> Avi = vi> A> vj = vi> Avj = λj vi> vj = λj vj> vi . (3)
Subtracting the Eq. (3) from (2) we obtain
0 = (λi − λj )vj> vi .
Since λi 6= λj it must hold that vj> vi = 0.
Proof of part 3 of Theorem: dim φi = µi
Recall φi = Ker(λi In − A). Fix eigenvalue λ. We will prove this by

constructing an orthonormal basis for φ = φi composed of µ = µi
elements.
Lemma (Auxiliary Lemma)
Let B ∈ Sm and let λ be an eigenvalue of B. Then, there exists an
orthogonal matrix U = [u Q] ∈ Rm×m , Q ∈ Rm×(m−1) , such that

λ 0
Bu = λu, kuk2 = 1, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ
Recall an orthogonal matrix is a square matrix whose columns are

orthonormal vectors.
We will prove the lemma later on.

Lemma (Auxiliary Lemma)


λ 0
Bu = λu, kuk2 = 1, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ
Throughout the proof we fix some eigenvalue λ of A.
We first apply the lemma to A ∈ Sn : since µ ≥ 1, there exists an

orthogonal matrix U1 = [u1 Q1 ] ∈ Rn×n , such that Au1 = λu1 , and

λ 0
U>1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1
Now, if µ = 1 we have finished the proof, since we found a subspace of φ

of dimension one (the subspace is span(u1 )).
Suppose now µ > 1.

Recall that there exist orthogonal U1 = [u1 Q1 ] ∈ Rn×n , such that
Au1 = λu1 , and

> λ 0
U1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1
Note that since U1 is orthogonal, we have U−1 >

1 = U1 .
Thus, the matrices A and U> 1 AU1 are similar! In particular they have
the same eigenvalues (including algebraic multiplicities).
Because of the block diagonal structure of U> 1 AU1 , λ is an eigenvalue of
A1 of multiplicity µ − 1 (in particular, note that the charcharistic
polynomial of A is given by ρA (σ) = (σ − λ) · ρA1 (σ)).

Suppose now µ > 1.

Recall that there exist orthogonal U1 = [u1 Q1 ] ∈ Rn×n , such that
Au1 = λu1 , and

λ 0
U> 1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1
We showed λ is an eigenvalue of A1 of multiplicity µ − 1.

We hence apply the same reasoning to the symmetric matrix A1 ∈ Sn−1 :
there exists an orthogonal matrix U2 = [ũ2 Q2 ] ∈ R(n−1)×(n−1) such that
A1 ũ2 = λũ2 , kũ2 k2 = 1, and

λ 0
U> 2 A1 U2 = , A2 = Q> 2 A1 Q2 ∈ S
n−2
.
0 A2

0
We next show that the vector u2 = U1 is a unit-norm eigenvector
ũ2
of A corresponding to the eigenvalue λ, and it is orthogonal to u1 . Indeed,

λ 0n−1 > 0 λ 0 0
Au2 = U1 U1 U1 = U1
0n−1 A1 ũ2 0 A1 ũ2

0 0
= U1 = U1 = λu2 .
A1 ũ2 λũ2
Moreover, >
0 0
ku2 k22 = u>
2 u2 = U>
1 U1 = kũ2 k22 = 1,
ũ2 ũ2
and
0 0 0
u> >
1 u2 = u1 U1 = u>
1 [u1 Q1 ] = [1 0n−1 ] = 0.
ũ2 ũ2 ũ2
If µ = 2, then the proof is finished, since we have found an orthonormal
basis of dimension two for φ (the vectors u1 , u2 ).
Otherwise, if µ > 2, we iterate the same reasoning on matrix
A2 ∈ R(n−2)×(n−2) : we find an eigenvector u3 orthogonal to u1 , u2 .
We do this, by taking a unit-norm eigenvector ũ03 satisfying A2 ũ03 = λũ03 .

0
Then, by re-iterating the above arguments, we have that ũ3 = U2
ũ03
is a unit-length eigenvector of A1 ∈ R(n−1)×(n−1) corresponding to
eigenvalue λ and orthogonal to ũ2 (previously found eigenvector, of A1 ).
Using thesame argument once more, we have that the vector
0
u3 = U 1 is a unit-length eigenvector of A corresponding to
ũ3
eigenvalue λ, and orthogonal to both u1 , u2 .
We can continue this process until we reach the actual value of µ (notice
that by the above, for each Ai , λ is an eigenvalue of algebraic multiplicity
µ − i) and at this point we exit the procedure with an orthonormal basis of
φ composed of exactly µ vectors.
Eigendecomposition of Real Symmetric Matrices
Theorem (Eigendecomposition of a symmetric matrix)

Let A ∈ Sn , let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of A. Let
further µi denote the algebraic multiplicity of λi , and let
φi = Ker(λi In − A). Then, for all i = 1, . . . , k:
1 λi ∈ R and corresponding eigenvectors can alway be chosen to be in
Rn ,
2 φi ⊥ φj (∀ ui ∈ φi , uj ∈ φj , i 6= j : u>
i uj = 0),
3 dim φi = µi .

The Spectral Theorem / Eigen-decompositon
Combining the Eigen-decomposition theorem and the
diagonalization-via-eigenvectors theorem we have:
Theorem (Spectral theorem)

Let A ∈ Rn×n be symmetric, let λi ∈ R, i = 1, . . . , n, be the eigenvalues
of A (counting multiplicities). Then, there exists a set of orthonormal
vectors ui ∈ Rn , i = 1, . . . , n, such that Aui = λi ui . Equivalently, there
exists an orthogonal matrixPU = [u1 · · · un ] (i.e., UU> = U> U = In )
such that A = UΛU> = ni=1 λi ui u> i , Λ = diag(λ1 , . . . , λn )
In particular, any symmetric matrix can be decomposed as a weighted sum

of simple rank-one matrices of the form ui u>
i , where the weights are given
by the eigenvalues λi .
Convention: from now on we consider the eigenvalues in non-increasing

order, i.e., λ1 ≥ λ2 ≥ . . . λn , and the eigenvectors as an orthonormal basis
of Rn .
Proof of the Auxiliary Lemma
Recall that in order to prove that µi = dim φi we have used the following
lemma:
Lemma

λ 0
Bu = λu, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ
Proof: Let u be any unit-norm eigenvector of B associated with λ. We

can now take Q to be matrix whose columns are an orthonormal basis to
the subspace orthogonal to u. Hence, U = [u Q] is orthogonal by
construction.

Proof of the Auxiliary Lemma
Lemma

> λ 0
Bu = λu, U BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ
Proof cont.: By calculation:

> Bu > BQ

u u
U> BU = [u Q]> B[u Q] =
Q> Bu Q> BQ
>
u Bu (Q> Bu)>
>
u (λu) (Q> λu)>

= =
Q> Bu Q> BQ Q> λu Q> BQ

λ 0
= ,
0 Q> BQ
where the last equality follows since the columns of Q are orthogonal to u.
Some Applications of the Spectral Theorem
Theorem (inverse and matrix power (HW))

Let A ∈ Sn , and write its eigen-decomposition A = UΛU> . Then
1 if A is invertible then A−1 = UΛ−1 U> ,
2 for any k ∈ N+ : Ak = UΛk U> .
Observation
Let A ∈ Sn . Let λ1 , . . . , λn denote its eigenvalues and let u1 , . . . , un
denote the corresponding eigenvectors. Then,
X X
A= λ i ui u>
i = λ i ui u>
i .
i∈[n] i∈[n]:λi 6=0
Observation
Let A ∈ Sn . rank(A) is equal to number of non-zero eigenvalues of A.

Computational Consequences of the Spectral Theorem
We now give some motivation why the eigen-decompistion (and related

decompositions that we will see in the sequel) are so important in modern
data-intensive applications.
For the following, think on the case that rank(A) = k << n (this indeed
holds for many matrices representing data that we see in real-life).
Let A ∈ Sn andP
suppose that the eigen-decomposition
A = UΛU = ki=1 λi ui u>
>
i is given, where rank(A) = k. Then,
Informally speaking, A can be stored in a computer’s memory using
only k(1 + n) memory units - storing the k non-zero eigenvalues and
the k eigenvectors (assuming each memory unit can store a scalar).
On the other hand, an explicit n × n symmetric matrix requires

2
n + n 2−n = Θ(n2 ) memory units (independent of k).
Computational Consequences of the Spectral Theorem
Let A ∈ Sn andP
suppose that the eigen-decomposition
A = UΛU = ki=1 λi ui u>
>
i is given, where rank(A) = k. Then,
Multiplying a vector x ∈ Rn by A takes O(kn) time:

1 first, compute the scalars u> x, i = 1, . . . , k = O(kn) time
Pki
2 then, compute the sum i=1 λi ui u> i x - O(kn) time.
On the other hand, computing Ax, when A is given explicitly (entry

by entry) takes O(n2 ) time (n row-column inner products).
Multiplying a matrix B ∈ Rn×n with A takes O(kn2 ) time, instead of

O(n3 ) time for an explicitly given n × n matrix (similar argument for
the matrix-column product, but now apply for each of the n columns).

Algebraic Methods in Data Science: Lesson 3: Dan Garber

Uploaded by

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 3: Dan Garber

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 3: Dan Garber

Uploaded by

Copyright:

Available Formats

Algebraic Methods in Data Science: Lesson 3

Faculty of Industrial Engineering and Management

Winter Semester 2020-2021

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 1 / 28

Range and Nullspace of Matrices

We focus our attention to real-valued m × n matrices. Let us begin with

Fix a m × n real matrix A. We denote the range (or image) and

R(A) = Im(A) := {Ax : x ∈ Rn };

Recall R(A) is a subspace of Rm and N (A) is a subspace of Rn . Recall

the rank-nullity theorem:

dim R(A) + dim N (A) = n;

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 2 / 28

It is also common to write A • B. It is not hard to show that (HW):

hA, Bi = Tr(A> B).

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 3 / 28

Matrix Inner Products and Norms

kAkp→q = max kAxkq .

kAkp→q measures how much can A “amplify” in q-norm, a vector of unit

In particular, we will use the notation kAkp := kAkp→p .

kAk1 := kAk1→1 = max kAxk1 = max kAj k1 ,

kAk∞ := kAk∞→∞ = max kAxk∞ = max kAj k1 .

Of particular interest (as we’ll discuss in the sequel) is the so-called

kAk2 := kAk2→2 = max kAxk2 .

In particular, as we shall see, this norm is related to the eigenvalues of

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 5 / 28

Recap on Eigenvalues and Eigenvectors of Square Matrices

The eigenvalues of a square real matrix A are the roots of the

Recall this is a polynomial of degree n with real coefficients.

Theorem (Fundamental theorem of algebra)

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 6 / 28

Theorem (Fundamental theorem of algebra)

Recall that for any eigenvalue λ of a real matrix A, the eigenvectors of A

or equivalently, it is the nullspace

N (λI − A) = Ker(λI − A).

Hence, the set of all eigenvectors associated with an eigenvalue λ is in

Recap on Eigenvalues and Eigenvectors of Square Matrices

For any eigenvalue λ, denoting by µ its number of appearances as a root

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 8 / 28

Theorem (Diagnolization via eigenvectors)

Here, Iµj denotes the µj × µj identity matrix.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 9 / 28

Recap on Eigenvalues and Eigenvectors of Square Matrices

Theorem (Diagnolization via eigenvectors)

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 10 / 28

Proof cont.: It follows from simple calculation that A = UΛU−1 :

Eigendecomposition of Real Symmetric Matrices

Theorem (Eigendecomposition of a symmetric matrix)

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 12 / 28

Proof of part 1 - real eigenvalues and eigenvectors:

Since u∗ u = kuk22 6= 0 (recall from tutorial we know u∗ v is inner product

Now, since A − A> = 0, it must hold that λ − λ∗ = 0, which implies that

Proof of the Eigen-decomposition Theorem

Re(Au) = Re(λu) = λRe(u).

On the other hand we also have that

Re(Au) = Re(A(Re(u) + i · Im(u)) = Re(ARe(u)) = ARe(u).

Thus, we have ARe(u) = λRe(u), which means that Re(u) is an

Proof of part 2 - φi ⊥ φj : Recall φi = Ker(λi In − A).

vj> Avi = λi vj> vi . (2)

Since Avj = λj vj we have