Algebraic Methods in Data Science: Lesson 3: Dan Garber

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Algebraic Methods in Data Science: Lesson 3

Faculty of Industrial Engineering and Management


Technion - Israel Institute of Technology

Dan Garber
https://dangar.net.technion.ac.il/

Winter Semester 2020-2021

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 1 / 28

Range and Nullspace of Matrices

We focus our attention to real-valued m × n matrices. Let us begin with


recalling some of the most basic definition.

Fix a m × n real matrix A. We denote the range (or image) and


nullspace (kernel) as

R(A) = Im(A) := {Ax : x ∈ Rn };


N (A) = Ker(A) := {x ∈ Rn : Ax = 0}.

Recall R(A) is a subspace of Rm and N (A) is a subspace of Rn . Recall

the rank-nullity theorem:

dim R(A) + dim N (A) = n;


dim R(A> ) + dim N (A> ) = m.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 2 / 28


Matrix Inner Products and Norms
Let X = Rm×n . The standard matrix inner product is defined (similarly to
the standard inner product for vectors) as:
X
hA, Bi = Ai,j Bi,j .
i∈[m],j∈[n]

It is also common to write A • B. It is not hard to show that (HW):

hA, Bi = Tr(A> B).

The Euclidean norm for Rm×n is defined similarly to the vector case, and
as in the vector case it is induced by the standard inner product.
It is called the Frobenius norm and it is given by
sX q
kAkF = 2
Ai,j = Tr(A> A).
i,j

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 3 / 28

Matrix Inner Products and Norms


Note that the `p norms defined for vectors, also naturally extend to
matrices, i.e., the following are norms
X 1/p
p
kAk(p) = |Ai,j | 1 ≤ p ≤ ∞.
i,j

Theorem (HW)
For every p, q ∈ [1, ∞] × [1, ∞], the following function is a norm

kAkp→q = max kAxkq .


x∈Rn :kxkp =1

kAkp→q measures how much can A “amplify” in q-norm, a vector of unit


length with respect to the p-norm.

In particular, we will use the notation kAkp := kAkp→p .


@ Dan Garber (Technion) Lesson 3 Winter 2020-21 4 / 28
Matrix Inner Products and Norms
Examples (HW): Denote by Aj the jth column and by Aj the jth row.
Then,

kAk1 := kAk1→1 = max kAxk1 = max kAj k1 ,


kxk1 =1 j=1,...,n

kAk∞ := kAk∞→∞ = max kAxk∞ = max kAj k1 .


kxk∞ =1 j=1,...,m

Of particular interest (as we’ll discuss in the sequel) is the so-called


spectral norm which is given by

kAk2 := kAk2→2 = max kAxk2 .


kxk2 =1

In particular, as we shall see, this norm is related to the eigenvalues of


A> A, since it holds that
q
kAk2 = max λi (A> A).
i

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 5 / 28

Recap on Eigenvalues and Eigenvectors of Square Matrices

Definition
Given a real square matrix A ∈ Rn×n we say λ is an eigenvalue of A, if
there exists a vector u 6= 0 such that Au = λu.
We say u is an eigenvector corresponding to eigenvalue λ.

Recall that even when A is real, both eigenvalues and eigenvectors need
not be real (i.e., can be complex).

The eigenvalues of a square real matrix A are the roots of the


characteristic polynomial: pA (λ) = det(λI − A).

Recall this is a polynomial of degree n with real coefficients.

Theorem (Fundamental theorem of algebra)


Any matrix A ∈ Rn×n has n (not necessarily real) eigenvalues, counting
multiplicities.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 6 / 28


Recap on Eigenvalues and Eigenvectors of Square Matrices

Theorem (Fundamental theorem of algebra)


Any matrix A ∈ Rn×n has n (not necessarily real) eigenvalues, counting
multiplicities.

Recall that for any eigenvalue λ of a real matrix A, the eigenvectors of A


that correspond to the eigenvalue λ, are the set of all solutions to the
linear system

(λI − A)v = 0,

or equivalently, it is the nullspace

N (λI − A) = Ker(λI − A).

Hence, the set of all eigenvectors associated with an eigenvalue λ is in


particular a subspace of Cn .
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 7 / 28

Recap on Eigenvalues and Eigenvectors of Square Matrices

For any eigenvalue λ, denoting by µ its number of appearances as a root


of the characteristic polynomial det(λI − A) (algebraic multiplicity) and
by v the dimension of the eigenvectors subspace, i.e.,
v = dim(N (λI − A)) (the geometric multiplicity). We have v ≤ µ.

Theorem
Let λi , i = 1, . . . , k be the distinct eigenvalues of a matrix A ∈ Rn×n , and
let φi = Ker(λi I − A), i = 1, . . . , k be the corresponding eigenspaces.
Then, any k nonzero vectors ui ∈ φi , i = 1, . . . , k, are linearly independent.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 8 / 28


Recap on Eigenvalues and Eigenvectors of Square Matrices

Theorem (Diagnolization via eigenvectors)


Let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of a matrix
A ∈ Rn×n , let µi , i = 1, . . . , k, denote the corresponding algebraic
(i) (i)
multiplicities, and let φi = N (λi In − A). Let further U(i) = (u1 · · · uvi )
be a matrix containing by columns a basis of φi where vi = dim(φi ),
i = 1..k. If vi = µi for all i = 1, . . . , k, then the matrix
U = (U(1) · · · U(k) ) ∈ Rn×n is invertible, and A = UΛU−1 , where
 
λ1 Iµ1 0 ··· 0
 0
 λ2 Iµ2 · · · 0 

Λ=  .. . .
.. . .. .
..
.

..
 
0 . 0 λk IµK

Here, Iµj denotes the µj × µj identity matrix.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 9 / 28

Recap on Eigenvalues and Eigenvectors of Square Matrices

Theorem (Diagnolization via eigenvectors)


Let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of a matrix
A ∈ Rn×n , let µi , i = 1, . . . , k, denote the corresponding algebraic
(i) (i)
multiplicities, and let φi = N (λi In − A). Let further U(i) = (u1 · · · uvi )
be a matrix containing by columns a basis of φi where vi = dim(φi ),
i = 1..k. If vi = µi for all i = 1, . . . , k, then the matrix
U = (U(1) · · · U(k) ) ∈ Rn×n is invertible, and A = UΛU−1 .

Proof: The fact that U is invertible follows since its columns correspond
to eigenvectors of different eigenvalues. Since by previous theorem we
have that eigenvectors of different eigenvalues are linearly independent, we
have that the columns of U are linearly independent and hence it has full
rank and thus it is invertible.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 10 / 28


Recap on Eigenvalues and Eigenvectors of Square Matrices
Theorem (Diagnolization via eigenvectors)
Let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of a matrix
A ∈ Rn×n , let µi , i = 1, . . . , k, denote the corresponding algebraic
(i) (i)
multiplicities, and let φi = N (λi In − A). Let further U(i) = (u1 · · · uvi )
be a matrix containing by columns a basis of φi where vi = dim(φi ),
i = 1..k. If vi = µi for all i = 1, . . . , k, then the matrix
U = (U(1) · · · U(k) ) is invertible, and A = UΛU−1 .

Proof cont.: It follows from simple calculation that A = UΛU−1 :


AU = (AU(1) , AU(2) , ..., AU(k) )
(1) (2) (k)
= (Au1 , . . . , Au(1) (2) (k)
v1 , Au1 , . . . , Auv2 , . . . , Au1 , . . . , Auvk )
(1) (2) (k)
= (λ1 u1 , . . . , λ1 u(1) (2) (k)
v1 , λ2 u1 , . . . , λ2 uv2 , . . . , λk u1 , . . . , λk uvk )
= (λ1 U(1) , λ2 U(2) , ..., λk U(k) ) = (U(1) , U(2) , ..., U(k) )Λ = UΛ.
Now, multiplying on both sides with U−1 from the right, we get the result.
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 11 / 28

Eigendecomposition of Real Symmetric Matrices


We turn to discuss real symmetric matrices (recall A is symmetric if
Ai,j = Aj,i ). We shall see that these matrices have a unique structure
(w.r.t. eigenvalues and eigenvectors) which will be the basis to much of
the material in this course.
We denote by Sn the space of real n × n symmetric matrices.

Theorem (Eigendecomposition of a symmetric matrix)


Let A ∈ Sn , let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of A. Let
further µi denote the algebraic multiplicity of λi , and let
φi = Ker(λi In − A). Then, for all i = 1, . . . , k:
1 λi ∈ R and corresponding eigenvectors can alway be chosen to be in
Rn ,
2 φi ⊥ φj (∀ ui ∈ φi , uj ∈ φj , i 6= j : u>
i uj = 0),
3 dim φi = µi .

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 12 / 28


Proof of the Eigen-decomposition Theorem

Proof of part 1 - real eigenvalues and eigenvectors:


Let λ, u be any eigenvalue/eigenvector pair for A, i.e., Au = λu.
By taking the conjugate transpose of both sides we have u∗ A∗ = λ∗ u∗ .
Multiplying the first blue equation on the left by u∗ and the second blue
equation on the right by u, we have

u∗ Au = λu∗ u, u∗ A∗ u = λ∗ u∗ u. (1)

Since u∗ u = kuk22 6= 0 (recall from tutorial we know u∗ v is inner product


over Cn ), recalling that A is real implies that A∗ = A> , and subtracting
the two equalities in (1), it follows that

u∗ Au − u∗ A∗ u = u∗ (A − A> )u = (λ − λ∗ )kuk22 .

Now, since A − A> = 0, it must hold that λ − λ∗ = 0, which implies that


λ must be a real number.
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 13 / 28

Proof of the Eigen-decomposition Theorem


Proof of part 1 - real eigenvalues and eigenvectors:
Let λ, u be any eigenvalue/eigenvector pair for A, i.e., Au = λu.
We have shown λ must be real.
Let us now show that without loss of generality, we can always choose u
to be real, i.e., u ∈ Rn .
Suppose u 6= 0 is complex and suppose Re(u) 6= 0 (if Re(u) = 0 we can
alway take iu instead and then iu ∈ Rn ).
On one hand we have:

Re(Au) = Re(λu) = λRe(u).

On the other hand we also have that

Re(Au) = Re(A(Re(u) + i · Im(u)) = Re(ARe(u)) = ARe(u).

Thus, we have ARe(u) = λRe(u), which means that Re(u) is an


eigenvector of A associated with λ.
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 14 / 28
Proof of the Eigen-decomposition Theorem

Proof of part 2 - φi ⊥ φj : Recall φi = Ker(λi In − A).

Let vi ∈ φi , vj ∈ φj , i 6= j.
Since Avi = λi vi we have

vj> Avi = λi vj> vi . (2)

Since Avj = λj vj we have

vj> Avi = vi> A> vj = vi> Avj = λj vi> vj = λj vj> vi . (3)

Subtracting the Eq. (3) from (2) we obtain

0 = (λi − λj )vj> vi .

Since λi 6= λj it must hold that vj> vi = 0.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 15 / 28

Proof of part 3 of Theorem: dim φi = µi

Recall φi = Ker(λi In − A). Fix eigenvalue λ. We will prove this by


constructing an orthonormal basis for φ = φi composed of µ = µi
elements.
Lemma (Auxiliary Lemma)
Let B ∈ Sm and let λ be an eigenvalue of B. Then, there exists an
orthogonal matrix U = [u Q] ∈ Rm×m , Q ∈ Rm×(m−1) , such that
 
λ 0
Bu = λu, kuk2 = 1, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ

Recall an orthogonal matrix is a square matrix whose columns are


orthonormal vectors.

We will prove the lemma later on.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 16 / 28


Proof of part 3 of Theorem: dim φi = µi

Lemma (Auxiliary Lemma)


Let B ∈ Sm and let λ be an eigenvalue of B. Then, there exists an
orthogonal matrix U = [u Q] ∈ Rm×m , Q ∈ Rm×(m−1) , such that
 
λ 0
Bu = λu, kuk2 = 1, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ

Throughout the proof we fix some eigenvalue λ of A.

We first apply the lemma to A ∈ Sn : since µ ≥ 1, there exists an


orthogonal matrix U1 = [u1 Q1 ] ∈ Rn×n , such that Au1 = λu1 , and
 
λ 0
U>1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1

Now, if µ = 1 we have finished the proof, since we found a subspace of φ


of dimension one (the subspace is span(u1 )).
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 17 / 28

Proof of part 3 of Theorem: dim φi = µi

Suppose now µ > 1.


Recall that there exist orthogonal U1 = [u1 Q1 ] ∈ Rn×n , such that
Au1 = λu1 , and
 
> λ 0
U1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1

Note that since U1 is orthogonal, we have U−1 >


1 = U1 .

Thus, the matrices A and U> 1 AU1 are similar! In particular they have
the same eigenvalues (including algebraic multiplicities).
Because of the block diagonal structure of U> 1 AU1 , λ is an eigenvalue of
A1 of multiplicity µ − 1 (in particular, note that the charcharistic
polynomial of A is given by ρA (σ) = (σ − λ) · ρA1 (σ)).

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 18 / 28


Proof of part 3 of Theorem: dim φi = µi

Suppose now µ > 1.


Recall that there exist orthogonal U1 = [u1 Q1 ] ∈ Rn×n , such that
Au1 = λu1 , and
 
λ 0
U> 1 AU1 = , A1 = Q> 1 AQ1 ∈ S
n−1
.
0 A1

We showed λ is an eigenvalue of A1 of multiplicity µ − 1.


We hence apply the same reasoning to the symmetric matrix A1 ∈ Sn−1 :
there exists an orthogonal matrix U2 = [ũ2 Q2 ] ∈ R(n−1)×(n−1) such that
A1 ũ2 = λũ2 , kũ2 k2 = 1, and
 
λ 0
U> 2 A1 U2 = , A2 = Q> 2 A1 Q2 ∈ S
n−2
.
0 A2

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 19 / 28

Proof of part 3 of Theorem: dim φi = µi


 
0
We next show that the vector u2 = U1 is a unit-norm eigenvector
ũ2
of A corresponding to the eigenvalue λ, and it is orthogonal to u1 . Indeed,
        
λ 0n−1 > 0 λ 0 0
Au2 = U1 U1 U1 = U1
0n−1 A1 ũ2 0 A1 ũ2
   
0 0
= U1 = U1 = λu2 .
A1 ũ2 λũ2
Moreover,  >  
0 0
ku2 k22 = u>
2 u2 = U>
1 U1 = kũ2 k22 = 1,
ũ2 ũ2
and      
0 0 0
u> >
1 u2 = u1 U1 = u>
1 [u1 Q1 ] = [1 0n−1 ] = 0.
ũ2 ũ2 ũ2
If µ = 2, then the proof is finished, since we have found an orthonormal
basis of dimension two for φ (the vectors u1 , u2 ).
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 20 / 28
Proof of part 3 of Theorem: dim φi = µi
Otherwise, if µ > 2, we iterate the same reasoning on matrix
A2 ∈ R(n−2)×(n−2) : we find an eigenvector u3 orthogonal to u1 , u2 .
We do this, by taking a unit-norm eigenvector ũ03 satisfying A2 ũ03 = λũ03 .
 
0
Then, by re-iterating the above arguments, we have that ũ3 = U2
ũ03
is a unit-length eigenvector of A1 ∈ R(n−1)×(n−1) corresponding to
eigenvalue λ and orthogonal to ũ2 (previously found eigenvector, of A1 ).
Using thesame argument once more, we have that the vector
0
u3 = U 1 is a unit-length eigenvector of A corresponding to
ũ3
eigenvalue λ, and orthogonal to both u1 , u2 .

We can continue this process until we reach the actual value of µ (notice
that by the above, for each Ai , λ is an eigenvalue of algebraic multiplicity
µ − i) and at this point we exit the procedure with an orthonormal basis of
φ composed of exactly µ vectors.
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 21 / 28

Eigendecomposition of Real Symmetric Matrices

Theorem (Eigendecomposition of a symmetric matrix)


Let A ∈ Sn , let λi , i = 1, . . . , k ≤ n, be the distinct eigenvalues of A. Let
further µi denote the algebraic multiplicity of λi , and let
φi = Ker(λi In − A). Then, for all i = 1, . . . , k:
1 λi ∈ R and corresponding eigenvectors can alway be chosen to be in
Rn ,
2 φi ⊥ φj (∀ ui ∈ φi , uj ∈ φj , i 6= j : u>
i uj = 0),
3 dim φi = µi .

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 22 / 28


The Spectral Theorem / Eigen-decompositon
Combining the Eigen-decomposition theorem and the
diagonalization-via-eigenvectors theorem we have:

Theorem (Spectral theorem)


Let A ∈ Rn×n be symmetric, let λi ∈ R, i = 1, . . . , n, be the eigenvalues
of A (counting multiplicities). Then, there exists a set of orthonormal
vectors ui ∈ Rn , i = 1, . . . , n, such that Aui = λi ui . Equivalently, there
exists an orthogonal matrixPU = [u1 · · · un ] (i.e., UU> = U> U = In )
such that A = UΛU> = ni=1 λi ui u> i , Λ = diag(λ1 , . . . , λn )

In particular, any symmetric matrix can be decomposed as a weighted sum


of simple rank-one matrices of the form ui u>
i , where the weights are given
by the eigenvalues λi .

Convention: from now on we consider the eigenvalues in non-increasing


order, i.e., λ1 ≥ λ2 ≥ . . . λn , and the eigenvectors as an orthonormal basis
of Rn .
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 23 / 28

Proof of the Auxiliary Lemma

Recall that in order to prove that µi = dim φi we have used the following
lemma:
Lemma
Let B ∈ Sm and let λ be an eigenvalue of B. Then, there exists an
orthogonal matrix U = [u Q] ∈ Rm×m , Q ∈ Rm×(m−1) , such that
 
λ 0
Bu = λu, U> BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ

Proof: Let u be any unit-norm eigenvector of B associated with λ. We


can now take Q to be matrix whose columns are an orthonormal basis to
the subspace orthogonal to u. Hence, U = [u Q] is orthogonal by
construction.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 24 / 28


Proof of the Auxiliary Lemma
Lemma
Let B ∈ Sm and let λ be an eigenvalue of B. Then, there exists an
orthogonal matrix U = [u Q] ∈ Rm×m , Q ∈ Rm×(m−1) , such that
 
> λ 0
Bu = λu, U BU = > , Q> BQ ∈ Sm−1 .
0 Q BQ

Proof cont.: By calculation:


> Bu > BQ
 
u u
U> BU = [u Q]> B[u Q] =
Q> Bu Q> BQ
 >
u Bu (Q> Bu)>
  >
u (λu) (Q> λu)>

= =
Q> Bu Q> BQ Q> λu Q> BQ
 
λ 0
= ,
0 Q> BQ
where the last equality follows since the columns of Q are orthogonal to u.
@ Dan Garber (Technion) Lesson 3 Winter 2020-21 25 / 28

Some Applications of the Spectral Theorem

Theorem (inverse and matrix power (HW))


Let A ∈ Sn , and write its eigen-decomposition A = UΛU> . Then
1 if A is invertible then A−1 = UΛ−1 U> ,
2 for any k ∈ N+ : Ak = UΛk U> .

Observation
Let A ∈ Sn . Let λ1 , . . . , λn denote its eigenvalues and let u1 , . . . , un
denote the corresponding eigenvectors. Then,
X X
A= λ i ui u>
i = λ i ui u>
i .
i∈[n] i∈[n]:λi 6=0

Observation
Let A ∈ Sn . rank(A) is equal to number of non-zero eigenvalues of A.

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 26 / 28


Computational Consequences of the Spectral Theorem

We now give some motivation why the eigen-decompistion (and related


decompositions that we will see in the sequel) are so important in modern
data-intensive applications.

For the following, think on the case that rank(A) = k << n (this indeed
holds for many matrices representing data that we see in real-life).

Let A ∈ Sn andP
suppose that the eigen-decomposition
A = UΛU = ki=1 λi ui u>
>
i is given, where rank(A) = k. Then,
Informally speaking, A can be stored in a computer’s memory using
only k(1 + n) memory units - storing the k non-zero eigenvalues and
the k eigenvectors (assuming each memory unit can store a scalar).

On the other hand, an explicit n × n symmetric matrix requires


2
n + n 2−n = Θ(n2 ) memory units (independent of k).

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 27 / 28

Computational Consequences of the Spectral Theorem

Let A ∈ Sn andP
suppose that the eigen-decomposition
A = UΛU = ki=1 λi ui u>
>
i is given, where rank(A) = k. Then,

Multiplying a vector x ∈ Rn by A takes O(kn) time:


1 first, compute the scalars u> x, i = 1, . . . , k = O(kn) time
Pki
2 then, compute the sum i=1 λi ui u> i x - O(kn) time.

On the other hand, computing Ax, when A is given explicitly (entry


by entry) takes O(n2 ) time (n row-column inner products).

Multiplying a matrix B ∈ Rn×n with A takes O(kn2 ) time, instead of


O(n3 ) time for an explicitly given n × n matrix (similar argument for
the matrix-column product, but now apply for each of the n columns).

@ Dan Garber (Technion) Lesson 3 Winter 2020-21 28 / 28

You might also like