My Notes On Tensors
My Notes On Tensors
Hermitian Matrices
It is simpler to begin
√ with matrices with complex numbers. Let x = a + ib, where a, b are real
numbers, and i = −1. Then, x∗ = a − ib is the complex conjugate of x. In the discussion below,
all matrices and numbers are complex-valued unless stated otherwise.
Let M be an n × n square matrix with complex entries. Then, λ is an eigenvalue of M if there
is a non-zero vector ~v such that
M~v = λ~v
This implies (M − λI)~v = 0, which also means the determinant of M − λI is zero. Since the
determinant is a degree n polynomial in λ, this shows that any M has n real or complex eigenvalues.
A complex-valued matrix M is said to be Hermitian if for all i, j, we have Mij = Mji ∗ . If the
entries are all real numbers, this reduces to the definition of symmetric matrix.
In the discussion below, we will need the notion of inner product. Let ~v and w~ be two vectors
with complex entries. Define their inner product as
n
X
h~v , wi
~ = vi∗ wi
i=1
Furthermore, we define
n
X
h~v , ~v i = vi∗ vi = k~v k2
i=1
Claim 1. M is Hermitian iff all its eigenvalues are real. If further M is real and symmetric, then
all its eigenvectors have real entries as well.
Proof. Using the fact that Mij ∗ = Mji , we have the following relations:
XX
hM~v , ~v i = (Mij vj )∗ vi
i j
XX
= Mji vj∗ vi
i j
X X
= vj∗ (Mji vi )
j i
= h~v , M~v i
1
2 COMPUTING EIGENVALUES
and
h~v , M~v i = h~v , λ~v i = λk~v k2
Since these expressions are equal, this means λ∗ = λ, which means λ is real.
Suppose M is real and symmetric. Let λ be some eigenvalue, which by the above definition is
real. Then we have M~v = λ~v for some complex ~v . Since M is real, this means, the above relation
also holds for both the real and complex parts of ~v . Therefore, if w~ is the real part of ~v , then
Mw ~ = λw.
~ This implies all eigenvectors are real if M is real and symmetric.
From now on, we will only focus on matrices with real entries.
Claim 2. For a real, symmetric matrix M , let λ 6= λ0 be two eigenvalues. Then the corresponding
eigenvectors are orthogonal.
~ = λ0 w.
Proof. Let M~v = λ~v and M w ~ Since M is symmetric, it is easy to check that
X
hM~v , wi
~ = h~v , M wi
~ = Mij vi wj
i,j
But
hM~v , wi
~ = λh~v , wi
~
and
~ = λ0 h~v , wi
h~v , M wi ~
Theorem 1. Let M be a n × n real symmetric matrix, and let λ1 , . . . , λn denote its eigenvalues.
Then, there exist n real-valued vectors v~1 , v~2 , . . . , v~n such that:
• k~
vi k = 1 for all i = 1, 2, . . . , n;
• h~
vi , v~j i = 0 for all i 6= j ∈ {1, 2, . . . , n}; and
Computing Eigenvalues
In this section, we assume M is real and symmetric.
Lemma 1. Let M be a real symmetric matrix. Let λ1 denote its largest eigenvalue and v~1 denote
the corresponding eigenvector with unit norm. Then
This means
λ1 ≤ sup ~xT M~x
x∈Rn ,k~
~ xk=1
Suppose the supremum is achieved at vector ~y . Let v~1 , v~2 , . . . , v~n denote the orthogonal eigenvectors
of unit length corresponding to the eigenvalues λ1 ≥ · · · ≥ λn respectively. Then we can write ~y in
this basis as:
Xn
~y = αi v~i
i=1
Since ~y has unit length, this means
n
X
h~y , ~y i = αi2 = 1
i=1
Note next that
n
X
M ~y = αi λi v~i
i=1
This means
Xn n
X n
X
~y T M ~y = h αi v~i , αi λi v~i i = αi2 λi
i=1 i=1 i=1
This means
n
X n
X
T T
sup ~x M~x = ~y M ~y = αi2 λi ≤ λ1 αi2 = λ1
x∈Rn ,k~
~ xk=1 i=1 i=1
This means the supremum has value exactly λ1 and is achieved for ~x = v~1 .
Corollary 3. Let M be a real symmetric matrix, and λn denote its smallest eigenvalue. Then
In order to interpret what P does, note the following. For any vectors ~v and w,
~
~ = ~v T P T P w
hP ~v , P wi ~ = ~v T w
~ = h~v , wi
~
This means applying P preserves the angles between vectors, as well as their lengths. This means
applying P performs a rotation of the space.
Given a real, symmetric matrix M with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn , let Q denote the
matrix whose rows are the corresponding eigenvectors of unit length. Since these eigenvectors are
orthogonal, this implies Q is orthonormal. Let D be the matrix whose entries along the diagonal
are the n eigenvalues, and other entries are zero. It is easy to check that:
M QT = QT D ⇒ M = QT DQ
Therefore, applying M to a vector v is the same as applying the rotation Q; then stretching
dimension i by factor λi , and rotating back by QT . Note that λi could be negative, which flips the
sign of the corresponding coordinate.
If we were to do this, which direction stretches the most? The dimension corresponding to λ1
after the rotation by Q. But in the original space, this would be the direction corresponding to the
first eigenvector v~1 . Now, for unit vector ~x, we have
This quantity therefore corresponds to the length of the stretched rotated vector (since Q~x is also
a unit vector), which is λ1 if ~x = v~1 . This is the interpretation of the previous result.
Power Iteration
Assuming λ1 is strictly larger than λ2 , there is a simple algorithm to compute the largest eigenvector.
Note that
k
M k = QT DQ = QT Dk Q
This means k k k
PnM has eigenvectors v~1 , v~2 , . . ., and the corresponding eigenvalues are λ1 , λ2 , . . .. Sup-
pose ~x = i=1 αi v~i , then
Xn
k
M ~x = αi λki v~i
i=1
The claim is that as k becomes large, this vector points more and more in the direction of v~1 .
1
1 Since it is a random vector, each αi ∼ N 0,kn .
Suppose we choose the initial vector ~x at random.
This means with large probability, |αi | ∈ n , 1 . If we assume λ1 ≥ cλ2 , then, for i ≥ 2, αi λi ≤
λk1 /ck , while α1 λk1 ≥ λk1 /n. If we choose ck ≥ n, so that k ≥ logc n, then the terms for i ≥ 2 have
vanishingly small coefficients, and the dominant term corresponds to v~1 . Assuming c = 1 + , we
need to choose k ≈ log n for the second and larger eigenvectors to have vanishing contributions
compared to the first.
This method is termed power iteration. It can be improved by repeated squaring, so that k
grows in powers of 2. The number of iterations then drops to around log 1 + log log n. So even an
exponentially small gap, say = 21n between λ1 and λ2 is sufficient for the algorithm to converge
to the direction v~1 in polynomially many iterations.
5
Once we have computed v~1 , to compute v~2 , simply pick a random Pnvector and project in the
direction perpendicular to v~1 . Call this projected vector~x. Since ~x = i=2 αi v~i , we again have
n
X
M k ~x = αi λki v~i
i=2
In the limit, this converges to the second eigenvector assuming the second eigenvalue is well-
separated from the third. And so on.
v~i T M v~i = λi h~
vi T , v~i i ≥ 0 ⇒ λi ≥ 0
The above characterization is if and only if: A real, symmetric matrix is PSD iff all its eigenvalues
are non-negative.