LecN16 R
LecN16 R
16-2 – eigBackg
ä Machine learning problems often require a (partial) Singular Value ä Standard eigenvalue problem:
Decomposition -
Ax = λx
ä Somewhat different issues in this case:
Often: A is symmetric real (or Hermitian complex)
• Very large matrices, update the SVD
• Compute dominant singular values/vectors ä Generalized problem Ax = λBx Often: B is sym-
metric positive definite, A is symmetric or nonsymmetric
• Many problems of approximating a matrix (or a tensor) by one of
lower rank (Dimension reduction, ...) ä Quadratic problems: (A + λB + λ2C)u = 0
ä But: Methods for computing SVD often based on those for " n
#
ä Nonlinear eigenvalue X
standard eigenvalue problems problems (NEVP) A0 + λB0 + fi(λ)Ai u = 0
i=1
ä λ̃ = Ritz value, ũ = V y = Ritz vector ä Works well for computing a few eigenvalues near σ/
ä Two common choices for K: ä Used in commercial package NASTRAN (for decades!)
1) Power subspace K = span{Ak X0}; or span{Pk (A)X0};
2) Krylov subspace K = span{v, Av, · · · , Ak−1v} ä Requires factoring (A − σI) (or (A − σB) in generalized
case.) But convergence will be much faster.
ä A solve each time - Factorization done once (ideally).
16-7 – eigBackg 16-8 – eigBackg
Background. The main tools (cont) Current state-of-the art in eigensolvers
Algorithm: Subspace Iteration with Projection THEOREM: Let S0 = span{x1, x2, . . . , xm} and assume that
1. Start: Choose an initial system of vectors X = S0 is such that the vectors {P xi}i=1,...,m are linearly independent
[x0, . . . , xm] and an initial polynomial Ck . where P is the spectral projector associated with λ1, . . . , λm. Let
Pk the orthogonal projector onto the subspace Sk = span{Xk }.
2. Iterate: Until convergence do:
Then for each eigenvector ui of A, i = 1, . . . , m, there exists a
(a) Compute Ẑ = Ck (A)Xold. unique vector si in the subspace S0 such that P si = ui. Moreover,
(b) Orthonormalize Ẑ into Z. the following inequality is satisfied
(c) Compute B = Z H AZ and use the QR algorithm to k
λm+1
compute the Schur vectors Y = [y1, . . . , ym] of B. k(I − Pk )uik2 ≤ kui − sik2 + k , (1)
λi
(d) Compute Xnew = ZY .
(e) Test for convergence. If satisfied stop. Else select a new where k tends to zero as k tends to infinity.
polynomial Ck0 0 and continue.
16-16 – eig1
Krylov subspace methods Arnoldi’s Algorithm
Principle: Projection methods on Krylov subspaces, i.e., on ä Goal: to compute an orthogonal basis of Km.
Km(A, v1) = span{v1, Av1, · · · , Am−1v1} ä Input: Initial vector v1, with kv1k2 = 1 and m.
ä Accumulate each new converged eigenvector in columns 2, 3, ... m <e(λ) =m(λ) Res. Norm
[‘locked’ set of eigenvectors.] 10 0.9987435899D+00 0.0 0.246D-01
" # 20 0.9999523324D+00 0.0 0.144D-02
z active
}| { 30 0.1000000368D+01 0.0 0.221D-04
Thus, for k = 2: Vm = v , v}2 , v3, . . . , vm
| 1{z
Locked 40 0.1000000025D+01 0.0 0.508D-06
50 0.9999999996D+00 0.0 0.138D-07
∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
Hm =
∗ ∗ ∗ ∗
∗ ∗ ∗
∗ ∗
16-28 – eig1
ä Computing the next 2 eigenvalues of Mark(10). Hermitian case: The Lanczos Algorithm
Eig. Mat-Vec’s <e(λ) =m(λ) Res. Norm
ä The Hessenberg matrix becomes tridiagonal :
2 60 0.9370509474 0.0 0.870D-03
69 0.9371549617 0.0 0.175D-04 A = AH and VmH AVm = Hm H
→ Hm = Hm
78 0.9371501442 0.0 0.313D-06
87 0.9371501564 0.0 0.490D-08 ä We can write
3 96 0.8112247133 0.0 0.210D-02
α1 β2
104 0.8097553450 0.0 0.538D-03 β α β
2 2 3
112 0.8096419483 0.0 0.874D-04
.. .. .. .. β3 α3 β4
Hm = (2)
.. .. .. .. . . .
. . .
152 0.8095717167 0.0 0.444D-07
βm αm
ä Consequence: three term recurrence
βj+1vj+1 = Avj − αj vj − βj vj−1
16-29 – eig1 16-30 – eig1
ä Partial reorthogonalization: reorthogonalize only when deemed Assume eigenvalues sorted increasingly
necessary.
λ1 ≤ λ2 ≤ · · · ≤ λn
ä Main question is when?
ä Uses an inexpensive recurrence relation ä Orthogonal projection method onto Km;
ä Work done in the 80’s [Parlett, Simon, and co-workers] + more ä To derive error bounds, use the Courant characterization
recent work [Larsen, ’98]
ä Package: PROPACK [Larsen] V 1: 2001, most recent: V 2.1 (Au, u) (Aũ1, ũ1)
λ̃1 = min =
(Apr. 05) u ∈ K, u6=0 (u, u) (ũ1, ũ1)
(Au, u) (Aũj , ũj )
ä Often, need for reorthogonalization not too strong λ̃j = min =
u ∈ K, u6=0 (u, u) (ũj , ũj )
u ⊥ũ1 ,...,ũj−1
ä Bounds for λ1 easy to find – similar to linear systems. A-priori error bounds
ä Ritz values approximate eigenvalues of A inside out:
Theorem [Kaniel, 1966]:
2
λ1 λ2 λn−1 λn (m) tan ∠(v1, u1)
0≤ λ1 − λ1 ≤ (λN − λ1)
λ̃1 λ̃2 λ̃n−1 λ̃n Tm−1(1 + 2γ1)
λ2−λ1
where γ1 = λN −λ2
; and ∠(v1, u1) = angle between v1 and u1.
Theorem
2
(m) (m) tan ∠(vi, ui)
0≤ λi − λi ≤ (λN − λ 1 ) κi
Tm−i(1 + 2γi)
λi+1−λi (m) Q (m)
λj−λN
where γi = λN −λi+1
, κi = j<i λ(m)−λ
j i
If the algorithm does not break down before step m, then ä If θj , yj , zj are, respectively an eigenvalue of Tm, with associated
the vectors vi, i = 1, . . . , m, and wj , j = 1, . . . , m, right and left eigenvectors yj and zj respectively, then corresponding
are biorthogonal, i.e., approximations for A are
(vj , wi) = δij 1 ≤ i, j ≤ m . Ritz value Right Ritz vector Left Ritz vector
θj Vmyj Wmzj
Moreover, {vi}i=1,2,...,m is a basis of Km(A, v1) and
{wi}i=1,2,...,m is a basis of Km(AH , w1) and [Note: terminology is abused slightly - Ritz values and vectors nor-
AVm = VmTm + δm+1vm+1eH mally refer to Hermitian cases.]
m,
A Wm = WmTm + β̄m+1wm+1eH
H H
m,
H
Wm AVm = Tm .
16-43 – eig1