Lec 02
Lec 02
Gradient Method
Xiaojing Ye
Department of Mathematics & Statistics
Georgia State University
• αk ≥ 0: step size.
We will first discuss some properties of steepest descent method, and con-
sider other (inexact) line search methods.
Proof. Define φ(α) := f (x(k) − αg (k)). Since αk = arg min φ(α), we have
>
0 = φ0(αk ) = ∇f (x(k) − αk g (k))>g (k) = g (k+1) g (k)
On the other hand, we have
For a prescribed > 0, terminate the iteration if one of the followings is met:
• kg (k)k < ;
3
0(
2
1
0
0.000 0.002 0.004 0.006 0.008 0.010
Xiaojing Ye, Math & Stat, Georgia State University 9
In the 2nd iteration:
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
0.75
2(
0.50
0.25
0.00
10 12 14 16 18 20
Then we need to find the step size αk = arg minα φ(α) where
1 (k)
(x − αg (k) )> Q(x(k) − αg (k) ) − b> (x(k) − αg (k) )
φ(α) := f (x(k) − αg (k) ) =
2
Solving φ0(α) = −(x(k) − αg (k))>Qg (k) + b>g (k) = 0, we obtain
>
(Qx(k) − b)>g (k) g (k) g (k)
αk = =
g (k) > Qg (k)
>
g (k) Qg (k)
Since ∇2f (x) = Q 0, f is strictly convex and only has a unique minimizer,
denoted by x∗.
V (x(k+1)) = (1 − γk )V (x(k))
where
0
if kg (k)k = 0
γk = > >
g (k) Qg (k) g (k) g (k)
αk
2 − αk if kg (k)k 6= 0
(k) > (k) >
g Q−1g (k) g Qg (k)
If kg (k)k 6= 0, then
1 (k+1)
V (x(k+1)) = (x − x∗)>Q(x(k+1) − x∗)
2
1
= (x(k) − x∗ + αk g (k))>Q(x(k) − x∗ + αk g (k))
2
(k) (k) > (k) ∗ 1 2 (k)>
= V ( x ) − αk g Q(x − x ) + αk g Qg (k)
2
Therefore
(k) > (k) ∗ 1 2 (k) >
V (x(k)) − V (x(k+1)) αk g Q ( x − x ) − 2 αk g Qg (k)
=
V (x(k)) V (x(k))
i=0
γ ≤ − log(1 − γ) ≤ 2γ
which hold for γ ≥ 0 close to 0. Then use the squeeze theorem.
for any x.
>
g (k) g (k)
First recall that αk = > .
g (k) Qg (k)
Then there is
(k) > > (k)
g Qg (k) g (k) g
γk = αk 2 − αk
g (k) > Q−1g (k) g (k) > Qg (k)
(k) > (k) 2
(g g )
= > >
g (k) Qg (k)g (k) Q−1g (k)
kg (k)k4 λmin(Q)
= 2 2
≥ >0
(k)
kg kQkg kQ−1 (k) λmax(Q)
P
Therefore k γk = +∞.
Theorem. If the step size α > 0 is fixed, then the gradient method converges
if and only if
2
0<α<
λmax(Q)
kg (k)k2
Q
kg (k) k2
γk = α 2
2 2
−α
(k)
kg kQ−1 (k)
kg kQ
λmin(Q)kg (k)k2 2
≥α −α
−1 (k) 2
λmax(Q )kg k λmax(Q)
2
= αλ2
min (Q) −α >0
λmax(Q)
Therefore k γk = ∞ and hence GM converges.
P
Solution. First rewrite f into the standard quadratic form with symmetric Q:
" √ # " #
1 8 2 2 3
f (x) = x > √ x + x> + 24
2 2 2 10 6
" √ #
8 2 2
Then we compute the eigenvalues of Q = √ :
2 2 10
√
λ −√8 −2 2
|λI − Q| = = (λ − 8)(λ − 10) − 8 = (λ − 6)(λ − 12)
−2 2 λ − 10
2 ).
Hence λmax(Q) = 12, and the range of α should be (0, 12
Q)
Remark. λλmax((Q )
= kQ kkQ −1 k is called the condition number of Q.
min
kx(k+1) − x∗k
0 < lim <∞
k→∞ kx(k) − x∗ kp
It can be shown that p ≥ 1, and the larger p is, the faster the convergence is.
• x(k) = 1
k → 0, then
|x(k+1)| kp
= <∞
(k)
|x | p k+1
if p ≤ 1. Therefore x(k) → 0 with order 1.
|x(k+1)| q k+1
= kp = q k(1−p)+1 < ∞
|x(k)|p q
if p ≤ 1. Therefore x(k) → 0 with order 1.
k
• x(k) = q 2 → 0, then
k+1
|x(k+1)| q2 2k (2−p) < ∞
= k
= q
|x(k)|p q p2
if p ≤ 2. Therefore x(k) → 0 with order 2.
Instead, we prefer inexact line search. That is, we do not exactly solve
• guarantees convergence.
Backtracking: choose initial guess α(0) and τ ∈ (0, 1) (e.g., τ = 0.5), then
set α = α(0) and repeat:
1. Check whether φk (α) ≤ φk (0) + εαφ0k (0) (first Armijo condition). If yes,
then terminate.
2. Shrink α to τ α.
Proof of Claim. The second Wolfe condition φ0k (αk ) ≥ ηφ0k (0) implies
φ0k (αk ) − φ0k (0) ≥ (η − 1)φ0k (0), which is
L
−h∇f (x(k+1)) − ∇f (x(k)), g (k)i ≤ kx(k+1) − x(k)k2 = Lαk kg (k)k2
αk
Combining the two inequalities above yields the claim.
ε(1 − η) K−1
kg (k)k2 ≤ f (x(0)) − f (x(K)) < ∞
X
L k=0