0% found this document useful (0 votes)

13 views43 pages

Lec 02

Lecture 2

Uploaded by

Murtaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views43 pages

Lec 02

Lecture 2

Uploaded by

Murtaza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

MATH 4211/6211 – Optimization

Gradient Method

Xiaojing Ye
Department of Mathematics & Statistics
Georgia State University

Xiaojing Ye, Math & Stat, Georgia State University 0

Consider x(k) and compute g (k) := ∇f (x(k)). Set descent direction to
d(k) = −g (k).

Now we want to find α ≥ 0 such that x(k) − αg (k) improves x(k).

Define φ(α) := f (x(k) − αg (k)), then φ has Taylor expansion:

f (x(k) − αg (k)) = f (x(k)) − αkg (k)k2 + o(α)

For α sufficiently small, we have

f (x(k) − αg (k)) ≤ f (x(k))

Xiaojing Ye, Math & Stat, Georgia State University 1

Gradient Descent Method (or Gradient Method):

x(k+1) = x(k) − αk g (k)

Set an initial guess x(0), and iterate the scheme above to obtain {x(k) : k =
0, 1, . . . }.

• x(k): current estimate;

• g (k) := ∇f (x(k)): gradient at x(k);

• αk ≥ 0: step size.

Xiaojing Ye, Math & Stat, Georgia State University 2

Steepest Descent Method: choose αk such that

αk = arg min f (x(k) − αg (k))

α≥0
Steepest descent method is an exact line search method.

We will first discuss some properties of steepest descent method, and con-
sider other (inexact) line search methods.

Xiaojing Ye, Math & Stat, Georgia State University 3

Proposition. Let {x(k)} be obtained by steepest descent method, then

(x(k+2) − x(k+1))>(x(k+1) − x(k)) = 0

Proof. Define φ(α) := f (x(k) − αg (k)). Since αk = arg min φ(α), we have
>
0 = φ0(αk ) = ∇f (x(k) − αk g (k))>g (k) = g (k+1) g (k)
On the other hand, we have

x(k+2) = x(k+1) − αk+1g (k+1)

x(k+1) = x(k) − αk g (k)
Therefore, we have
>
(x(k+2) − x(k+1))>(x(k+1) − x(k)) = αk+1αk g (k+1) g (k) = 0.

Xiaojing Ye, Math & Stat, Georgia State University 4

Proposition. Let {x(k)} be obtained by steepest descent method and g (k) 6=
0, then f (x(k+1)) < f (x(k))

Proof. Define φ(α) := f (x(k) − αg (k)). Then

φ0(0) = −∇f (x(k) − 0g (k))>g (k) = −kg (k)k2 < 0.

Since αk is a minimizer, there is

f (x(k+1)) = φ(αk ) < φ(0) = f (x(k)).

Xiaojing Ye, Math & Stat, Georgia State University 5

Stopping Criterion.

For a prescribed > 0, terminate the iteration if one of the followings is met:

• kg (k)k < ;

• |f (x(k+1)) − f (x(k))| < ;

• kx(k+1) − x(k)k < .

More preferable choices using “relative change”:

• |f (x(k+1)) − f (x(k))|/|f (x(k))| < ;

• kx(k+1) − x(k)k/kx(k)k < .

Xiaojing Ye, Math & Stat, Georgia State University 6

Example. Use steepest descent method for 3 iterations on

f (x1, x2, x3) = (x1 − 4)4 + (x2 − 3)2 + 4(x3 + 5)4

with initial point x(0) = [4, 2, −1]>.

Solution. We will repeatedly use the gradient, so let’s compute it first:

 
3
 4(x1 − 4) 
∇f (x) = 
 2(x2 − 3) 

16(x3 + 5)3

We keep in mind that x∗ = [4, 3, −5]>.

Xiaojing Ye, Math & Stat, Georgia State University 7

In the 1st iteration:

• Current iterate: x(0) = [4, 2, −1]>;

• Current gradient: g (0) = ∇f (x(0)) = [0, −2, 1024]>;

• Find step size:

α0 = arg min f (x(0) − αg (0))

α≥0

= arg min 0 + (2 + 2α − 3)2 + 4(−1 − 1024α + 5)4
α≥0

and use secant method to get α0 = 3.967 × 10−3.

• Next iterate: x(1) = x(0) − α0g (0) = · · · = [4.000, 2.008, −5.062]>.

Xiaojing Ye, Math & Stat, Georgia State University 8

1e3
6
5
4
)

3
0(

2
1
0
0.000 0.002 0.004 0.006 0.008 0.010
Xiaojing Ye, Math & Stat, Georgia State University 9
In the 2nd iteration:

• Current iterate: x(1) = [4.000, 2.008, −5.062]>;

• Current gradient: g (1) = ∇f (x(1)) = [0.001, −1.984, −0.003875]>;

• Find step size:

α1 = arg min f (x(1) − αg (1) )
α≥0

2 4
= arg min 0 + (2.008 + 1.984α − 3) + 4(−5.062 + 0.003875α + 5)
α≥0

and use secant method to get α1 = 0.500.

• Next iterate: x(2) = x(1) − α1g (1) = · · · = [4.000, 3.000, −5.060]>.

Xiaojing Ye, Math & Stat, Georgia State University 10

1.0
0.8
0.6
)
2(

0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0

Xiaojing Ye, Math & Stat, Georgia State University 11

In the 3rd iteration:

• Current iterate: x(2) = [4.000, 3.000, −5.060]>;

• Current gradient: g (2) = ∇f (x(2)) = [0.000, 0.000, −0.003525]>;

• Find step size:

α2 = arg min f (x(2) − αg (2) )
α≥0

4
= arg min 0 + 0 + 4(−5.060 + 0.003525α + 5)
α≥0

and use secant method to get α2 = 16.29.

• Next iterate: x(3) = x(2) − α2g (2) = · · · = [4.000, 3.000, −5.002]>.

Xiaojing Ye, Math & Stat, Georgia State University 12

1e 6
1.50
1.25
1.00
)

0.75
2(

0.50
0.25
0.00
10 12 14 16 18 20

Xiaojing Ye, Math & Stat, Georgia State University 13

A quadratic function f of x can be written as

f (x) = x>Ax − b>x

where A is not necessarily symmetric.

Note that x>Ax = x>A>x and hence x>Ax = 1 > >

2 x (A + A )x where
A + A> is symmetric.

Therefore, a quadratic function can always be rewritten as

1 >
f (x) = x Qx − b>x
2
where Q is symmetric. In this case, the gradient and Hessian are:

∇f (x) = Qx − b and ∇2f (x) = Q.

Xiaojing Ye, Math & Stat, Georgia State University 14

Now let’s see what happens when we apply the steepest descent method to a
quadratic function f :
1 >
f (x) = x Qx − b>x
2
where Q 0.

At k-th iteration, we have x(k) and g (k) = ∇f (x(k)) = Qx(k) − b.

Then we need to find the step size αk = arg minα φ(α) where
1 (k)
(x − αg (k) )> Q(x(k) − αg (k) ) − b> (x(k) − αg (k) )
φ(α) := f (x(k) − αg (k) ) =
2
Solving φ0(α) = −(x(k) − αg (k))>Qg (k) + b>g (k) = 0, we obtain
>
(Qx(k) − b)>g (k) g (k) g (k)
αk = =
g (k) > Qg (k)
>
g (k) Qg (k)

Xiaojing Ye, Math & Stat, Georgia State University 15

Therefore, the steepest descent method applied to f (x) = 1 > >
2 x Qx − b x
with Q 0 yields
(k) > (k) !
g g
x(k+1) = x(k) − >
g (k)
g (k) Qg (k)

Xiaojing Ye, Math & Stat, Georgia State University 16

Several concepts about algorithms and convergence:

• Iterative algorithm: an algorithm that generates sequence x(0), x(1),

x(2),. . . , each based on the points preceding it.

• Descent method: a method/algorithm such that f (x(k+1)) ≤ f (x(k)).

• Globally convergent: an algorithm that generates sequence x(k) → x∗

starting from ANY x(0).

• Locally convergent: an algorithm that generates sequence x(k) → x∗ if

x(0) is sufficiently close to x∗.

• Rate of convergence: how fast is the convergence (more later).

Xiaojing Ye, Math & Stat, Georgia State University 17

Now we come back to the convergence of the steepest descent applied to
quadratic function f (x) = 1
2 x > Qx − b> x where Q 0.

Since ∇2f (x) = Q 0, f is strictly convex and only has a unique minimizer,
denoted by x∗.

By FONC, there is ∇f (x∗) = Qx∗ − b = 0, i.e., Qx∗ = b.

Xiaojing Ye, Math & Stat, Georgia State University 18

To examine the convergence, we consider
1 ∗>
V (x) := f (x) + x Qx∗
2
= ···
1
= (x − x∗)>Q(x − x∗)
2
(show this as an exercise).

Since Q 0, there is V (x) = 0 iff x = x∗.

Xiaojing Ye, Math & Stat, Georgia State University 19

Lemma. Let {x(k)} be generated by the steepest descent method. Then

V (x(k+1)) = (1 − γk )V (x(k))
where

0

 if kg (k)k = 0
γk = > >
g (k) Qg (k) g (k) g (k)
αk
 2 − αk if kg (k)k 6= 0
(k) > (k) >

g Q−1g (k) g Qg (k)

Xiaojing Ye, Math & Stat, Georgia State University 20

Proof. If kg (k)k = 0, then x(k+1) = x(k) and V (x(k+1)) = V (x(k)).
Hence γk = 0.

If kg (k)k 6= 0, then
1 (k+1)
V (x(k+1)) = (x − x∗)>Q(x(k+1) − x∗)
2
1
= (x(k) − x∗ + αk g (k))>Q(x(k) − x∗ + αk g (k))
2
(k) (k) > (k) ∗ 1 2 (k)>
= V ( x ) − αk g Q(x − x ) + αk g Qg (k)
2
Therefore
(k) > (k) ∗ 1 2 (k) >
V (x(k)) − V (x(k+1)) αk g Q ( x − x ) − 2 αk g Qg (k)
=
V (x(k)) V (x(k))

Xiaojing Ye, Math & Stat, Georgia State University 21

Note that:

Q(x(k) − x∗) = Qx(k) − b = ∇f (x(k)) = g (k)

x(k) − x∗ = Q−1g (k)
(k) 1 (k) ∗ > (k) ∗ 1 (k)> −1 (k)
V (x ) = (x − x ) Q(x −x )= g Q g
2 2
Then we obtain
V (x(k)) − V (x(k+1)) αk a − 1 α 2b
2 k =α b

a

= 1 k 2 − αk
(k)
V (x ) c b
2c
where
> > >
a := g (k) g (k), b := g (k) Qg (k), c := g (k) Q−1g (k)

Xiaojing Ye, Math & Stat, Georgia State University 22

Now we have obtained V (x(k+1)) = (1 − γk )V (x(k)), from which we have
"k−1 #
V (x(k)) = (1 − γi) V (x(0))
Y

i=0

Since x(0) is given and fixed, we can see

k−1
V (x(k)) → 0
Y
⇐⇒ (1 − γi) → 0
i=0
k−1
X
⇐⇒ − log(1 − γi) → +∞
i=0
k−1
X
⇐⇒ γi → +∞
i=0

Xiaojing Ye, Math & Stat, Georgia State University 23

We summarize the result below:
n o
Theorem. Let x(k) be generated by the gradient algorithm for a quadratic
function f (x) = (1/2)x>Qx − b>x (where Q 0) with step sizes αk
converges, i.e., x(k) → x∗ iff ∞
P
k=0 γk = +∞.

Proof. (Sketch) Use the inequalities

γ ≤ − log(1 − γ) ≤ 2γ
which hold for γ ≥ 0 close to 0. Then use the squeeze theorem.

Xiaojing Ye, Math & Stat, Georgia State University 24

Rayleigh’s inequality: given a symmetric Q 0, there is

λmin(Q)kxk2 ≤ x>Qx =: kxk2

Q ≤ λmax ( Q )kx k2

for any x.

Here λmin(Q) (λmax(Q)) are the minimum (maximum) eigenvalue of Q.

In addition, we can get the min/max eigenvalues of Q−1:

1 1
λmin(Q−1) = and λmax(Q−1) =
λmax(Q) λmin(Q)

Xiaojing Ye, Math & Stat, Georgia State University 25

Lemma. If Q 0, then for any x, there is
λmin(Q) kxk4 λmax(Q)
≤ 2 2
≤
λmax(Q) kxkQkxkQ−1 λmin(Q)

Proof. By Rayleigh’s inequality, we have

kxk2 kxk2
λmin(Q)kxk2 ≤ kxk2
Q ≤ λmax ( Q)kx k2 and 2
≤ kxkQ−1 ≤
λmax(Q) λmin(Q)
These imply
1 kxk2 1 kxk2
≤ 2
≤ and λmin(Q) ≤ 2
≤ λmax(Q)
λmax(Q) kxkQ λmin(Q) kxkQ−1
Multiplying the two yields the claim.

Xiaojing Ye, Math & Stat, Georgia State University 26

P
We can show the steepest descent method has αk set to satisfy k γk =
+∞:

>
g (k) g (k)
First recall that αk = > .
g (k) Qg (k)

Then there is
(k) > > (k)
g Qg (k) g (k) g

γk = αk 2 − αk
g (k) > Q−1g (k) g (k) > Qg (k)
(k) > (k) 2
(g g )
= > >
g (k) Qg (k)g (k) Q−1g (k)
kg (k)k4 λmin(Q)
= 2 2
≥ >0
(k)
kg kQkg kQ−1 (k) λmax(Q)
P
Therefore k γk = +∞.

Xiaojing Ye, Math & Stat, Georgia State University 27

Now let’s consider the gradient method with fixed step size α > 0:

Theorem. If the step size α > 0 is fixed, then the gradient method converges
if and only if
2
0<α<
λmax(Q)

Proof. “⇐” Suppose 0 < α < λ 2(Q) , then

max

kg (k)k2
Q

kg (k) k2
γk = α 2
2 2
−α
(k)
kg kQ−1 (k)
kg kQ
λmin(Q)kg (k)k2 2

≥α −α
−1 (k) 2
λmax(Q )kg k λmax(Q)
2

= αλ2
min (Q) −α >0
λmax(Q)
Therefore k γk = ∞ and hence GM converges.
P

Xiaojing Ye, Math & Stat, Georgia State University 28

“⇒” Suppose GM converges but α ≤ 0 or α ≥ λ 2(Q) . Then if x(0) is
max
chosen such that x(0) − x∗ is the eigenvector corresponding to the eigenvalue
λmax(Q) of Q, we have

x(k+1) − x∗ = x(k) − αg (k) − x∗

= x(k) − α(Qx(k) − b) − x∗
= x(k) − α(Qx(k) − Qx∗) − x∗
= (I − αQ)(x(k) − x∗)
= (I − αQ)k+1(x(0) − x∗)
= (1 − αλmax(Q))k+1(x(0) − x∗)
Taking norm on both sides yields

kx(k+1) − x∗k = |1 − αλmax(Q)|k+1kx(0) − x∗k

where |1 − αλmax(Q)| ≥ 1 if α ≤ 0 or α ≥ λ 2(Q) . Contradiction.
max

Xiaojing Ye, Math & Stat, Georgia State University 29

Example. Find an appropriate α for the GM with fixed step size α for
" √ # " #
4 2 2 3
f (x) = x> x + x> + 24
0 5 6

Solution. First rewrite f into the standard quadratic form with symmetric Q:
" √ # " #
1 8 2 2 3
f (x) = x > √ x + x> + 24
2 2 2 10 6
" √ #
8 2 2
Then we compute the eigenvalues of Q = √ :
2 2 10
√
λ −√8 −2 2
|λI − Q| = = (λ − 8)(λ − 10) − 8 = (λ − 6)(λ − 12)
−2 2 λ − 10
2 ).
Hence λmax(Q) = 12, and the range of α should be (0, 12

Xiaojing Ye, Math & Stat, Georgia State University 30

Convergence rate of steepest descent method:

1 x> Qx + b> x with Q 0 yields

Recall that applying SD to f (x) = 2

V (x(k+1)) ≤ (1 − κ)V (x(k))

1 (x − x∗ )> Q(x − x∗ ), and κ = λmin (Q) .
where V (x) := 2 λ (Q) max

Q)
Remark. λλmax((Q )
= kQ kkQ −1 k is called the condition number of Q.
min

Xiaojing Ye, Math & Stat, Georgia State University 31

Order of convergence

We say x(k) → x∗ with order p if

kx(k+1) − x∗k
0 < lim <∞
k→∞ kx(k) − x∗ kp

It can be shown that p ≥ 1, and the larger p is, the faster the convergence is.

Xiaojing Ye, Math & Stat, Georgia State University 32

Example.

• x(k) = 1
k → 0, then

|x(k+1)| kp
= <∞
(k)
|x | p k+1
if p ≤ 1. Therefore x(k) → 0 with order 1.

• x(k) = q k → 0 for some q ∈ (0, 1), then

|x(k+1)| q k+1
= kp = q k(1−p)+1 < ∞
|x(k)|p q
if p ≤ 1. Therefore x(k) → 0 with order 1.

Xiaojing Ye, Math & Stat, Georgia State University 33

Example.

k
• x(k) = q 2 → 0, then
k+1
|x(k+1)| q2 2k (2−p) < ∞
= k
= q
|x(k)|p q p2
if p ≤ 2. Therefore x(k) → 0 with order 2.

Xiaojing Ye, Math & Stat, Georgia State University 34

In general, we have the following result:

Theorem. If kx(k+1) − x∗k = O(kx(k) − x∗kp), then the convergence is of

order at least p.

Remark. Note that p ≥ 1.

Xiaojing Ye, Math & Stat, Georgia State University 35

Descent method and line search

Given a descent direction d(k) of f : Rn → R at x(k) (e.g., d(k) = −g (k)),

we need to decide the step size αk in order to compute

x(k+1) = x(k) + αk d(k).

Exact line search computes αk by solving for

αk = arg min φk (α), where φk (α) := f (x(k) + αd(k)).

Notice that φ : R+ → R and φ0(α) = ∇f (x(k) + αd(k))d(k). Hence we can

use the secant method:
α(l) − α(l−1) 0 (α(l) ).
α(l+1) = α(l) − φ
φ0k (α(l)) − φ0k (α(l−1)) k
with some initial guess α(0), α(1), and set αk to liml→∞ α(l).

Xiaojing Ye, Math & Stat, Georgia State University 36

In practice, it is not computationally economical to use exact line search.

Instead, we prefer inexact line search. That is, we do not exactly solve

αk = arg min φk (α), where φk (α) := f (x(k) + αd(k)),

α
but only require αk to satisfy certain conditions such that:

• easy to compute in practice.

• guarantees convergence.

• performs well in practice.

Xiaojing Ye, Math & Stat, Georgia State University 37

There are several commonly used conditions for αk :

• Armijo condition: let ε ∈ (0, 1), γ > 1 and

φk (αk ) ≤ φk (0) + εαk φ0k (0) (so αk not too large)

φk (γαk ) ≥ φk (0) + εγαk φ0k (0) (so αk not too small)

• Armijo-Goldstein condition: let 0 < ε < η < 1 and

φk (αk ) ≤ φk (0) + εαk φ0k (0) (so αk not too large)

φk (αk ) ≥ φk (0) + ηαk φ0k (0) (so φ0k (αk ) not too small)

• Wolfe condition: let 0 < ε < η < 1 and

φk (αk ) ≤ φk (0) + εαk φ0k (0) (so αk not too large)

φ0k (αk ) ≥ ηφ0k (0) (so φk not too steep at αk )
Strong-Wolfe condition: replaces the second condition with |φ0k (αk )| ≤
η|φ0k (0)|.

Xiaojing Ye, Math & Stat, Georgia State University 38

Backtracking line search

In practice, we often use the following backtracking line search:

Backtracking: choose initial guess α(0) and τ ∈ (0, 1) (e.g., τ = 0.5), then
set α = α(0) and repeat:

1. Check whether φk (α) ≤ φk (0) + εαφ0k (0) (first Armijo condition). If yes,
then terminate.

2. Shrink α to τ α.

In other words, we find the smallest integer m ∈ N0 such that αk = τ mα(0)

satisfies the first Armijo condition φk (αk ) ≤ φk (0) + εαk φ0k (0).

Xiaojing Ye, Math & Stat, Georgia State University 39

Why line search guarantees convergence?

First, note that here by convergence we mean k∇f (x(k))k → 0.

We take Wolfe condition and d(k) = −g (k) for simplicity. Assume ∇f is

L-Lipschitz continuous. Now

x(k+1) = x(k) − αk g (k)

φk (αk ) = f (x(k+1))
φ0k (αk ) = −∇f (x(k+1))g (k)
φk (0) = f (x(k))
φ0k (0) = −∇f (x(k))g (k)
Moreover, L-Lipschitz continuity of ∇f implies

±h∇f (x) − ∇f (y ), x − y i ≤ k∇f (x) − ∇f (y )kkx − y k ≤ Lkx − y k2

for any x, y .

Xiaojing Ye, Math & Stat, Georgia State University 40

Claim. αk ≥ 1−η
L .

Proof of Claim. The second Wolfe condition φ0k (αk ) ≥ ηφ0k (0) implies
φ0k (αk ) − φ0k (0) ≥ (η − 1)φ0k (0), which is

−h∇f (x(k+1)) − ∇f (x(k)), g (k)i ≥ (1 − η)kg (k)k2.

(k+1) −x(k)
Note that g (k) = x α , we know
k

L
−h∇f (x(k+1)) − ∇f (x(k)), g (k)i ≤ kx(k+1) − x(k)k2 = Lαk kg (k)k2
αk
Combining the two inequalities above yields the claim.

Xiaojing Ye, Math & Stat, Georgia State University 41

The first Wolfe condition (Armijo condition) implies

(k+1) (k) (k) 2 (k) ε(1 − η) (k) 2

f (x ) ≤ f (x ) − εαk kg k ≤ f (x ) − kg k .
L
Taking telescope sum yields

(K) (0) ε(1 − η) K−1

kg (k)k2.
X
f (x ) ≤ f (x ) −
L k=0
which implies

ε(1 − η) K−1
kg (k)k2 ≤ f (x(0)) − f (x(K)) < ∞
X
L k=0

for any K (we assume f is bounded below). Notice that ε(1−η)

L > 0.

Therefore kg (k)k = k∇f (x(k))k → 0.

Xiaojing Ye, Math & Stat, Georgia State University 42

Gradient Descent
No ratings yet
Gradient Descent
18 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Topic3 PDF
No ratings yet
Topic3 PDF
50 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Part3 1
No ratings yet
Part3 1
15 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
Nonlinear Spring21
No ratings yet
Nonlinear Spring21
45 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Minimization of Functions Having Lipschitz Continuous First Partial Derivatives
No ratings yet
Minimization of Functions Having Lipschitz Continuous First Partial Derivatives
4 pages
EE364a Homework 7 Solutions
No ratings yet
EE364a Homework 7 Solutions
16 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Analysis of Quadratic Case: Theorem
No ratings yet
Analysis of Quadratic Case: Theorem
3 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Optimization Lecture 3
No ratings yet
Optimization Lecture 3
7 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Optimization Lecture 2
No ratings yet
Optimization Lecture 2
7 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Line Search Algorithms: Bracket
No ratings yet
Line Search Algorithms: Bracket
13 pages
Rrrdesdelinear and Nonlinear Programming-4
No ratings yet
Rrrdesdelinear and Nonlinear Programming-4
3 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Steepest Descent
No ratings yet
Steepest Descent
4 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Seamo Paper E
75% (4)
Seamo Paper E
8 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
Chương 9
No ratings yet
Chương 9
12 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
7 Numerical Methods For Unconstrained Optimization PDF
No ratings yet
7 Numerical Methods For Unconstrained Optimization PDF
25 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Line Search Algorithms
No ratings yet
Line Search Algorithms
13 pages
Minimization of Functions Having Lipschitz Continuous First Partial Derivatives
No ratings yet
Minimization of Functions Having Lipschitz Continuous First Partial Derivatives
4 pages
Assignment
No ratings yet
Assignment
2 pages
Download
No ratings yet
Download
7 pages
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
No ratings yet
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
8 pages
Optimal and Robust Estimation With An Introduction To Stochastic Control Theory Second Edition Frank L. Lewis Instant Download
100% (1)
Optimal and Robust Estimation With An Introduction To Stochastic Control Theory Second Edition Frank L. Lewis Instant Download
62 pages
Lecture10 PDF
No ratings yet
Lecture10 PDF
4 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
No ratings yet
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
3 pages
Macro 1 Theory and Background - Rel 108 OM Format PDF
75% (4)
Macro 1 Theory and Background - Rel 108 OM Format PDF
33 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Fantasy Sports Prediction Clustering Analysis
No ratings yet
Fantasy Sports Prediction Clustering Analysis
21 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Yaskawa Cp9200sh Manual
No ratings yet
Yaskawa Cp9200sh Manual
328 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
The Bivariate Poisson Distribution
No ratings yet
The Bivariate Poisson Distribution
45 pages
Humaira Thesis
No ratings yet
Humaira Thesis
28 pages
Quantization and Compression PDF
No ratings yet
Quantization and Compression PDF
220 pages
Division Properties of Exponents Homework Help
100% (1)
Division Properties of Exponents Homework Help
5 pages
Lecture 1 Slides
No ratings yet
Lecture 1 Slides
84 pages
Lecture 10 Control Plane Functions 28th July
No ratings yet
Lecture 10 Control Plane Functions 28th July
39 pages
EXAMPLE Bar Graph
No ratings yet
EXAMPLE Bar Graph
1 page
Project-4 MS
No ratings yet
Project-4 MS
49 pages
Lectr14 (STOCHASTIC PROCESSES)
No ratings yet
Lectr14 (STOCHASTIC PROCESSES)
49 pages
05 Shearbear Print
No ratings yet
05 Shearbear Print
50 pages
Immediate Download Stability of Buildings Part 4 Moment Frames 1st Edition Andy Gardner Ebooks 2024
100% (1)
Immediate Download Stability of Buildings Part 4 Moment Frames 1st Edition Andy Gardner Ebooks 2024
61 pages
Karnaugh Maps 1
No ratings yet
Karnaugh Maps 1
18 pages
2.2 Freefall
No ratings yet
2.2 Freefall
2 pages
Cambridge IGCSE™: Physics 0625/42 May/June 2022
No ratings yet
Cambridge IGCSE™: Physics 0625/42 May/June 2022
12 pages
Lecture 7
No ratings yet
Lecture 7
12 pages
Annex To ED Decision 2019-012-R CS-HPT-DSN
No ratings yet
Annex To ED Decision 2019-012-R CS-HPT-DSN
53 pages
10 Graphs of Polynomial Functions
No ratings yet
10 Graphs of Polynomial Functions
25 pages
Study Design Comparison
No ratings yet
Study Design Comparison
6 pages
ML Unit Wise Important Questions
No ratings yet
ML Unit Wise Important Questions
2 pages
Topic 3 Notes: Jeremy Orloff
No ratings yet
Topic 3 Notes: Jeremy Orloff
11 pages
1985-Orientational Analysis, Tensor Analysis and The Group Properties of The SI Supplementary units-II
No ratings yet
1985-Orientational Analysis, Tensor Analysis and The Group Properties of The SI Supplementary units-II
18 pages
Lecture 11 SDN Layers Elements 4th August
No ratings yet
Lecture 11 SDN Layers Elements 4th August
27 pages
Checking The Timing Between Asynchronous Clock Group Paths
No ratings yet
Checking The Timing Between Asynchronous Clock Group Paths
14 pages
Simple Horserace Game Worksheet For Visual Basic 2008
No ratings yet
Simple Horserace Game Worksheet For Visual Basic 2008
4 pages
Formulating and Solving LPs Using Excel Solver
No ratings yet
Formulating and Solving LPs Using Excel Solver
10 pages
Questions
No ratings yet
Questions
2 pages
Design and Control of A Manipulator Arm Driven by Pneumatic Muscle Actuators
No ratings yet
Design and Control of A Manipulator Arm Driven by Pneumatic Muscle Actuators
6 pages
Joint Sparse Channel Estimation and Data Detection For Underwater Acoustic Channels Using Partial Interval Demodulation
No ratings yet
Joint Sparse Channel Estimation and Data Detection For Underwater Acoustic Channels Using Partial Interval Demodulation
6 pages
Iccs Brochure Vagai
No ratings yet
Iccs Brochure Vagai
2 pages

Lec 02

Uploaded by

Lec 02

Uploaded by

MATH 4211/6211 – Optimization

Xiaojing Ye, Math & Stat, Georgia State University 0

Now we want to find α ≥ 0 such that x(k) − αg (k) improves x(k).

Define φ(α) := f (x(k) − αg (k)), then φ has Taylor expansion:

f (x(k) − αg (k)) = f (x(k)) − αkg (k)k2 + o(α)

For α sufficiently small, we have

f (x(k) − αg (k)) ≤ f (x(k))

Xiaojing Ye, Math & Stat, Georgia State University 1

x(k+1) = x(k) − αk g (k)

• x(k): current estimate;

• g (k) := ∇f (x(k)): gradient at x(k);

Xiaojing Ye, Math & Stat, Georgia State University 2

αk = arg min f (x(k) − αg (k))

Xiaojing Ye, Math & Stat, Georgia State University 3

(x(k+2) − x(k+1))>(x(k+1) − x(k)) = 0

x(k+2) = x(k+1) − αk+1g (k+1)

Xiaojing Ye, Math & Stat, Georgia State University 4

Proof. Define φ(α) := f (x(k) − αg (k)). Then

φ0(0) = −∇f (x(k) − 0g (k))>g (k) = −kg (k)k2 < 0.

f (x(k+1)) = φ(αk ) < φ(0) = f (x(k)).

Xiaojing Ye, Math & Stat, Georgia State University 5

• |f (x(k+1)) − f (x(k))| < ;

• kx(k+1) − x(k)k < .

More preferable choices using “relative change”:

• |f (x(k+1)) − f (x(k))|/|f (x(k))| < ;

• kx(k+1) − x(k)k/kx(k)k < .

Xiaojing Ye, Math & Stat, Georgia State University 6

f (x1, x2, x3) = (x1 − 4)4 + (x2 − 3)2 + 4(x3 + 5)4

Solution. We will repeatedly use the gradient, so let’s compute it first:

We keep in mind that x∗ = [4, 3, −5]>.

Xiaojing Ye, Math & Stat, Georgia State University 7

• Current iterate: x(0) = [4, 2, −1]>;

• Current gradient: g (0) = ∇f (x(0)) = [0, −2, 1024]>;

• Find step size:

α0 = arg min f (x(0) − αg (0))

and use secant method to get α0 = 3.967 × 10−3.

• Next iterate: x(1) = x(0) − α0g (0) = · · · = [4.000, 2.008, −5.062]>.

Xiaojing Ye, Math & Stat, Georgia State University 8

• Current iterate: x(1) = [4.000, 2.008, −5.062]>;

• Current gradient: g (1) = ∇f (x(1)) = [0.001, −1.984, −0.003875]>;

• Find step size:

and use secant method to get α1 = 0.500.

• Next iterate: x(2) = x(1) − α1g (1) = · · · = [4.000, 3.000, −5.060]>.

Xiaojing Ye, Math & Stat, Georgia State University 10

Xiaojing Ye, Math & Stat, Georgia State University 11

• Current iterate: x(2) = [4.000, 3.000, −5.060]>;

• Current gradient: g (2) = ∇f (x(2)) = [0.000, 0.000, −0.003525]>;

• Find step size:

and use secant method to get α2 = 16.29.

• Next iterate: x(3) = x(2) − α2g (2) = · · · = [4.000, 3.000, −5.002]>.

Xiaojing Ye, Math & Stat, Georgia State University 12

Xiaojing Ye, Math & Stat, Georgia State University 13

f (x) = x>Ax − b>x

Note that x>Ax = x>A>x and hence x>Ax = 1 > >

Therefore, a quadratic function can always be rewritten as

∇f (x) = Qx − b and ∇2f (x) = Q.

Xiaojing Ye, Math & Stat, Georgia State University 14

At k-th iteration, we have x(k) and g (k) = ∇f (x(k)) = Qx(k) − b.

Xiaojing Ye, Math & Stat, Georgia State University 15

Xiaojing Ye, Math & Stat, Georgia State University 16

• Iterative algorithm: an algorithm that generates sequence x(0), x(1),

• Descent method: a method/algorithm such that f (x(k+1)) ≤ f (x(k)).

• Globally convergent: an algorithm that generates sequence x(k) → x∗

• Locally convergent: an algorithm that generates sequence x(k) → x∗ if

• Rate of convergence: how fast is the convergence (more later).

Xiaojing Ye, Math & Stat, Georgia State University 17

By FONC, there is ∇f (x∗) = Qx∗ − b = 0, i.e., Qx∗ = b.

Xiaojing Ye, Math & Stat, Georgia State University 18

Since Q  0, there is V (x) = 0 iff x = x∗.

Xiaojing Ye, Math & Stat, Georgia State University 19

Xiaojing Ye, Math & Stat, Georgia State University 20

Xiaojing Ye, Math & Stat, Georgia State University 21

Q(x(k) − x∗) = Qx(k) − b = ∇f (x(k)) = g (k)

Xiaojing Ye, Math & Stat, Georgia State University 22

Since x(0) is given and fixed, we can see

Xiaojing Ye, Math & Stat, Georgia State University 23

• |f (x(k+1)) − f (x(k))| < ;

• kx(k+1) − x(k)k < .

• |f (x(k+1)) − f (x(k))|/|f (x(k))| < ;

• kx(k+1) − x(k)k/kx(k)k < .

Since Q 0, there is V (x) = 0 iff x = x∗.

1 x> Qx + b> x with Q 0 yields