0% found this document useful (0 votes)
58 views

Lecture 17

The document discusses the projected gradient descent algorithm for constrained optimization problems. It introduces the projection of a point onto a convex set and proves the projection theorem. The projected gradient descent algorithm modifies the gradient descent updating rule by projecting the new point back onto the feasible set in each iteration. The algorithm is guaranteed to produce feasible solutions in each iteration and its convergence rate is analyzed.

Uploaded by

Tấn Long Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Lecture 17

The document discusses the projected gradient descent algorithm for constrained optimization problems. It introduces the projection of a point onto a convex set and proves the projection theorem. The projected gradient descent algorithm modifies the gradient descent updating rule by projecting the new point back onto the feasible set in each iteration. The algorithm is guaranteed to produce feasible solutions in each iteration and its convergence rate is analyzed.

Uploaded by

Tấn Long Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

SYS 6003: Optimization Fall 2016

Lecture 17
Instructor: Quanquan Gu Date: Oct 26th
Today we are going to study the projected gradient descent algorithm.
Consider the following contrained optimization problem :

min f (x). (1)


x∈D

If we apply the gradient descent algorithm directly, we cannot guarantee that in each iter-
ation, xt+1 = xt − ηt ∇f (xt ) will be in D. In other words, we may end up with infeasible
solutions. To ensure that the new point, xt+1 , that obtained in each iteration will be always
in D, one way is to project the new point back onto the feasible set.
Let us first define the projection of a point onto a set.
Definition 1 (Projection) The projection point of x onto a set C is defined as ΠC (x) :=
arg miny∈C 12 kx − yk22 .

Theorem 1 (Projection Theorem) Let C ⊆ Rd be a convex set. For any x ∈ Rd and


y ∈ C, it holds that
(1) (ΠC (x) − y)> (ΠC (x) − x) ≤ 0.;

(2) kΠC (x) − yk22 + kΠC (x) − xk22 ≤ kx − yk22 .

Proof: (1) Let f (y) = 21 kx − yk22 . By the first order necessary condition of local minimum
y∗ = ΠC (x), we have ∇f (y∗ )> d ≥ 0 where d is any feasible directions at y∗ . Let d =
y − ΠC (x). For any y ∈ C, it then follows that

∇f (y∗ )T (y − ΠC (x)) ≥ 0. (2)

Note that ∇f (y∗ ) = −(x − y∗ ) = y∗ − x and y∗ = ΠC (x). From (2), it then follows

(ΠC (x) − x)> (y − ΠC (x)) ≥ 0, i.e.,


(ΠC (x) − x)> (ΠC (x) − y) ≤ 0.

(2) We have

kx − yk22 = kx − ΠC (x) + ΠC (x) − yk22


= kx − ΠC (x)k22 + kΠC (x) − yk22 − 2(ΠC (x) − y)> (ΠC (x) − x)
≥ kx − ΠC (x)k22 + kΠC (x) − yk22 ,

where the inequality follows from part (1). This completes the proof.

Remark 1 Geometrically, the projection theorem says that the the angle between vectors
y − ΠC (x) and ΠC (x) − x is either acute or right.

1
Algorithm 1 Projected Gradient Descent
1: Input: ηt
2: Initialize: x1 ∈ D
3: for t = 1 to T − 1 do
4: xt+1 = ΠD [xt − ηt ∇f (xt )]
5: end for

So we modify the updating rule of gradient descent to be xt+1 = ΠD [xt − ηt ∇f (xt )],
where ΠD (x) is the projection of x onto D. Then we have the projected gradient descent
algorithm shown in Algorithm 1. It is worth noting that if the gradient of f does not exist
at xt , then in the fourth line of Algorithm 1, we can use any subgradient of f at xt instead
of its gradient.
The following theorem provides the convergence rate for the projected gradient descent
algorithm.

Theorem 2 Suppose that f is a convex function, and its subgradient g(x) is bounded by√G,
i.e., kg(x)k2 ≤ G, for any x ∈ D. Then for the projected gradient descent with ηt = 1/ t,
it holds that
 X T   2 
1 ∗ R 2 1
f xt − f (x ) ≤ +G √
T t=1 2 T

where x∗ is the optimal solution to problem (1) and R = maxx,y∈D kx − yk2 is the diameter
of the set convex D.

You might also like