0% found this document useful (0 votes)

18 views21 pages

Projected Gradient

Uploaded by

karlo135

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views21 pages

Projected Gradient

Uploaded by

karlo135

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Projected Gradient Algorithm

Andersen Ang
Content
ECS, Uni. Southampton, UK
[email protected] Unconstrained vs constrained problem
Problem setup
Homepage angms.science Understanding the geometry of projection
PGD is a special case of proximal gradient
Theorem 1. An inequality of PGD with constant stepsize

Theorem 2. PGD converges ergodically at rate O √1 on Lipschitz function
k
Version: July 13, 2023 K
! K
1 X ∗ ∥x0 − x∗ ∥22 α X 2
f xk − f ≤ + ∥∇f (xk )∥2
First draft: August 2, 2017 K + 1 k=0 2α(K + 1) 2(K + 1) k=0

∗ L∥x0 − x∗ ∥
f (x̄K ) − f ≤ √ .
K+1
Unconstrained minimization Constrained minimization
min f (x). min f (x).
x∈Rn x∈Q

▶ All x ∈ Rn is feasible. ▶ Not all x ∈ Rn is feasible.

▶ Any x ∈ Rn can be a solution. ▶ Not all x ∈ Rn can be a solution.
▶ The solution has to be inside the set Q.
▶ An example:

min ∥Ax − b∥22 s.t. ∥x∥2 ≤ 1

x∈Rn

can be expressed as
min ∥Ax − b∥22 .
∥x∥2 ≤1

Here Q := {v ∈ Rn : ∥v∥2 ≤ 1} is known as the unit ℓ2 ball.

2 / 21
Approaches for solving constrained minimization problems

▶ Duality / Lagrangian approach

▶ Not our focus here.
▶ Although the approach of Lagrangian multiplier is usually taught in standard calculus class, the standard
explanation that {gradient on primal variable has to be anti-parallel to gradient on the dual variable} is not
intuitive and it is not the deep reason why the method works.
▶ It requires a deep understanding of convex conjugate, constraint qualifications and duality to appreciate the
Lagrangian approach, which is out of the scope here.

▶ First-order method / gradient-based method

▶ Simple.
▶ Our focus.

▶ Second-order method, Zero-order method, Higher-order method

▶ Not our focus here.

3 / 21
Solving unconstrained problem minn f (x) by gradient descent
x∈R

simple

▶ Gradient descent GD is a easy way to solve unconstrained optimization problem minn f (x).
 x∈R
intuitive


▶ Starting from an initial point x0 ∈ Rn , GD iterates the following until a stopping condition is met:

k ∈ N: the current iteration counter

k + 1 ∈ N: the next iteration counter
xk : the current variable
xk+1 = xk − αk ∇f (xk ), xk+1 : the next variable
∇f is the gradient of f with respect to differentiation of x
∇f (xk ) is the ∇f at the current variable xk
αk ∈ (0, +∞): gradient stepsize

▶ Question: how about constrained problem? Is it possible to tune GD to fit constrained problem?
Answer: yes, and the key is Euclidean projection operator proj : Rn ⇒ Rn .

▶ Remark
▶ We assume f is differentiable (i.e., ∇f exists).
▶ If f is not differentiable, we can replace gradient by subgradient, and we get the so-called subgradient method.

4 / 21
Problem setup of constrained problem

min f (x).
x∈Q

▶ We focus on the Euclidean space Rn

▶ f : Rn → R is the objective / cost function

▶ f is assumed to be continuously differentiable, i.e., ∇f (x) exists for all x f ∈ C1
▶ we assume f is globally L-Lipschitz, but not here |f (x) − f (y)| ≤ L∥x − y∥
▶ we do not assume ∇f is globally L-Lipschitz ∥∇f (x) − ∇f (y)∥ ≤ L∥x − y∥

▶ ∅ ̸= Q ⊂ Rn is convex and compact

▶ The constraint is represented by a set Q
▶ Q ⊂ Rn means Q is a subset of Rn , the domain of f
▶ Q ̸= ∅ means Q is not an empty set it is not useful for discussion if Q is empty
n o
▶ Q is a convex set ∀x∀y∀λ ∈ (0, 1) x ∈ Q, y ∈ Q =⇒ λx + (1 − λ)y ∈ Q
▶ Q is compact compact = bounded + closed

▶ For the details of convexity, Lipschitz, see here.

5 / 21
Solving constrained problem by projected gradient descent
▶ Projected gradient descent PGD = GD + projection

▶ Starting from an initial point x0 ∈ Q, PGD iterates the following equation until a stopping condition is met:
k ∈ N: the current iteration counter
k + 1 ∈ N: the next iteration counter
xk : the current variable
xk+1 : the next variable

xk+1 = PQ xk − αk ∇f (xk ) ,
∇f is the gradient of f with respect to differentiation of x
∇f (xk ) is the ∇f at the current variable xk
αk ∈ (0, +∞): gradient stepsize
PQ is the shorthand of projQ

▶ projQ ( · ) is called Euclidean projection operator, and itself is also an optimization problem:
PQ (x0 ) = projQ (x0 ) = argmin ∥x − x0 ∥2 . (∗)
x∈Q
i.e., given a point x0 , PQ finds a point x ∈ Q which is “closest” to x0 .
▶ The measure of “closeness” here is the Euclidean distance ∥x − x0 ∥2 .

▶ (∗) is equivalent to
1
∥x − x0 ∥22 ,
argmin
2
x∈Q
where we squaring the cost so that the function becomes differentiable.
6 / 21
Comparing PGD to GD ▶ PGD = GD + projection.
GD ▶ if the point xk − αk ∇f (xk ) after the gradient update is leaving the
set Q, project it back.
1. Pick an initial point x0 ∈ Rn ▶ if the point xk − αk ∇f (xk ) after the gradient update is within the
2. Loop until stopping condition is met: set Q, keep the point and do nothing.
2.1 Descent direction: compute −∇f (xk )
2.2 Stepsize: pick a αk ▶ Projection PQ ( · ) : Rn ⇒ Rn
2.3 Update: xk+1 = xk − αk ∇f (xk ) ▶ It is a mapping from Rn to Rn , i.e., a point-to-point mapping
▶ In general, for a nonconvex set Q, such mapping is possibly
non-unique (this is the ⇒)
PGD ▶ PQ ( · ) is an optimization problem
1. Pick an initial point x0 ∈ Q
1 2
2. Loop until stopping condition is met: PQ (x0 ) := argmin ∥x − x0 ∥2 . (⋆)
x∈Q 2
2.1 Descent direction: compute −∇f (xk )
2.2 Stepsize: pick a αk If Q is a convex compact set, the optimization problem has a unique
2.3 Update: yk+1 = xk − αk ∇f (xk ) solution, and we have PQ ( · ) : Rn → Rn
2.4 Projection:
xk+1 = argmin 21 ∥x − yk+1 ∥22

x∈Q easy to solve

▶ PGD is economic if (⋆) is has a closed-form expression

cheap to compute


Q is nonconvex

▶ PGD is possibly not an economic if (∗) has no closed-form expression

(∗) is expensive to compute

7 / 21
Understanding the geometry of projection ... (1/4)
Consider a convex set Q ⊂ Rn and a point x0 ∈ Rn .

Case 1. x0 ∈ Q. Case 2. x0 ∈
/ Q.

x0 x0
Q Q

▶ As x0 ∈ Q, the closest point to x0 in Q will be x0 itself. ▶ Now x0 is outside Q

▶ The distance between a point to itself is zero. ▶ We need to find a point x

▶ x∈Q
▶ Mathematically: ∥x − x0 ∥2 = 0 gives x = x0 .
▶ ∥x − x0 ∥2 is smallest
▶ This is the trivial case and therefore not interesting.
▶ This is case that is interesting.

8 / 21
Understanding the geometry of projection ... (2/4)
▶ The circles are ℓ2 -norm ball centered at x0 with different radius.

▶ Points on these circles are equidistant to x0 (with different l2 distance on different circles).

▶ Note that some points on the blue circle are inside Q, those are feasible points.

x0
Q

9 / 21
Understanding the geometry of projection ... (3/4)
▶ The point inside Q which is closest to x0 is the point where the ℓ2 norm ball “touches” Q.

▶ In this example, the blue point y is the solution to

1
PQ (x0 ) = argmin ∥x − x0 ∥22 .
x∈Q 2

y
x0
Q

▶ In fact, such point is always located on the boundary of Q for x0 ∈

/ Q.
That is, mathematically, if x0 ∈/ Q, then
1
argmin ∥x − x0 ∥22 ∈ bdryQ.
x∈Q 2

10 / 21
Understanding the geometry of projection ... (4/4)
Note that the projection is orthogonal: the blue point y is always on a straight line that is tangent to the norm ball
and Q.

y
x0
Q

The normal to the tangent is exactly x0 − y = x0 − projQ (x0 ).

11 / 21
Property of projection: Bourbaki-Cheney-Goldstein inequality

See the details here

12 / 21
PGD is a special case of proximal gradient

▶ The indicator function ι(x), of a set Q is defined as follows:

(
0 x∈Q
ιQ (x) =
+∞ x∈
/Q

▶ With the indicator function, constrained problem has two equivalent expressions

min f (x) ≡ minf (x) + ιQ (x).

x∈Q x

▶ Proximal gradient is a method to solve the optimization problem of a sum of differentiable and a non-differentiable
function:
minf (x) + g(x),
x
where g is non-differentiable.

▶ PGD is in fact the special case of proximal gradient where g(x) is the indicator function. See here for more about
proximal gradient .

13 / 21
On PGD ergodic convergence rate
▶ Theorem 1. If f is convex, PGD with constant stepsize α satisfies
K
! K x∗ is the (global) minimizer
1 X ∥x0 − x∗ ∥22 α X f ∗ := f (x∗ ) is the optimal cost value
f xk − f ∗ ≤ + ∥∇f (xk )∥22 α is the constant stepsize
K + 1 k=0 2α(K + 1) 2(K + 1) k=0
K is the total of number of iteration performed

▶ Interpretation:
PK
▶ the term 1 xk is the “average” of the sequence xk after K iterations
K+1
PK k=0
▶ denote 1
x k as x̄
K+1 k=0
▶ denote f (x̄) as f¯
Then the theorem reads:
∥x0 − x∗ ∥22
f¯ − f ∗ ≤ + something positive.
2α(K + 1)
1
Hence the convergence rate is like O( K ).

K
α X
▶ The term ∥∇f (xk )∥22 converges to zero
2(K + 1) k=0
K
X
▶ as long as 2
∥∇f (xk )∥2 is not diverging to infinity, or
k=0
K
X
▶ the growth of 2
∥∇f (xk )∥2 is slower than K
k=0

14 / 21
What is ergodic convergence?

▶ Ergodic convergence = “The centroid of a point cloud moving towards the limit point”

▶ Sequence convergence: each of x1 , x2 , ..., xk are all getting closer and closer to x∗

▶ Ergodic convergence: the average of x1 , x2 , ..., xk converges to x∗

▶ which doesn’t imply each of x1 , x2 , ..., xk are getting closer and closer to x∗
▶ some of them can be moving away from x∗ , as long as the centroid is getting closer

15 / 21
Proof of theorem 1 ... (1/3)
Combine the two boxes.
f (z) ≤ f (x) + ⟨∇f (x), z − x⟩ f is convex
∗ ∥xk − x∗ ∥2 2 ∗ 2
2 + ∥xk − yk+1 ∥2 − ∥yk+1 − x ∥2
⇐⇒ f (x) − f (z) ≤ ⟨∇f (x), x − z⟩ f (xk )−f ≤
2α
=⇒ f (xk ) − f ∗ ≤ ⟨∇f (xk ), xk − x∗ ⟩ x = xk , z = x∗ , f (x∗ ) = f ∗

k − yk+1
Dx
PGD
f (xk ) − f ∗ ≤ , xk − x∗
E
⇐⇒ yk+1 = xk − αk ∇f (xk ) PGD
yk+1 = xk − αk ∇f (xk ) we have xk − yk+1 = α∇f (xk )
αk

xk − yk+1 , xk − x∗
=⇒ f (xk ) − f ∗ ≤ constant stepsize Then
α
∗ ∥xk − x∗ ∥2 2 ∗ 2
2 + ∥α∇f (xk )∥2 − ∥yk+1 − x ∥2
So we have f (xk )−f ≤
2α
xk − yk+1 , xk − x∗
f (xk ) − f ∗ ≤
α
Now we have
A not-so-trivial trick

(a − b)(a − c) = a2 − ac − ab + bc ∥xk −x∗ ∥2 ∗ 2

2 −∥yk+1 −x ∥2 + α ∥∇f (x )∥2
f (xk ) − f ∗ ≤ k 2
2α 2
2a2 − 2ac − 2ab + 2bc
=
2
a2 − 2ac + a2 − 2ab + 2bc +c2 − c2 + b2 − b2
=
2
(a − c)2 + (a − b)2 − (b − c)2
=
2

Therefore

∥xk − x∗ ∥2 2 ∗ 2
2 + ∥xk − yk+1 ∥2 − ∥yk+1 − x ∥2
xk − yk+1 , xk − x∗ =
2
16 / 21
Proof of theorem 1 ... (2/3)
Now we have Note ∥yk+1 − x∗ ∥22 ≥ ∥ xk+1 −x∗ ∥22 .
| {z }
projQ (yk+1 )
∥xk −x∗ ∥2 ∗ 2
2 −∥yk+1 −x ∥2
f (xk ) − f ∗ ≤ 2α + α 2
2 ∥∇f (xk )∥2 - This is known as “projection operator is non-expansive”
- “post-projection distance at most the same as the pre-projected”
Next we need to make use of the fact that projection is - This is from the Bourbaki-Cheney-Goldstein inequality
non-expansive. - Details here

Pictorially

Explanation: focus on the term ∥xk − x∗ ∥22 − ∥yk+1 − x∗ ∥22

xx : current variable ΠQ (y) = xk+1
y
yk+1 : gradient updated xk
Q
xk+1 : projected yk+1 x∗
We wish to replace ∥yk+1 − x∗ ∥22 by ∥xk+1 − x∗ ∥22
How: by the fact that projection operator is non-expansive
Hence −∥yk+1 − x∗ ∥22 ≤ −∥xk+1 − x∗ ∥22 and

∥xk −x∗ ∥2 ∗ 2
2 −∥yk+1 −x ∥2
f (xk ) − f ∗ ≤ 2α + α 2
2 ∥∇f (xk )∥2

∥xk −x∗ ∥2 ∗ 2
2 −∥xk+1 −x ∥2 α 2
≤ 2α + 2 ∥∇f (xk )∥2

It forms a telescoping series !

17 / 21
Proof of theorem 1 ... (3/3)
Consider the left hand side, as f is convex, by Jensen’s inequality
∥x0 − x∗ ∥2 ∗ 2
2 − ∥x1 − x ∥2 α
f (x0 ) − f ∗ ≤ ∥∇f (x0 )∥2
 
k = 0 + 1 K 1 K
2 X X
2α 2 f xk  ≤ f (xk ).
K + 1 k=0 K + 1 k=0
∥x1 − x∗ ∥2 ∗ 2
2 − ∥x2 − x ∥2 α
k = 1 f (x1 ) − f ∗ ≤ + ∥∇f (x1 )∥2
2
2α 2 Therefore
.
. ∥x0 − x∗ ∥2
2
.  
∥xk − x∗ ∥2 ∗ 2
2 − ∥xK+1 − x ∥2 + α ∥∇f (x )∥2
1 K
∗
2α(K + 1)
f (xk ) − f ∗ ≤
X
k = K k 2 f xk  − f ≤
2α 2 K + 1 k=0 α K
X 2
+ ∥∇f (xk )∥2 .
Sums all 2(K + 1) k=0

K
X ∗
∥x0 − x∗ ∥2 ∗ 2
2 − ∥xk+1 − x ∥2 α XK
2
f (xk )−f ≤ + ∥∇f (xk )∥2 .
k=0 2α 2 k=0

As 0 ≤ 1 ∥x ∗ 2
2α k+1 − x ∥2 ,

K
X ∗
∥x0 − x∗ ∥2 K
2 + α X ∥∇f (x )∥2 .
f (xk ) − f ≤ k 2
k=0 2α 2 k=0

Expand the summation on the left and divide the whole equation by K + 1

1 K
X ∗ ∥x0 − x∗ ∥2
2 α K
X 2
f (xk ) − f ≤ + ∥∇f (xk )∥2 .
K + 1 k=0 2α(K + 1) 2(K + 1) k=0

18 / 21

PGD converges ergodically at order O √1 on Lipschitz function
k

K
( )
1 X ∥x0 − x∗ ∥
Theorem 2. If f is Lipschitz, for the point x̄K = xk and constant stepsize α = √ we have
K + 1 k=0 L K +1

L∥x0 − x∗ ∥
f (x̄K ) − f ∗ ≤ √ .
K+1
Proof
▶ f is Lipschitz means ∇f is bounded: ∥∇f ∥ ≤ L, where L is the Lipschitz constant.
▶ Put x̄K , α, ∥∇f ∥ ≤ L into theorem 1.

Remarks
▶ On the stepsize α, note that it is K (total number of step) not k (current iteration number).
▶ α requires to know x∗ , so this theorem is practically useless as knowing x∗ already solves the problem.

▶ Although we do not know x∗ in general, the theorem tells that the ergodic convergence speed of PGD is O √1
k

19 / 21
Discussion
In the convergence analysis of GD:
1. f is convex and β-smooth (gradient is β-Lipschitz)
1
2. Convergence rate O .
k
3. The convergence rate is not ergodic

In the convergence analysis of PGD:

1. f is convex and L-Lipschitz (gradient is bounded above)
1
2. Convergence rate O √ .
k
3. The convergence rate is ergodic, it works on x̄K

If f is convex and β-smooth, the convergence of PGD will be the same as that of GD.
1
▶ Theoretical convergence rate of PGD on convex and β-smooth f is also O .
k
▶ However practically it depends on the complexity of the projection.
Some Q are difficult to project onto.

As PGD is a special case of proximal gradient method, it is better to study proximal gradient method. For example
here, here and here
20 / 21
Last page - summary

▶ PGD = GD + projection

▶ PGD with constant stepsize α:

K K
!
1 X ∥x0 − x∗ ∥22 α X
f xk − f∗ ≤ + ∥∇f (xk )∥22
K + 1 k=0 2α(K + 1) 2(K + 1) k=0

∥x0 −x∗ ∥
n PK o
▶ If f is Lipschitz (bounded gradient), for the point x̄K = 1
K+1 k=0 xk and constant step size α = √
L K+1
then

L∥x0 − x∗ ∥
f (x̄K ) − f ∗ ≤ √ .
K+1
End of document

21 / 21

Book - The Design of High Performance Mechatronics 2nd - 20231110
No ratings yet
Book - The Design of High Performance Mechatronics 2nd - 20231110
928 pages
Duchi SH Si CH 08
No ratings yet
Duchi SH Si CH 08
8 pages
Proximal Minimization With D-Functions: Gorithms
No ratings yet
Proximal Minimization With D-Functions: Gorithms
11 pages
Lecture 15 Projected Gradient
No ratings yet
Lecture 15 Projected Gradient
8 pages
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
No ratings yet
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
34 pages
Proximal Gradient Methods For Machine
No ratings yet
Proximal Gradient Methods For Machine
96 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
No ratings yet
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
23 pages
Root
No ratings yet
Root
4 pages
SESO2018 Wednesday Sagastizabal
No ratings yet
SESO2018 Wednesday Sagastizabal
181 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Mirror Descent Slides
No ratings yet
Mirror Descent Slides
35 pages
L10 - Subgrad - PGD (Partially Annotated)
No ratings yet
L10 - Subgrad - PGD (Partially Annotated)
39 pages
Lecture 10 Proximal
No ratings yet
Lecture 10 Proximal
4 pages
ConvexSpring25 Week9
No ratings yet
ConvexSpring25 Week9
26 pages
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
No ratings yet
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
9 pages
Nonex Kap2
No ratings yet
Nonex Kap2
16 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Hw3sol PDF
No ratings yet
Hw3sol PDF
8 pages
Dimitri Bertsekas - Nonlinear Programming (Google Books Preview) (2016, Athena Scientific) - Libgen - Li
No ratings yet
Dimitri Bertsekas - Nonlinear Programming (Google Books Preview) (2016, Athena Scientific) - Libgen - Li
64 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
Fast Convex Optimization With Quantum Gradient Methods
No ratings yet
Fast Convex Optimization With Quantum Gradient Methods
42 pages
Article: Projection-Based Curve Pattern Search For Black-Box Optimization Over Smooth Convex Sets
No ratings yet
Article: Projection-Based Curve Pattern Search For Black-Box Optimization Over Smooth Convex Sets
20 pages
Generalized Distances and Generalized Projections
No ratings yet
Generalized Distances and Generalized Projections
20 pages
Projected Gradient Methods For Linearly Constrained Problems - Calamai, Moré (1987)
No ratings yet
Projected Gradient Methods For Linearly Constrained Problems - Calamai, Moré (1987)
24 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
No ratings yet
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
8 pages
Notes ch4 1
No ratings yet
Notes ch4 1
7 pages
Grundlehren Der Mathematischen Wissenschaften 306: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 306: A Series of Comprehensive Studies in Mathematics
361 pages
Co 463
No ratings yet
Co 463
116 pages
1190 543 PB
No ratings yet
1190 543 PB
17 pages
GenPen NachuanXiao
No ratings yet
GenPen NachuanXiao
28 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
Cours D'optimisation
No ratings yet
Cours D'optimisation
159 pages
Classification of Optimization Methods
No ratings yet
Classification of Optimization Methods
68 pages
ECE 236B Course Notes
No ratings yet
ECE 236B Course Notes
90 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
15.093 Optimization Methods
No ratings yet
15.093 Optimization Methods
12 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Interior Gradient and Proximal Methods For Convex and Conic Optimization
No ratings yet
Interior Gradient and Proximal Methods For Convex and Conic Optimization
29 pages
Computational OPT Book 2023 Chapter 01
No ratings yet
Computational OPT Book 2023 Chapter 01
18 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Lecture05 Descent
No ratings yet
Lecture05 Descent
31 pages
Nonlinearity in Structural Dynamics Chapter App G
No ratings yet
Nonlinearity in Structural Dynamics Chapter App G
11 pages
Convex Module B
No ratings yet
Convex Module B
29 pages
Handbook of Convergence Theorems
No ratings yet
Handbook of Convergence Theorems
70 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Lecture 1 2 Background
No ratings yet
Lecture 1 2 Background
6 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
Pseudolinear Functions and Optimization Shashi Kant Mishra Balendu Bhooshan Upadhyay PDF Download
No ratings yet
Pseudolinear Functions and Optimization Shashi Kant Mishra Balendu Bhooshan Upadhyay PDF Download
82 pages
Lecture 17
No ratings yet
Lecture 17
2 pages
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
100% (1)
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
453 pages
Mirror 2
No ratings yet
Mirror 2
8 pages
Lec 13
No ratings yet
Lec 13
6 pages
Lectures 2023
No ratings yet
Lectures 2023
115 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
DSA MK Lect3 PDF
No ratings yet
DSA MK Lect3 PDF
75 pages
Theory of Approximation and Splines-I Lecture-1 Basic Concepts of Interpolation
No ratings yet
Theory of Approximation and Splines-I Lecture-1 Basic Concepts of Interpolation
4 pages
Regular Expressions Cheat Sheet PDF
No ratings yet
Regular Expressions Cheat Sheet PDF
1 page
4.2.4 Chain Rule and Implicit Differentation
No ratings yet
4.2.4 Chain Rule and Implicit Differentation
8 pages
Comp Vis Week 3
No ratings yet
Comp Vis Week 3
44 pages
Durbin Watson Tables
No ratings yet
Durbin Watson Tables
35 pages
A Gentle Introduction To Mini-Batch Gradient Descent and How To Configure Batch Size
No ratings yet
A Gentle Introduction To Mini-Batch Gradient Descent and How To Configure Batch Size
16 pages
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
No ratings yet
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
7 pages
12 Cbse Revision Assignment Day 11 22-12-24
No ratings yet
12 Cbse Revision Assignment Day 11 22-12-24
2 pages
Creating Performance Curves For VRF System
No ratings yet
Creating Performance Curves For VRF System
8 pages
Elimination Methods
No ratings yet
Elimination Methods
58 pages
Analysis and Design of Algorithms
No ratings yet
Analysis and Design of Algorithms
2 pages
Algo and Flowchart
No ratings yet
Algo and Flowchart
32 pages
Control Systems 1 Term Sept Jan 2019 2020 Drs Khaled Samir Chapter 4
No ratings yet
Control Systems 1 Term Sept Jan 2019 2020 Drs Khaled Samir Chapter 4
9 pages
DSP - Manual Part
No ratings yet
DSP - Manual Part
9 pages
A Practical File of Data Structure Lab BCA-206: Session: 2020-21
No ratings yet
A Practical File of Data Structure Lab BCA-206: Session: 2020-21
3 pages
Semi Detailed Lesson Plan 1
No ratings yet
Semi Detailed Lesson Plan 1
5 pages
Control Syste1
No ratings yet
Control Syste1
26 pages
PSO Optimization 8 Bus System
No ratings yet
PSO Optimization 8 Bus System
11 pages
Irs Assignment Se DS
No ratings yet
Irs Assignment Se DS
2 pages
ML Notes Vaibhav
No ratings yet
ML Notes Vaibhav
269 pages
Functions of Several Variables, Partial Derivatives
No ratings yet
Functions of Several Variables, Partial Derivatives
26 pages
Cs1201 Design and Analysis of Algorithm
No ratings yet
Cs1201 Design and Analysis of Algorithm
27 pages
Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
No ratings yet
Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
13 pages
Lesson2 1
No ratings yet
Lesson2 1
49 pages
1 s2.0 S0167404816301572 Main
No ratings yet
1 s2.0 S0167404816301572 Main
18 pages
(Goutam Paul Subhamoy Maitra) RC4 Stream Cipher A (B-Ok - Xyz)
No ratings yet
(Goutam Paul Subhamoy Maitra) RC4 Stream Cipher A (B-Ok - Xyz)
310 pages
Types of Kriging
100% (1)
Types of Kriging
9 pages
Problem Set 4: Graphs: CS 3510: Design & Analysis of Algorithms
No ratings yet
Problem Set 4: Graphs: CS 3510: Design & Analysis of Algorithms
5 pages

Projected Gradient

Uploaded by

Projected Gradient

Uploaded by

Projected Gradient Algorithm

▶ All x ∈ Rn is feasible. ▶ Not all x ∈ Rn is feasible.

min ∥Ax − b∥22 s.t. ∥x∥2 ≤ 1

Here Q := {v ∈ Rn : ∥v∥2 ≤ 1} is known as the unit ℓ2 ball.

▶ Duality / Lagrangian approach

▶ First-order method / gradient-based method

▶ Second-order method, Zero-order method, Higher-order method

k ∈ N: the current iteration counter

▶ We focus on the Euclidean space Rn

▶ f : Rn → R is the objective / cost function

▶ ∅ ̸= Q ⊂ Rn is convex and compact

▶ For the details of convexity, Lipschitz, see here.

▶ As x0 ∈ Q, the closest point to x0 in Q will be x0 itself. ▶ Now x0 is outside Q

▶ The distance between a point to itself is zero. ▶ We need to find a point x

▶ In this example, the blue point y is the solution to

▶ In fact, such point is always located on the boundary of Q for x0 ∈

The normal to the tangent is exactly x0 − y = x0 − projQ (x0 ).

See the details here

▶ The indicator function ι(x), of a set Q is defined as follows:

min f (x) ≡ minf (x) + ιQ (x).

▶ Ergodic convergence: the average of x1 , x2 , ..., xk converges to x∗

(a − b)(a − c) = a2 − ac − ab + bc ∥xk −x∗ ∥2 ∗ 2

Explanation: focus on the term ∥xk − x∗ ∥22 − ∥yk+1 − x∗ ∥22

It forms a telescoping series !

In the convergence analysis of PGD:

▶ PGD with constant stepsize α:

You might also like