0% found this document useful (0 votes)
12 views5 pages

Lecture 3: Composite Problem Via Duality: 3.1.1 Motivations

The document discusses using duality to solve composite convex optimization problems that can be split into easier subproblems. It provides an example of using duality to solve a maximum flow problem on a graph and derive its dual minimum cut problem. It also shows how to use cutting plane methods and duality to solve semidefinite programs and recover approximate primal solutions.

Uploaded by

Trinity Pausch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Lecture 3: Composite Problem Via Duality: 3.1.1 Motivations

The document discusses using duality to solve composite convex optimization problems that can be split into easier subproblems. It provides an example of using duality to solve a maximum flow problem on a graph and derive its dual minimum cut problem. It also shows how to use cutting plane methods and duality to solve semidefinite programs and recover approximate primal solutions.

Uploaded by

Trinity Pausch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CSE 599: Interplay between Convex Optimization and Geometry Winter 2018

Lecture 3: Composite Problem via Duality


Lecturer: Yin Tat Lee

Disclaimer: Please tell me any mistake you noticed.

Some materials here are taken from [21] and some are new and might be submitted somewhere. If you have
any interesting applications, please do not hesitate to contact me.

3.1 Composite Problem via Duality

3.1.1 Motivations

Up to now, duality is still magic to me and I can only explain some of its power through applications. Here,
we focus on the algorithmic power of duality. Suppose we have a difficult convex problem minx f (x). Often,
we can split the difficult problem into f (x) = g(x) + h(Ax) such that minx g(x) and minx h(Ax) are easy.
This form is usually called composite problem. We can compute its dual as follows:
min g(x) + h(Ax) = min max g(x) + θ> Ax − h∗ (θ)
x x θ
= max min g(x) + (A> θ)> x − h∗ (θ)
θ x

= max −g ∗ (−A> θ) − h∗ (θ)


θ

where we used the following minimax theorem:


Theorem 3.1.1 (Sion’s minimax theorem). Let X ⊂ Rn be a compact convex set and Y ⊂ Rm be a convex
set. If f : X × Y → R ∪ {+∞} such that f (x, ·) is upper semi-continuous and quasi-concave on Y for all
x ∈ X and f (·, y) is lower semi-continuous and quasi-convex on X for all y ∈ Y . Then, we have
min sup f (x, y) = sup min f (x, y).
x∈X y∈Y y∈Y x∈X

Remark. Compactness is necessary. Consider f (x, y) = x + y.

We call “g(x) + h(Ax)” the primal problem and “−g ∗ (−A> θ) − h∗ (θ)” the dual problem. Often, the dual
problem gives us some insight on the primal problem. However, we note that there are many ways to
separate a problem and hence many dual problems.
As an example, consider the flow problem on a graph G = (V, E):
max c> f
Af =d,−1≤f ≤1

where f ∈ RE is the flow vector, A ∈ RV ×E is the flow conversation constraints, d is the demand vector
and c is the cost vector. We can take the dual as follows:
max c> f = max min c> f − φ> (Af − d)
Af =d,−1≤f ≤1 −1≤f ≤1 φ

= min max φ> d + (c − A> φ)> f


φ −1≤f ≤1
X
= min φ> d + |c − A> φ|e
φ
e∈E

3-1
3-2 Lecture 3: Composite Problem via Duality

For the case c = 0 and d = F · 1st (the maximum flow problem), the dual problem is the minimum s − t
cut problem with the cut given by {v ∈ V such that φ(v) ≥ t}. Note that there are |E| variables in primal
and |V | variables in dual. In this sense, the dual problem is easier to solve for dense graph.
Although we do not have a way to turn a minimum s − t cut to a maximum s − t flow in general, we will
teach various tricks to reconstruct the primal solution from the dual solution by perturbing the problem.

3.1.2 Example: Semidefinite Programming

Here, we illustrate how cutting plane methods can be used to obtain both primal and dual solutions via a
concrete problem - semidefinite programming. Consider the semidefinite programming (SDP) problem:

max C • X s.t. Ai • X = bi for i = 1, 2, · · · , m (3.1)


X0

and its dual


m
X
>
min b y s.t. yi Ai  C (3.2)
y
i=1

where X, C, Ai are n × n symmetric matrices and b, y ∈ Rm . If we apply the current best cutting plane
method naively on the primal problem, we would get O∗ (n2 (Z +n4 )) time algorithm for the primal (because
there is n2 variables) and O∗ (m(Z + m2 )) for the dual where Z is the total number of non-zeros in Ai , and
O∗ indicates we are ignoring the log terms in the run time. In general, n2  m and hence it takes much
less time to solve the dual.
We note that
Pm min b> y = min b> y.
v> ( m
P
i=1 yi Ai C i=1 y i Ai −C)v≥0 ∀kvk 2 =1

In each step of the cutting plane method, the gradient oracle either outputs b or outputs one of the cutting
planes
Xm
v> ( yi Ai − C)v ≥ 0.
i=1

Let K be the set of all cutting planes used in the algorithm. Then, the proof of the cutting plane method
shows that
Pm min b> y = Pm min b> y ± ε. (3.3)
i=1 yi Ai C v> ( i=1 yi Ai −C)v≥0 ∀v∈S

The crux of the proof is to take the dual of the right hand side, then we have that

X Xm

Pm min b> y = min max b> y − λv v > ( yi Ai − C)v


v> ( i=1 yi Ai −C)v≥0 ∀v∈S y λv ≥0
v∈S i=1
X m
X X
= max min C • λv vv > + b> y − yi λv vv > • Ai
λv ≥0 y
v∈S i=1 v∈S
m
X
= P max min C •X + yi (bi − X • Ai )
X= v∈S λv vv > ,λv ≥0 y
i=1
= max C • X.
X= v∈S λv vv > ,λv ≥0,X•Ai =bi
P

Note
P that this is exactly the primal SDP problem, except that we restrict the set of solution to the form
>
v∈S λ v vv with λv ≥ 0. Also, we can write the problem as a linear program:
X
max λv v > Cv. (3.4)
>
P
v λv v Ai v=bi for all i,λv ≥0
v
Lecture 3: Composite Problem via Duality 3-3

Therefore, we can simply solve this linear program and recovers an approximate solution for SDP v∈S λv vv > .
P
By (3.3), we know that this is an approximate solution with the same guarantee as the dual SDP.
Now, we analyze the runtime of this algorithm. This algorithm contains two phases: solve the dual SDP
via cutting plane method, and solve the primal linear Pmprogram. Note that each step of the cutting plane
method involves finding a separation hyperplane of i=1 yi Ai  C.
Pm
Exercise 3.1.2. Let Ω = {y ∈ Rm : i=1 yi Ai  C}. Show that one can implement the separating oracle
in time O∗ (Z + nω ) via eigenvalue calculations.

Therefore, the first phase takes O∗ (m(Z + nω + m2 )) time in total. Since the cutting plane method takes
O∗ (m) steps, we have |S| = O∗ (m). In the second phase, we need to solve a linear program (3.4) with
O∗ (m) variables with O(m) constraints. It is known how to solve such linear programs in time O∗ (m2.5 )
[20]. Hence, the total cost is dominated by the first phase

O∗ (mZ + mnω + m3 ).

Problem 3.1.3. In the first phase, each step involves computing an eigenvector of a similar matrix. Is it
natural to ask if some matrix update formulas are useful to decrease the cost per step in the cutting plane
to O∗ (Z + n2 )? Namely, can we solve SDP in time

O∗ (mZ + m3 )?

3.1.3 Duality and Convex Hull

The composite problem minx g(x) + h(Ax) in general can be solved by the similar trick. To make the
geometric picture clear, we consider its “convex programming” version: min(x,t1 )∈epig,(Ax,t2 )∈epih t1 + t2 . To
simplify the notation, we consider the problem

min c> x
x∈K1 ,M x∈K2

where M ∈ Rm×n , convex sets K1 ⊂ Rn and K2 ⊂ Rm .


To be concrete, let us consider the following example. Let V1 be the set of students, V2 be the set of schools.
Each edge e ∈ E represent a choice of a student. Let we be some happiness of school/student if the student
is assigned to that school. Suppose that every student can only be assigned to one school and school b can
accept cb many students. Then, the problem can be formulated
X X X
min we xe subjects to x(a,b) ≤ 1 ∀a ∈ V1 , x(a,b) ≤ cb ∀b ∈ V2 .
xe ≥0
e∈E (a,b)∈E (a,b)∈E

This is called weighted b-matching problem. Obviously, the number of students is much more than the
number of schools. Therefore, an algorithm with running time linear to the number of students is preferable.
To apply our framework, we let
X
K1 = {x ∈ RE , xe ≥ 0, x(a,b) ≤ 1 ∀a ∈ V1 },
(a,b)∈E
V2
K2 = {y ∈ R , yb ≤ cb },

and M ∈ RE → RV2
P
is the map (T x)b = (a,b)∈E x(a,b) .
To further emphasize its importance, let me give some general examples here:
• Linear programming minAx=b,x≥0 c> x: K1 = {x ≥ 0}, K2 = {b} and M = A.
• Semidefinite programming minAi •X=bi ,X0 C • X: K1 = {X  0}, K2 = {b} and M : Rn×n → Rm
defined by (M X)i = Ai • X.
3-4 Lecture 3: Composite Problem via Duality

• Matroid intersection minx∈M1 ∩M2 1> x: K1 = PM1 and K2 = M2 , M = I.


• Submodular minimization: K1 = {y ∈ Rn : i∈S yi ≤ f (S) for all S ⊂ [n]}, K2 = {y ≤ 0},PM = I.
• Submodular flow: K1 = {ϕ ∈ RE , `e ≤ ϕe ≤ ue for all e ∈ E}, K2 = {y ∈ RV : i∈S yi ≤
f (S) for all S ⊂ [n]}, M is the incidence matrix.
In all of these examples, it is easy to compute the gradient of `∗K1 and `∗K2 . For the last three examples, it is
not clear how to compute gradient of `K1 and/or `K2 directly. Furthermore, in all examples, M maps from
a larger space to the same or smaller space. Therefore, it is good to take advantage of the smaller space.
Before our result in [21], the standard way is to use the equivalence of ∇`∗K1 and ∇`K1 , and apply cutting
plane methods. Worsen by the running time of cutting plane methods at that time, such algorithm usually
has theoretical running time at least n5 with terrible practical performance.
Now, we start rewrite the problem as we do in the beginning:

min c> x = minn c> x + `K1 (x) + `K2 (M x)


x∈K1 ,M x∈K2 x∈R

= minn max
m
c> x + `K1 (x) + θ> M x − `∗K2 (θ)
x∈R θ∈R

= max
m
minn c> x + `K1 (x) + θ> M x − `∗K2 (θ)
θ∈R x∈R

= max
m
−`∗K1 (−c − M > θ) − `∗K2 (θ).
θ∈R

Taking dual has two benefits. First, the number of variables is smaller. Second, the gradient oracle is
something we can compute efficiently. Hence, cutting plane methods can be used to solve it in O∗ (mT +m3 )
where T is the time to evaluate ∇`∗K1 and ∇`∗K2 . The only problem left is to recover the primal x.
The key observation is the following lemma:

Lemma 3.1.4. Let xi ∈ K1 be the set of points outputted by the oracle ∇`∗K1 during the cutting plane
method. Define yi ∈ K2 similarly. Suppose that the cutting plane method ends with the guarantee that the
additive error is less than ε. Then, we have that

min c> x ≤ min c> x ≤ min c> x + ε


x∈K1 ,T x∈K2 x∈K
f1 ,T x∈K
f2 x∈K1 ,T x∈K2

where K
f1 = Conv(xi ) and K
f2 = Conv(yi ).

Proof. Let θi be the set of directions queried by the oracle for ∇`∗K1 and ϕi be the directions queried by
the oracle for ∇`∗K2 . We claim that xi ∈ ∇`∗K ∗
f1 (θi ) and yi ∈ ∇`K
f2 (ϕi ). Having this, the algorithm cannot
distinguish between K1 and K1 , and between K2 and K2 . Hence, the algorithm runs exactly the same,
f f
namely, the same sequence of points. Therefore, we get the same value c> x. However, by the guarantee of
cutting plane method, we have that

min c> x ≤ c> x ≤ min c> x + ε.


x∈K
f2 ,T x∈K
f2 x∈K1 ,T x∈K2

To prove the claim, we note that xi ∈ ∇`∗K


f (θi ).
f1 ⊂ K1 and hence min f θ> x ≥
Note that K x∈K1 i
1
minx∈K1 θi> x. Also, note that
xi = arg min θi> x ∈ K
f1
x∈K1

> > > >


and hence minx∈Kf1 θi x ≤ minx∈K1 θi x. Therefore, we have that minx∈K f1 θi x = minx∈K1 θi x. Therefore,
xi ∈ arg minx∈K θ> x. This proves the claim for K
1 i
f1 . The proof for K
f2 is the same.
Lecture 3: Composite Problem via Duality 3-5

> m
This reduces the problem into the form minx∈K f2 c x. For the second phase, we let zi = M xi ∈ R .
f1 ,T x∈K
Then, we have
X
min c> x = min
P P c> ( t i xi )
x∈K
f1 ,M x∈K
f2 ti ≥0,si ≥0,M i ti x i = i si yi
i
X
= min P
P ti · c> xi .
ti ≥0,si ≥0, i ti zi = i si yi

Note that it takes O∗ (mZ) time to write down this linear program where Z is the number of non-zeros in
M . Next, we note that this linear program has O∗ (m) variables and m constraints. Therefore, we can solve
it in O∗ (m2.5 ) time.
Therefore, the total running time is

O∗ (m(T + m2 ) + (mZ + m2.5 )).

To conclude, we have the following theorem:


Theorem 3.1.5. Given convex sets K1 ⊂ Rn and K2 ⊂ Rm with m ≤ n and a matrix M : Rn → Rm with
Z non-zeros. Let T be the cost to compute ∇`∗K1 and ∇`∗K2 . Then, we can solve the problem

min c> x
x∈K1 ,M x∈K2

in time
O∗ (mT + mZ + m3 ).
Remark. We hided all sorts of terms in the log term hidden in O∗ such as the diameter of the set.

Going back to the school/student problem, this algorithm gives a running time of

O∗ (|V2 ||E| + |V2 |3 )

which is linear to the number of students!


In general, this statement says that if we can split a convex problem into two part, both easy to solve and
one part has less variables, then we can solve it in time proportionally to the smaller dimension.
Exercise 3.1.6. How about the complexity of the problem minx∈∩ki=1 Ki c> x given the oracle ∇`Ki .

References

[20] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs
in o(sqrt(rank)) iterations and faster algorithms for maximum flow. In Foundations of Computer Science
(FOCS), 2014 IEEE 55th Annual Symposium on, pages 424–433. IEEE, 2014.
[21] Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implica-
tions for combinatorial and convex optimization. In Foundations of Computer Science (FOCS), 2015
IEEE 56th Annual Symposium on, pages 1049–1065. IEEE, 2015.

You might also like