Lecture 3: Composite Problem Via Duality: 3.1.1 Motivations
Lecture 3: Composite Problem Via Duality: 3.1.1 Motivations
Some materials here are taken from [21] and some are new and might be submitted somewhere. If you have
any interesting applications, please do not hesitate to contact me.
3.1.1 Motivations
Up to now, duality is still magic to me and I can only explain some of its power through applications. Here,
we focus on the algorithmic power of duality. Suppose we have a difficult convex problem minx f (x). Often,
we can split the difficult problem into f (x) = g(x) + h(Ax) such that minx g(x) and minx h(Ax) are easy.
This form is usually called composite problem. We can compute its dual as follows:
min g(x) + h(Ax) = min max g(x) + θ> Ax − h∗ (θ)
x x θ
= max min g(x) + (A> θ)> x − h∗ (θ)
θ x
We call “g(x) + h(Ax)” the primal problem and “−g ∗ (−A> θ) − h∗ (θ)” the dual problem. Often, the dual
problem gives us some insight on the primal problem. However, we note that there are many ways to
separate a problem and hence many dual problems.
As an example, consider the flow problem on a graph G = (V, E):
max c> f
Af =d,−1≤f ≤1
where f ∈ RE is the flow vector, A ∈ RV ×E is the flow conversation constraints, d is the demand vector
and c is the cost vector. We can take the dual as follows:
max c> f = max min c> f − φ> (Af − d)
Af =d,−1≤f ≤1 −1≤f ≤1 φ
3-1
3-2 Lecture 3: Composite Problem via Duality
For the case c = 0 and d = F · 1st (the maximum flow problem), the dual problem is the minimum s − t
cut problem with the cut given by {v ∈ V such that φ(v) ≥ t}. Note that there are |E| variables in primal
and |V | variables in dual. In this sense, the dual problem is easier to solve for dense graph.
Although we do not have a way to turn a minimum s − t cut to a maximum s − t flow in general, we will
teach various tricks to reconstruct the primal solution from the dual solution by perturbing the problem.
Here, we illustrate how cutting plane methods can be used to obtain both primal and dual solutions via a
concrete problem - semidefinite programming. Consider the semidefinite programming (SDP) problem:
where X, C, Ai are n × n symmetric matrices and b, y ∈ Rm . If we apply the current best cutting plane
method naively on the primal problem, we would get O∗ (n2 (Z +n4 )) time algorithm for the primal (because
there is n2 variables) and O∗ (m(Z + m2 )) for the dual where Z is the total number of non-zeros in Ai , and
O∗ indicates we are ignoring the log terms in the run time. In general, n2 m and hence it takes much
less time to solve the dual.
We note that
Pm min b> y = min b> y.
v> ( m
P
i=1 yi Ai C i=1 y i Ai −C)v≥0 ∀kvk 2 =1
In each step of the cutting plane method, the gradient oracle either outputs b or outputs one of the cutting
planes
Xm
v> ( yi Ai − C)v ≥ 0.
i=1
Let K be the set of all cutting planes used in the algorithm. Then, the proof of the cutting plane method
shows that
Pm min b> y = Pm min b> y ± ε. (3.3)
i=1 yi Ai C v> ( i=1 yi Ai −C)v≥0 ∀v∈S
The crux of the proof is to take the dual of the right hand side, then we have that
X Xm
Note
P that this is exactly the primal SDP problem, except that we restrict the set of solution to the form
>
v∈S λ v vv with λv ≥ 0. Also, we can write the problem as a linear program:
X
max λv v > Cv. (3.4)
>
P
v λv v Ai v=bi for all i,λv ≥0
v
Lecture 3: Composite Problem via Duality 3-3
Therefore, we can simply solve this linear program and recovers an approximate solution for SDP v∈S λv vv > .
P
By (3.3), we know that this is an approximate solution with the same guarantee as the dual SDP.
Now, we analyze the runtime of this algorithm. This algorithm contains two phases: solve the dual SDP
via cutting plane method, and solve the primal linear Pmprogram. Note that each step of the cutting plane
method involves finding a separation hyperplane of i=1 yi Ai C.
Pm
Exercise 3.1.2. Let Ω = {y ∈ Rm : i=1 yi Ai C}. Show that one can implement the separating oracle
in time O∗ (Z + nω ) via eigenvalue calculations.
Therefore, the first phase takes O∗ (m(Z + nω + m2 )) time in total. Since the cutting plane method takes
O∗ (m) steps, we have |S| = O∗ (m). In the second phase, we need to solve a linear program (3.4) with
O∗ (m) variables with O(m) constraints. It is known how to solve such linear programs in time O∗ (m2.5 )
[20]. Hence, the total cost is dominated by the first phase
O∗ (mZ + mnω + m3 ).
Problem 3.1.3. In the first phase, each step involves computing an eigenvector of a similar matrix. Is it
natural to ask if some matrix update formulas are useful to decrease the cost per step in the cutting plane
to O∗ (Z + n2 )? Namely, can we solve SDP in time
O∗ (mZ + m3 )?
The composite problem minx g(x) + h(Ax) in general can be solved by the similar trick. To make the
geometric picture clear, we consider its “convex programming” version: min(x,t1 )∈epig,(Ax,t2 )∈epih t1 + t2 . To
simplify the notation, we consider the problem
min c> x
x∈K1 ,M x∈K2
This is called weighted b-matching problem. Obviously, the number of students is much more than the
number of schools. Therefore, an algorithm with running time linear to the number of students is preferable.
To apply our framework, we let
X
K1 = {x ∈ RE , xe ≥ 0, x(a,b) ≤ 1 ∀a ∈ V1 },
(a,b)∈E
V2
K2 = {y ∈ R , yb ≤ cb },
and M ∈ RE → RV2
P
is the map (T x)b = (a,b)∈E x(a,b) .
To further emphasize its importance, let me give some general examples here:
• Linear programming minAx=b,x≥0 c> x: K1 = {x ≥ 0}, K2 = {b} and M = A.
• Semidefinite programming minAi •X=bi ,X0 C • X: K1 = {X 0}, K2 = {b} and M : Rn×n → Rm
defined by (M X)i = Ai • X.
3-4 Lecture 3: Composite Problem via Duality
= minn max
m
c> x + `K1 (x) + θ> M x − `∗K2 (θ)
x∈R θ∈R
= max
m
minn c> x + `K1 (x) + θ> M x − `∗K2 (θ)
θ∈R x∈R
= max
m
−`∗K1 (−c − M > θ) − `∗K2 (θ).
θ∈R
Taking dual has two benefits. First, the number of variables is smaller. Second, the gradient oracle is
something we can compute efficiently. Hence, cutting plane methods can be used to solve it in O∗ (mT +m3 )
where T is the time to evaluate ∇`∗K1 and ∇`∗K2 . The only problem left is to recover the primal x.
The key observation is the following lemma:
Lemma 3.1.4. Let xi ∈ K1 be the set of points outputted by the oracle ∇`∗K1 during the cutting plane
method. Define yi ∈ K2 similarly. Suppose that the cutting plane method ends with the guarantee that the
additive error is less than ε. Then, we have that
where K
f1 = Conv(xi ) and K
f2 = Conv(yi ).
Proof. Let θi be the set of directions queried by the oracle for ∇`∗K1 and ϕi be the directions queried by
the oracle for ∇`∗K2 . We claim that xi ∈ ∇`∗K ∗
f1 (θi ) and yi ∈ ∇`K
f2 (ϕi ). Having this, the algorithm cannot
distinguish between K1 and K1 , and between K2 and K2 . Hence, the algorithm runs exactly the same,
f f
namely, the same sequence of points. Therefore, we get the same value c> x. However, by the guarantee of
cutting plane method, we have that
> m
This reduces the problem into the form minx∈K f2 c x. For the second phase, we let zi = M xi ∈ R .
f1 ,T x∈K
Then, we have
X
min c> x = min
P P c> ( t i xi )
x∈K
f1 ,M x∈K
f2 ti ≥0,si ≥0,M i ti x i = i si yi
i
X
= min P
P ti · c> xi .
ti ≥0,si ≥0, i ti zi = i si yi
Note that it takes O∗ (mZ) time to write down this linear program where Z is the number of non-zeros in
M . Next, we note that this linear program has O∗ (m) variables and m constraints. Therefore, we can solve
it in O∗ (m2.5 ) time.
Therefore, the total running time is
min c> x
x∈K1 ,M x∈K2
in time
O∗ (mT + mZ + m3 ).
Remark. We hided all sorts of terms in the log term hidden in O∗ such as the diameter of the set.
Going back to the school/student problem, this algorithm gives a running time of
References
[20] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs
in o(sqrt(rank)) iterations and faster algorithms for maximum flow. In Foundations of Computer Science
(FOCS), 2014 IEEE 55th Annual Symposium on, pages 424–433. IEEE, 2014.
[21] Yin Tat Lee, Aaron Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implica-
tions for combinatorial and convex optimization. In Foundations of Computer Science (FOCS), 2015
IEEE 56th Annual Symposium on, pages 1049–1065. IEEE, 2015.