L.
Vandenberghe EE236C (Spring 2013-14)
Cutting-plane methods
• cutting planes
• localization methods
1
Cutting-plane oracle
provides a black-box description of a convex set C
• when queried at x, oracle either asserts x ∈ C or returns a 6= 0, b with
aT x ≥ b, aT z ≤ b ∀z ∈ C
aT z = b defines a cutting plane, separating x and C
• cut is neutral if aT x = b: query point is on boundary of halfspace
• cut is deep if aT x > b: query point in interior of halfspace that is cut
x x
C C
Cutting-plane methods 2
Localization method
goal: find a point in convex set C described by cutting-plane oracle
algorithm: choose bounded set P0 containing C; repeat for k ≥ 1:
• choose a point x(k) in Pk−1 and query the cutting-plane oracle at x(k)
• if x(k) ∈ C, return x(k); else, add cutting plane aTk z ≤ bk to Pk−1:
Pk = Pk−1 ∩ {z | aTk z ≤ bk }
terminate if Pk = ∅
variation: to keep Pk simple, choose Pk ⊇ Pk−1 ∩ {z | aTk z ≤ bk }
we’ll discuss specific algorithms later
Cutting-plane methods 3
geometry
Pk−1
ak ak
x(k) x(k)
Pk
Pk gives uncertainty of C after iteration k
Cutting-plane methods 4
Unconstrained minimization
C is optimal set for convex f
neutral cut: if f (x) > f ⋆ and g ∈ ∂f (x), then a neutral cut at x is
gT z ≤ gT x
proof: g T z > g T x implies f (z) ≥ f (x) + g T (z − x) > f (x) > f ⋆
interpretation: by evaluating g ∈ ∂f (x)
• we rule out halfspace in search for x ∈ C
x
• we get one ‘bit’ of info on C g
Cutting-plane methods 5
Deep cut for unconstrained minimization
suppose we know a number f¯ with
f (x) > f¯ ≥ f ⋆
for example, f¯ is the smallest value of f found so far in an algorithm
deep cut: if f (x) > f ⋆ and g ∈ ∂f (x), then a deep cut at x is given by
g T z ≤ g T x − f (x) + f¯
proof: g T z > g T x − f (x) + f¯ implies
f (z) ≥ f (x) + g T (z − x) > f¯ ≥ f ⋆
Cutting-plane methods 6
Feasibility problem
C is solution set of convex inequalities
fi(x) ≤ 0, i = 1, . . . , m
deep cut: if x 6∈ C, find j with fj (x) > 0 and evaluate gj ∈ ∂fj (x);
gjT z ≤ gjT x − fj (x)
is a deep cut at x
proof: gjT z > gjT x − fj (x) implies z 6∈ C because
fj (z) ≥ fj (x) + gjT (z − x) > 0
Cutting-plane methods 7
Inequality constrained problem
C is optimal set of convex problem
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
feasibility cut: if x is not feasible, say fj (x) > 0, we have a deep cut
gjT z ≤ gjT x − fj (x) where gj ∈ ∂fj (x)
objective cut: if x is feasible, but f0(x) > p⋆ + ǫ, we have a neutral cut
g0T z ≤ g0T x where g0 ∈ ∂f0(x)
moreover, if f¯ with f0(x) > f¯ ≥ p⋆ is known, we have a deep cut
g0T z ≤ g0T x − f0(x) + f¯
Cutting-plane methods 8
Variational inequality
monotone mapping: a mapping F : Rn → Rn is monotone if
T
(F (x) − F (y)) (x − y) ≥ 0 ∀x, y
monotone variational inequality: given closed convex S, find x̂ ∈ S with
F (x̂)T (x − x̂) ≥ 0 ∀x ∈ S
−F (x̂)
x̂
S
equivalently, x̂ = PS (x̂ − F (x̂)) where PS is projection on S
Cutting-plane methods 9
Convex optimization problem as variational inequality
variational inequality with F (x) = ∇f (x):
x̂ ∈ S, ∇f (x̂)T (x − x̂) ≥ 0 ∀x ∈ S
• F is monotone if f is convex (see p. 1-9)
• variational inequality is optimality condition for convex problem
minimize f (x)
subject to x ∈ S
(see EE236B page 4–9)
note: in the general variational inequality, F is not necessarily a gradient
Cutting-plane methods 10
Saddle-point problem
suppose f (u, v) is convex in u, concave in v, and U , V are convex sets
saddle point: (û, v̂) ∈ U × V is a saddle point if
f (û, v) ≤ f (û, v̂) ≤ f (u, v̂) ∀u ∈ U, v ∈ V
variational inequality formulation (for differentiable f ):
T
∇uf (û, v̂) u − û
≥0 ∀(u, v) ∈ U × V
−∇v f (û, v̂) v − v̂
• û minimizes f (u, v̂) over u ∈ U ; v̂ minimizes −f (û, v) over v ∈ V
• a variational inequality with F (u, v) = (∇fu(u, v), −∇fv (u, v))
Cutting-plane methods 11
monotonicity of F (u, v) = (∇fu(u, v), −∇fv (u, v))
if f is convex-concave, then for all w = (u, v), ŵ = (û, v̂)
(F (w) − F (ŵ))T (w − ŵ)
= (∇uf (w) − ∇uf (ŵ))T (u − û) − (∇v f (w) − ∇v f (ŵ))T (v − v̂)
≥ −f (û, v) + f (u, v) − f (u, v̂) + f (û, v̂) + f (u, v̂) − f (u, v)
+ f (û, v) − f (û, v̂)
= 0
Cutting-plane methods 12
Cutting planes for variational inequality
to generate cutting plane at x
• if x 6∈ S, use feasibility cut (cutting plane that separates x from S)
• if x ∈ S and not a solution, use the cutting plane
F (x)T z ≤ F (x)T x
proof: if F (x)T z > F (x)T x then, by monotonicity,
F (z)T (z − x) ≥ F (x)T (z − x) > 0
therefore z is not a solution of the variational inequality
Cutting-plane methods 13
Outline
• cutting planes
• cutting-plane methods
Choice of query point
should be near center of Pk−1
Pk−1 Pk−1
(k) x(k)
x
Pk Pk
Pk−1 Pk−1
x(k) x(k)
Pk
Pk
want to pick x(k) so that Pk is as small as possible, for any cut
Cutting-plane methods 14
Example: bisection in R
for minimizing differentiable convex f : R → R
given: interval P0 = [l, u] containing x⋆
repeat:
x := (l + u)/2;
if f ′(x) < 0, l := x; else u := x
x(k+1) Pk
Pk+1
Cutting-plane methods 15
iteration complexity
length(Pk−1) length(P0)
length(Pk ) = =
2 2k
• length(Pk ) measures uncertainty in x⋆
• uncertainty is halved at each iteration (exactly one bit of info)
#steps required to reduce uncertainty (in x⋆) to below r:
length(P0) initial uncertainty
k = log2 = log2
r final uncertainty
Cutting-plane methods 16
Specific cutting-plane algorithms
methods vary in choice of query point
center of gravity (CG) algorithm
x(k) is center of gravity of Pk−1
maximum volume ellipsoid (MVE) cutting-plane method
x(k) is center of maximum volume ellipsoid contained in Pk−1
Chebyshev center cutting-plane method
x(k) is center of largest ball contained in Pk−1
analytic center cutting-plane method (ACCPM) (next lecture)
x(k) is analytic center of (inequalities defining) Pk−1
Cutting-plane methods 17
Lower bound on complexity
problem class: find x ∈ C ⊆ Rn
• C is convex 2r
• C is contained in {x | kxk∞ ≤ R} C 2R
• C contains an ℓ∞-norm ball of radius r
• C is described by a cutting-plane oracle
bound on complexity
no localization algorithm can guarantee a complexity lower than
R
n log2
2r
iterations (queries to oracle)
Cutting-plane methods 18
proof: suppose we run a localization algorithm for
k < n log2(R/(2r)) iterations
we will construct a ‘resisting oracle’ for a hyperrectangle
C = {x | c − d x c + d}
that does not contain any of the k query points and satisfies
R
max (|ci| + di) ≤ R, min di ≥ ≥r
i i 2⌈k/n⌉
therefore, the algorithm failed to find a point in C in k steps even though
{x | kx − ck∞ ≤ r} ⊆ C ⊆ {x | kxk∞ ≤ R}
Cutting-plane methods 19
the oracle and c, d are constructed as follows: initially, c = 0, d = R
at iteration j,
• define i = j − n⌊(j − 1)/n⌋ (i.e., cycle through the n coordinates)
• if x is the query point at iteration j, then
– if xi ≥ ci, update c, d as
ci := ci − di/2, di := di/2
and return the cut eTi (z − x) ≤ 0
– if xi < ci, update c, d as
ci := ci + di/2, di := di/2
and return the cut −eTi (z − x) ≤ 0
Cutting-plane methods 20
Center of gravity algorithm
choose as x(k) the center of gravity of Pk−1 (denoted CG(Pk−1))
Z
x dx
P
x(k) = CG(Pk−1) = Z k−1
dx
Pk−1
theorem: if S ⊆ Rn convex, xcg = CG(S), g 6= 0,
T
vol S ∩ x | g (x − xcg ) ≤ 0 ≤ (1 − 1/e) vol(S)
(independent of dimension n)
Cutting-plane methods 21
Convergence of CG cutting-plane method
assumptions
• P0 ⊆ {x | kxk∞ ≤ R}
• C contains an ℓ∞-ball of radius r
iteration complexity
if x(1), . . . , x(k) 6∈ C, then C ⊆ Pk (no part of C is cut) and
k k
1 1
(2r)n ≤ vol(Pk ) ≤ 1− vol(P0) ≤ 1− (2R)n
e e
therefore
n log(R/r) R
k≤ = 1.51 n log2
− log(1 − 1/e) r
Cutting-plane methods 22
advantages of CG-method
• guaranteed convergence
• affine-invariance
• iteration complexity is near optimal (see page 18)
disadvantage
finding x(k) = CG(Pk−1) is much harder than original problem
(but, can modify CG-method to work with approximate CG computation)
Cutting-plane methods 23
Maximum volume ellipsoid method
x(k) is center of maximum volume ellipsoid in Pk−1
• can be computed via convex optimization
• affine-invariant
complexity
• can show vol(Pk+1) ≤ (1 − 1/n) vol(Pk )
• hence can bound number of steps:
n log(R/r)
k≤ ≈ n2 log(R/r)
− log(1 − 1/n)
if cutting-plane oracle cost is not small, MVE is a good practical method
Cutting-plane methods 24
Extensions
multiple cuts
• oracle returns set of linear inequalities instead of just one, e.g.,
– all violated inequalities
– all inequalities (including shallow cuts)
– multiple deep cuts
• at each iteration, append (set of) new inequalities to those defining Pk
nonlinear cuts
• use nonlinear convex inequalities instead of linear ones
• localization set no longer a polyhedron
• some methods (e.g., ACCPM) still work
Cutting-plane methods 25
Dropping constraints
the problem
• number of linear inequalities defining Pk increases at each iteration
• hence, computational effort to compute x(k+1) increases
solutions
• drop redundant constraints
• keep only a fixed number of (the most relevant) constraints
• at each iteration, replace localization set by upper bound
first two solutions discussed in lecture ; third solution in lecture
Cutting-plane methods 26
Epigraph cutting-plane method
cutting-plane method applied to epigraph form problem
minimize t
subject to f0(x) ≤ t
fi(x) ≤ 0, i = 1, . . . , m
cutting-plane oracle (queried at x)
• if x is infeasible for original problem (say, fj (x) > 0), add cutting-plane
T
gj z
≤ gjT x − fj (x) (gj ∈ ∂fj (x))
0 t
• if x is feasible for original problem, add two cutting-planes
T T
0 z g0 z
≤ f0(x), ≤ g0T x−f0(x) (g0 ∈ ∂f0(x))
1 t −1 t
Cutting-plane methods 27
References
• Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic
Course (2004) (§3.2.5 and §3.2.6)
• S. Boyd, course notes for EE364b, Convex Optimization II
Cutting-plane methods 28