0% found this document useful (0 votes)
22 views20 pages

0201 Localization Methods Notes

Uploaded by

lichenghua2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views20 pages

0201 Localization Methods Notes

Uploaded by

lichenghua2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Localization and Cutting-Plane Methods

S. Boyd and L. Vandenberghe


April 9, 2018

Contents
1 Cutting-planes 2

2 Finding cutting-planes 3
2.1 Unconstrained minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Feasibility problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Inequality constrained problem . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Localization algorithms 7
3.1 Basic cutting-plane and localization algorithm . . . . . . . . . . . . . . . . . 7
3.2 Measuring uncertainty and progress . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Choosing the query point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Some specific cutting-plane methods 12


4.1 Bisection method on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Center of gravity method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 MVE cutting-plane method . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Chebyshev center cutting-plane method . . . . . . . . . . . . . . . . . . . . . 16
4.5 Analytic center cutting-plane method . . . . . . . . . . . . . . . . . . . . . . 16

5 Extensions 16
5.1 Multiple cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Dropping or pruning constraints . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Nonlinear cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Epigraph cutting-plane method 18

7 Lower bounds and stopping criteria 19

1
In these notes we describe a class of methods for solving general convex and quasiconvex
optimization problems, based on the use of cutting-planes, which are hyperplanes that sepa-
rate the current point from the optimal points. These methods, called cutting-plane methods
or localization methods, are quite different from interior-point methods, such as the barrier
method or primal-dual interior-point method described in [?, §11]. Cutting-plane methods
are usually less efficient for problems to which interior-point methods apply, but they have
a number of advantages that can make them an attractive choice in certain situations.

• Cutting-plane methods do not require differentiability of the objective and constraint


functions, and can directly handle quasiconvex as well as convex problems. Each itera-
tion requires the computation of a subgradient of the objective or constraint functions.

• Cutting-plane methods can exploit certain types of structure in large and complex
problems. A cutting-plane method that exploits structure can be faster than a general-
purpose interior-point method for the same problem.

• Cutting-plane methods do not require evaluation of the objective and all the constraint
functions at each iteration. (In contrast, interior-point methods require evaluating all
the objective and constraint functions, as well as their first and second derivatives.)
This can make cutting-plane methods useful for problems with a very large number of
constraints.

• Cutting-plane methods can be used to decompose problems into smaller problems that
can be solved sequentially or in parallel.

To apply these methods to nondifferentiable problems, you need to know about subgra-
dients, which are described in a separate set of notes. More details of the analytic center
cutting-plane method are given in another separate set of notes.

1 Cutting-planes
The goal of cutting-plane and localization methods is to find a point in a convex set X ⊆
Rn , which we call the target set, or, in some cases, to determine that X is empty. In an
optimization problem, the target set X can be taken as the set of optimal (or ǫ-suboptimal)
points for the problem, and our goal is find an optimal (or ǫ-suboptimal) point for the
optimization problem.
We do not have direct access to any description of the target set X (such as the objective
and constraint functions in an underlying optimization problem) except through an oracle.
When we query the oracle at a point x ∈ Rn , the oracle returns the following information
to us: it either tells us that x ∈ X (in which case we are done), or it returns a separating
hyperplane between x and X, i.e., a 6= 0 and b such that

aT z ≤ b for z ∈ X, aT x ≥ b.

2
x
a

Figure 1: The inequality aT z ≤ b defines a cutting-plane at the query point x, for


the target set X, shown shaded. To find a point in the target set X we need only
search in the lightly shaded halfspace; the unshaded halfspace {z | aT z > b} cannot
contain any point in the target set.

This hyperplane is called a cutting-plane, or cut, since it ‘cuts’ or eliminates the halfspace
{z | aT z > b} from our search; no such point could be in the target set X. This is illustrated
in figure 1. We call the oracle that generates a cutting-plane at x (or the message that
x ∈ X) a cutting-plane oracle. We can assume kak2 = 1, since dividing a and b by kak2
defines the same cutting-plane.
When the cutting-plane aT z = b contains the query point x, we refer to it as a neutral
cut or neutral cutting-plane. When aT x > b, which means that x lies in the interior of the
halfspace that is being cut from consideration, the cutting-plane is called a deep cut. Figure 2
illustrates a neutral and deep cut. Intuition suggests that a deep cut is better, i.e., more
informative, than a neutral cut (with the same normal vector a), since it excludes a larger
set of points from consideration.

2 Finding cutting-planes
In this section we show how to find cutting-planes for several standard convex optimization
problems. We take the target set X to be the optimal set for the problem, so the oracle
must either declare the point x optimal, or produce a hyperplane that separates x from the
optimal set. It is straightforward to include equality constraints, so we leave them out to
simplify the exposition.

3
x x

X X

Figure 2: Left: a neutral cut for the point x and target set X. Here, the query
point x is on the boundary of the excluded halfspace. Right: a deep cut for the
point x and target set X.

2.1 Unconstrained minimization


We first consider the unconstrained optimization problem

minimize f0 (x), (1)

where f0 is convex. To find a cutting-plane for this problem, at the point x, we proceed as
follows. We find a subgradient g ∈ ∂f (x). (If f is differentiable at x, then our only choice is
g = ∇f (x).) If g = 0, then x is optimal, i.e., in the target set X. So we assume that g 6= 0.
Recall that for all z we have
f0 (z) ≥ f0 (x) + g T (z − x)
(indeed, this is the definition of a subgradient). Therefore if z satisfies g T (z − x) > 0, then it
also satisfies f0 (z) > f0 (x), and so cannot be optimal (i.e., in X). In other words, we have

g T (z − x) ≤ 0 for z ∈ X,

and g T (z − x) = 0 for z = x. This shows that

g T (z − x) ≤ 0

is a (neutral) cutting-plane for (1) at x.


This cutting-plane has a simple interpretation: in our search for an optimal point, we can
remove the halfspace {z | g T (z − x) > 0} from consideration because all points in it have an
objective value larger than the point x, and therefore cannot be optimal. This is illustrated
in figure 3.
We can generate a deep cut if we know a number f¯ that satisfies

f0 (x) > f¯ ≥ f ⋆ ,

4
g ∈ ∂f (x)
x
X

Figure 3: The curves show the level sets of a convex function f0 . In this example
the optimal set X is a singleton, the minimizer of f0 . The hyperplane given by
g T (z − x) = 0 separates the point x (which lies on the hyperplane) from the optimal
set X, hence defines a (neutral) cutting-plane. All points in the unshaded halfspace
can be ‘cut’ since in that halfspace we have f0 (z) ≥ f0 (x).

where f ⋆ = inf x f0 (x) is the optimal value of the problem (1). In this case we know that any
optimal point x⋆ must satisfy

f¯ ≥ f ⋆ ≥ f0 (x) + g T (x⋆ − x),

so we have the deep cut


g T (z − x) + f0 (x) − f¯ ≤ 0.
When the problem (1) is quasiconvex (i.e., when f0 is quasiconvex), we can find a cutting-
plane at x by finding a nonzero quasigradient of f0 at x. Essentially by definition, the
inequality g T (z − x) ≤ 0 is a cutting-plane when g is a nonzero quasigradient of f at x.

2.2 Feasibility problem


We consider the feasibility problem

find x
subject to fi (x) ≤ 0, i = 1, . . . , m,

where fi are convex. Here the target set X is the feasible set.
To find a cutting-plane for this problem at the point x we proceed as follows. If x is
feasible, i.e., satisfies fi (x) ≤ 0 for i = 1, . . . , m, then x ∈ X. Now suppose x is not feasible.

5
f1 = 0

∇f1 (x)

f2 = 0

x
X

Figure 4: The curves show level sets of two convex functions f1 , f2 ; the darker
curves show the level sets f1 = 0 and f2 = 0. The feasible set X, defined by f1 ≤ 0,
f2 ≤ 0, is at lower left. At the point x, the constraint f1 (x) ≤ 0 is violated, and the
hyperplane f1 (x) + ∇f1 (x)T (z − x) = 0 defines a deep cut.

This means that there is at least one index j for which fj (x) > 0, i.e., x violates the jth
constraint. Let gj ∈ ∂fj (x). From the inequality
fj (z) ≥ fj (x) + gjT (z − x),
we conclude that if
fj (x) + gjT (z − x) > 0,
then fj (z) > 0, and so z also violates the jth constraint. It follows that any feasible z
satisfies the inequality
fj (x) + gjT (z − x) ≤ 0,
which gives us the required cutting-plane. Since fj (x) > 0, this is a deep cut.
Here we remove from consideration the halfspace defined by fj (x) + gj (x)T (z − x) > 0
because all points in it violate the jth inequality, as x does, hence are infeasible. This is
illustrated in figure 4.
Note that we can find a cutting-plane for every violated constraint, so if more than one
constraint is violated at x, we can generate multiple cutting-planes that separate x from X.
(We will see that some algorithms can make use of multiple cutting-planes at a given point.)

2.3 Inequality constrained problem


By combining the methods described above, we can find a cutting-plane for the problem
minimize f0 (x)
(2)
subject to fi (x) ≤ 0, i = 1, . . . , m,

6
where f0 , . . . , fm are convex. As above, the target set X is the optimal set.
Given the query point x, we first check for feasibility. If x is not feasible, then we can
construct a cut as
fj (x) + gjT (z − x) ≤ 0, (3)
where fj (x) > 0 (i.e., j is the index of any violated constraint) and gj ∈ ∂fj (x). This
defines a cutting-plane for the problem (2) since any optimal point must satisfy the jth
inequality, and therefore the linear inequality (3). The cut (3) is called a feasibility cut for
the problem (2), since we are cutting away a halfplane of points known to be infeasible (since
they violate the jth constraint).
Now suppose that the query point x is feasible. Find a g0 ∈ ∂f0 (x). If g0 = 0, then x
is optimal and we are done. So we assume that g0 6= 0. In this case we can construct a
cutting-plane as
g0T (z − x) ≤ 0,
which we refer to as an objective cut for the problem (2). Here, we are cutting out the
halfspace {z | g0T (z − x) > 0} because we know that all such points have an objective value
larger than x, hence cannot be optimal.
We can also find a deep objective cut, by keeping track of the best objective value fbest ,
among feasible points, found so far. In this case we can use the cutting-plane
g0T (z − x) + f0 (x) − fbest ≤ 0,
since all other points have objective value at least fbest . (If x is the best feasible point found
so far, then fbest = f (x), and this reduces the neutral cut above.)

3 Localization algorithms
3.1 Basic cutting-plane and localization algorithm
We start with a set of initial linear inequalities
Cz  d,
where C ∈ Rq×n , that are known to be satisfied by any point in the target set X. One
common choice for this initial set of inequalities is the ℓ∞ -norm ball of radius R, i.e.,
−R ≤ zi ≤ R, i = 1, . . . , n,
where R is chosen large enough to contain X. At this point we know nothing more than
X ⊆ P0 = {z | Cz  d}.
Now suppose we have queried the oracle at points x(1) , . . . , x(k) , none of which were
announced by the oracle to be in the target set X. Then we have k cutting-planes
aTi z ≤ bi , i = 1, . . . , k,

7
Pk

Figure 5: Points x(1) , . . . , x(k) , shown as dots, and the associated cutting-planes,
shown as lines. From these cutting-planes we conclude that the target set X (shown
dark) lies inside the localization polyhedron Pk , shown lightly shaded. We can limit
our search for a point in X to Pk .

that separate x(k) from X, respectively. Since every point in the target set must satisfy these
inequalities, we know that

X ⊆ Pk = {z | Cz  d, aTi z ≤ bi , i = 1, . . . , k}.

In other words, we have localized X to within the polyhedron Pk . In our search for a point
in X, we need only consider points in the localization polyhedron Pk . This is illustrated in
figure 5.
If Pk is empty, then we have a proof that the target set X is empty. If it is not, we choose
a new point x(k+1) at which to query the cutting-plane oracle. (There is no reason to choose
x(k+1) outside Pk , since we know that all target points lie in Pk .) If the cutting-plane oracle
announces that x(k+1) ∈ X, we are done. If not, the cutting-plane oracle returns a new
cutting-plane, and we can update the localization polyhedron by adding the new inequality.
This iteration gives the basic cutting-plane or localization algorithm:

Basic conceptual cutting-plane/localization algorithm

given an initial polyhedron P0 = {z | Cz  d} known to contain X.


k := 0.
repeat
Choose a point x(k+1) in Pk .
Query the cutting-plane oracle at x(k+1) .
If the oracle determines that x(k+1) ∈ X, quit.

8
Else, update Pk by adding the new cutting-plane: Pk+1 := Pk ∩ {z | aTk+1 z ≤ bk+1 }.
If Pk+1 = ∅, quit.
k := k + 1.

Provided we choose x(k+1) in the interior of Pk , this algorithm generates a strictly de-
creasing sequence of polyhedra, which contain X:

P0 ⊇ · · · ⊇Pk ⊇ X.

These inclusions are strict since each query point x(j+1) is in the interior of Pj , but either
outside, or on the boundary of, P(j+1) .

3.2 Measuring uncertainty and progress


The polyhedron Pk summarizes what we know, after k calls to the cutting-plane oracle,
about the possible location of target points. The size of Pk gives a measure of our ignorance
or uncertainty about target points: if Pk is small, we have localized the target points to
within a small set; if Pk is large, we still have much uncertainty about where the target
points might be.
There are several useful scalar measures of the size of the localization set Pk . Perhaps
the most obvious is its diameter, i.e., the diameter d of the smallest ball that contains Pk . If
this ball is {x̂ + u | kuk2 ≤ d/2}, we can say that the target set has been localized to within
a distance d/2 of the point x̂. Using this measure, we can judge the progress in a given
iteration by the reduction in the diameter of Pk . (Since Pk+1 ⊆ Pk , the diameter always
decreases.)
Another useful scalar measure of the size of the localization set is its volume. Using this
measure, we can judge the progress in iteration k by the fractional decrease in volume, i.e.,
vol(Pk+1 )
.
vol(Pk )
This volume ratio is affine-invariant: if the problem (and choice of query point) is transformed
by an affine change of coordinates, the volume ratio does not change, since if T ∈ Rn×n is
nonsingular,
vol(T Pk+1 ) vol(Pk+1 )
= .
vol(T Pk ) vol(Pk )
(The diameter ratio does not have this property.)
The log of the volume of the uncertainty set, i.e., log vol(Pk ), has a nice interpretation
as a measure of uncertainty. Up to a scale factor and an additive constant, log vol(Pk ) gives
the number of bits required to specify any point in the set with an accuracy ǫ. To see this,
we first find a minimal cover of Pk with ǫ balls. This can be done with approximately

N = vol(C)/(an ǫn )

9
balls, where an is a constant that depends only on n. To specify one of these balls requires
an index with ⌈log2 N ⌉ bits. This has form cn + log vol(C), where cn depends on n and ǫ.
Using this measure of uncertainty, the log of the ratio of the volume of Pk to the volume of
Pk+1 is exactly the decrease in uncertainty.
Volume arguments can be used to show convergence of (some) cutting-plane methods. In
one standard method, we assume that the target set X contains a ball Br of radius r > 0,
and that P0 is contained in some ball BR of radius R. We show that at each step of the
cutting-plane method the volume of Pk is reduced at least by some factor γ < 1. If the
algorithm has not terminated in k steps, then, we have

vol(Br ) ≤ vol(Pk ) ≤ γ k vol(P0 ) ≤ γ k vol(BR )

since Br ⊆ Pk and since BR ⊇ P0 . It follows that


n log(R/r)
k≤ .
log(1/γ)

3.3 Choosing the query point


The cutting-plane algorithm described above is only conceptual, since the critical step, i.e.,
how we choose the next query point x(k+1) inside the current localization polyhedron Pk , is
not fully specified. Roughly speaking, our goal is to choose query points that result in small
localization polyhedra. We need to choose x(k+1) so that Pk+1 is as small as possible, or
equivalently, the new cut removes as much as possible from the current polyhedron Pk . The
reduction in size (say, volume) of Pk+1 compared to Pk gives a measure of how informative
the cutting-plane for x(k+1) is.
When we query the oracle at the point x(k+1) , we do not know which cutting-plane will
be returned; we only know that x(k+1) will be in the excluded halfspace. The informativeness
of the cut, i.e., how much smaller Pk+1 is than Pk , depends on the direction ak+1 of the cut,
which we do not know before querying the oracle. This is illustrated in figure 6, which shows
a localization polyhedron Pk and a query point x(k+1) , and two cutting-planes that could be
returned by the oracle. One of them gives a large reduction in the size of the localization
polyhedron, but the other gives only a small reduction in size.
Since we want our algorithm to work well no matter which cutting-plane is returned by
the oracle, we should choose x(k+1) so that, no matter which cutting-plane is returned by the
oracle, we obtain a good reduction in the size of our localization polyhedron. This suggests
that we should choose x(k+1) to be deep inside the polyhedron Pk , i.e., it should be some kind
of center of Pk . This is illustrated in figure 7, which shows the same localization polyhedron
Pk as in figure 6 with a more central query point x(k+1) . For this choice of query point, we
cut away a good portion of Pk no matter which cutting-plane is returned by the oracle.
If we measure the informativeness of the kth cut using the volume reduction ratio
vol(Pk+1 )/ vol(Pk ), we seek a point x(k+1) such that, no matter what cutting-plane is re-
turned by the oracle, we obtain a certain guaranteed volume reduction. For a cutting-plane
with normal vector a, the least informative is the neutral one, since a deep cut with the

10
Pk Pk

(k+1)
x(k+1)
x
Pk+1 Pk+1

Figure 6: A localization polyhedron Pk and query point x(k+1) , shown as a dot.


Two possible scenarios are shown. Left. Here the cutting-plane returned by the
oracle cuts a large amount from Pk ; the new polyhedron Pk+1 , shown shaded, is
small. Right. Here the cutting-plane cuts only a very small part of Pk ; the new
polyhedron Pk+1 , is not much smaller than Pk .

Pk Pk

x(k+1) x(k+1)
Pk+1
Pk+1

Figure 7: A localization polyhedron Pk and a more central query point xk+1 than
in the example of figure 6. The same two scenarios, with different cutting-plane
directions, are shown. In both cases we obtain a good reduction in the size of the
localization polyhedron; even the worst possible cutting-plane at x would result in
Pk+1 substantially smaller than Pk .

11
same normal vector leads to a smaller volume for Pk+1 . In a worst-case analysis, then, we
can assume that the cuts are neutral, i.e., have the form aT (z − x(k+1) ) ≤ 0. Let ρ denote
the volume ratio, as a function of a,
vol(Pk ∩ {z | aT (z − x(k+1) ) ≤ 0})
ρ(a) = .
vol(Pk )
The function ρ is positive homogeneous, and satisfies
ρ(a) + ρ(−a) = 1.
To see this, note that a neutral cut with normal a divides Pk into two polyhedra,
Pk+1 = Pk ∩ {z | aT (z − x(k+1) ) ≤ 0}, P k+1 = Pk ∩ {z | aT (z − x(k+1) ) ≥ 0}.
The first is the new localization polyhedron, and the second is the polyhedron of points
‘thrown out’ by the cut. The sum of their volumes is the volume of Pk . If the cut with
normal −a is considered, we get the same two polyhedra, with the new polyhedron and the
polyhedron of ‘thrown out’ points switched.
From ρ(a) + ρ(−a) = 1, we see that the worst-case volume reduction satisfies
sup ρ(a) ≥ 1/2.
a6=0

This means that the worst-case volume reduction (over all possible cutting-planes that can
be returned by the oracle) can never be better (smaller) than 1/2. The best possible (guar-
anteed) volume reduction we can have is 50% in each iteration.

4 Some specific cutting-plane methods


Many different choices for the query point have been proposed, which give different cutting-
plane or localization algorithms. These include:
• The center of gravity algorithm, also called the method of central sections. The query
point x(k+1) is chosen as the center of gravity of Pk .
• Maximum volume ellipsoid (MVE) cutting-plane method. The query point x(k+1) is
chosen as the center of the maximum volume ellipsoid contained in Pk .
• Chebyshev center cutting-plane method. The query point x(k+1) is chosen as the Cheby-
shev center of Pk , i.e., the center of the largest Euclidean ball that lies in Pk .
• Analytic center cutting-plane method (ACCPM). The query point x(k+1) is chosen as
the analytic center of the inequalities defining Pk .
Each of these methods has advantages and disadvantages, in terms of the computational
effort required to determine the next query point, the theoretical complexity of the method,
and the practical performance of the method.

12
4.1 Bisection method on R
We first describe a very important cutting-plane method: the bisection method. We consider
the special case n = 1, i.e., a one-dimensional search problem. We will describe the tradi-
tional setting in which the target set X is the singleton {x∗ }, and the cutting-plane oracle
always returns a neutral cut. The cutting-plane oracle, when queried with x ∈ R, tells us
either that x∗ ≤ x or that x∗ ≥ x. In other words, the oracle tells us whether the point x∗
we seek is to the left or right of the current point x.
The localization polyhedron Pk is an interval, which we denote [lk , uk ]. In this case, there
is an obvious choice for the next query point: we take x(k+1) = (lk + uk )/2, the midpoint of
the interval. The bisection algorithm is:

Bisection algorithm for one-dimensional search.

given an initial interval [l, u] known to contain x∗ ; a required tolerance r > 0


repeat
x := (l + u)/2.
Query the oracle at x.
If the oracle determines that x∗ ≤ x, u := x.
If the oracle determines that x∗ ≥ x, l := x.
until u − l ≤ 2r

In each iteration the localization interval is replaced by either its left or right half, i.e., it
is bisected. The volume reduction factor is the best it can be: it is always exactly 1/2. Let
2R = u0 −l0 be the length of the initial interval (i.e., 2R gives it diameter). The length of the
localization interval after k iterations is then 2−k 2R, so the bisection algorithm terminates
after exactly
k = ⌈log2 (R/r)⌉ (4)
iterations. Since x∗ is contained in the final interval, we are guaranteed that its midpoint
(which would be the next iterate) is no more than a distance r from x∗ . We can interpret
R/r as the ratio of the initial to final uncertainty. The equation (4) shows that the bisection
method requires exactly one iteration per bit of reduction in uncertainty.
It is straightforward to modify the bisection algorithm to handle the possibility of deep
cuts, and to check whether the updated interval is empty (which implies that X = ∅). In
this case, the number ⌈log2 (R/r)⌉ is an upper bound on the number of iterations required.
The bisection method can be used as a simple method for minimizing a differentiable
convex function on R, i.e., carrying out a line search. The cutting-plane oracle only needs
to determine the sign of f ′ (x), which determines whether the minimizing set is to the left (if
f ′ (x) ≥ 0) or right (if f ′ (x) ≤ 0) of the point x.

13
4.2 Center of gravity method
The center of gravity method, or CG algorithm, was one of the first localization methods
proposed, by Newman [?] and Levin [?]. In this method we take the query point to be
x(k+1) = cg(Pk ), where the center of gravity of a set C ⊆ Rn is defined as

z dz
R
cg(C) = RC ,
C dz

assuming C is bounded and has nonempty interior. The center of gravity is invariant under
affine transformations, so the CG method is also affine-invariant.
The center of gravity turns out to be a very good point in terms of the worst-case volume
reduction factor: we always have

vol(Pk+1 )
≤ 1 − 1/e ≈ 0.63.
vol(Pk )

In other words, the volume of the localization polyhedron is reduced by at least 37% at each
step. Note that this guaranteed volume reduction is completely independent of all problem
parameters, including the dimension n.
This guarantee comes from the following result: suppose C ⊆ Rn is convex, bounded,
and has nonempty interior. Then for any nonzero a ∈ Rn , we have
 
vol C ∩ {z | aT (z − cg(C)) ≤ 0} ≤ (1 − 1/e) vol(C).

In other words, a plane passing though the center of gravity of a convex set divides its volume
almost equally: the volume division inequity can be at most in the ratio (1 − 1/e) : 1/e, i.e.,
about 1.72 : 1.
In the CG algorithm we have

vol(Pk ) ≤ (1 − 1/e)k vol(P0 ) ≈ 0.63k vol(P0 ).

Now suppose the initial polyhedron lies inside a Euclidean ball of radius R (i.e., it has
diameter ≤ 2R), and the target set contains a Euclidean ball of radius r. Then we have

vol(P0 ) ≤ αn Rn ,

where αn is the volume of the unit Euclidean ball in Rn . Since X ⊆ Pk for each k (assuming
the algorithm has not yet terminated) we have

αn rn ≤ vol(Pk ).

Putting these together we see that

αn rn ≤ (1 − 1/e)k αn Rn ,

14
so
n log(R/r)
k≤ ≈ 2.18n log(R/r).
− log(1 − 1/e)
We can express this using base-2 logarithms as
n(log 2) log2 (R/r)
k≤ ≈ 1.51n log2 (R/r),
− log(1 − 1/e)
in order to compare this complexity estimate with the similar one for the bisection algo-
rithm (4). We conclude that the CG algorithm requires at most 1.51n iterations per bit of
uncertainty reduction. (Of course, the CG algorithm reduces to the bisection method when
n = 1.)
Finally, we come to a very basic disadvantage of the CG algorithm: it is extremely
difficult to compute the center of gravity of a polyhedron in Rn , described by a set of
linear inequalities. (It is possible to efficiently compute the center of gravity in very low
dimensions, e.g., n = 2 or n = 3, by triangulation.) This means that the CG algorithm,
although interesting, is not a practical cutting-plane method.
Variants of the CG algorithm have been developed, in which an approximate center
of gravity, which can be efficiently computed, is used in place of the center of gravity,
but they are generally complicated. One recent and very interesting advance in this area
is the so-called ‘hit-and-run’ algorithm, which is a randomized method for computing an
approximation of the center of gravity of a convex set. This can be use to create a practical
CG method; see, e.g., [?].

4.3 MVE cutting-plane method


In the maximum volume inscribed ellipsoid method, due to Tarasov, Khachiyan, and Èrlikh
[?], we take the next iterate to be the center of the maximum volume ellipsoid that lies in
Pk , which can be computed by solving a convex optimization problem [?, §8.4.2]. Since the
maximum volume inscribed ellipsoid is affinely invariant, so is the resulting MVE cutting-
plane method.
For the MVE cutting-plane method, there is also a bound on the volume reduction factor:
vol(Pk+1 )
≤ 1 − 1/n.
vol(Pk )

In this case the volume reduction factor is dependent on n, and degrades (i.e., increases) for
increasing n. An analysis similar to the one given above for the CG algorithm shows that
the number of iterations is no more than
n log(R/r)
k≤ ≈ n2 log(R/r),
− log(1 − 1/n)
provided the initial polyhedron lies in a ball of radius R, and the target set contains a ball
of radius r.

15
4.4 Chebyshev center cutting-plane method
In the Chebyshev center cutting-plane method, due to Elzinga and Moore [?], the query point
x(k+1) is taken to be the Chebyshev center of the current polyhedron Pk , i.e., the center of
the largest Euclidean ball that lies inside it. This point can be computed by solving a linear
program [?, §8.5.1]. Unlike the other methods described here, the Chebyshev center cutting-
plane method is not affinely invariant. The Chebyshev center cutting-plane method can be
strongly affected by problem scaling or affine transformations of coordinates.

4.5 Analytic center cutting-plane method


The analytic center cutting-plane method (ACCPM) uses as query point the analytic center
of Pk , i.e., the solution of the problem
P m0 Pm k
minimize − i=1 log(di − cTi x) − i=1 log(bi − aTi x),

with variable x, where

Pk = {z | cTi z ≤ di , i = 1, . . . , m0 , aTi z ≤ bi , i = 1, . . . , mk }.

(We have an implicit constraint that x ∈ int Pk .) ACCPM was developed by Goffin and
Vial [?] and analyzed by Nesterov [?] and Atkinson and Vaidya [?].
ACCPM seems to give a good trade-off in terms of simplicity and practical performance.
It will be described in much more detail in a separate set of notes.

5 Extensions
In this section we describe several extensions and variations on cutting-plane methods.

5.1 Multiple cuts


One simple extension is to allow the oracle to return a set of linear inequalities for each query,
instead of just one. When queried at x(k) , the oracle returns a set of linear inequalities which
are satisfied by every z ∈ X, and which (together) separate x(k) and X. Thus, the oracle can
return Ak ∈ Rpk ×n and bk ∈ Rpk , where Ak z  bk holds for every z ∈ X, and Ak x(k) 6≺ bk .
This means that at least one of the pk linear inequalities must be a valid cutting-plane by
itself. The inequalities which are not valid cuts by themselves are called shallow cuts.
It is straightforward to accommodate multiple cutting-planes in a cutting-plane method:
at each iteration, we simply append the entire set of new inequalities returned by the oracle
to our collection of valid linear inequalities for X. To give a simple example showing how
multiple cuts can be obtained, consider the convex feasibility problem

find x
subject to fi (x) ≤ 0, i = 1, . . . , m.

16
In §2 we showed how to construct a cutting-plane at x using any (one) violated constraint.
We can obtain a set of multiple cuts at x by using information from any set of inequalities,
provided at least one is violated. From the basic inequality

fj (z) ≥ fj (x) + gjT (z − x),

where gj ∈ ∂fj (x), we find that every z ∈ X satisfies

fj (x) + gjT (z − x) ≤ 0.

If x violates the jth inequality, this is a deep cut. If x satisfies the jth inequality, this is a
shallow cut, and can be used in a group of multiple cuts, as long as one neutral or deep cut
is present. Common choices for the set of inequalities to use to form cuts are
• the most violated inequality, i.e., argmaxj fj (x),

• any violated inequality (e.g., the first constraint found to be violated),

• all violated inequalities,

• all inequalities.

5.2 Dropping or pruning constraints


The computation required to find the new query point x(k+1) grows with the number of
linear inequalities that describe Pk . This number, in turn, increases by one at each iteration
(for a single cut) or more (for multiple cuts). For this reason most practical cutting-plane
implementations include a mechanism for dropping or pruning the set of linear inequalities
as the algorithm progresses. In the conservative approach, constraints are dropped only
when they are known to be redundant. In this case dropping constraints does not change
Pk , and the convergence analysis for the cutting-plane algorithm without pruning still holds.
The progress, judged by volume reduction, is unchanged when we drop constraints that are
redundant.
To check if a linear inequality aTi z ≤ bi is redundant, i.e., implied by the linear inequalities
aTj z ≤ bj , j = 1, . . . , m, we can solve the linear program

maximize aTi z
subject to aTj z ≤ bj , j = 1, . . . , m, j 6= i.

The linear inequality is redundant if and only if the optimal value is (strictly) smaller than
bi . Solving a linear program to check redundancy of each inequality is usually too costly,
and therefore not done.
In some cases there are other methods that can identify (some) redundant constraints,
with less computational effort.

E = {F u + g | kuk2 ≤ 1} (5)

17
is known to cover the current localization polyhedron P. (In the MVE cutting-plane method,
such an ellipsoid can be obtained by expanding the maximum volume ellipsoid inside Pk by
a factor of n about its center.) If the maximum value of aTi z over E is smaller than or equal
to bi , i.e.,
aTi g + kF T ai k2 ≤ bi ,
then the constraint aTi z ≤ bi is redundant.
In practical approaches, constraints are dropped even when they are not redundant, or
at least not known to be redundant. In this case the pruning can actually increase the size of
the localization polyhedron. Heuristics are used to rank the linear inequalities in relevance,
and the least relevant ones are dropped first. One method for ranking relevance is based on
a covering ellipsoid (5). For each linear inequality we form the fraction
bi − aTi g
,
kF T ai k2
and then sort the inequalities by these factors, with the lowest numbers corresponding to
the most relevant, and the largest numbers corresponding to the least relevant. Note that
any inequality for which the fraction exceeds one is in fact redundant.
One common strategy is to keep a fixed number N of linear inequalities, by dropping
as many constraints as needed, in each iteration, to keep the total number fixed at N .
The number of inequalities kept (i.e., N ) is typically between 3n and 5n. Dropping non-
redundant constraints complicates the convergence analysis of a cutting-plane algorithm,
sometimes considerably, so we will not consider pruning in our analysis. In practice, pruning
often improves the performance, in terms of the number of iterations required to obtain a
given accuracy, despite the increase in the localization polyhedron that occurs when non-
redundant constraints are dropped. In terms of computational effort, which depends on the
number of iterations as well as the number of inequalities in the localization polyhedron, the
improvement due to pruning can be dramatic.

5.3 Nonlinear cuts


It is straightforward to extend the idea of cutting-planes, which are described by linear
inequalities that hold for each point in X, to cutting-sets, which are described by more
complex convex inequalities. For example, a quadratic cut has the form of a convex quadratic
inequality that every z ∈ X is known to satisfy. In this case the localization set is no longer
a polyhedron. However, many of the same ideas work; for example, the analytic center of
such a set is readily computed, so ACCPM can be extended to handle nonlinear cuts.

6 Epigraph cutting-plane method


For a convex optimization problem (as opposed to a quasiconvex problem), it is usually
better to apply a cutting-plane method to the epigraph form of the problem, rather than
directly to the problem.

18
We start with the problem

minimize f0 (x)
(6)
subject to fi (x) ≤ 0, i = 1, . . . , m,

where f0 , . . . , fm are convex. In the basic cutting-plane method outlined above, we take the
variable to be x, and the target set X to be the set of optimal points. Cutting-planes are
found using the methods described in §2.
Suppose instead we form the equivalent epigraph form problem

minimize t
subject to f0 (x) ≤ t (7)
fi (x) ≤ 0, i = 1, . . . , m,

with variables x ∈ Rn and t ∈ R. We take the target set to be the set of optimal points for
the epigraph problem (7), i.e.,

X = {(x, f0 (x)) | x optimal for (6)}.

Let us show how to find a cutting-plane, in Rn+1 , for this version of the problem, at the
query point (x, t). First suppose x is not feasible for the original problem, e.g., the jth
constraint is violated. Every feasible point satisfies

0 ≥ fj (z) ≥ fj (x) + gjT (z − x),

where gj ∈ ∂fj (x), so we can use the cut

fj (x) + gjT (z − x) ≤ 0

(which doesn’t involve the second variable). Now suppose that the query point x is feasible.
Evaluate a subgradient g ∈ ∂f0 (x). If g = 0, then x is optimal; otherwise, for any (z, s) ∈
Rn+1 feasible for the problem (7), we have

s ≥ f0 (z) ≥ f0 (x) + g T (z − x)

Since x is feasible, f0 (x) ≥ p⋆ , which is the optimal value of the second variable. Thus, we
can construct two cutting-planes in (z, s):

f0 (x) + g T (z − x) ≤ s, s ≤ f0 (x).

7 Lower bounds and stopping criteria


In this section we describe a general method for constructing a lower bound on the optimal
value of a convex problem, assuming we have evaluated a subgradient of its objective and
constraint functions at some points. The method involves solving a linear program using

19
data collected from the subgradient evaluations. This method can be used in cutting-plane
methods to give a non-heuristic stopping criterion. In the analytic center cutting-plane
method, a lower bound based on this one can be computed very cheaply at each iteration.
Consider a convex function f . Suppose we have evaluated f and a subgradient of f at
points x(1) , . . . , x(q) . We have, for all z,

f (z) ≥ f (x(i) ) + g (i)T (z − x(i) ), i = 1, . . . , q,

and so
f (z) ≥ fˆ(z) = max f (x(i) ) + g (i)T (z − x(i) ) .
 
i=1,...,q

The function fˆ is a convex piecewise-linear global underestimator of f .


Now suppose that we use a cutting-plane method to solve the problem

minimize f0 (x)
subject to fi (x) ≤ 0, i = 1, . . . , m (8)
Cx  d,

where fi are convex. After k steps, we have evaluated the objective or constraint functions,
along with a subgradient, k times (or more, if multiple cuts are used). As a result, we can
form piecewise-linear approximations fˆ0 , . . . , fˆm of the objective and constraint functions.
Now we form the problem

minimize fˆ0 (x)


subject to fˆi (x) ≤ 0, i = 1, . . . , m (9)
Cx  d.

Since the objective and constraint functions are piecewise-linear, this problem can be trans-
formed to a linear program. Its optimal value is a lower bound on p⋆ , the optimal value of
the problem (8), since fˆi (x) ≤ fi (x) for all x and i = 0, . . . , m.
Computing this lower bound requires solving a linear program, and so is relatively ex-
pensive. In ACCPM, however, we can easily construct a lower bound on the problem (9), as
a by-product of the analytic centering computation, which in turn gives a lower bound on
the original problem (8).

Acknowledgments
Lin Xiao and Joëlle Skaf helped develop the material here.

20

You might also like