Local Search in Smooth Convex Sets: CX Ax B A I A A A A A A O D X Ax B X CX CX O A I J Z O Opt D X X C A B P CX

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Local Search in Smooth Convex Sets

Ravi Kannan Yale University Department of Computer Science New Haven, CT 06510 [email protected] Andreas Nolte Universit t zu K ln a o Institut f r Informatik u 50931 K ln o [email protected]

Abstract
In this paper we analyse two very simple techniques to minimize a linear function over a convex set. The rst is a deterministic algorithm based on gradient descent. The second is a randomized algorithm which makes a small local random change at every step. The second method can be used when the convex set is presented by just a membership oracle whereas the rst requires something similar to a separation oracle. We dene a simple notation of smoothness of convex sets and show that both algorithms provide a near optimal solution for smooth convex sets in polynomial time. We describe several application examples from Linear and Stochastic Programming where the relevant sets are indeed smooth and thus our algorithms apply. The main point of the paper is that such simple algorithms yield good running time bounds for natural problems.

of steps determined in advance. We will develop a framework, namely the notation of smooth sets, in order to analyse these local search techniques and give a number of application examples from Linear and Stochastic Programming. The surprising fact is, that these simple methods work very well in a number of interesting cases. The rst example where our techniques apply is to linear programs

max cx

subject to

Ax  b

which have the special property that (here and below we use the notation that Ai denotes the i th row of A 2 mn )

Aj Ai  ,1 + jAi jjAj j for a constant  0. This stipulates that there must be at least a constant angle between any two constraints of A, i.e.,
that there are no sharp corners in the polyhedron. For such linear programs, we show that our algorithm nds a nearly feasible, nearly optimal point within additive 2 2 error of in O  m nD  arithmetic operations. D denotes the diameter of the set fxjAx  bg fxjcx  cxf g and the O notation suppresses, as usual, logarithmic factors. Positive or up-monotone Linear Programs (i.e. where Aij  0 for all i; j ) is obviously a special case. In this special case, our algorithms can nd an approximate
solution
 m22n2 log2 D z that satises the inequalities in O Opt arithmetic operations. Opt is the value of the objective function of the optimal solution and D is here the value of an initially given feasible solution xf . If we do not have an initially feasible solution, we can nd one by setting b xf i = maxi;j aii - the algorithm is still polynomial. Our second example is from a general class of problems called Constrained Probabilistic Programming [14] on which there is substantial literature. Here, as in the deterministic case we are given a cost vector c and a matrix A. But, here we have a random vector b with a probability distribution P and the task is to minimize cx over the feasible
j

1. Introduction
In this paper we analyse two geometric approaches to minimize a linear function cx over a convex set. The rst algorithm can be used when the set is given as a level set M = fx : F x  g of a convex function F : n ! . A step of the algorithm is simply described : from the current feasible point x (inside M ), we compute y = x , rF x , c where ; are small positive reals and if y is in M , we replace x by y , and repeat, otherwise we terminate. This of course requires the computation of the gradient of F . The second algorithm applies when the convex set is only given by a membership oracle; in the second algorithm, we generate a new point y in a ball with center x and a certain radius r uniformly at random. If y 2 M and y has a better objective function value, we go to y; otherwise, we do not. The algorithm terminates after a number

set M = fx : P Ax  b  g for a xed 1. Under mild and natural conditions on the density of b, one can see that M is convexn we can prove a running time bound and o

calls to the memberof O ship oracle of M . We will see later that such a membership oracle is naturally available. These bounds can be improved in the case of independent distribution of the components of b by a factor of n0:5m. A similar approach is possible for another problem in Constrained Probabilistic Programming, namely the Component Commonality Problem, where also a fast running time of our method can be proved [10]. Finally, we will describe a method to make a general class of convex sets smooth, that are given by a membership oracle. This implies that our simple optimization algorithm can also be applied to certain, not necessarily smooth sets with good running time bounds. Furthermore, this method improves the local conductance [8] of the convex set considerably without changing the set much. This might have some implications for the running time of the random sampling algorithm, that approximates the volume of a convex set, but this is subject of further research.

n2:5 m min miny2M 2 cy;cxf

tions. Each iteration involves the computation of F as well as the computation of the rst and second derivatives and the solution of a linear system in n [15]. The parameter  is dependent on the barrier function and is at least k, if there exists an afne subspace in M , that contains a vertex where precisely k linearly independent constraints are active.

The running time of our algorithm is not directly comparable to the running time of the method of centers. Our gradient descent algorithm does not involve the computation of the second derivative (i.e. the Hessian of F ) and yields a 2 2 solution which is approximately optimal in D log D evaluations of F . D denotes the diameter of M and is a parameter, that measures smoothness. Moreover, despite being much simpler, our technique can also be applied in the case where M is given only by a membership oracle. Here, the method of centers cannot be applied directly, since the rst and second derivatives have to be computed with a high precision. In the case of Linear Programming it is of course possible to solve the optimization problem exactly in polynomial time with the ellipsoid method [6] or an interior point method [15]. However, the best known time bound for general linear programming is Om1:5 nL arithmetic operations (VAIDYA[16]). L is a parameter, that is bounded by the size of the problem [15]. Given the assumptions of the LP described above our algorithm outperforms VAIDYA S algorithm, if is large (e.g. a constant) and the diameter is small. In the special case of Positive Linear Programming our algorithm cannot improve the best known running time O  mn  of L UBY and N ISAN [11]. But their Lagrangian 4 based algorithm is not as simple and has to cleverly exploit the structure of the problem. In the case where our convex set is given by an oracle we could, as common in non-linear optimization, also use a gradient descent approach by estimating the gradient. Especially in stochastic optimization, where these problems arise frequently, this is the usual approach in practice to solve these problems ([17]). Because of the difculty of getting an accurate estimate of the components of the gradient in a reasonable amount of time, apart from asymptotic results, only empirical running time results are known [17]. One could also use the Ellipsoid method to solve the problem, since membership and separation oracle are polynomially equivalent [6]. But the conversion needs a large number of steps and our approach here is more efcient. In the special case of the Component Commonality Problem our approach yields, despite being much simpler, an improved running time in comparison to the best known result so far from K ANNAN , M OUNT and TAYUR [9].

2. Comparison
While of course gradient descent methods are very old, the analysis presented in this paper is new. This seems to be one of the rst non-asymptotic results of gradient descent methods in a non-trivial geometric setting. Furthermore, we will develop a notation of smooth sets, where these simple methods have provable good running time bounds, that is very generally applicable. This is, additionally to the simplicity of this technique, the reason for the interest in this approach. An alternative approach to solve the problem of minimizing a linear function over a convex set is the method of centers from N ESTEROV and N EMIROVSKY [13]. It can be applied in the case where the convex set is given as a level set M = fx : F x  g of a given convex function F : n ! . The method of centers is based on a logarithmic barrier function. It proceeds in stages and the Newton method is used to construct a series of feasible points in the interior of M , that follow approximately a certain central path [15]. The barrier function of M is tightly connected to F and has to be self-concordant (i.e. the third derivative is bounded above by a multiple of the second derivative), which implies fast convergence of the Newton method in a certain area, and -self-limiting, which implies a certain lower bound on the size of the area of fast convergence of the Newton method. Given a suitable starting point xf , this method can provide a feasible point xK p cx ,cxopt  iterawith jcxK , cxopt j in K = O  ln K

3. Smoothness
In the following we will describe a suitable condition on our convex sets, that will imply a fast running time of our local improvement algorithms. Denition 1 Let F : n ! be a partially differentiable function, ; r; 2 + . The set M = fx 2 n : F x  g is called ; r-smooth, iff M is convex and for every x 2 fx 2 n : F x = g the set

Before we describe our approximation algorithm, we want to make an observation, that is used in the analysis several times. We prove a lower bound on the angle between the tangent hyperplane and the isohyperplane (same values of the objective function as x) at an arbitrary point x at the boundary of M , that is still quite far away from the optimum in terms of the objective function. Let x be at the boundary of M (i.e. F x = ) and  = cx , miny2M cy: We assume c to be a unit vector. Lemma 2 Suppose H1 is the tangent hyperplane of M at  x with normal vector t = ,rFxxj and H2 is the hyperplane jrF  with normal vector c and x 2 H2 . Let D be the diameter of the set M fy 2 n jcy  cxg. Thus

fy 2 Rn : jy,xj2  rg
is contained in M .

fy 2 Rn : x,y

rF x  g jrF xj2

It is easy to see that this denition reects the intuitive way to think about smoothness. If a set M = fx 2 jF x  g is ; r-smooth, then at every border point the cap of a ball with radius r is contained in the set (see Figure 3). To

~  ~  cos  : t c D
Proof: Elementary geometry.

cap

Roughly speaking, Lemma 2 implies, that, if we are quite far away from the optimum in terms of the objective function, but not too far away in terms of L2 distance, we can expect a quite large angle between the tangent hyperplane and the isohyperplane.

x F(x)

4. The gradient descent method


Here, we assume that F itself and the gradient rF can be computed efciently. To avoid messy notation we assume further that both evaluations take the same amount of time which is at least n. We will give a running time bound in terms of the evaluations of F . The following condition is our condition on the smoothness of M in order to optimize over M efciently:

Figure 1. A smooth set be more precise let H be the hyperplane with normal vector rF x and x , rF x 2 H . Then the part of the ball that is, compared to x, on the other side of the hyperplane should be fully contained in M . This prevents the border of M having sharp corners, if we assume  r. Obviously M is just a halfspace, if  = 0 and r 0 and the denition is meaningless if   r. This implies that the relation of  and r is a measure of smoothness. The next lemma is a useful criterion for smoothness. Lemma 1 Suppose F : n ! is twice differentiable, ; r 2 + and x satises F x = . Let y 2 B x; r (ball with center x and radius r according to the L2 norm) and let G
 = F x +
y , x : 0; 1 ! . If

9 8r  1 8  r2 M

is ; r-smooth.

(2)

2jrF xj2  G00 




It says that if we choose  to be a certain xed fraction of r (i.e. we want to have a certain cap inside the convex set), then we have to choose the radius accordingly. Therefore is a direct measure of the curvature. The more our level curve is bent, the smaller the chosen radius and the larger have to be and vice versa. The following theorem summarizes the running time analysis of our gradient descent algorithm. Theorem 1 Let xf 2 M and D be the L2 -diameter of the set M fy 2 njcy  cxf g. The gradient descent algorithm will nd a z 2 M with cz  miny2M cy + in

(1)

for every
2 ; r-smooth.

0; 1

and every such pair

x; y, then M

is

O D

Proof: Easy application of the second order Taylor theorem. 2

Let miny2M cy  C (this is only a technical assumption enabling us to do a binary search for the correct value of the minimum later). We assume x 2 M to be a border

log2 D  evaluations of F .

Procedure Decrease(x; ) begin = =D;

we are given an initially feasible point xf (i.e. b)

Axf 

z0 = x ,

,c +
end;

Find next border point z in direction

rF x 6 jrF xj ;


2

Return z ;

rF x rF x jrF xj c jrF xj (line search);

Figure 2. The procedure Decrease

0 (we have an initially point of M with  = cx , C given feasible point xf 2 M and nding a border point is easy using line search). In the following we will describe a single step of our algorithm and analyse the improvement. We try to optimize the objective function over that part of the ball with radius r 2 + , that is, due to the smoothness condition, guaranteed inside of M .

We will denote the set of feasible points fxjAx  bg by P . The basic idea to apply the gradient descent algorithm is, that we have dened a set, that contains P , however, that is not much larger than P , but smooth without sharp corners. We optimize over this larger set using the gradient descent method described above. In order to state our result we dene for x 2 n Ai x , bi , = ,Ai x + bi , if Ai x bi and 0 elsewhere as the distance of x to the hyperplane Ai y P bi , if x is not on the right side. We = m consider F x = i=1 Ai x , bi , 2 and dene M = n jF x  g for a given fx 2 0 as the extension of P , which is still convex. Given ; 0 our gradient descent algorithm will nd a point z s.t. cz  minAxb cx + and F x  i.e. the objective function value of z is within a small range of the actual solution and the sum of the violations of the inequalities squared is bounded by . The main object of our efforts in this section is to prove the following Proposition.

Proposition 1 Suppose x is a point at the border of M . Let = =D and cx = C + . Procedure Decrease (see 2 Figure 4) nds a z 2 M with cz  cx , 12 in Olog D evaluations of F . Proof: (Sketch) By dening r = 3 and  = r=2 we have ; r-smoothness of M . Using this and the fact about the angle in Lemma 2 it is easy to prove the Lemma. 2 By repeated application of Proposition 1 and by doing a binary search on C to nd a C near the minimal value miny2M cy the main result of this section, Theorem 1, follows.

Proposition 2 Suppose xf 2 P is the initially feasible solution and D is the (nite) diameter of the set fxjAx  0. The gradient descent bg fxjcx  cxf g.Let ; algorithm described in the lastP section can nd a point z with ~z  miny2P ~y + and j Aj  z , b+ 2  in c c
2 2 2 O m nDplog D  arithmetic operations.

As a rst step we show, that M is smooth. Lemma 3

m 8r  1 8  2p r2 M

is ; r-smooth.

4.1. Application example: Linear Programming


In the following we will describe an application example that does not t in this framework in the obvious way. We will give an approximation algorithm for certain types of linear programs, which are obviously not smooth in general. But, it turns out that we can consider a slightly bigger, but smooth set and apply our method to this set. In general we are interested in solving the following problem

m Proof: Let r  1;   2p r2 be given and assume F x = (i.e. x is at the border of M ). An easy calculation yields ,  rF x = , Pm 2Ai x , bi + aik k=1;:::;n : The asi=1 sumptions on A allow us us to get a lower bound on the length of the gradient: jrF xj2  4 : Let y 2 B x; r 2 and G P F x + y , x : 0; 1 ! : We get =

R G00 
  m 2Ai y , x2  2mjy , xj2  2mr2 : 2 i=1 Using Lemma 1 the result follows. 2
With Theorem 1 the proof of Proposition 2 is complete.

4.2. Positive Linear Programming


In this subsection we are concerned with the special case that each entry of the matrix A is positive and P is contained in the positive orthand. This is a special case of the last section since the scalar product of each row is even bounded below by 0. It is easy to see, that we may assume, that the cost vector c has all strictly positive components. After ~ 1 rescaling, if necessary, we may also assume, that c = pn , the normalized vector of 1s. In order to get hold on the

Ax  b; where c 2 Rn is a unit cost vector, A is a m  n matrix with normalized rows jAi j = 1 and b 2 Rm . In order
min cx
s.t. to apply the gradient descent method we have to put some conditions on the instances. We assume

Aj Ai  ,1 +  for a constant 

0 (i.e. there must be a constant angle between the hyperplanes)

diameter D of the set fxjAx  bg fxjcx  cxf g, we consider the initially given feasible solution xf 2 P . By p dening D = 2~ xf  it is easy to see that every solution 1 ~ y  ~ xf must satisfy jy , xf j2  D; since P y 2 P with 1 1 is contained in the positive orthand. The following theorem is the main result of this section. Theorem 2 Suppose xf is an initially feasible point in p P and D = 2~ xf . The gradient descent algorithm 1 can nd a z 2 P with ~z  1 +  miny2P ~y in c c m22n2 log2 D log n log 1  arithmetic operations. O Opt As a rst step we are concerned with a proper choice of in order to round a solution x 2 M to a feasible solution without losing much in terms of the objective function. After the termination of our process analogous to the last ~ ~ 1 1 section we get an x 2 M with px  miny2P py + : We n n

Procedure Randomgreedy (xf ; , membership oracle for M , diameter D) begin x = xf ; U = ~x; c L = U , D; Repeat;

1 1 have to round it to x0 2 P with px  miny2P py + 2 : n n Before we round we will carry out a preprocessing step to get closer to the set of feasible points P without making our objective function much worse. The idea is to calculate repeatedly the gradient of F at the current point and go in every step a little bit along the negative gradient towards P .

end;

50 3 2 Repeat UD Ln times; , L r = 50U ,n0:5 ; D Repeat(basic step) Generate a random x0 2 B x; r; check whether x0 2 M ; check whether cx0 cx; Until x 2 M and cx0 cx; Find border point x in direction x0 ; End(repeat); If ~x  2=3U + 1=3L c then U = 2=3U + 1=3L; Else L = 2=3L + 1=3U Until U , L  ;

Proposition 3 Assume

F x  in Om2 n log n arithmetic operations.

16 log2 n and let x 2 M with ~ ~ 1 1 . We can round x to a z 2 P with pz  px + n n

Figure 3. The algorithm random greedy objective function value, we move in the direction of this point (binary search for the next border point) and iterate. Otherwise we repeat the random generation. This means we are just looking for a random direction, in which the objective function might improve. The algorithm is described in detail in Figure 5. The following condition is again our condition on M in order to optimize efciently.

Proof: (Sketch) Using the idea of repeatedly calculating the gradient we can nd, starting from an x 2 M with F x = , a y 2 M with jx , yj2 = 3p log n and F y  =4n in Om2 n log n arithmetic steps. Using this and applying the facts, that P is up-monotone (i.e. x 2 P and y  x componentwise implies y 2 P ) and F x  inf y2P jx , yj2 the result follows. 2 2 Finally we use a scaling approach to prove Theorem 2, i.e. we apply our gradient descent methods repeatedly to a rescaled polyhedron. The analysis is technical and is omitted from this abstract.

9 8r  1 8  r2 M

is (; r)-smooth

(3)

In the following we are concerned with the proof of the main theorem. Theorem 3 Suppose xf 2 M and D is the diameter of the set M fy 2 n jcy  cxf g. The random greedy algorithm described in Figure 3 will nd a z 2 M with

5. Membership Oracle
In this section we do not assume that we can compute F and the gradient of F efciently. We just assume that we are given a membership oracle for the set M that can decide whether a given point x 2 n belongs to M or not. Our aim is, as in the last section, to minimize a linear function over M , while we assume the existence of an initially given point xf 2 M . Instead of computing the gradient at a feasible point x and going perpendicular to it in the direction of the objective function, we just generate a point in a ball of a certain radius r and center x uniformly at random. If the point is feasible (i.e. in M ) and has a better

cz  miny2M cy +
ship oracle.

3 in O D 2 n calls to the member-

The simple greedy strategy was outlined above. We will proceed with the analysis of a single step. The aim is that we will indeed get an improvement with a certain probability depending on the value of the objective function of the current point. We assume that our current point x is at the border of M . First, we try to get a lower bound on the probability to get a feasible point in M by randomly picking a point in a ball B x; r with center x and radius r. Let

 = miny2M cy , cxf

0:

 Lemma 4 Suppose = D , r = 50pn and Bnf = fy 2 n jjy , xj  r and F y  F x = g: This implies vol Bnf x; r  1 2 vol B x; r. 4

Furthermore, we are given a cost vector c 2 problem is to minimize cx under the condition

Rn and the

erated ball in half. According to the denition of smoothness and (3) the volume of the set Bnf x; r of points y that are on the right side of the hyperplane and F y  F x (i.e. y 2 P ) is bounded above by voln,1 B x; r = = rn,1n,,1 r  1 voln B x; r, where c  2 is a cercn 1 ~ ~ p 4 2 tain, ball specic constant.

r Proof: Let  = 50pn . The tangent hyperplane cuts the gen-

P Ax  b 
for a certain xed 1. This natural extension of Linear Programming was already considered by DANTZIG [3] and there exists substantial literature on this topic (see [14] for an overview). The general assumption is that membership queries can be answered efciently. This means that we are given a subroutine which can decide for a given x 2 n whether P Ax  b  . In practical settings we could use a sampling approach, if a set of sample vectors b according to the density function is given. Or we could use a Markov chain approach [1], if the density is known. But we do not go into the details here. In order to use our local search approach we have to put certain conditions on the instances to get smooth sets, since obviously a sharp concentrated density would cause our set to be non-smooth. We assume

n,1

50 n

We consider now the tangent hyperplane at x and the isohyperplane fy jcy = cxg. According to Proposition 2 we  know that there is at least an angle of  D between the corresponding normal vectors. We need the following obvious lemma.

n jrF xy  = fy 2 Lemma 5 Let Bb rF xx and cy  cxg B x; r be the set of better points in the ball, that are on the right side of the tangent hyperplane. This implies vol Bb  2 vol B x; r:
In the following we want to estimate how much progress we could expect in each step. Lemma 6 Suppose n

A  0 and h  0

Bp = y 2 Rnjcx  cy  cx ,  pr n vol Bp  1 2 vol B x; r: 4

: Then we get

Proof: Analogous to the proof of Lemma 4.

 2 R+ h is log-cancave and 9c1 ; c2 2 R+ 8x 2 Rm 8y 2 Bx; c1  hx  1 + c2jx , yj (this hy is a Lipschitz condition for log h, which guarantees smoothness of h) h is also a density function of every (n-1)-dimensional
subspace But these conditions do not put severe restrictions on the density function, since (with minor technical changes) all the most important distributions like the exponential and the normal distributions meet these requirements. As usual we also assume the existence of a given initial feasible point xf (i.e. P Axf  b  ). Due to the unboundedness of M in every positive direction we might assume that the cost vector c has all strictly positive components. After p rescaling, if necessary, we may also assume that c = ~ = n. 1 To simplify the notation we may further assume that the row vectors of the matrix A are unit vectors. We dene M = fx 2 n jP Ax  b  g. One can show that F x = P Ax  b is a log-concave function [14], so that M is convex and ts into our framework. The following theorem summarizes the running time of the random greedy algorithm.

there is a constant bound for each component of h (i.e. hi x 0, iff 0  xj   for each j and a constant

As a corollary we get a lower bound on the expected step size. Corollary 1 Suppose x is a point at the border of M and   = cx , miny2M cy. Let = D and r = 50Dpn . The process of choosing a random
in B x; r hits a point point y 2 M with ~y  ~x ,  pr with probability of at c c n least

   =  D .

Thus, we have an analysis of one basic step of the algorithm. We will perform the algorithm in stages, taking care of the dependence of the radius on . The proof is technical and omitted from this abstract. In the following subsection we will give an application example from stochastic programming.

5.1. Application Example: Probabilistic Constrained Programming


The following problem is a standard problem in stochastic optimization [17]. We are given a m  n matrix A with real entries and a random vector b 2 Rm with density h.

Theorem 4 The random greedyo algorithm needs at most


n M 2:5 m min miny22 cy ; cxf O n calls of the membership oracle of M to get a feasible approximate solution z with cz  1 +  miny2M cy and P Ax  b  :

First, we prove a bound on the smoothness of M . Lemma 7

9c 8r  1 8  cn1:5 mr2 M

is ; r-smooth.

The proof of this proposition requires a sequence of lemmata. Before we start we state a theorem [4], called the isoperimetric inequality, which is useful in the analysis.

n be a convex body and h be a logTheorem 5 Let J concave function dened on J and  the induced measure. Let S1 ; S2 J , t  distS1 ; S2  = maxx2S1 ;y2S2 jx , y j2 and d  diam J  = maxx;y2J jx , y j2 . If B = J n S1 S2 , then

With the help of the last two lemmata and Lemma 1 the proof of Lemma 7 is complete. Thus, after the verication of smoothness we could use the analysis of the last section. But, since we get again a diameter of the set M fz j~ z  ~ xg is related to the objective function value 1 1 p (M fz j~ z  ~ xg B 0; 2~ x), we could use once 1 1 1 again a dynamic programming approach. The analysis is tedious and omitted in this abstract. Remarks: If the componets of the vector b are independently distributed and if the density is bounded below by a constant, we can improve the running time of our al
n miny2M cy;cxf o gorithm to O n2 min by a direct 2 bound on the length of the gradient and the second derivative without using the isoperimetric inequality. The method can be applied in a similar fashion to the Component Commonality Problem [17], where the costs of buying raw materials should be minimized, while the customer demands have to be satised with a certain xed probability [10]. Our method yields also in this case superior running time bounds to the results known so far [9]. Furthermore, our method and analysis are much simpler than those in [9], that are based on the analysis the conductance of a certain Markov chain. Moreover, our method allows Hit and Run steps (which [9] does not), a feature that is very likely to speed up the algorithm a lot in practice [18].

minS1 ; S2   1 d=tB : 2


We are going to use the isoperimetric inequality in [4] to get a lower bound on the length of the gradient, since this gradient is related to a n , 1-dimensional surface area inside a convex body.

Ai = ith-column of A @F ix = j Aji Fj h. Therefore c @x R jrF xj2  j Fj hj j2 : Let  be the (n , 1)-dimensional 2 2 surface area of Sx = fy jy  Axg (inside the positive orR thand). j Fj hj j1 is obviously the integral of h over . nR o R R 2 h  diamB min M h; BnM h with B = fy 2  Rmjhy 0g is a direct implication of the isoperimetric inequality 5. Because there is a constant bound for p each component, we obtain diamB    m for the constant  2 R+ (see Assumptions). Therefore, observing R 1; it follows jrF xj1  pcm for a conSx h = stant c 2 + . Applying the standard relationship between the L1 and the L2 norm we get the result of the lemma. 2

c jrF xj2  m for a certain constant c 2 R+. Proof: Let Fj = fy 2 Rm jyj = Axj and y  Axg. and i; 1  i  n be xed.R We get with P
Lemma 8

6. Smoothing of convex sets


In this section we will describe a method to make certain convex sets smooth in order to apply our local search technique. This method does not apply to arbitrary convex sets, therefore we assume two conditions to be true for the given convex set S .

Now, we prove an upper bound on the norm of the Hessian of F . F is almost everywhere twice differentiable and we want to consider jHF xj2 = maxjzj;jyj=1 jz T HF xy j with x 2 M . Lemma 9

jHF xj = On1:5 . Proof: (Sketch) Let i; j; 1  i; j  n be xed.  R According !


to the last lemma we have

S is up-monotone (i.e. x 2 S and y  x (componentwise) implies y 2 S S Rn ) +

H xij =

O1

and we get with g y  = y because of our Lipschitz-assumptions on the density h. As a consequence we obtain jHF xj2 = maxjzj;jyj=1 jz T HF xyj  On1:5 : and the lemma follows. 2

h Fk @xj R j  @ Fk h = +
Ac @xj
P

k aki

The problem is again to optimize a given linear cost function c over S , while we assume to have an initial feasible point xf . This problem has a lot of interesting applications, in fact all the application examples described earlier are special cases of this general problem. The idea to get a smooth set from a given up-monotone set is easy to describe. We will consider only a subset S 0 of points x of S with the following property: the volume of the intersection of a certain cube with center x and S should be at least a constant. This set is provably smooth. Furthermore, the set deviates not very much from S (if we

choose the cube to be small enough), so that optimizing over S yields a good solution for S . In the following we will give bounds on the smoothness and tackle the problem of computing the volume of the intersection of the cube and S . The following theorem summarizes the results of this section. Theorem 6 Suppose 2 + and xf 2 S . The random greedy algorithm described below applied to a suitable smoothed set S 0
S yields a z 2 S with cz  n o 4:5 1+  miny2S cy in O n 2 min miny2S cy ; cxf calls 2 to the membership oracle of S . It is again easy to see that we might assume that every component of the cost vector p positive and, after rescaling, is we might assume c = ~ = n (the rescaled set is still up1 monotone and contained in the positive orthand). Furtherp more, D = 21xf  is an upper bound on the diameter of S fxj1x  1xf g. We will describe our smoothing procedure rst. Suppose er ; : : : ; er is an ONB of n with er = ~ =pn and C = 1 1 n 1 fx 2 n j, 2n  xi  2n g for a given 2 + is the cube with center 0 and sidelength =n. Let C r = er ; : : : ; er C 1 n be the image of C of the linear mapping L : ei ! er . Thus, i C r is just the rotated cube of side length d = n with center axis er ; : : : ; er . Suppose x 2 n . We dene C r x as the 1 n r S rotated cube with center x and F x = voln CCrx   with vol  x

This condition implies that there is at least an angle of

p between the normal vector and the cost vector ~ = n. It is a 1 necessary condition for the smoothness of F , but if it does
not hold we are near optimal anyway, as we will see later. Lemma 10 Suppose (4) holds. This implies that every border plane of C r x and every tangent hyperplane of S include at least an angle of pn . Proof: Easy calculation using the up-monotonicity of S , i.e. the components of a normal vector of a tangent hyperplane have the same sign. 2 To get a suitable bound on the smoothness of M we consider again the length of the gradient at an arbitrary border point x 2 M . Lemma 11 jrF xj2 the rotated cube). Proof: Let

pn

n,1 d

(d is the sidelength of

the level sets M = fx 2 n jF x  g: One can show by an easy application of the B RUNN -M INKOWSKI Theorem [2], that M is convex, so that these denitions t into our framework of smooth sets. 2

 be the n , 1-dimensional surface between C r x S and C r x n S . We denote the normalvector of the tangent hyperplane of S (in direction of S) by vt . R This implies rF =  vt (as a vector). Because S is upmonotone, every component of vt is non-negative. We get jrF xj1 = voln,1 , since vt is a unit vector according to the L2 norm. The isoperimetric inequality [4] implies voln,1   diam1 r x min f dn ; 1 , dn g : ThereC
n,1 d 1 fore we get jrF xj2  pn jrF xj1 = 2 n : maxjzj;jyj=1 jz T HF xyj.
Lemma 12 Next, we will give a bound on the Hessian

jHF xj =

We will consider the problem of determining the membership of M of an arbitrary point x 2 n . Because computing the volume of M C r x is in general hard [4], we use a sampling approach instead. We will generate logc n points in C r x uniformly at random and say x 2 M , if at least a fraction of these points hit S . Using Chernoff bounds [12] it is easy to see that we can decide the membership with an error probability of O1=nc . We make a small error here (i.e. we compute membership of M + with a small constant instead of M ), but this is not important since we will show the smoothness of every level set M with 0 1. To make the analysis easier we replace the surface of S by a C 1 curve. We can do this without loss of generality, since this approximation can be made arbitrarily exact. For the smoothness analysis of M we consider the following special case rst. Suppose h is the normal vector of a tangent hyperplane of S . We assume the following condition to be true.

jHF xj  dn,2n1:5 .

Proof: (Sketch) The entries of the Hessian are related to the n , 2-dimensional border lines of S on the facets of C r x. Due to convexity of S we get an upper bound of these by considering the n , 2-dimensional surface area of these facets. Using the lower bound on the angle between tangent hyperplanes and the facets of the cube we can derive the desired result. The details are omitted from this abstract. 2 Proof of Theorem 6: (Sketch) Under the assumption that we never encounter a point y 2 C r x, where the assumption (4) does not hold, our analysis of the random greedy algorithm in the last section runs through analogously. In this case it is easy to see that the result of Theorem 6 is an implication of the general result in section 5. If we encounter a point, where (4) does not hold, we can argue, using a similar proof as in Lemma 2, that we are already near optimal.2

jh~ =pnj  1 , 1

(4)

7. Current Research and Open Problems


We have described a very simple approach with good running time bounds in many application examples. One interesting question is of course, whether these easily implementable methods have any value in real world problems. There are some very promising empirical results of Hit and Run Algorithms for Global Optimization [18] and we are currently testing our approach. On the theoretical side there are still many open questions. The notation of smooth sets seems to be applicable in a lot of different areas and we are currently working on extending our results to optimize general convex functions over convex smooth sets. There are also interesting application examples, like the two stage programming problems in stochastic optimization, that recently gained interest, when applied to problems in Computational Finance. Moreover, it seems to be natural to use a ball instead of a rotated cube for smoothing convex sets as described in the last section. Intuitively, this procedure should also yield a smooth set. There is, however, the problem of bounding the second derivative that is related to the n , 2-dimensional volume of the intersection of the convex set with the ball, which seems to be hard. At the moment, we do not know of any method to get suitable bounds.

[12] R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, 1995. [13] J. Nesterov and A. Nemirovsky. Interior Point Polynomial Methods in Conex Programming: Theory and Applications. SIAM, Philadelphia, 1994. [14] M. Prekopa. Numerical Solutions to Probabilistic Constrained Programming Problems in Ermoliev, Yu.; Wets, R.: Numerical Techniques for Stochastic Optimization. Springer, 1980. [15] T. Terlaky. Interior point methods of mathematical programming. Kluwer Academic Publishers, 1996. [16] P. Vaidya. Speeding up linear programming using fast matrix multiplication. Proc. of 30th FOCS, 1989. [17] R. Wets. Stochastic Programming in Handbooks of Operations Research and Management Science Vol. 1. North Holland, 1989. [18] Z. Zabinsky and et al. Improving hit-and-run for global optimization. Journal Global Optimization 3, 1993.

References
[1] D. Applegate and R. Kannan. Sampling and integration of near log-concave functions. Proc. of 23th ACM STOC, 1990. [2] M. Berger. Geometry I. Springer Verlag, 1980. [3] G. Dantzig. Linear programming under uncertainty. Management Science 1, 1955. [4] M. Dyer and A. Frieze. Computing the volume of convex bodies: A case where randomness provably helps. Proceedings of the Symposium on Applied Math, 1991. [5] M. Dyer, A. Frieze, and R. Kannan. A random polynomial time algorithm for approximating the volume of convex bodies. Journal of the ACM 38, 1991. [6] M. Gr tschel, L. Lovasz, and A. Schrijver. Geometric Alo gorithms and Combinatorial Optimization. Springer Verlag, 1988. [7] J. Jayaraman, J. Srinivasan, and R. Roundy. Procurement of common components in a stochastic environment. IBM Technical Report, 1992. [8] R. Kannan, L. Lovasz, and M. Simonovits. Random walks and an n5 volume algorithm. Random Structures and Algorithms, 1997. [9] R. Kannan, J. Mount, and S. Tayur. A randomized algorithm to optimize over certain convex sets. Mathematics of Operations Research 20, 1995. [10] R. Kannan and A. Nolte. A fast random greedy algorithm for the component commonality problem. Proc. of ESA, 1998. [11] M. Luby and R. Nisan. Positive linear programming. 25th ACM STOC, 1992.

You might also like