Stoch Load Bal r1
Stoch Load Bal r1
Stoch Load Bal r1
Abstract
This paper considers stochastic optimization problems whose objective functions involve powers of
random variables. For a concrete example, consider the classic S TOCHASTIC `p L OAD BALANCING
P ROBLEM (S TOCH L OAD BALp ): There are m machines and n jobs, and we are given independent ran-
dom variables Yij describing the distribution of the load incurred on machine i if we assign job j to it.
The goal is to assign each job to the machines in order to minimize the expected `p -norm of the total load
incurredP overPthe machines. That is, letting Ji denote the jobs assigned to machine i, we want to mini-
mize E( i ( j∈Ji Yij )p )1/p . While convex relaxations represent one of the most powerful algorithmic
tools, in problems such as S TOCH L OAD BALp the main difficulty is to capture such objective function in
a way that only depends on each random variable separately.
In this paper, show how to capture p-power-type objectives in such separable way by using the L-
function method. This method was precisely introduced by Latała to capture in a sharp way the moment
of sums of random variables through the individual marginals. We first show how this quickly leads to a
constant-factor approximation for very general subset selection problem with p-moment objective.
Moreover, we give a constant-factor approximation for S TOCH L OAD BALp , improving on the recent
O( lnpp )-approximation of [Gupta et al., SODA 18]. Here the application of the method is much more
involved. In particular, we need to prove structural results connecting the expected `p -norm of a random
vector with the p-moments of its coordinate-marginals (machine loads) in a sharp way, taking into ac-
count simultaneously the different scales of the loads that are incurred in the different machines by an
unknown assignment. Moreover, our starting convex (indeed linear) relaxation has exponentially many
constraints that are not conducive to integral rounding; we need to use the solution of this LP to obtain a
reduced LP which can then be used to obtain the desired assignment.
1 Introduction
This paper considers stochastic optimization problems whose objective functions are related to powers of
sums of random variables. For a concrete example, consider the classic S TOCHASTIC `p L OAD BALANCING
P ROBLEM (S TOCH L OAD BALp ): There are m machines and n jobs, and we are given independent non-
negative random variables Yij (job sizes) describing the distribution of the load incurred on machine i if we
assign job j to it. The goal is to, only knowing these distributions, assign each job to the machines in order
to minimize the expected `p -norm of the realized total load incurred over the machines. That is, letting Ji
denote the jobs assigned to machine i, we want to minimize
" #1/p
X X X p
E Yij =E Yij , (1)
j∈Ji i∈[m] p i∈[m] j∈Ji
P P
if p ∈ [1, ∞), and to minimize the makespan Ek( j∈Ji Yij )i∈[m] k∞ = E[maxi j∈Ji Yij ] when p = ∞.
Notice the entire assignment is done up-front without knowledge of the actual outcomes of the random
variables, and hence there is no adaptivity. We remark that the `p -norms interpolate between considering
only the most loaded machine (`∞ ) and simply adding the loads of the machines (`1 ), and have been used
in this context since at least the 70’s [CW75, CC76], since in some applications they better capture how
well-balanced an allocation is [AAG+ 95]. This classic problem has been widely studied in its stochas-
tic [KRT00, GI99, GKNS18, Pin04], deterministic [LST90, AERW04, AE05, KMPS09, MS14], and online
versions [AAG+ 95, AAS01, BCK00, Car08, CFK+ 11, Mol17]. See [GKNS18] for a comprehensive dis-
cussion and literature review on stochastic load balancing, most relevant for us.
The deterministic versions of such problems can typically be well-approximated through the use of con-
vex programs; for example, this method has provided constant-factor approximations for the deterministic
version of S TOCH L OAD BALp [AE05, KMPS09, MS14]. However, in the stochastic version of these prob-
lems the situation is much more complicated, since in principle terms like (1) require multi-dimensional
integration due to the expectation involving powers of sums of random variables.
Thus, the main element for using convex programs to tackle such stochastic problems is to be able to
approximately capture the objective function in a way that only depends on each random variable individu-
ally.
P The first idea is to replace the random variables by just their expectation, for example reducing (1) to
k( j∈Ji EYij )i∈[m] kp . Unfortunately, even basic examples show that too much is lost and this simple proxy
is not enough. For the special case of S TOCH L OAD BALp with p = ∞ and identical machines (i.e., the item
sizes are independent of the machines), [KRT00] proposed to use the so-called effective size [Hui88] of a
job as a proxy instead of its expectation: For a random variable X and parameter ` ∈ (1, ∞), its effective
size (at scale `) is
1
β` (X) := · ln E eX·ln ` ; (2)
ln `
for ` = 1, it is defined β1 (X) := EX. Using this notion, [KRT00] obtained the first constant-factor
approximation for this special case of S TOCH L OAD BALp . They also use it to provide approximations for
stochastic bin-packing and knapsack problems (all packing- or `∞ -type problems). Only recently, Gupta
et al. [GKNS18] managed to use this fruitful notion to obtain a constant approximation for the unrelated
machines case (but still p = ∞).
However, suitable notions of effective size have not been used for p-power-type functions. For example,
for S TOCH L OAD BALp with general p only an O( lnpp )-approximation is known, also due to [GKNS18], and
relies on other techniques (expected size as a proxy plus Rosenthal’s Inequality). Oddly, this approximation
1
ratio goes to infinity as p → ∞, despite the constant-factor approximation known for p = ∞. This indicates
our current shortcomings in algorithmic and analytical tools for dealing with such moment-type objectives.
where in the approximation only constant factors are lost. This result shows that the moment of sums of
random variables does not depend much on a stochastic interaction between them, only on the interaction of
the deterministic proxies νp,ε (Xi ). One difficulty is that this deterministic interaction has an implicit form
that depends on setting the parameter ε in the “right” way.
Quick application: Subset selection with p-moment objective. Nonetheless, to show how the L-function
method yields in a simple way approximations in the context of optimization problems, we consider the fol-
lowing general subset selection problem with moment objective (S UBSET S ELECTIONp ): There are n items,
the value of item j is stochastic and given by the non-negative random variable Vj , and these random vari-
ables are independent. Given a subset P ⊆ {0, 1}n of the boolean cube representing the feasible sets of
items, the goal is find a feasible set that maximizes the p-moment of the sum of the selected items’ values:
X p 1/p
max E V j xj .
x∈P
j
Using the L-function method, we show that one can reduce this problem to that of optimizing a deter-
ministic linear function over P.
Theorem 1. Suppose there is a constant approximation for optimizing any non-negative linear function
over P (i.e., for any non-negative vector c ∈ Rn+ , we can find a point x̄ ∈ P satisfying hc, x̄i ≥ Ω(1) ·
maxx∈P hc, xi). Then there is a constant approximation for S UBSET S ELECTIONp over P for any p ∈
[1, ∞).
The proof is very simple: By standard binary search arguments, we can assume we know the optimal
objective value OPT. Then based on equation (3), set ε = OPT. The “⇒” direction P of this equation
essentially shows that to get value ≈ OPT it suffices to find a solution x ∈ P with j νp,ε (Vj )xj ≥ 1
(a deterministic linear feasibility/optimization problem), and the direction “⇐” essentially shows that the
optimal solution satisfies this inequality, thus such solution can indeed be found. We carry this out more
formally in Appendix A.
1
The name L-function is borrowed from [PnG99].
2
Another application: S TOCH L OAD BALp . In our next and main result, we show that S TOCH L OAD BALp
admits a constant approximation for all p ∈ [1, ∞], improving over the O( lnpp )-approximation of [GKNS18].
Theorem 2. For all p ∈ [1, ∞], the S TOCHASTIC `p L OAD BALANCING P ROBLEM admits a constant-
factor approximation.
In this case the application of the L-function method is much more involved. The first issue is that the
objective function (1) is not of the form that can be tackled directly by the L-function method. To connect
the two, we prove a bound relating the expected `p -norm of the sum of random variables andP the p-moments
of these sums. One direction is easy: given independent RVs {Xij }ij and letting Si = j Xij , by the
1/p
concavity of x 7→ x Jensen’s inequality gives
1/p X 1/p
p p
Ek(S1 , . . . , Sm )kp ≤ Ek(S1 , . . . , Sm )kp = ESi , (4)
i
so the expected `p -norm is upper bounded by the moments ESip . However, the other direction (with con-
stant factor loss) is not true in general. Nonetheless, we prove such converse inequality under additional
assumptions on the moments ESip (that are discharged later); this is done in Section 3.1.
Given this result, the idea is to write an assignment LP with additional linear constraints based on the
νp,εi (Yij )’s to control the moment of the loads in each of the machines, and thus the objective function (1).
But the second issue appears: even if we assume to know the optimal objective value OPT, we do not know
the moment of the loads in each machine in the optimal solution, needed to set the parameters εi . Thus, we
need to write a valid constraint for each of the possible combination of εi ’s. The general theory behind it
is developed in Section 3.2, and the LP is presented in Section 4.1. Addressing a similar issue in the case
p = ∞ was a main contribution of [GKNS18] and we borrow ideas from it, though in the case p < ∞ they
need to be modified to avoid super-constant losses, see discussion in Section 3.2.
Finally, as indicated, this LP has a large (exponential in m) number of inequalities, and thus it seems
unlikely one can convert a fractional solution into an integral assignment satisfying all of the constraints.
Thus, again inspired by [GKNS18], we use the optimal solution of this LP to obtain an estimate of the “right”
εi ’s for each of the machines and write a reduced LP based on them. This reduced LP is essentially one for
the G ENERALIZED A SSIGNMENT P ROBLEM, for which one can use the classic algorithm by Shmoys and
Tardos [ST93] to obtain an approximate integral assignment.
We remark that even in the deterministic version of the problem previous approximations relied on
convex programs [AE05, KMPS09, MS14], so our techniques also give the first LP-based approach in this
case.
1.2 Notation
Unless specified,
P the letter p always denotes a value in (1, ∞]. Given a vector v ∈ Rm , its `p -norm is defined
p 1/p
by kvkp := ( i vi ) . Given a subset of coordinates I ⊆ [m], we use vI = (vi )i∈I to denote the restriction
of v to these coordinates. When computationally relevant, we assume that the input distributions are discrete,
supported on a finite set, and given explicitly, i.e., for each x in the support we are given Pr(X = x).
3
2 The L-function method
Definition 1. For any random variable X and parameters p, ε > 0, functional νε,p is defined as
X p
1
νε,p (X) := ln E 1 + .
p ε
To simplify the notation, we omit the subscript p in νε,p . As mentioned above, the main property of this
functional is the following:
p
ε∗
≤ ES p ≤ (eε∗ )p .
10
Proof. This is P
the development
Q in page 37 of [PnG99], which we reproduce for convenience. Using the
inequality 1 + i ai ≤ i (1 + ai ) valid for non-negative ai ’s, we have
X p p Y p
1 1X 1
E Xj ≤E 1+ Xj ≤E 1 + Xj
ε ε ε
j j j
p
Y 1 P
≤ E 1 + Xj = ep j νε (Xj ) .
ε
j
4
3.1 Relating expected `p -norms and moments
The goal of this section is to relate the expected `p -norm Ek(S1 , . . . , Sm )kp of a random vector S =
(S1 , . . . , Sm ) and the coordinate moments ESip .PAs mentioned in the introduction, Jensen’s inequality
(inequality (4)) gives the upper bound EkSkp ≤ ( i ESip )1/p ; in this section we prove a partial converse to
this inequality.
To see the difficulty in obtaining such converse suppose S1 is a Poisson random variable with parameter
λ = 1, and S2 , . . . , Sm = 0. It is known that ES1p ≈ ( lnpp )p , and the Jensen’s based inequality (4) gives
the upper bound EkSkp ≤ ( i ESip )1/p ≈ lnpp . However, the actual expected norm is EkSkp = ES1 = 1.
P
Thus, in general it is not possible to obtain a converse to the Jensen’s based inequality without losing a
factor of Ω( lnpp ).2 Nonetheless, we show that one can obtain tighter bounds as long as none of the Si ’s
contributes too much to the sum i ESip (and each Si is a sum of “small” random variables). For that we
P
need the following sharp moment comparison from [HMS01], which is a vast generalization of Khinchine’s
Inequality; we simplify the statement for our purposes, and for a RV X use |||X|||p := (EX p )1/p to denote
its p-th moment.
To upper bound the variance of Sip , first note that Var(Sip ) = ESi2p − (ESip )2 ≤ ESi2p . To upper bound
the right-hand side, the idea is to use the moment comparison Theorem 4 to obtain ESi2p . 2p (ESip )2 ≤
2p ESip (the last inequality by assumption). More precisely, for each i let Mi = maxj Xi,j denote the largest
component of Si (in each scenario); applying Theorem 4 with q = 2p we have
1/2p 2p h i
1/p 2
ESi2p ≤ c2p · 22p (ESip ) + EMi2p ≤ c2p · 24p (ESip ) + EMi2p ,
2
This unavoidable gap of lnpp between the moment (ES1p )1/p and the expectation ES1 is one of the losses in the O( lnpp )-
approximation of [GKNS18], which appears from the use of Rosenthal’s Inequality.
5
where the last inequality uses (a+b)q ≤ (2 max{a, b})q ≤ 2q (aq +bq ). Moreover, the assumption ESip ≤ 1
implies that (ESip )2 ≤ ESip , and the assumption Xi,j ∈ [0, 1] implies Mi2p ≤ Mip ≤ Sip . Therefore, P we
obtain that Var(Sip ) ≤ 2c2p · 24p · ESip . Moreover, by assumption µ ≥ α1p , and hence µ2 ≥ α1p p
i ESi .
Employing these bound, we obtain that the right-hand side of (5) is at most 8αp · c2p · 24p . But for α a
sufficiently small constant (1/(c2 28 ) suffices), this upper bound is at most 21 . This concludes the proof.
We will also need the following corollary, which is essentially Claim 3 of [GKNS18] with a different
parametrization; its proof is presented in Appendix B.
2. This implies that for Ω(m) indices i we have ν̃100OPT/m1/p (Si ) ≥ 1; recall this is around the “right”
condition to apply the results from the L-function method
10p OPTp
3. More precisely, Lemma 2 implies that for each such i we have ESip ≥ ( 100OPT
10m1/p
)p = m
p p
4. Assuming the requirements of Lemma 3 are met, we can use it to obtain EkSkp > 14 (Ω(m) 10 m OPT 1/p
) >
OPT (the last inequality holds if we adjust the constants properly). This reaches the desired contra-
diction.
In fact, one can apply this argument to any subset K ⊆ [m] of coordinates to obtain that
XX
ν100OPT/|K|1/p (Xi,j ) ≤ |K|. (6)
i∈K j
Therefore, after guessing OPT, we can write the following IP enforcing these restrictions and be assured
that the optimal solution is feasible for it:
XX
ν100OPT/|K|1/p (Yi,` ) · xi,` ≤ |K| ∀K ⊆ [m]
i∈K `
x ∈ assignment polytope ∩ {0, 1}m×n .
In turn, suppose we can use this IP to obtain an integral assignment satisfying (6) (approximately). Then
we can try to use the moment control from Lemma 1 and the Jensen’s-based inequality (4) to reverse the
process and argue that our solution has expected `p -load O(OPT). For that, since the L-function method
6
is most effective when ε is such that ν̃ε (Si ) ≈ 1, we can use the following idea inspired by [GKNS18]:
p
assign to each machine i a size k such that ν̃OPT/k1/p (Si ) ≈ 1 (or equivalently, ESip ≈ OPT
k ); since by (6)
there are at most ≈ k machines assigned to size k, we can hope to prove that i ESiP . OPTp and from
P
the Jensen’s based inequality obtain EkSkp . OPT. Unfortunately this argument is not enough because
p
the former inequality is not true: for j = 1, . . . , ln m we could have 2mj machines with ESip ≈ OPT
m/2j
, thus
m P p P m OPTp p
assigned to size 2j , and get i ESi ≈ j 2j m/2j ≈ (ln m)OPT .
Multi-scale bound. The logarithmic loss in the previous example comes form the fact we grouped the
machines with similar scale of moment ESip and applied the upper bound (6) separately for each group. To
avoid this loss we will then obtain a more refined upper bound that takes into account all scales simultane-
ously.
Theorem 5. Consider a scalar Ø and P a sufficiently small constant α. Consider independent random vari-
ables {Xi,j }i,j in [0, αØ]. Let Si = j Xi,j , and suppose EkSkp ≤ Ø. Consider the scaled down variables
Xi,j
X̃i,j := 44 . Then for any sequence of values v1 , . . . , vm ≥ (1/α)p , we have
X 1 X
νØ/v1/p X̃i,j − 1 ≤ 3. (7)
vi i
i j
(For example, when vi = m for all i this corresponds roughly to the bound (6) with K = [m].)
Ø P
Proof. To simplify the notation let εi := 1/p and define S̃i := j X̃i,j . The high-level idea is to show that
vi
if (7) does not hold then Lemmas 2 and 3 imply that EkSkp > Ø, contradicting our assumption. To apply
the former lemma effectively we need to break up the sums S̃i into subsums with νεi -mass ≈ 1; for that, we
need to take care of X̃i,j ’s with big νεi -mass first.
For each machine i, let Bi be the set of indices j such that νεi (X̃i,j ) > 1 (“big items”). We need to show
that the big items do not contribute much to (7).
P 1 P
Claim 1. i vi j∈Bi νεi (X̃i,j ) ≤ 1.
p
≤ (4Ø)p , so passing to the tilde version and restrict-
P
Proof. First, from Corollary 1 we have that i,j EXi,j
ing to the big items we have
X p Øp
EX̃i,j ≤ p. (8)
11
i,j∈Bi
p
Moreover, for the big items we can relate νεi (X̃i,j ) and EX̃i,j . For that, again recall (a+b)p ≤ (2 max{a, b})p ≤
p p p
2 (a + b ); so expanding the definition of νεi we have for any random variable X
p p p
1 p X 1 X 1 X 1 1
νεi (X) ≤ ln 2 1 + E ≤ 1 + ln 1 + E ≤1+ E =1+ EX p ,
p εi p εi p εi p εpi
where the last inequality uses that ln(1 + x) ≤ x for all x. Moreover, for any big item we have by definition
εi p p
νεi (X̃i,j ) > 1, so Lemma 2 gives E(X̃i,j )p ≥ ( 10 ) , or equivalently 1 ≤ 10
εp
E(X̃i,j )p . Applying this bound
i
p
and the displayed inequality to X̃i,j , we can relate νεi (X̃i,j ) to EX̃i,j :
p
11p 11p vi
10 1 1 p p
νεi (X̃i,j ) ≤ + · E( X̃i,j ) ≤ · E(X̃i,j ) = · E(X̃i,j )p .
εpi p εpi εpi Øp
7
Dividing by vi , adding this inequality over all big items, and employing (8) then concludes the proof.
Now assume by contradiction assume that (7) does not hold. Given P this, and using the previous claim,
S 1 P
if we remove the big items i Bi from consideration we still have i vi ( j ∈B / i νεi (X̃i,j ) − 1) > 2. So
ignore the big items; to simplify the notation, we just assume there are no big items. Since νεi (X̃i,j ) ≤ 1
for the remaining items, we can partition the sum S̃i = j X̃i,j into subsums S̃i0 , S̃i1 , . . . , S̃iki such that
P
S̃iw has νεi -mass in [1, 2] for all w ≥ 1 and the exceptional sum S̃i0 has νεi -mass P at most 1; formally we
w w
consider a partition J0 , J1 , . . . , Jki of the index set of {X̃i,j }j such that S̃i := j∈Jw X̃i,j has ν̃εi (S̃i ) :=
0
P
j∈Jw νεi (X̃i,j ) ∈ [1, 2] for all w ≥ 1, and ν̃εi (S̃i ) ≤ 1.
Again kS̃kp can be lower bounded by ignoring the exceptional sums {S̃i0 }i and assigning each of the
other sums to their own coordinate, so
X 1/p
w p
EkS̃kp ≥ E (S̃i ) . (9)
i,w≥1
We now lower bound the right-hand side using Theorem 3. First, using Lemma 2 we have E(S̃iw )p ≥
εi p Øp 1 w w p Øp 1
( 10 ) = 10p · v . By scaling the S̃i ’s down if necessary, assume this holds at equality: E(S̃i ) = 10p · v .
i i
Adding over all i, w,
X Øp X ki
E(S̃iw )p = . (10)
10p vi
i,w≥0 i
X 1 Xki X 1 X
w
ν̃εi (S̃i ) ≥ νεi (X̃i,j ) − 1 > 2.
vi vi
i w=1 i j
P ki
Since the ν̃εi ’s in the left-hand side are at most 2, this implies that i vi > 1. So applying this to (10) we
get
X Øp
E(S̃iw )p ≥ p .
10
i,w≥0
Using (9) and recalling that S = 44 · S̃, we get EkSkp > Ø, which contradicts the assumption EkSkp ≤ Ø.
This concludes the proof.
8
Converse bound. Crucially, we need a converse to the previous theorem: if inequalities (7) are satisfied,
then the `p -norm of the loads is at most O(Ø). Indeed, one can show the following (with the additional
control of the `∞ -norm).
EkSkp ≤ O(Ø).
We sketch a proof under simplifying assumptions (in which case we do not even need the condition
EkSk∞ ≤ O(Ø)); while we will actually require a modified version of this theorem, the simplified proof is
helpful to provide intuition.
Proof
P 1 ideaP of Theorem 6. Assume the following slightly stronger version of (7) holds forPall sequences (vi )i :
i vi ν
j Ø/v 1/p X̃i,j ≤ 3. Applying this to the sequence (v̄i )i where v̄i is such that j νØ/v̄ 1/p X̃i,j ≈
P 1i p
i
1, we get i v̄i . 3. , By Theorem 3 Øv̄i ≈ ES̃ip , and so we get i ES̃ip . 3Øp , and inequality (4) then
P
gives EkSkp ≤ O(Ø), concluding the proof.
The issue with this theorem is that it will be hard to satisfy inequality (7) for all the allowed sequences
(vi )i later when we round our Linear Program. However, note that in the proof of this theorem we only
needed this inequality to hold for a single sequence (v̄i )i with specific properties, which will be easier to
achieve. We will abstract out the properties needed. Actually, for technical reasons (controlling the size of
the coefficients in the rounding phase of our algorithm) we will need to work with a capped version of ν:
In order to offset the loss introduced by this capping, we will also need a “coarse control” of the random vari-
ables (the result below holds without this coarse control if one uses ν instead of ν + ). Following [GKNS18],
we will also use the effective size (2) to control the `∞ -norm. This is then our main converse bound, whose
proof is deferred to Appendix C.
Theorem 7. Consider a scalar Ø and aPsufficiently small constant α. Consider independent random vari-
X
ables {Xi,j }i,j in [0, αØ], and let Si = j Xi,j , and X̃i,j = 44i,j . Suppose the following hold:
p
1. (`
Pp control)
+
There exists an integer sequence
p
P v̄11, . . . , v̄m ≥ (1/α) such that for each i, either
jν 1/p (X̃i,j ) ≤ 10 or vi = (1/α) , and i v̄i ≤ 5.
Ø/v̄i
p
≤ O(Ø)p .
P
2. (coarse control) i,j EXi,j
3. (`∞ control) There exists a sequence `¯1 , . . . , `¯m ∈ [m] such that j β`¯i (Xi,j /Ø) ≤ γ for all i, for
P
some constant γ, and for each ` ∈ [m] at most ` of the i’s have `¯i = `.
9
4 Stochastic `p Load Balancing: Algorithm and analysis
In this section we prove Theorem 2, namely we give a constant-factor approximation to problem S TOCH L OAD BALp
(please recall the definition of S TOCH L OAD BALp from Section 1).
Let OPT denote the smallest expected `p load (1) over all assignments of jobs to machines. The devel-
opment of the algorithm mirrors that of the previous section and proceeds in 3 steps:
1. First we write an LP that essentially captures constraints (7) in a fractional way, which from Theorem 5
we know to hold (after some truncation) for the optimal assignment (we also include a control on the
`∞ -norm using exponentially many constraints, as well the coarse control guaranteed by Corollary 1).
2. Then, based on a fractional solution x̄ of this LP, we write a reduced LP that is feasible (a crucial
point) and imposes the requirements of Theorem 7 in a fractional way. This reduces the exponentially
many inequalities for `p (and `∞ ) control to just one inequality per machine, by selecting the right
v̄i ’s (and `¯i ’s) based on x̄; this is done using the ideas in the proof sketch of Theorem 6.
3. Since this reduced LP is much more structured and has fewer constraints, we can use an algorithm
for the G ENERALIZED A SSIGNMENT P ROBLEM to find an integer approximate solution. Then from
Theorem 7 the corresponding assignment has expected `p -load O(OPT).
We make some simplifying assumptions. We consider the case p ∈ (1, ∞), since the case p = 1 is trivial
(just assign assign job j to the machine i that gives the smallest expected job size EYij ) and the case p = ∞
was solved in [GKNS18]. By using a standard binary search argument we assume throughout that we have
an estimate of the optimal value OPT within a factor of 2 (i.e., if our starting LP is feasible we reduce the
current estimate of OPT, and if it is infeasible we increase it). In fact, to simplify the notation we assume
we know OPT exactly: the error in the estimation translates directly to the constants in the approximation
factor.
4.1 Starting LP
As in [KRT00, GKNS18], we split the job into its truncated and exceptional parts: Let α be a sufficiently
small constant (with 1/α integral, to simplify things); we then define the truncated part Yij0 = Yij · 1(Yij ≤
αOPT), and the exceptional part Yij00 = Yij · 1(Yij > αOPT), where 1(E) is the indicator of the event E
(notice Yij = Yij0 + Yij00 ).
Our LP, with variable xij denoting the amount of job j assigned to machine i, is then the following (as
before we use tildes to denote the scaling Ỹij0 := Yij0 /44):
X
(EYij00 ) xij ≤ 2OPT (11)
i,j
X 1 X
+ 0
ν 1/p (Ỹij ) xij − 1 ≤ 3 ∀vi ∈ {1/αp , . . . , m}, ∀i ∈ [m], (12)
vi OPT/vi
i j
XX
βk (Yij0 /OPT) xij ≤ C · k ∀K ⊆ [m] with |K| = k, ∀k ∈ [m] (13)
i∈K j
10
where
P C is a sufficiently large constant, and the assignment polytope is the standard one {x ∈ [0, 1]n×m :
i xij = 1 ∀j}. Constraint (11) is borrowed from [GKNS18] and controls the contribution to the `p -norm
by the exceptional parts. Constraints (12) capture a weakened version of the bounds guaranteed by Theorem
5 (notice ν + ≤ ν); as mentioned earlier, what we gain from this weakening is a better control on the size
of the coefficients, important for the rounding step. Constraint (13) is also from [GKNS18] and controls the
`∞ -norm of the truncated part. Constraint (14) imposes the bound guaranteed by Corollary 1 and is only
required to control the loss incurred by using the capped quantity ν + instead of ν.
Lemma 2.3 of [GKNS18] shows that the optimal (integral) assignment satisfies constraints (11) and (13)
(notice that since k.k∞ ≤ k.kp , the loads of the optimal solution satisfies EkSk∞ ≤ OPT). Applying
Theorem 5 and Corollary 1 with {Xij }j representing the truncated part of the items assigned to machine i
by this solution and with Ø = OPT, we see that constraints (12) and (14) are also satisfied by the optimal
solution. Therefore, the LP is feasible.
About solving it in polynomial time: Notice that we can write the inequalities (12) by setting an auxiliary
variable zi with
1 X 0
zi ≥ νOPT/v1/p (Ỹij ) xij − 1 ∀vi ∈ {1/αp , . . . , m},
vi i
j
P
and replacing constraint (12) by just i zi ≤ 3. Thus, we can capture all constraints except (13) with a poly-
sized formulation. Since it is easy to see that we can separate inequalities (13) in poly-time (see Section
2.3 of [GKNS18]), we can use the ellipsoid method to solve the LP in polynomial time. Summarizing this
discussion we have the following.
2. Let `¯i ∈ [m] be the largest such that j β`¯i (Yij0 /OPT) x̄ij ≤ C, where C is the constant in constraints
P
(13) (such `¯i exists since constraint (13) implies that setting it to 1 satisfies this inequality).
11
The reduced LP then becomes:
X
(EYij00 ) xij ≤ 2OPT (16)
i,j
X
ν+ 1/p (Ỹij0 ) xij ≤ 2 ∀i ∈ I (17)
OPT/v̄i
j
X
β`¯i (Yij0 /OPT) xij ≤ 1 ∀i (18)
j
In addition, any integral assignment approximately satisfying constraints (17)-(19) fulfills the require-
ment of Theorem 7 for the truncated parts, and thus we can control their expected `p norm.
Lemma 5. Consider an integral assignment x ∈ {0, 1}n×m satisfying constraints (17)-(19) within a mul-
tiplicative factor of 5. For each i, let {Xi,j }j = {Yij0 xij }j (i.e., the truncated part of the jobs assigned to
machine i). Then {Xi,j }i,j , {v̄i }P ¯
i , and {`i }i satisfy the requirements of Theorem 7 with Ø = OPT.
In particular, letting Si0 = 0
j Yij be the load incurred on machine i by the truncated sizes of jobs
0
assigned to it, we have EkS kp ≤ O(OPT).
Proof. The second part of the lemma follows directly from Theorem 7, so we prove the first part. First,
from the definition of the truncation we have Xi,j ≤ αOPT. We show that Item 1 (`p control) in Theorem 7
holds; since x satisfies constraints (17) within a multiplicative factor of 5, and by the choice of the v̄i ’s, it
suffices to show i v̄1i ≤ 5.
P
We partition the indices i into 2 sets, depending on whether v̄i P hit the upper bound m or not: U<m =
{i ∈ [m] : v̄i < m} and Um = {i ∈ [m] : v̄i = m}. By definition i∈Um v̄1i ≤ m · m 1
= 1. For an index
i ∈ U<m , by maximality of v̄i we have that v̄i + 1 satisfies
X
Vi := νOPT/(v̄i +1)1/p (Ỹij0 ) x̄ij > 2,
j
and hence Vi − 1 ≥ 1. But since x̄ satisfies constraints (12), we have i∈U<m v̄i1+1 (Vi − 1) ≤ 3, and hence
P
P 1 p
i∈U<m v̄i +1 ≤ 3. Finally, since v̄i ≥ (1/α) ≥ 100 (since α is a sufficiently small constant), we have
1 1 P 1 P 1
v̄i ≤ 1.01 v̄i +1 and hence i∈U<m v̄i ≤ 4. This shows i v̄i ≤ 5.
12
Item 2 (coarse control) in Theorem 7 is directly enforced by constraint (19). To show that Item 3 (`∞
control) in Theorem 7 holds, we just need that for all ` ∈ [m], for at most ` of the i’s we have `¯i = `.
Since this is clearly true for ` = m, consider ` < m and suppose by contradiction that there there is a
P K ⊆ [m]
set of size ` + 1 such that `¯i = ` for all i ∈ K. By maximality of `¯i , for all i ∈ K we have
0
j β`+1 (Yij /OPT) x̄ij > C; adding this over all i ∈ K and using that x̄ satisfies constraint (13) for K, we
have
XX (13)
C · (` + 1) < β`+1 (Yij0 /OPT) x̄ij ≤ C · (` + 1),
i∈K j
Since the total size of a job equals its truncated plus its exceptional part, the previous lemmas and
triangle inequality give the following.
Corollary 2. Consider an integral
P assignment x ∈ {0, 1}n×m satisfying constraints (16)-(19) within a
factor of 5. Then letting Si = j Yij xij be the load incurred on machine i, we have EkSkp ≤ O(OPT).
Shmoys and Tardos [ST93] designed an algorithm that given any fractional solution to the above program
produces an integral assignment that satisfies (22) exactly, and satisfies constraints (21) with the RHSs
increased to Ai + maxj aij .
Notice that the reduced LP (16)-(20) is essentially an instance of GAP: the difference is that we have 2
cost-type constraints and 2 makespan-type constraints for some machines. But we can simply combine the
1 1
equations of the same type to obtain a GAP instance: add 2OPT of inequality (16) to (4OPT) p of inequality
1
(19) to form a single cost constraint with RHS 2, and add 2 of inequality (17) to (18) for each i ∈ I to obtain
a single makespan constraint constraint with RHS 2 (for i ∈ / I just keep the makespan constraint (18)).
Since this GAP instance is a relaxation of the LP (16)-(20) it is also feasible. Thus, consider any
fractional solution to this GAP instance and let x̃ be the integral assignment produced by Shmoys-Tardos
algorithm [ST93].
Lemma 6. The integral assignment x̃ satisfies all the constraints (16)-(19) within a factor of 2.
Proof. Notice that for this GAP instance Ai ≥ 1 and
+ 0 0
max aij ≤ max ν 1/p (Ỹij ) , β`¯i (Yij /OPT) ;
j OPT/v̄i
13
by construction νε+ ≤ 1 (this is the only motivation for introducing this capped version of ν) and since the
truncated sizes have Yij0 ≤ OPT
Yij0
1 1 ¯
β`¯i (Yij0 /OPT) = ¯ ln E exp ln `¯i ≤ ¯ ln eln `i = 1,
ln `i OPT ln `i
and so maxj aij ≤ 1 and hence Ai + maxj aij ≤ 2Ai for all i. Thus, by the guarantees of [ST93] x̃ satisfies
constraints (21) and (22) within a multiplicative factor of 2. The fact that all the coefficients are non-negative
then implies that x̃ satisfies the disaggregated
P constraints within a multiplicative
P factor of 4 (i.e., apply that
for non-negative uij ’s and vij ’s, ij (uij + vij )x̃ij ≤ 2 · 2 implies ij uij ≤ 4, and the same for the vij ’s).
This concludes the proof.
Then from Corollary 2 the assignment x̃ has expected `p load at most O(OPT). This proves Theorem 2.
References
[AAG+ 95] B. Awerbuch, Y. Azar, E. F. Grove, Ming-Yang Kao, P. Krishnan, and J. S. Vitter. Load bal-
ancing in the lp norm. In Foundations of Computer Science, 1995. Proceedings., 36th Annual
Symposium on, pages 383–391, Oct 1995.
[AAS01] A. Avidor, Y. Azar, and J. Sgall. Ancient and new algorithms for load balancing in the lp norm.
Algorithmica, 29(3):422–441, 2001.
[AE05] Yossi Azar and Amir Epstein. Convex programming for scheduling unrelated parallel machines.
In Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, STOC
’05, pages 331–337, New York, NY, USA, 2005. ACM.
[AERW04] Yossi Azar, Leah Epstein, Yossi Richter, and Gerhard J Woeginger. All-norm approximation
algorithms. Journal of Algorithms, 52(2):120 – 133, 2004.
[BCK00] Piotr Berman, Moses Charikar, and Marek Karpinski. On-line load balancing for related ma-
chines. J. Algorithms, 35(1):108–121, April 2000.
[Car08] Ioannis Caragiannis. Better bounds for online load balancing on unrelated machines. In Pro-
ceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08,
pages 972–981, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics.
[CC76] R. A. Cody and E. G. Coffman, Jr. Record allocation for minimizing expected retrieval costs
on drum-like storage devices. J. ACM, 23(1):103–115, January 1976.
[CFK+ 11] Ioannis Caragiannis, Michele Flammini, Christos Kaklamanis, Panagiotis Kanellopoulos, and
Luca Moscardelli. Tight bounds for selfish and greedy load balancing. Algorithmica,
61(3):606–637, 2011.
[CW75] Ashok K. Chandra and C. K. Wong. Worst-case analysis of a placement algorithm related to
storage allocation. SIAM Journal on Computing, 4(3):249–263, 1975.
[GI99] A. Goel and P. Indyk. Stochastic load balancing and related problems. In 40th Annual Sympo-
sium on Foundations of Computer Science (Cat. No.99CB37039), pages 579–586, 1999.
14
[GKNS18] Anupam Gupta, Amit Kumar, Viswanath Nagarajan, and Xiangkun Shen. Stochastic load bal-
ancing on unrelated machines. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Sympo-
sium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages
1274–1285, 2018.
[HMS01] Paweł Hitczenko and Stephen Montgomery-Smith. Measuring the magnitude of sums of inde-
pendent random variables. Ann. Probab., 29(1):447–466, 02 2001.
[Hui88] J. Y. Hui. Resource allocation for broadband networks. IEEE Journal on Selected Areas in
Communications, 6(9):1598–1608, Dec 1988.
[KMPS09] V. S. Anil Kumar, Madhav V. Marathe, Srinivasan Parthasarathy, and Aravind Srinivasan. A
unified approach to scheduling on unrelated parallel machines. J. ACM, 56(5):28:1–28:31,
August 2009.
[KRT00] J. Kleinberg, Y. Rabani, and E. Tardos. Allocating bandwidth for bursty connections. SIAM
Journal on Computing, 30(1):191–217, 2000.
[Lat97] Rafał Latała. Estimation of moments of sums of independent real random variables. Ann.
Probab., 25(3):1502–1513, 07 1997.
[LST90] Jan Karel Lenstra, David B. Shmoys, and Éva Tardos. Approximation algorithms for scheduling
unrelated parallel machines. Mathematical Programming, 46(1):259–271, Jan 1990.
[Mol17] Marco Molinaro. Online and random-order load balancing simultaneously. In Proceedings of
the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’17, pages
1638–1650, Philadelphia, PA, USA, 2017. Society for Industrial and Applied Mathematics.
[MS14] Konstantin Makarychev and Maxim Sviridenko. Solving optimization problems with disec-
onomies of scale via decoupling. In 55th IEEE Annual Symposium on Foundations of Computer
Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 571–580, 2014.
[Pin04] Michael Pinedo. Online deterministic scheduling, stochastic scheduling, and online determin-
istic scheduling. In Joseph Y-T. Leung, editor, Handbook of Scheduling - Algorithms, Models,
and Performance. CRC, 2004.
[PnG99] Victor de la Peña and Evarist Giné. Decoupling: From Dependence to Independence. Springer-
Verlag, New York, NY, USA, 1999.
[ST93] David B. Shmoys and Éva Tardos. An approximation algorithm for the generalized assignment
problem. Mathematical Programming, 62(1):461–474, Feb 1993.
15
A Proof of Theorem 1
Consider an instance of S UBSET S ELECTIONp , and suppose we have an α-approximate linear optimization
oracle over P, for a constant α. Let x∗ be an optimal solution for S UBSET S ELECTIONp , let S = j:x∗ =1 Vj ,
P
j
and let OPT = (ES p )1/p be the value of this optimal solution. By a standard binary search argument,
assume OPT is known within a constant factor (i.e., if (24) is feasible we increase the estimate of OPT, if it
is infeasible we decrease it). In fact, to simplify the notation we assume we know OPT exactly: the error in
the estimation translates directly to the constants inP the approximation factor.
Define ε̄ = eOPT 1 ∗
1/α . From Lemma 1 we have that j:x∗j =1 νε̄ (Vj ) ≥ α . Therefore, the optimal solution x
is feasible for the program
X 1
νε̄ (Vj ) xj ≥ (24)
α
j
x ∈ P.
P
Then use an α-approximate linear optimization oracle over P to find a solution x̄ ∈ P satisfying j νε̄ (Vj )x̄j ≥
ε̄ p
1 (if cannot find, increase the estimate of OPT). Lemma 2 then implies that E( j Vj x̄j )p ≥ ( 10
P
) =
p 1 P p 1/p
OPT (10e1/α )p ; since α is a constant, this implies that (E( j Vj x̄j ) ) ≥ Ω(OPT) and concludes the
proof of Theorem 1.
B Proof of Corollary 1
p
> (4Ø)p ; we want to show EkSkp > Ø. Notice the total
P
We prove the contrapositive: assume i,j EXi,j
load kSkp is at least the load of putting each of the Xi,j ’s in their own coordinate, namely
C Proof of Theorem 7
Let
X
I := i: ν + 1/p (X̃i,j ) ≤ 10
Ø/v̄i
j
n
and let I c = [m] \ I be its complement. Also, for each coordinate i, let Ji = j : ν + 1/p (X̃i,j ) =
Ø/v̄i
o
νØ/v̄1/p (X̃i,j ) be the set of j’s where the capping of ν did not made a difference, and let Jic be its comple-
i
ment. Let S J be the vector with coordinates SiJ = j∈Ji Xi,j (so only contributions from j’s in Ji ), and
P
16
c c
let S J = S − S J be the other contributions. We will break up S as S = SIJ + SIJ + SI c . From triangle
inequality is suffices to show that the expected `p -norm of each term is at most O(Ø).
We start with SIJ . By definition of I and J, for each i ∈ I we have
X X
νØ/v̄1/p (X̃i,j ) = ν + 1/p (X̃i,j ) ≤ 10,
i Ø/v̄i
j∈Ji j∈Ji
p SJ
and so Lemma 1 gives that E(S̃iJ )p ≤ Øv̄i e10p , where S̃iJ := 44i as usual. Moreover, since the sequence
satisfies i v̄1i ≤ 5 and all terms are positive, in particular we have i∈I v̄1i ≤ 5; so adding over all i ∈ I
P P
we have X X 1
E(S̃iJ )p ≤ Øp e10p ≤ 5Øp e10p .
v̄i
i∈I i∈I
Then the Jensen’s based inequality (4) gives EkS̃IJ kp ≤ O(Ø), and hence EkSIJ kp ≤ O(Ø).
c
Now we upper bound EkSIJ kp . Since for each j ∈ Jic the capping of ν kicked in, we have ν + 1/p (X̃i,j ) =
Ø/v̄i
1. Thus, by definition of I, for i ∈ I there can be at most 10 elements in Jic . Employing the elementary
inequality !p
X p X
u ≤ |U | max u ≤ |U |p up
u∈U
u∈U u∈U
Jc
which holds for any set U of non-negative numbers, we obtain E(Si )p ≤ |Ji |p j∈Ji EXi,j p
≤ 10p j∈Ji EXi,j p
P P
P each i J ∈
for
c p
I. Adding over all i ∈ I and using the “coarse control” Item 2 of the lemma, we have
p p Jc
i∈I E(Si ) ≤ 10 O(Ø) . Again inequality (4) gives that EkSI kp ≤ O(Ø).
Finally we control SI c . First, there are not too many coordinates in I c : again we have i∈I c v̄1i ≤ 5,
P
and since for such i’s v̄i = (1/α)p , this implies that |I c | ≤ 5αp . Moreover, because of Item 3 of the
lemma (`∞ control), the arguments from Lemma 2.4 of [GKNS18] show that EkSI c k∞ ≤ O(Ø) (for
completeness we provide a proof in Appendix D). Finally, from `p -`∞ comparison (i.e., for any vector
x ∈ Rd , kxkp ≤ d1/p kxk∞ holds) we have kSI c kp ≤ |I c |1/p kSI c k∞ , and thus EkSI c kp ≤ O(Ø). This
concludes the proof of the theorem.
To prove this we need the following lemma (Lemma 2.1 of [GKNS18]), which follows directly by the
Chernoff-Cramèr method for concentration.
17
Proof of Lemma 7. Let I = {i : `i ≤ 2}, and notice that by assumption |I| ≤ 3. Jensen’s inequality shows
that for any random variable X and any `, β` (X) ≥ EX. Therefore, our assumption implies
X XX
EkSI k∞ ≤ ESi ≤ Ø · β`i (Xi,j /Ø) ≤ 2γ.
i∈I i∈I j
P
Si Xi,j
To upper bound EkSI c k∞ , where I c = [m] \ I, we apply Lemma 8 to Ø = j
Ø to obtain for all t ≥ 3
Si X
Pr ≥ β`i (Xi,j /Ø) + t ≤ `−t
i
Ø
j
where the third inequality uses our assumption. Using the nonegativity of kSI c k∞ , we integrate the tail to
obtain
Z ∞
EkSI c k∞ = Pr(kSI c k∞ ≥ t) dx
0
Z Ø·(γ+3) Z ∞
= Pr(kSI c k∞ ≥ t) dx + Ø · Pr(kSI c k∞ ≥ Ø γ + Ø t) dx
0 3
Z ∞
−t+2
≤ Ø · (γ + 3) + 2 dx ≤ O(Ø).
3
Since by triangle inequality kSk∞ ≤ kSI k∞ + kSI c k∞ , putting the above bounds together concludes the
proof.
18