Nonconvex Optimization For Communication Systems
Nonconvex Optimization For Communication Systems
Communication Systems
Mung Chiang
Electrical Engineering Department
Princeton University, Princeton, NJ 08544, USA
[email protected]
Summary. Convex optimization has provided both a powerful tool and an intriguing mentality to the analysis and design of communication systems over the last
few years. A main challenge today is on nonconvex problems in these application.
This paper presents an overview of some of the important nonconvex optimization
problems in point-to-point and networked communication systems. Three typical applications are covered: Internet congestion control through nonconcave network utility maximization, wireless network power control through geometric and sigmoidal
programming, and DSL spectrum management through distributed nonconvex optimization. A variety of nonconvex optimization techniques are showcased: from
standard dual relaxation to sum-of-squares programming through successive SDP
relaxation, signomial programming through successive GP relaxation, and leveraging
the specific structures in problems for efficient and distributed heuristics.
Key words: Nonconvex optimization, Geometric programming, Semidefinite programming, Sum of squares, Duality, Network utility maximization,
TCP/IP, Wireless network, Power control.
1 Introduction
There has been two major waves in the history of optimization theory: the
first started with linear programming and simplex method in late 1940s, and
the second with convex optimization and interior point method in late 1980s.
Each has been followed by a transforming period of appreciation-application
cycle: as more people appreciate the use of LP/convex optimization, more
look for their formulations in various applications; then more work on its theory, efficient algorithms and softwares; the more powerful the tools become;
then more people appreciate its usage. Communication systems benefit significantly from both waves, including multicommodity flow solutions (e.g.,
Bellman Ford algorithm) from LP, and basic network utility maximization
and robust transceiver design from convex optimization.
Much of the current research frontier is about the potential of the third
wave, on nonconvex optimization. If one word is used to differentiate between easy and hard problems, convexity is probably the watershed. But if
a longer description length is allowed, much useful conclusions can be drawn
even for nonconvex optimization. Indeed, convexity is a very disturbing watershed, since it is not a topological invariant under change of variable (e.g.,
see geometric programming) or higher-dimension embedding (e.g., see sum of
squares method). A variety of approaches have been proposed, from nonlinear
transformation to turn an apparently nonconvex problem into a convex problem, to characterization of attraction regions and systematically jumping out
of a local optimum, from successive convex approximation to dualization, from
leveraging the specific structures of the problems (e.g., Difference of Convex
functions, concave minimization, low rank nonconvexity) to developing more
efficient branch-and-bound procedures.
Researchers in communications and networking have been examining nonconvex optimization using domain-specific structures in important problems
in the areas of wireless networking, Internet engineering, and communication
theory. Perhaps four typical topics best illustrate the variety of challenging
issues arising from nonconvex optimization in communication systems:
This chapter overviews the latest results in recent publications about the
first two topics, with a particular focus on showing the connections between
the engineering intuitions about important problems in communication systems and the state-of-the-art algorithms in nonconvex optimization theory.
Special Volume
Indeed, the basic NUM (1) is such a nice optimization problem that
its theoretical and computational properties have been well studied since the
1960s in the field of monotropic programming, e.g., as summarized in [41].
For network rate allocation problems, a dual-based distributed algorithm has
been widely studied (e.g., in [24, 32]), and is summarized below.
Zero duality gap for (1) states that the solving the Lagrange dual problem
is equivalent to solving the primal problem (1). The Lagrange dual problem
is readily derived. We first form the Lagrangian of (1):
X
X
X
xs
L(x, ) =
Us (xs ) +
l cl
s
sS(l)
X
X
X
Us (xs )
l xs +
cl l
L(x, ) =
s
X
s
lL(s)
Ls (xs , s ) +
cl l
P
where s = lL(s) l . For each source s, Ls (xs , s ) = Us (xs ) s xs only
depends on local xs and the link prices l on those links used by source s.
The Lagrange dual function g() is defined as the maximized L(x, ) over
x. This net utility maximization obviously can be conducted distributively
P
by the each source, as long as the aggregate link price s =
lL(s) l is
available to source s, where source s maximizes a strictly concave function
Ls (xs , s ) over xs for a given s :
xs (s ) = argmax [Us (xs ) s xs ] , s.
(2)
Special Volume
(3)
where the optimization variable is . Any algorithms that find a pair of primaldual variables (x, ) that satisfy the KKT optimality condition would solve
(1) and its dual problem (23). One possibility is a distributed, iterative subgradient method, which updates the dual variables to solve the dual problem
(23):
+
X
l (t + 1) = l (t) (t) cl
xs (s (t)) , l
(4)
sS(l)
where t is the iteration number and (t) > 0 are step sizes. Certain choices of
step sizes, such as (t) = 0 /t, 0 > 0, guarantee that the sequence of dual
variables (t) will converge to the dual optimal as t . The primal
variable x((t)) will also converge to the primal optimal variable x . For a
primal problem that is a convex optimization, the convergence is towards the
global optimum.
The sequence of the pair of algorithmic steps (2,4) forms a canonical distributed algorithm that globally solves network utility optimization problem
(1) and the dual (23) and computes the optimal rates x and link prices .
Nonconcave Network Utility Maximization
2.5
U (x)
1.5
0.5
10
12
x
Fig. 1. Some examples of utility functions Us (xs ): it can be concave or sigmoidal
as shown in the graph, or any general nonconcave function. If the bottleneck link
capacity used by the source is small enough, i.e., if the dotted vertical line is pushed
to the left, a sigmoidal utility function effectively becomes a convex utility function.
Special Volume
polynomial utilities) in [15] and this section, are complementary in the study
of distributed rate allocation by nonconcave NUM.
2.2 Global maximization of nonconcave network utility
Sum-of-squares method
We would like to bound the maximum network utility by in polynomial time
and search for a tight bound. Had there been no link capacity constraints,
maximizing a polynomial is already an NP hard problem, but can be relaxed
into a SDP [46]. This is because testing if the following bounding inequality
holds p(x), where p(x) is a polynomial of degree d in n variables, is
equivalent to testing the positivity of p(x), which can be relaxed
Pr into testing
if p(x) can be written as a sum of squares (SOS): p(x) = i=1 qi (x)2 for
some polynomials qi , where the degree of qi is less than or equal to d/2. This
is referred to as the SOS relaxation. If a polynomial can be written as a sum
of squares, it must be non-negative, but not vice versa. Conditions under
which this relaxation is tight were studied since Hilbert. Determining if a
sum of squares decomposition exists can be formulated as an SDP feasibility
problem, thus polynomial-time solvable.
Constrained nonconcave NUM can be relaxed by a generalization of the
Lagrange duality theory, which involves nonlinear combinations of the constraints instead of linear combinations in the standard duality theory. The key
result is the Positivstellensatz, due to Stengle [48], in real algebraic geometry,
which states that for a system of polynomial inequalities, either there exists
a solution in Rn or there exists a polynomial which is a certificate that no
solution exists. This infeasibility certificate is recently shown to be also computable by an SDP of sufficient size [38, 37], a process that is referred to as
the sum-of-squares method and automated by the software SOSTOOLS [39]
initiated by Parrilo in 2000. For a complete theory and many applications of
SOS methods, see [38] and references therein.
Furthermore, the bound itself can become an optimization variable in
the SDP and can be directly minimized. A nested family of SDP relaxations,
each indexed by the degree of the certificate polynomial, is guaranteed to
produce the exact global maximum. Of course, given the problem is NP hard,
it is not surprising that the worst-case degree of certificate (thus the number
of SDP relaxations needed) is exponential in the number of variables. What
is interesting is the observation that in applying SOSTOOLS to nonconcave
utility maximization, a very low order, often the minimum order relaxation
already produces the globally optimal solution.
Application of SOS method to nonconcave NUM
Using sum-of-squares and the Positivstellensatz, we set up the following problem whose objective value converges to the optimal value of problem (1), where
{Ui } are now general polynomials, as the degree of the polynomials involved
is increased.
minimize
subject
P to
P
P
s Us (xs ) l l (x)(cl sS(l) xs )
P
P
P
j,k jk (x)(cj sS(j) xs )(ck sS(k) xs )
P
P
. . . 12...n (x)(c1 sS(1) xs ) . . . (cn sS(n) xs )
is SOS,
l (x), jk (x), . . . , 12...n (x) are SOS.
(5)
Special Volume
ations. The sufficient rank test checks a rank condition on this moment matrix
and recovers (one or several) optimal x , as discussed in [19].
In summary, we have the following Algorithm for centralized computation
of a globally optimal rate allocation to nonconcave utility maximization, where
the utility functions can be written as or converted into polynomials.
Algorithm 1. Sum-of-squares for nonconcave utility maximization.
1) Formulate the relaxed problem (5) for a given degree D.
2) Use SDP to solve the Dth order relaxation, which can be conducted
using SOSTOOLS [39].
3) If the resulting dual SDP solution satisfies the sufficient rank condition,
the Dth order optimizer (D) is the globally optimal network utility, and a
corresponding x can be obtained 2 .
4) Increase D to D + 2, i.e., the next higher order relaxation, and repeat.
In the following subsection, we give examples of the application of SOS
relaxation to the nonconcave NUM. We also apply the above sufficient test to
check if the bound is exact, and if so, we recover the optimum rate allocation
x that achieve this tightest bound.
2.3 Numerical Examples and Sigmoidal Utilities
Polynomial utility examples
First, consider quadratic utilities, i.e., Us (xs ) = x2s as a simple case to start
with (this can be useful, for example, when the bottleneck link capacity limits
sources to their convex region of a sigmoidal utility). We present examples
that are typical, in our experience, of the performance of the relaxations.
Example 1. A small illustrative example. Consider the simple 2 link, 3
user network shown in Figure 2, with c = [1, 2]. The optimization problem is
x2
x3
c1
c2
x1
Fig. 2. Network topology for example 1.
Otherwise, (D) may still be the globally optimal network utility but is only
provably an upper bound.
10
P 2
maximize
s xs
subject to x1 + x2 1
x1 + x3 2
x1 , x2 , x3 0.
(6)
(7)
Special Volume
c3
c1
11
c6
c5
c2
c4
c7
Fig. 3. Network topology for example 2.
obtain the exact bound = 321.11 and recover an optimal rate allocation.
For n = 30, m = 2, and capacities randomly chosen from [0, 15], it turns out
that D = 2 relaxation yields the exact bound 816.95 and a globally optimal
rate allocation.
c1
c6
c2
c3
c5
c4
1
1+
e(as xs +bs )
where {as , bs } are constant integers. Even though these sigmoidal functions
are not polynomials, we show the problem can be cast as one with polynomial
cost and constraints, with a change of variables.
Example 4. Sigmoidal utility. Consider the simple 2 link, 3 user example
shown in Figure 2 for as = 1 and bs = 5.
The NUM problem is to
12
P
1
maximize
s 1+e(xs 5)
subject to x1 + x2 c1
x1 + x3 c2
x 0.
(8)
1
Let ys = 1+e(x
, then xs = log( y1s 1) + 5. Substituting for x1 , x2 in
s 5)
the first constraint, arranging terms and taking exponentials, then multiplying
the sides by y1 y2 (note that y1 , y2 > 0), we get
(1 y1 )(1 y2 ) e(10c1 ) y1 y2 ,
which is polynomial in the new variables y. This applies to all capacity con1
straints, and the non-negativity constraints for xs translate to ys 1+e
5.
Therefore the whole problem can be written in polynomial form, and SOS
methods apply. This transformation renders the problem polynomial for general sigmoidal utility functions, with any as and bs .
We present some numerical results, using a small illustrative example. Here
SOS relaxations of order 4 (D = 4) were used. For c1 = 4, c2 = 8, we find
= 1.228, which turns out to be a global optimum, with x0 = [0, 4, 8] as the
optimal rate vector. For c1 = 9, c2 = 10, we find = 1.982 and x0 = [0, 9, 10].
Now place a weight of 2 on y1 , while the other ys have weight one, we obtain
= 1.982 and x0 = [9, 0, 1].
In general, if as 6= 1 for some s, however, the degree of the polynomials in
the transformed problem may be very high. If we write the general problem
as
P
1
maximize
Ps 1+e(as xs +bs )
subject to sS(l) xs cl , l,
(9)
x 0,
each capacity constraint after transformation will be
Q
rls k6=s ak
s (1 ys )
Q
Q
P
Q rls k6=s ak
exp( s as (cl + s rls /as bs )) s ys
,
where rls = 1 if l L(s) and equals 0 otherwise. Since the product of the
as appears in the exponents, as > 1 significantly increases the degree of the
polynomials appearing in the problem and hence the dimension of the SDP
in the SOS method.
It is therefore also useful to consider alternative representations of sigmoidal functions such as the following rational function:
Us (xs ) =
xns
,
a + xns
1/n
where the inflection point is x0 = ( a(n1)
and the slope at the inflection
n+1 )
n1
n+1 1/n
0
point is Us (x ) = 4n ( a(n1) ) . Let ys = Us (xs ), the NUM problem in this
case is equivalent to
Special Volume
maximize
s ys
n
n
subject to x
s
P ys xs ays = 0
sS(l) xs cl , l
x0
13
(10)
(11)
l=1
0, ,
where the optimization variables are and , and denotes an ordered set
of integers {l }.
P
Fixing D where l l D, and equating the coefficients on the two sides
of the equality in (11), yields a linear program (LP). (Note that there are no
SOS terms, therefore no semidefiniteness conditions.) As before, increasing
the degree D gives higher order relaxations and a tighter bound.
We provide a pricing interpretation for problem (11).
PFirst, normalize each
capacity constraint as 1 ul (x) 0, where ul (x) = sS(l) xs /cl . We can
interpret ul (x) as link usage, or the probability that link l is used at any given
point in time. Then, in (11), we have terms linear in u such as l (1 ul (x)),
14
(12)
where the optimization variables are the coefficients in l (x). Similar to the
SOS relaxation (5), fixing the order D of the expression in (12) results in an
SDP. This relaxation has the nice property that no product terms appear: the
relaxation becomes exact with a high enough D without the need of product
terms. However, this degree might be much higher than what the previous
SOS method requires.
2.5 Concluding Remarks and Future Directions
We consider the NUM problem in the presence of inelastic flows, i.e., flows
with nonconcave utilities. Despite its practical importance, this problem has
not been studied widely, mainly due to the fact it is a nonconvex problem.
There has been no effective mechanism, centralized or distributed, to compute the globally optimal rate allocation for nonconcave utility maximization
3
with an extra assumption that always holds for linear constraints as in NUM
problems
Special Volume
15
16
Special Volume
17
(2)
(n)
. . . xan
(j)
where dk 0, k = 1, 2, . . . , K, and ak R, j = 1, 2, . . . , n, k = 1, 2, . . . , K.
0.5
100
For example, 2x
is a posynomial in x, x1 x2 is not a posyn1 x2 + 3x1 x3
omial, and x1 /x2 is a monomial, thus also a posynomial.
18
PKi
(13)
(1)
(2)
(n)
ik
ik
. . . xnik ,
k=1 dik x1 x2
(1)
(2)
(n)
Special Volume
19
5
120
4.5
4
100
3.5
Function
Function
80
60
3
2.5
2
40
1.5
20
0
0
10
5
5
10
1
0.5
4
4
2
B
2
0
Fig. 5. A bi-variate posynomial before (left graph) and after (right graph) the log
transformation. A non-convex function is turned into a convex one.
c
a(j)
xj
GP PMoP SP
R+
R
R
R
Z+
R
R++ R++ R++
20
60
3.5
40
Function
Function
3
20
2.5
20
10
2
3
10
5
Y
5
0 0
2
1
B
1
0 0
Fig. 6. Ratio between two bi-variate posynomials before (left graph) and after (right
graph) the log transformation. It is a non-convex function in both cases.
Special Volume
21
j6=i
Pi Gii Fii
Pj Gij Fij + ni
(15)
X
j6=i
Gij Fij Pj }.
22
1
j6=i 1+ SIRth Gij Pj
G P
ii i
[25], which means that the upper bound Po,i Po,i,max can be written as an
upper bound on a posynomial in P:
Y
SIRth Gij Pj
1
1+
.
(17)
Gii Pi
1 Po,i,max
j6=i
Special Volume
23
user we are optimizing for, must be greater than a common threshold SIR level
. In different experiments, is varied to observe the effect on the optimized
users SIR. This is done independently for the near user at d = 1, a medium
distance user at d = 15, and the far user at d = 20. The results are plotted in
Figure 7.
Optimized SIR vs. Threshold SIR
20
near
medium
far
15
10
10
15
20
5
10
Several interesting effects are illustrated. First, when the required threshold SIR in the constraints is sufficiently high, there is no feasible power control
solution. At moderate threshold SIR, as is decreased, the optimized SIR initially increases rapidly. This is because it is allowed to increase its own power
by the sum of the power reductions in the four other users, and the noise is
relatively insignificant. At low threshold SIR, the noise becomes more significant and the power trade-off from the other users less significant, so the curve
starts to bend over. Eventually, the optimized user reaches its upper bound
on power and cannot utilize the excess power allowed by the lower threshold
SIR for other users. This is exhibited by the transition from a sharp bend in
the curve to a much shallower sloped curve.
We now proceed to show that GP can also be applied to the problem
formulations with an overall system objective of total system throughput,
under both user data rate constraints and outage probability constraints.
The following constrained problem of maximizing system throughput is a
GP:
24
(18)
This is because most GP solution algorithms solve both the primal GP and its
Lagrange dual problem, and by complementary slackness condition, a resource
constraint is tight at optimal power allocation when the corresponding optimal
dual variable is non-zero.
Special Volume
25
Constraints
(a) Ri Ri,min (rate constraint)
(b) P
Pi1 Gi1 = Pi2 Gi2 (near-far constraint)
(c)
Ri Rsystem,min (total throughput constraint)
i
(d) Po,i Po,i,max (outage prob. constraint)
(e) 0 Pi Pi,max (power constraint)
Extensions
In wireless multihop networks, system throughput may be measured either by
end-to-end transport layer utilities or by link layer aggregate throughput. GP
application to the first approach has appeared in [9], and those to the second
approach in [10]. Furthermore, delay and buffer overflow properties can also
be accommodated in the constraints or objective function of GP-based power
control.
3.4 Power Control by Geometric Programming: Non-convex Case
If we maximize the total throughput Rsystem in the medium to low SIR case,
i.e., when SIR is not much larger than 0dB, the approximation of log(1 + SIR)
as log SIR does not hold. Unlike SIR, which is an inverted posynomial, 1+SIR is
1
not an inverted posynomial. Instead, 1+SIR
is a ratio between two posynomials:
P
f (P)
j6=i Gij Pj + ni
= P
.
g(P)
j Gij Pj + ni
(19)
26
minimize f0 (x)
subject to fi (x) 1, i = 1, 2, . . . , m,
(20)
where f0 is convex without loss of generality6 , but the fi (x)s, i are nonconvex. Since directly solving this problem is NP-hard, we want to solve it by
a series of approximations fi (x) fi (x), x, each of which can be optimally
solved in an easy way. It is known [33] that if the approximations satisfy the
following three properties, then the solutions of this series of approximations
converge to a point satisfying the necessary optimality Karush-Kuhn-Tucker
(KKT) conditions of the original problem:
(1) fi (x) fi (x) for all x,
(2) fi (x0 ) = fi (x0 ) where x0 is the optimal solution of the approximated
problem in the previous iteration,
(3) fi (x0 ) = fi (x0 ).
The following algorithm describes the generic successive approximation
approach.Given a method to approximate fi (x) with fi (x) , i, around some
point of interest x0 , the following algorithm provides the output of a vector
that satisfies the KKT conditions of the original problem.
Algorithm 2. Successive approximation to a nonconvex problem.
1) Choose an initial feasible point x(0) and set k = 1.
2) Form an approximated problem of (20) based on the previous point
x(k1) .
3) Solve the k-th approximated problem to obtain x(k) .
4) Increment k and go to step 2 until convergence to a stationary point.
Single condensation method. Complementary GPs involve upper bounds on
the ratio of posynomials as in (19); they can be turned into GPs by approximating the denominator of the ratio of posynomials, g(x), with a monomial
g(x), but leaving the numerator f (x) as a posynomial.
P
Lemma 1 Let g(x) = i ui (x) be a posynomial. Then
Y ui (x) i
g(x) g(x) =
.
(21)
i
i
If, in addition, i = ui (x0 )/g(x0 ), i, for any fixed positive x0 , then g(x0 ) =
g(x0 ), and g(x) is the best local monomial approximation to g(x) near x0 in
the sense of first order Taylor approximation.
P
Q
Proof. The arithmetic-geometric mean inequality states that i i vi i vii ,
where v 0 and 0, 1T = 1. Letting ui = i vi , we can write this basic
6
to the
writing
Special Volume
27
Q ui i
P
. The inequality becomes an equality if we
inequality as i ui i i
P
let i = ui / i ui , i, which satisfies the condition that 0 and 1T = 1.
It can be readily verified that the best local monomial approximation of g(x)
near x0 is g(x).
Proposition 2 The approximation of a ratio of posynomials f (x)/g(x) with
f (x)/
g (x) where g(x) is the monomial approximation of g(x) using the
arithmetic-geometric mean approximation of Lemma 1 satisfies the three conditions for the convergence of the successive approximation method.
Proof. Conditions (1) and (2) are clearly satisfied since g(x) g(x) and
g(x0 ) = g(x0 ) (Lemma 1). Condition (3) is easily verified by taking derivatives
of g(x) and g(x).
Double condensation method. Another choice of approximation is to make
a double monomial approximation for both the denominator and numerator
in (19). However, in order to satisfy the three conditions for the convergence
of the successive approximation method, a monomial approximation for the
numerator f (x) should satisfy f (x) f(x).
Applications to power control
Figure 8 shows a block diagram of the approach of GP-based power control
for general SIR regime. In the high SIR regime, we need to solve only one GP.
In the medium to low SIR regimes, we solve truly nonconvex power control
problems that cannot be turned into convex formulation through a series of
GPs.
(High SIR)
Original
Problem
(Medium
to
Low SIR)
Original Problem
- Solve
1 GP
SP
- Complementary
GP (Condensed)
- Solve
1 GP
GP-based power control problems in the medium to low SIR regimes become SP (or, equivalently, Complementary GP), which can be solved by the
single or double condensation method. We focus on the single condensation
method here. Consider a representative problem formulation of maximizing
total system throughput in a cellular wireless network subject to user rate
28
Special Volume
29
to different optima over the entire set of experiments, achieving (or coming
very close to) the global optimum at 5290 bps 96% of the time and a local
optimum at 5060 bps 4% of the time. The average number of GP iterations
required by the condensation method over the same set of experiments is 15
if an extremely tight exit condition is picked for SP condensation iteration:
= 1 1010 . This average can be substantially reduced by using a larger ,
e.g., increasing to 1 102 requires on average only 4 GPs.
Optimized total system throughput
5300
5250
5200
5150
5100
5050
0
50
100
150
200
250
300
Experiment index
350
400
450
500
We have thus far discussed a power control problem (22) where the objective function needs to be condensed. The method is also applicable if some
constraint functions are signomials and need to be condensed [51].
3.5 Distributed Implementation
A limitation for GP-based power control in ad hoc networks without base
stations is the need for centralized computation (e.g., by interior point methods). The GP formulations of power control problems can also be solved by
a new method of distributed algorithm for GP. The basic idea is that each
user solves its own local optimization problem and the coupling among users
is taken care of by message passing among the users. Interestingly, the special structure of coupling for the problem at hand (all coupling among the
logical links can be lumped together using interference terms) allows one to
30
further reduce the amount of message passing among the users. Specifically,
we use a dual decomposition method to decompose a GP into smaller subproblems whose solutions are jointly and iteratively coordinated by the use
of dual variables. The key step is to introduce auxiliary variables and to add
extra equality constraints, thus transferring the coupling in the objective to
coupling in the constraints, which can be solved by introducing consistency
pricing (in contrast to congestion pricing). We illustrate this idea through
an unconstrained GP followed by an application of the technique to power
control.
Distributed algorithm for GP
Suppose we have the following unconstrained standard form GP in x 0:
P
minimize i fi (xi , {xj }jI(i) )
(23)
where xi denotes the local variable of the ith user, {xj }jI(i) denote the
coupled variables from other users, and fi is either a monomial or posynomial.
Making a change of variable yi = log xi , i, in the original problem, we obtain
P
minimize i fi (eyi , {eyj }jI(i) ).
We now rewrite the problem by introducing auxiliary variables yij for the
coupled arguments and additional equality constraints to enforce consistency:
P
yij
yi
}jI(i) )
minimize
i fi (e , {e
(24)
subject to yij = yj , j I(i), i.
Each ith user controls the local variables (yi , {yij }jI(i) ). Next, the Lagrangian
of (24) is formed as
X
X X
ij (yj yij )
L({yi }, {yij }; {ij }) =
fi (eyi , {eyij }jI(i) ) +
i
jI(i)
where
yi
yij
}jI(i) )+
X
j:iI(j)
X
ji yi
ij yij . (25)
jI(i)
Special Volume
31
(26)
{ij }
where
g({ij }) =
X
i
yi ,{yij }
Note that the transformed primal problem (24) is convex with zero duality
gap; hence the Lagrange dual problem indeed solves the original standard GP
problem. A simple way to solve the maximization in (26) is with the following
subgradient update for the consistency prices:
ij (t + 1) = ij (t) + (t)(yj (t) yij (t)).
(27)
Appropriate choice of the stepsize (t) > 0, e.g., (t) = 0 /t for some constant
0 > 0, leads to convergence of the dual algorithm.
Summarizing, the ith user has to: i) minimize the function Li in (25)
involving only local variables, upon receiving the updated dual variables
{ji , j : i I(j)}, and ii) update the local consistency prices {ij , j I(i)}
with (27), and broadcast the updated prices to the coupled users.
Applications to power control
As an illustrative example, we maximize the total system throughput in the
high SIR regime with constraints local to each user. If we directly applied the
distributed approach described in the last subsection, the resulting algorithm
would require knowledge by each user of the interfering channels and interfering transmit powers, which would translate into a large amount of message
passing. To obtain a practical distributed solution, we can leverage the structures of power control problems at hand, and instead keep a local copy of
each of the effective received powers PijR = Gij Pj . Again using problem (18)
as an example formulation and assuming high SIR, we can write the problem
as following (after the log change of variable):
P
1
i ) P exp(P R ) + 2
minimize
log
G
exp(
P
ij
ii
i
j6=i
ij + Pj ,
subject to PijR = G
Constraints local to each user, e.g., (a),(d) and (e) in Table (2).
(28)
The partial Lagrangian is
X
X
XX
R
i )
ijR ) + 2 +
L=
log G1
exp(
P
exp(
P
G
+
P
,
ij
ij
j
ij
ii
i
j6=i
j6=i
(29)
32
and the local ith Lagrangian function in (29) is distributed to the ith user,
from which the dual decomposition method can be used to determine the optimal power allocation P . The distributed power control algorithm is summarized as follows.
Algorithm 4. Distributed power allocation update to maximize Rsystem .
At each iteration t:
P
1) The ith user receives the term
(t)
involving the dual variables
ji
j6=i
from the interfering users by message
n passing
o and minimizes the following local
Lagrangian with respect to Pi (t), PijR (t) subject to the local constraints:
j
n
o
Li Pi (t), PijR (t) ; {ij (t)}j
j
P
2
R
= log G1
+
ii exp(Pi (t))
j6=i exp(Pij (t)) +
P
P
R
2) The ith user estimates the effective received power from each of the
interfering users PijR (t) = Gij Pj (t) for j 6= i, updates the dual variable by
ij (t + 1) = ij (t) + (0 /t) PijR (t) log Gij Pj (t) ,
(30)
and then broadcast them by message passing to all interfering users in the
system.
Example 7. Distributed GP power control. We apply the distributed algorithm to solve the above power control problem for three logical links with
Gij = 0.2, i 6= j, Gii = 1, i, maximal transmit powers of 6mW, 7mW and
7mW for link 1, 2 and 3 respectively. Figure 10 shows the convergence of the
dual objective function towards the globally optimal total throughput of the
network. Figure 11 shows the convergence of the two auxiliary variables in
link 1 and 3 towards the optimal solutions.
3.6 Concluding Remarks and Future Directions
Power control problems with nonlinear objective and constraints may seem to
be difficult, NP-hard problems to solve for global optimality. However, when
SIR is much larger than 0dB, GP can be used to turn these problems into
intrinsically tractable convex formulations, accommodating a variety of possible combinations of objective and constraint functions involving data rate,
delay, and outage probability. Then interior point algorithms can efficiently
compute the globally optimal power allocation even for a large network. Feasibility analysis of GP naturally lead to admission control and pricing schemes.
When the high SIR approximation cannot be made, these power control problems become SP and may be solved by the heuristic of condensation method
Special Volume
4
2.2
33
x 10
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
50
100
Iteration
150
200
Fig. 10. Convergence of the dual objective function through distributed algorithm
(Example 7).
log(P2)
log(PR
/G )
12 12
log(PR /G )
10
32
50
100
Iteration
150
32
200
34
Special Volume
35
Gaussian noise, and seeks to maximize its data rate by waterfilling over the
aggregated noise plus interference. No information exchange is needed among
users, and all the actions are completely autonomous. IW leads to an great
performance over the static approach, and enjoys a low complexity that is
linear in N . However, the greedy nature of IW leads to a performance far
from optimal in the near-far scenarios such as mixed CO/RT deployment and
upstream VDSL.
To address this, an optimal spectrum balancing (OSB) algorithm [8] has
been proposed, which finds the best possible spectrum management solution
under the current capabilities of the DSL modems. OSB avoids the selfish
behaviors of individual users by aiming at the maximization of a total weighted
sum of users rates, which corresponds to a boundary point of the achievable
rate region. On the other hand, OSB has a high computation complexity that
is exponential in N , which quickly leads to intractability when N is larger
than 6. Moreover, it is a completely centralized algorithm where a spectrum
management center at the central office needs to know the global information
(i.e., all the noise PSDs and crosstalk channel gains in the same binder) to
perform the algorithm.
As an improvement to the OSB algorithm, an iterative spectrum balancing
(ISB) algorithm [7] has been proposed, which is based on a weighted sum
rate maximization similar as OSB. Different from OSB, ISB performs the
optimization iteratively through users, which leads to a quadratic complexity
in N. Closely to optimal performance can be achieved by the ISB algorithm in
most cases. However, each user still needs to know the global information as in
OSB, thus ISB is still a centralized algorithm and considered to be impractical
in many cases.
This section presents the ASB algorithm [21], which further reduce the
complexity from ISB algorithm, and achieves close optimal performance similar as ISB and OSB. The basic idea is to use the concept of reference line
to mimic a typical victim line in the current binder. By setting the power
spectrum level to protect the reference line, a good balance between selfish
and global maximizations can be achieved. The ASB algorithm enjoys a linear
complexity in N and K, and can be implemented in a completely autonomous
way. We prove the convergence of ASB for both 2-user and N -user case, under
both sequential and parallel updates.
Table 3 compares various aspects of different DSM algorithms. Utilizing
the structures of the DSL problem, in particular, the lack of channel variation
and user mobility, is the key to provide a linear complexity, distributed, convergent, and almost optimal solution to this coupled nonconvex optimization
problem.
4.2 System Model
Using the notation as in [8, 7], we consider a DSL bundle with N = {1, ..., N }
modems (i.e., lines, users) and K = {1, ..., K} tones. Assume discrete multi-
36
Operation
Autonomous
Centralized
Centralized
Autonomous
Complexity
O (KN )
O KeN
O KN 2
O (KN )
Performance
Suboptimal
Optimal
Near optimal
Near optimal
Reference
[56]
[8]
[7]
[21]
where n,m
= |hn,m
| / |hn,n
k
k
k | is the normalized crosstalk channel gain, and
2
n
k is the noise power density normalized by the direct channel gain |hn,n
k | .
Here denotes the SINR-gap to capacity, which is a function of the desired
BER, coding gain and noise margin [49]. Without loss of generality, we assume
= 1. The data rate on line n is thus
X
R n = fs
bnk .
(32)
kK
Special Volume
snk P n .
37
(33)
kK
P
,
n.
kK k
(34)
such that the nonnegative weight coefficient wn is adjusted to ensure that the
target rate constraint of user n is met. Without loss of generality, here we
define w1 = 1. By changing the rate constraints Rn,target for users n > 1 (or
equivalently, changing the weight coefficients, wn for n > 1), every boundary
point of the convex rate region can be tracked.
We observe that at the optimal solutions of (34), each user chooses a
PSD level that leads to a good balance of maximization of her own rate
and minimization of the damages he causes to the other users. To accurately
calculate the latter, the user needs to know the global information of the noise
PSDs and crosstalk channel gains. However, if we aim at a less aggressive
objective and only require each user give enough protection to the other users
in the binder while maximization her own rate, then global information may
not be needed. Indeed, we can introduce the concept of a reference line,
a virtual line that represents a typical victim in the current binder. Then
instead of solving (34), each user tries to maximize the achievable data rate
38
on the reference line, subject to its own data rate and total power constraint.
Define the rate of the reference line to user n as
X
X
sk
bn =
Rn,ref =
log
1
+
.
k
nk snk +
k
kK
kK
The coefficients {
sk ,
k ,
nk , k, n} are parameters of the reference line and
can be obtained from field measurement. They represent the conditions of a
typical victim user in an interference channel (here a binder of DSL lines),
and are known to the users a priori. They can be further updated on a much
slower timescale through channel measurement data. User n then wants to
solve the following problem local to itself:
maximize Rn,ref
n
n,target
subject to R
,
P Rn
n
s
P
.
kK k
(36)
For each user n, replacing the original optimization (36) with the Lagrange
dual problem
X
max
max
Jkn wn , n , snk , sn
,
(38)
P
k
n
n 0,
where
kK
sn
P n
k
kK
sk
Jkn wn , n , snk , sn
= wn bnk + bnk n snk .
k
n
(39)
Special Volume
sn,I
wn , n , sn
= arg
k
k
max
sn
[0,P n ]
k
Jkn wn , n , snk , sn
,
k
39
(40)
which can be found by solving the first order condition, Jkn wn , n , snk , sn
/snk =
k
0, which leads to
nk sk
n = 0.
n,I
n,I
n
n
+
+
sk +
k sk +
k
k sk +
k
(41)
Note that (41) can be simplified into a cubic equation which has three
solutions. The optimal PSD can be found by substituting
these three solutions
back to the objective function Jkn wn , n , snk , sn
,
,
as
well as checking the
k
boundary solutions snk = 0 and snk = P n , and pick the one that yields the
largest value of Jkn .
The user then updates n to enforce the power constraint, and updates
wn to enforce the target rate constraint. The complete algorithm is given as
follows, where and w are small stepsizes for updating n and wn .
wn
sn,I
k
n,m m
sk
m6=n k
kn
P
;
k k
h
i+
P
wn = wn + w Rn,target k bnk
;
until convergence
end
until convergence
40
log 1 +
sk
nk snk +
k
log
sk
nk snk
.
nk snk
sk
n,II,1
n
n n n
n n
n n
Jk
w , , sk , sk = w b k
sk + log
.
+
X
w
n
n
sn,II,1
wn , n , sn
= n
n,m
sm
.
k k
k
k
k
+
nk /
k
(42)
n
X n,m
w
n,II,2
n
n
k sm
.
sk
wn , n , sk = n
k k
(43)
m6=n
s
+
on
the current tone. It is different from the conventional
k
k
m6=n k
waterfilling in that the water level in each tone is not only determined by the
dual variables wn and n , but also by the parameters of the reference line,
nk /
k .
On the other hand, on any tone where the reference line is inactive, i.e.,
C = {k|
kK
sk = 0, k K}, the objective function is
Jkn,II,2 wn , n , snk , sn
= wn bnk n snk ,
k
m6=n
n wnn
,kK
k
k
m6=n k
+
k /
k
n
n n
sn,II
w
,
,
s
=
.
+
k
k
n,m m
wn P
n
C
,
k
K
n
k
k
m6=n k
(44)
This is essentially a waterfilling type of solution, with different water levels
for different tones (frequencies). We call it frequency selective waterfilling.
4.5 Convergence Analysis
In this subsection, we show the convergence for both ASB-I and ASB-II, for
the case where users fix their weight coefficients wn , which is also called Rate
Special Volume
41
Adaptive (RA) spectrum balancing [49] that aims at maximizing users rates
subject to power constraint. 7
Convergence in the Two-user case
The first
result is on the convergence of ASB-I algorithm, with fixed w =
w1 , w2 and = 1 , 2 .
The proof of Theorem 3 uses supermodular game theory [53] and strategy
transformation similar to [20].
Now consider the ASB-II algorithm where two users sequentially optimize
their PSD levels under fixed values of w, but adjust to enforce the power
constraint. The following lemma will be useful in proving the main convergence
results.
Lemma 1. Consider any non-decreasing function f (x) and non-increasing
function g (x), where there exists a unique x such that f (x ) = g (x ) , f (x) / (x)|x=x >
0 and g (x) / (x)|x=x < 0. Then
x = arg min {max{f (x) , g (x)}} .
x
The second main category of spectrum balancing problem is Fixed Margin (FM),
which is concerned with finding a minimal power allocation such that minimum
target data-rates for each user is satisfied. For example, problem (34) is a mixed
RA/FM problem.
42
P n,t
Denote sn,t
k as the PSD of user n in tone k after iteration t, where
k sk =
P n is satisfied for any n and t. One iteration is defined as one round of updates
of all users. We can show that
Proposition 4 The ASB-II algorithm globally converges to the unique fixed
1,2
point in a two-user system under fixed w, if maxk 2,1
k maxk k < 1.
The convergence result of iterative waterfilling in the two user case [56] is
a special case of Proposition 4 by setting sk = 0, k.
Proof. Define
nk as the equivalent interference channel gain from user n to
the reference line,
n
k , k K
n
k =
C ,
0, k K
and we can simplify (44) as
sn,t+1
=
k
wn
n,m
sm,t
kn
k
k
n,t+1 +
nk /
k
+
, n, m {1, 2} , m 6= n, k, t
P
where n,t+1 is chosen such that k sn,t+1
= P n.
k
+
Define [x] = max (x, 0) and [x] = max (x, 0) , then it is clear that
Xh
k
n,t
sn,t
k sk
i+
i
X h n,t
sk sn,t
, n, t, t ,
k
(45)
since the total power constraint is always satisfied after any iteration. Also
define
i
P h
n
f n,t (x) = k x+w
m
sm,t
kn sn,t
,
k
k
k
n
/
k
k
i+
P h
n
m m,t
g n,t (x) = k x+w
kn sn,t
,
k
n /
k k sk
k
and it is clear that f n,t (x) (g n,t (x) , respectively) is non-decreasing (nonn,t+1
increasing) in x, and strictly
.
increasing (strictly decreasing) at x =
n,t+1
n,t+1
Also f
=g
. Then by Lemma 1,
max f n,t (x) , g n,t (x) max f n,t n,t+1 , g n,t n,t+1 , x.
Take x = n,t , we have
max f n,t n,t , g n,t n,t max f n,t n,t+1 , g n,t n,t+1 .
This leads to
Special Volume
max
i+ X h
i
X h 1,t+1
1,t+1
1,t
sk
s1,t
,
s
s
k
k
k
k
k
1,t+1
43
(46)
= max f 1,t 1,t+1 , g 1,t
max f 1,t 1,t , g 1,t 1,t
(
X
w1
1,2 2,t
1,t
2
k sk k sk
,
= max
1,t +
1k /
k
k
+ )
X
w1
1,2 2,t
1,t
2
k sk k sk
1,t +
1k /
k
k
(
)
i+ X
h
i
X 1,2 h 2,t1
2,t
1,2
2,t1
2,t
= max
k sk
sk
,
k sk
sk
k
(
)
n
o
i+ X h
i
X h 2,t
1,2
2,t1
2,t
2,t1
max k
max
sk sk
,
sk sk
k
(
)
n
o
n
o
i+ X h
i
X h 1,t
1,2
2,1
1,t1
1,t
1,t1
max k
max k max
sk sk
,
sk sk
.
k
< max
Xh
k
i+ X h
i
1,t1
1,t1
s1,t
,
s1,t
k sk
k sk
k
(47)
2,1
The last inequality is due to the fact that maxk 1,2
k maxk k < 1. This shows
that the algorithm is a contraction mapping from any initial PSD values, thus
globally converges to a unique fixed point [4].
44
5 km
CO
2 km
CP
4 km
RT1
3 km
RT2
4 km
CP
3.5 km
RT3
CP
3 km
CP
Special Volume
45
1.8
1.6
CO (Mbps)
1.4
1.2
0.8
0.6
0.4
0
4
RT3 (Mbps)
Fig. 13. Rate regions obtained by ASB, IW, OSB, and ISB.
Acknowledgment
The author would like to acknowledge collaborations with Raphael Cendrillon,
Maryam Fazel, Prashanth Hande, Jianwei Huang, Daniel Palomar, and Chee
Wei Tan on the publications related to this survey [12, 15, 51, 21], as well
as helpful general discussions on related topics with Stephen Boyd, Robert
Calderbank, John Doyle, David Julian, Jang-Won Lee, Ying Li, Steven Low,
Daniel ONeill, Asuman Ozdaglar, Pablo Parrilo, Ness Shroff, R. Srikant, and
Ao Tang.
46
References
1. M. Avriel, Ed. Advances in Geometric Programming, Plenum Press, New York,
1980.
2. N. Bambos, Toward power-sensitive network architectures in wireless communications: Concepts, issues, and design aspects. IEEE Pers. Comm. Mag., vol.
5, no. 3, pp. 50-59, 1998.
3. D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.
4. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: numerical methods. Prentice Hall, 1989.
5. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University
Press, 2004.
6. R. Cendrillon, G. Ginis, and M. Moonen, Improved linear crosstalk precompensation for downstream vdsl, in Proceedings of IEEE International Conference
on Acoustics, Speed and Signal Processing (ICASSP), 2004, pp. 10531056.
7. R. Cendrillon and M. Moonen, Iterative spectrum balancing for digital subscriber lines, in IEEE International Communications Conference (ICC), 2005.
8. R. Cendrillon, W. Yu, M. Moonen, J. Verlinden, and T. Bostoen, Optimal
multi-user spectrum management for digital subscriber lines, accepted by IEEE
Transactions on Communications, 2005.
9. M. Chiang, Balancing transport and physical layers in wireless multihop networks: Jointly optimal congestion control and power control, IEEE J. Sel. Area
Comm., vol. 23, no. 1, pp. 104-116, Jan. 2005.
10. M. Chiang, Geometric programming for communication systems, Foundations
and Trends of Communications and Information Theory, vol. 2, no. 1-2, pp. 1156, Aug. 2005.
11. M. Chiang, S. H. Low, R. A. Calderbank, and J. C. Doyle, Layering as optimization decomposition, To appear in Proceedings of IEEE, 2006.
12. M. Chiang, S. Zhang, and P. Hande, Distributed rate allocation for inelastic
flows: Optimization framework, optimality conditions, and optimal algorithms,
Proc. IEEE Infocom, Miami, FL, March 2005.
13. S. T. Chung, Transmission schemes for frequency selective gaussian interference
channels, Ph.D. dissertation, Stanford University, 2003.
14. R. J. Duffin, E. L. Peterson, and C. Zener, Geometric Programming: Theory
and Applications. Wiley, 1967.
15. M. Fazel and M. Chiang, Nonconcave network utility maximization by sum of
squares programming, Proc. IEEE CDC, Dec. 2005.
16. G. Foschini and Z. Miljanic, A simple distributed autonomous power control
algorithm and its convergence, IEEE Trans. Veh. Tech., vol. 42, no. 4, 1993.
17. G. Ginis and J. Cioffi, Vectored transmission for digital subscriber line systems, IEEE Journal on Selected Areas of Communications, vol. 20, no. 5, pp.
10851104, 2002.
18. D. Handelman, Representing polynomials by positive linear functions on compact convex polyhedra, Pacific J. Math., vol. 132, pp. 35-62, 1988.
19. D. Henrion, J.B. Lasserre, Detecting global optimality and extracting solutions
in GloptiPoly, Research report, LAAS-CNRS, 2003.
20. J. Huang, R. Berry, and M. L. Honig, A game theoretic analysis of distributed
power control for spread spectrum ad hoc networks, Proc. IEEE ISIT, July
2005.
Special Volume
47
48
41. R. T. Rockafellar, Network Flows and Monotropic Programming, Athena Scientific, 1998.
42. R. T. Rockafellar, Lagrange multipliers and optimality, SIAM Review, vol.
35, pp. 183-283, 1993.
43. C. Saraydar, N. Mandayam, and D. Goodman, Pricing and power control in
a multicell wireless data network, IEEE J. Sel. Areas Comm., vol. 19, no. 10,
pp. 1883-1892, 2001.
44. K. Schm
udgen, The k-moment problem for compact semialgebraic sets, Math.
Ann., vol. 289, pp. 203-206, 1991.
45. S. Shenker, Fundamental design issues for the future Internet, IEEE J. Sel.
Area Comm., vol. 13, no. 7, pp. 1176-1188, Sept. 1995.
46. N. Z. Shor, Quadratic optimization problems, Soviet J. Comput. Systems Sci.,
vol 25, pp. 1-11, 1987.
47. R. Srikant, The Mathematics of Internet Congestion Control, Birkhauser 2004.
48. G. Stengle, A Nullstellensatz and a Positivstellensatz in semialgebraic geometry, Math. Ann., vol. 207, pp.87-97, 1974.
49. T. Starr, J. Cioffi, and P. Silverman, Understanding digital Subscriber Line Technology. Prentice Hall, 1999.
50. C. Sung and W. Wong, Power control and rate management for wireless multimedia CDMA systems, IEEE Trans. Comm. vol. 49, no. 7, pp. 1215-1226,
2001.
51. C. W. Tan, D. Palomar, and M. Chiang, Solving non-convex power control
problems in wireless networks: Low SIR regime and distributed algorithms,
Proc. IEEE Globecom, St. Louis, MO, Nov. 2005.
52. C. W. Tan, D. Palomar and M. Chiang, Distributed Optimization of Coupled Systems with Applications to Network Utility Maximization, Proc. IEEE
ICASSP 2006, Toulouse, France, May 2006.
53. D. M. Topkis, Supermodularity and Complementarity. Princeton University
Press, 1998.
54. M. Xiao, N. B. Shroff, and E. K. P. Chong, Utility based power control (UBPC)
in cellular wireless systems, IEEE/ACM Trans. Networking, vol. 11, no. 10, pp.
210-221, March 2003.
55. R. Yates, A framework for uplink power control in cellular radio systems,
IEEE J. Sel. Areas Comm., vol. 13, no. 7, pp. 1341-1347, 1995.
56. W. Yu, G. Ginis, and J. Cioffi, Distributed multiuser power control for digital
subscriber lines, IEEE Journal on Selected Areas in Communication, vol. 20,
no. 5, pp. 11051115, June 2002.
57. W. Yu, R. Lui, and R. Cendrillon, Dual optimization methods for multiuser
orthogonal frequency division multiplex systems, in Proceedings of IEEE Globecom, vol. 1, 2004, pp. 225229.