Neely 2010
Neely 2010
with Application to
Communication and
Queueing Systems
Copyright © 2010 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00271ED1V01Y201006CNT007
Lecture #7
Series Editor: Jean Walrand, University of California, Berkeley
Series ISSN
Synthesis Lectures on Communication Networks
Print 1935-4185 Electronic 1935-4193
This material is supported in part by one or more of the following: the DARPA IT-MANET program grant
W911NF-07-0028, the NSF Career grant CCF-0747525, and continuing through participation in the Network
Science Collaborative Technology Alliance sponsored by the U.S. Army Research Laboratory.
Synthesis Lectures on
Communication Networks
Editor
Jean Walrand, University of California, Berkeley
Synthesis Lectures on Communication Networks is an ongoing series of 50- to 100-page publications
on topics on the design, implementation, and management of communication networks. Each lecture is
a self-contained presentation of one topic by a leading expert. The topics range from algorithms to
hardware implementations and cover a broad spectrum of issues from security to multiple-access
protocols. The series addresses technologies from sensor networks to reconfigurable optical networks.
The series is designed to:
• Help engineers and advanced students keep up with recent developments in a rapidly evolving
technology.
Network Simulation
Richard M. Fujimoto, Kalyan S. Perumalla, and George F. Riley
2006
Stochastic Network Optimization
with Application to
Communication and
Queueing Systems
Michael J. Neely
University of Southern California
M
&C Morgan & cLaypool publishers
ABSTRACT
This text presents a modern theory of analysis, control, and optimization for dynamic networks.
Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown
to enable constrained optimization of time averages in general stochastic systems. The focus is on
communication and queueing systems, including wireless networks with time-varying channels,
mobility, and randomly arriving traffic. A simple drift-plus-penalty framework is used to optimize
time averages such as throughput, throughput-utility, power, and distortion. Explicit performance-
delay tradeoffs are provided to illustrate the cost of approaching optimality. This theory is also
applicable to problems in operations research and economics, where energy-efficient and profit-
maximizing decisions must be made without knowing the future.
Topics in the text include the following:
Detailed examples and numerous problem set questions are provided to reinforce the main
concepts.
KEYWORDS
dynamic scheduling, decision theory, wireless networks, Lyapunov optimization, con-
gestion control, fairness, network utility maximization, multi-hop, mobile networks,
routing, backpressure, max-weight, virtual queues
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Example Opportunistic Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Example Problem 1: Minimizing Time Average Power Subject to
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Example Problem 2: Maximizing Throughput Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Example Problem 3: Maximizing Throughput-Utility Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 General Stochastic Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Differences from our Earlier Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 On General Markov Decision Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 On Network Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.1 Delay and Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
√
1.7.2 Optimal O( V ) and O(log(V )) delay tradeoffs . . . . . . . . . . . . . . . . . . . . . . 9
1.7.3 Delay-optimal Algorithms for Symmetric Networks . . . . . . . . . . . . . . . . . . 10
1.7.4 Order-optimal Delay Scheduling and Queue Grouping . . . . . . . . . . . . . . . 10
1.7.5 Heavy Traffic and Decay Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7.6 Capacity and Delay Tradeoffs for Mobile Networks . . . . . . . . . . . . . . . . . . 11
1.8 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Introduction to Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Stronger Forms of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Randomized Scheduling for Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 A 3-Queue, 2-Server Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 A 2-Queue Opportunistic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
viii
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
• Variable-V algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).
• Approximate scheduling and full throughput scheduling in interference networks via the Jiang-
Walrand theorem (Chapter 6).
• Treatment of problems with equality constraints and abstract set constraints (Section 5.4).
xii PREFACE
Finally, this text emphasizes the simplicity of the Lyapunov method, showing how all of the
results follow directly from four simple concepts: (i) telescoping sums, (ii) iterated expectations,
(iii) opportunistically minimizing an expectation, and (iv) Jensen’s inequality.
Michael J. Neely
September 2010
1
CHAPTER 1
Introduction
This text considers the analysis and control of stochastic networks, that is, networks with random
events, time variation, and uncertainty. Our focus is on communication and queueing systems.
Example applications include wireless mesh networks with opportunistic scheduling, cognitive radio
networks, ad-hoc mobile networks, internets with peer-to-peer communication, and sensor networks
with joint compression and transmission. The techniques are also applicable to stochastic systems
that arise in operations research, economics, transportation, and smart-grid energy distribution.
These problems can be formulated as problems that optimize the time averages of certain quantities
subject to time average constraints on other quantities, and they can be solved with a common
mathematical framework that is intimately connected to queueing theory.
S1(t)
a1(t) Q1(t) b1(t)=b1(S(t),p(t))
Receiver
a2(t) Q2(t) b2(t)=b2(S(t),p(t))
S2(t)
Figure 1.1: The 2-user wireless system for the example of Section 1.1.
Here we provide a simple wireless example to illustrate how the theory for optimizing time
averages can be used. Consider a 2-user wireless uplink that operates in slotted time t ∈ {0, 1, 2, . . .}.
Every slot new data randomly arrives to each user for transmission to a common receiver. Let
(a1 (t), a2 (t)) be the vector of new arrivals on slot t, in units of bits. The data is stored in queues
Q1 (t) and Q2 (t) to await transmission (see Fig. 1.1). We assume the receiver coordinates network
decisions every slot.
Channel conditions are assumed to be constant for the duration of a slot, but they can change
from slot to slot. Let S (t) = (S1 (t), S2 (t)) denote the channel conditions between users and the
receiver on slot t.The channel conditions represent any information that affects the channel on slot t,
such as fading coefficients and/or noise ratios. We assume the network controller can observe S (t) at
the beginning of each slot t before making a transmission decision. This channel-aware scheduling
is called opportunistic scheduling. Every slot t, the network controller observes the current S (t)
2 1. INTRODUCTION
and chooses a power allocation vector p(t) = (p1 (t), p2 (t)) within some set P of possible power
allocations. This decision, together with the current S (t), determines the transmission rate vector
(b1 (t), b2 (t)) for slot t, where bk (t) represents the transmission rate (in bits/slot) from user k ∈ {1, 2}
to the receiver on slot t. Specifically, we have general transmission rate functions b̂k (p(t), S (t)):
The precise form of these functions depends on the modulation and coding strategies used for
transmission. The queueing dynamics are then:
Several types of optimization problems can be considered for this simple system.
The problem of designing an algorithm to minimize time average power expenditure subject to
queue stability can be written mathematically as:
Minimize: p1 + p2
Subject to: 1) Queues Qk (t) are stable ∀k ∈ {1, 2}
2) p(t) ∈ P ∀t ∈ {0, 1, 2, . . .}
where queue stability is defined in the next chapter. It is shown in the next chapter that queue
stability ensures the time average output rate of the queue is equal to the time average input rate.
Our theory will allow the design of a simple algorithm that makes decisions p(t) ∈ P every slot
t, without requiring a-priori knowledge of the probabilities associated with the arrival and channel
processes a(t) and S (t). The algorithm meets all desired constraints in the above problem whenever
it is possible to do so. Further, the algorithm is parameterized by a constant V ≥ 0 that can be
chosen as desired to yield time average power within O(1/V ) from the minimum possible time
average power required for queue stability. Choosing a large value of V can thus push average power
arbitrarily close to optimal. However, this comes with a tradeoff in average queue backlog and delay
that is O(V ).
The three example problems considered in the previous section all involved optimizing a time
average (or a function of time averages) subject to time average constraints. Here we state the
general problems of this type. Consider a stochastic network that operates in discrete time with
unit time slots t ∈ {0, 1, 2, . . .}. The network is described by a collection of queue backlogs, written
in vector form Q(t) = (Q1 (t), . . . , QK (t)), where K is a non-negative integer. The case K = 0
corresponds to a system without queues. Every slot t, a control action is taken, and this action affects
arrivals and departures of the queues and also creates a collection of real valued attribute vectors x(t),
y (t), e(t):
for some non-negative integers M, L, J (used to distinguish between equality constraints and two
types of inequality constraints).The attributes can be positive or negative, and they represent penalties
or rewards associated with the network on slot t, such as power expenditures, distortions, or packet
drops/admissions. These attributes are given by general functions:
where ω(t) is a random event observed on slot t (such as new packet arrivals or channel conditions)
and α(t) is the control action taken on slot t (such as packet admissions or transmissions). The action
α(t) is chosen within an abstract set Aω(t) that possibly depends on ω(t). Let x m , y l , ej represent
the time average of xm (t), yl (t), ej (t) under a particular control algorithm. Our first objective is to
1.3. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 5
design an algorithm that solves the following problem:
Minimize: y0 (1.1)
Subject to: 1) y l ≤ 0 for all l ∈ {1, . . . , L} (1.2)
2) ej = 0 for all j ∈ {1, . . . , J } (1.3)
3) α(t) ∈ Aω(t) ∀t (1.4)
4) Stability of all Network Queues (1.5)
Our second objective, more general than the first, is to optimize convex functions of time
averages.1 Specifically, let f (x), g1 (x), . . . , gL (x) be convex functions from RM to R, and let X
be a closed and convex subset of RM . Let x = (x 1 , . . . , x M ) be the vector of time averages of the
xm (t) attributes under a given control algorithm. We desire a solution to the following problem:
Minimize: y 0 + f (x ) (1.6)
Subject to: 1) y l + gl (x) ≤ 0 for all l ∈ {1, . . . , L} (1.7)
2) ej = 0 for all j ∈ {1, . . . , J } (1.8)
3) x∈X (1.9)
4) α(t) ∈ Aω(t) ∀t (1.10)
5) Stability of all Network Queues (1.11)
These problems (1.1)-(1.5) and (1.6)-(1.11) can be viewed as stochastic programs, and are
analogues of the classic linear programs and convex programs of static optimization theory. A solution
is an algorithm for choosing control actions over time in reaction to the existing network state, such
that all of the constraints are satisfied and the quantity to be minimized is as small as possible. These
problems have wide applications, and they are of interest even when there is no underlying queueing
network to be stabilized (so that the “Stability” constraints in (1.5) and (1.11) are removed). However,
it turns out that queueing theory plays a central role in this type of stochastic optimization. Indeed,
even if there are no underlying queues in the original problem, we can introduce virtual queues as
a strong method for ensuring that the required time average constraints are satisfied. Inefficient
control actions incur larger backlog in certain queues. These backlogs act as “sufficient statistics” on
which to base the next control decision. This enables algorithms that do not require knowledge of
the probabilities associated with the random network events ω(t).
(t) + V × penalty(t)
2The notation used in later chapters is slightly different. Simplified notation is used here to give the main ideas.
1.4. DIFFERENCES FROM OUR EARLIER TEXT 7
subject to individual average power constraints at each node. Our previous text (22) unified these
ideas for application to general problems of the type described in Section 1.2.
• Variable-V algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).
• Approximate scheduling and full throughput scheduling in interference networks via the Jiang-
Walrand theorem (Chapter 6).
• Treatment of problems with equality constraints (1.3) and abstract set constraints (1.9) (Section
5.4).
W N −d
≥ (1 − log(2))
λ 4d
12 1. INTRODUCTION
where λ is the per-user throughput, C is the number of cells, d = N/C is the node/cell density, and
log(·) denotes the natural logarithm. Thus, if the node/cell density d = (1), then W /λ ≥ (N).
The 2-hop relay algorithm meets this bound with λ = (1) and W = (N ), and a relay algorithm
√
that redundantly
√ transmits packets over multiple paths meets this bound with λ = (1/ N) and
W = ( N ). Similar i.i.d. mobility models are considered in (119)(120)(121). The work (119)
shows that improved tradeoffs are possible if the transmission radius of each node can be scaled to
include a large amount of users in each transmission (so that the d = (1) assumption is relaxed).
The work (120)(121) quantifies the optimal tradeoff achievable under this type of radius scaling,
and it also shows improved tradeoffs are possible if the model is changed to allow time slot scaling
and network bit-pipelining. Related delay tradeoffs via transmission radius scaling for non-mobile
networks are in (122). Analysis of non-i.i.d. mobility models is more complex and considered in
(123)(124)(122)(125). Recent network coding approaches are in (126)(127)(128).
1.8 PRELIMINARIES
We assume the reader is comfortable with basic concepts of probability and random processes (such
as expectations, the law of large numbers, etc.) and with basic mathematical analysis. Familiarity
with queueing theory, Markov chains, and convex functions is useful but not required as we present
or derive results in these areas as needed in the text. For additional references on queueing theory
and Markov chains, including discussions of Little’s Theorem and the renewal-reward theorem,
see (129)(66)(130)(131)(132). For additional references on convex analysis, including discussions of
convex hulls, Caratheodory’s theorem, and Jensen’s inequality, see (133)(134)(135).
All of the major results of this text are derived directly from one or more of the following four
key concepts:
• Law of Telescoping Sums: For any function f (t) defined over integer times t ∈ {0, 1, 2, . . .}, we
have for any integer time t > 0:
t−1
[f (τ + 1) − f (τ )] = f (t) − f (0)
τ =0
The proof follows by a simple cancellation of terms. This is the main idea behind Lyapunov
drift arguments: Controlling the change in a function at every step allows one to control the
ending value of the function.
E {X} = E {E {X|Y }}
3 Strictly speaking, the law of iterated expectations holds whenever the result of Fubini’s Theorem holds (which allows one to
switch the integration order of a double integral). This holds whenever any one of the following hold: (i) E {|X|} < ∞, (ii)
E {max[X, 0]} < ∞, (iii) E {min[X, 0]} > −∞.
1.8. PRELIMINARIES 13
where the outer expectation is with respect to the distribution of Y , and the inner expectation
is with respect to the conditional distribution of X given Y .
This is easy to prove: If αω∗ represents any random control action chosen in the set Aω in
response to the observed ω, we have: c(αωmin , ω) ≤ c(αω∗ , ω). This is an inequality
relationship
concerning the min ∗ yields E c(αωmin , ω) ≤
∗ random variables ω, αω , αω . Taking expectations
E c(αω , ω) , showing that the expectation under the policy αωmin is less than or equal to the
expectation under any other policy. This is useful for designing drift minimizing algorithms.
• Jensen’s Inequality (not needed until Chapter 5): Let X be a convex subset of RM (possibly being
the full space RM itself ), and let f (x) be a convex function over X . Let X be any random
vector that takes values in X , and assume that E {X } is well defined and finite (where the
expectation is taken entrywise). Then:
E {X } ∈ X and f (E {X }) ≤ E {f (X )} (1.12)
This text also uses, in addition to regular limits of functions, the lim sup and lim inf. Using
(or not using) these limits does not impact any of the main ideas in this text, and readers who are
not familiar with these limits can replace all instances of “lim sup” and “lim inf” with regular limits
“lim,” without loss of rigor, under the additional assumption that the regular limit exists. For readers
interested in more details on this, note that a function f (t) may or may not have a well defined
limit as t → ∞ (consider, for example, a cosine function). We define lim supt→∞ f (t) as the largest
possible limiting value of f (t) over any subsequence of times tk that increase to infinity, and for
which the limit of f (tk ) exists. Likewise, lim inf t→∞ f (t) is the smallest possible limiting value. It
can be shown that these limits always exist (possibly being ∞ or −∞). For example, the lim sup and
lim inf of the cosine function are 1 and −1, respectively. The main properties of lim sup and lim inf
that we use in this text are:
• If f (t), g(t) are functions that satisfy f (t) ≤ g(t) for all t, then lim supt→∞ f (t) ≤
lim supt→∞ g(t). Likewise, lim inf t→∞ f (t) ≤ lim inf t→∞ g(t).
14 1. INTRODUCTION
• For any function f (t), we have lim inf t→∞ f (t) ≤ lim supt→∞ f (t), with equality if and only
if the regular limit exists. Further, whenever the regular limit exists, we have lim inf t→∞ f (t) =
lim supt→∞ f (t) = limt→∞ f (t).
• For any function f (t), we have lim supt→∞ f (t) = − lim inf t→∞ [−f (t)] and
lim inf t→∞ f (t) = − lim sup[−f (t)].
• If f (t) and g(t) are functions such that limt→∞ g(t) = g ∗ , where g ∗ is a finite constant, then
lim supt→∞ [g(t) + f (t)] = g ∗ + lim supt→∞ f (t).
15
CHAPTER 2
Introduction to Queues
Let Q(t) represent the contents of a single-server discrete time queueing system defined over integer
time slots t ∈ {0, 1, 2, . . .}. Specifically, the initial state Q(0) is assumed to be a non-negative real
valued random variable. Future states are driven by stochastic arrival and server processes a(t) and
b(t) according to the following dynamic equation:
We call Q(t) the backlog on slot t, as it can represent an amount of work that needs to be done. The
stochastic processes {a(t)}∞ ∞
t=0 and {b(t)}t=0 are sequences of real valued random variables defined
over slots t ∈ {0, 1, 2, . . .}.
The value of a(t) represents the amount of new work that arrives on slot t, and it is assumed
to be non-negative. The value of b(t) represents the amount of work the server of the queue can
process on slot t. For most physical queueing systems, b(t) is assumed to be non-negative, although
it is sometimes convenient to allow b(t) to take negative values. This is useful for the virtual queues
defined in future sections where b(t) can be interpreted as a (possibly negative) attribute.1 Because
we assume Q(0) ≥ 0 and a(t) ≥ 0 for all slots t, it is clear from (2.1) that Q(t) ≥ 0 for all slots t.
The units of Q(t), a(t), and b(t) depend on the context of the system. For example, in a
communication system with fixed size data units, these quantities might be integers with units of
packets. Alternatively, they might be real numbers with units of bits, kilobits, or some other unit of
unfinished work relevant to the system.
We can equivalently re-write the dynamics (2.1) without the non-linear max[·, 0] operator as
follows:
Q(t + 1) = Q(t) − b̃(t) + a(t) for t ∈ {0, 1, 2, . . .} (2.2)
where b̃(t) is the actual work processed on slot t (which may be less than the offered amount b(t)
if there is little or no backlog in the system on slot t). Specifically, b̃(t) is mathematically defined:
b̃(t)= min[b(t), Q(t)]
1 Assuming that the b(t) value in (2.1) is possibly negative also allows treatment of modified queueing models that place new
arrivals inside the max[·, 0] operator. For example, a queue with dynamics Q̂(t + 1) = max[Q̂(t) − β(t) + α(t), 0] is the same
as (2.1) with a(t) = 0 and b(t) = β(t) − α(t) for all t. Leaving a(t) outside the max[·, 0] is crucial for treatment of multi-hop
networks, where a(t) can be a sum of exogenous and endogenous arrivals.
16 2. INTRODUCTION TO QUEUES
Note by definition that b̃(t) ≤ b(t) for all t.The dynamic equation (2.2) yields a simple but important
property for all sample paths, described in the following lemma.
Lemma 2.1 (Sample Path Property) For any discrete time queueing system described by (2.1), and for
any two slots t1 and t2 such that 0 ≤ t1 < t2 , we have:
2 −1
t 2 −1
t
Q(t2 ) − Q(t1 ) = a(τ ) − b̃(τ ) (2.3)
τ =t1 τ =t1
1 1
t−1 t−1
Q(t) Q(0)
− = a(τ ) − b̃(τ ) (2.4)
t t t t
τ =0 τ =0
Q(t) Q(0) 1
t−1
1
t−1
− ≥ a(τ ) − b(τ ) (2.5)
t t t t
τ =0 τ =0
Summing the above over τ ∈ {t1 , . . . , t2 − 1} and using the law of telescoping sums yields:
2 −1
t 2 −1
t
Q(t2 ) − Q(t1 ) = a(τ ) − b̃(τ )
τ =t1 τ =t1
This proves (2.3). Inequality (2.4) follows by substituting t1 = 0, t2 = t, and dividing by t. Inequality
(2.5) follows because b̃(τ ) ≤ b(τ ) for all τ . 2
Q(t)
lim = 0 with probability 1
t→∞ t
Definition 2.3 A discrete time process Q(t) is mean rate stable if:
E {|Q(t)|}
lim =0
t→∞ t
We use an absolute value of Q(t) in the mean rate stability definition, even though our queue
in (2.1) is non-negative, because later it will be useful to define mean rate stability for virtual queues
that can be possibly negative.
Theorem 2.4 (Rate Stability Theorem) Suppose Q(t) evolves according to (2.1), with a(t) ≥ 0 for all
t, and with b(t) real valued (and possibly negative) for all t. Suppose that the time averages of the processes
a(t) and b(t) converge with probability 1 to finite constants a av and bav , so that:
1
t−1
lim a(τ ) = a av with probability 1 (2.6)
t→∞ t
τ =0
1
t−1
lim b(τ ) = bav with probability 1 (2.7)
t→∞ t
τ =0
Then:
(a) Q(t) is rate stable if and only if a av ≤ bav .
(b) If a av > bav , then:
Q(t)
lim = a av − bav with probability 1
t→∞ t
(c) Suppose there are finite constants > 0 and C > 0 such that E [a(t) + b− (t)]1+ ≤ C for
all t, where b− (t)= − min[b(t), 0]. Then Q(t) is mean rate stable if and only if a av ≤ bav .
18 2. INTRODUCTION TO QUEUES
Proof. Here we prove only the necessary condition of part (a). Suppose that Q(t) is rate stable, so
that Q(t)/t → 0 with probability 1. Because (2.5) holds for all slots t > 0, we can take limits in
(2.5) as t → ∞ and use (2.6)-(2.7) to conclude that 0 ≥ a av − bav . Thus, a av ≤ bav is necessary for
rate stability. The proof for sufficiency in part (a) and the proof of part (b) are developed in Exercises
2.3 and 2.4. The proof of part (c) is more complex and is omitted (see (136)). 2
The following theorem presents a more general necessary condition for rate stability that does
not require the arrival and server processes to have well defined limits.
Theorem 2.5 (Necessary Condition for Rate Stability) Suppose Q(t) evolves according to (2.1), with
any general processes a(t) and b(t) such that a(t) ≥ 0 for all t. Then:
(a) If Q(t) is rate stable, then:
1
t−1
lim sup [a(τ ) − b(τ )] ≤ 0 with probability 1 (2.8)
t→∞ t
τ =0
1
t−1
lim sup E {a(τ ) − b(τ )} ≤ 0 (2.9)
t→∞ t
τ =0
Proof. The proof of (a) follows immediately by taking a lim sup of both sides of (2.5) and noting
that Q(t)/t → 0 because Q(t) is rate stable. The proof of (b) follows by first taking an expectation
of (2.5) and then taking limits. 2
Definition 2.6 A discrete time process Q(t) is steady state stable if:
lim g(M) = 0
M→∞
1
t−1
g(M)= lim sup P r[|Q(τ )| > M] (2.10)
t→∞ t
τ =0
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY 19
1
t−1
lim sup E {|Q(τ )|} < ∞ (2.11)
t→∞ t
τ =0
Under mild boundedness assumptions, strong stability implies all of the other forms of stability,
as specified in Theorem 2.8 below.
Theorem 2.8 (Strong Stability Theorem) Suppose Q(t) evolves according to (2.1) for some general
stochastic processes {a(t)}∞ ∞
t=0 and {b(t)}t=0 , where a(t) ≥ 0 for all t, and b(t) is real valued for all t.
Suppose Q(t) is strongly stable. Then:
(a) Q(t) is steady state stable.
(b) If there is a finite constant C such that either a(t) + b− (t) ≤ C with probability 1 for all t
(where b− (t)= − min[b(t), 0]), or b(t) − a(t) ≤ C with probability 1 for all t, then Q(t) is rate stable,
so that Q(t)/t → 0 with probability 1.
(c) If there is a finite constant C such that either E a(t) + b− (t) ≤ C for all t, or
E {b(t) − a(t)} ≤ C for all t, then Q(t) is mean rate stable.
Proof. Part (a) is given in Exercise 2.5. Parts (b) and (c) are omitted (see (136)). 2
Readers familiar with discrete time Markov chains (DTMCs) may be interested in the fol-
lowing connection: For processes Q(t) defined over an ergodic DTMC with a finite or countably
infinite state space and with the property that, for each real value M, the event {|Q(t)| ≤ M} corre-
sponds to only a finite number of states, steady state stability implies the existence of a steady state
distribution, and strong stability implies finite average backlog and (by Little’s theorem (129)) finite
average delay.
a1(t) Q1(t)
a2(t) Q2(t)
a3(t) Q3(t)
Figure 2.1: A 3-queue, 2-server system. Every slot the network controller decides which 2 queues receive
servers. A single queue cannot receive 2 servers on the same slot.
Assume the arrival processes have well defined time average rates (a1av , a2av , a3av ), in units of pack-
ets/slot. Design a server allocation algorithm to make all queues rate stable when arrival rates are
given as follows:
a) (a1av , a2av , a3av ) = (0.5, 0.5, 0.9)
b) (a1av , a2av , a3av ) = (2/3, 2/3, 2/3)
c) (a1av , a2av , a3av ) = (0.7, 0.9, 0.4)
d) (a1av , a2av , a3av ) = (0.65, 0.5, 0.75)
e) Use (2.5) to prove that the constraints 0 ≤ aiav ≤ 1 for all i ∈ {1, 2, 3}, and a1av + a2av +
a3 ≤ 2, are necessary for the existence of a rate stabilizing algorithm.
av
Solution:
a) Choose the service vector (b1 (t), b2 (t), b3 (t)) to be independent and identically distributed
(i.i.d.) every slot, choosing (0, 1, 1) with probability 1/2 and (1, 0, 1) with probability 1/2. Then
{b1 (t)}∞t=0 is i.i.d. over slots with b1 = 0.5 by the law of large numbers. Likewise, b2 = 0.5 and
av av
b3av = 1. Then clearly aiav ≤ biav for all i ∈ {1, 2, 3}, and so the Rate Stability Theorem ensures all
queues are rate stable. While this is a randomized scheduling algorithm, one could also design a
deterministic algorithm, such as one that alternates between (0, 1, 1) (on odd slots) and (1, 0, 1)
(on even slots).
b) Choose (b1 (t), b2 (t), b3 (t)) i.i.d. over slots, equally likely over the three options (1, 1, 0),
(1, 0, 1), and (0, 1, 1). Then biav = 2/3 = aiav for all i ∈ {1, 2, 3}, and so by the Rate Stability
Theorem all queues are rate stable.
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY 21
c) Every slot, independently choose the service vector (0, 1, 1) with probability p1 , (1, 0, 1)
with probability p2 , and (1, 1, 0) with probability p3 , so that p1 , p2 , p3 satisfy:
where the inequality (2.12) is taken entrywise.This is an example of a linear program. Linear programs
are typically difficult to solve by hand, but this one can be solved easily by guessing that the constraint
in (2.12) can be solved with equality. One can verify the following (unique) solution: p1 = 0.3,
p2 = 0.1, p3 = 0.6. Thus, b1av = p2 + p3 = 0.7, b2av = p1 + p3 = 0.9, b3av = p1 + p2 = 0.4, and
so all queues are rate stable by the Rate Stability Theorem. It is an interesting exercise to design an
alternative deterministic algorithm that uses a periodic schedule to produce the same time averages.
d) Use the same linear program (2.12)-(2.14), but replace the constraint (2.12) with the
following:
This can be solved by hand by trial-and-error. One simplifying trick is to replace the above inequality
constraint with the following equality constraint:
1 1
t−1 t−1
Qi (t) Qi (0)
− ≥ ai (τ ) − bi (τ )
t t t t
τ =0 τ =0
1
t−1
≥ ai (τ ) − 1
t
τ =0
where the first inequality follows by (2.5) and the final inequality holds because bi (τ ) ≤ 1 for all τ .
The above holds for all t > 0. Taking a limit as t → ∞ and using the fact that queue i is rate stable
yields, with probability 1:
0 ≥ aiav − 1
22 2. INTRODUCTION TO QUEUES
and so we find that, for each i ∈ {1, 2, 3}, the condition aiav ≤ 1 is necessary for the existence of an
algorithm that makes all queues rate stable. Similarly, we have:
Q1 (t) + Q2 (t) + Q3 (t) Q1 (0) + Q2 (0) + Q3 (0)
−
t t
1 1
t−1 t−1
≥ [a1 (τ ) + a2 (τ ) + a3 (τ )] − [b1 (τ ) + b2 (τ ) + b3 (τ )]
t t
τ =0 τ =0
1
t−1
≥ [a1 (τ ) + a2 (τ ) + a3 (τ )] − 2
t
τ =0
where the final inequality holds because b1 (τ ) + b2 (τ ) + b3 (τ ) ≤ 2 for all τ . Taking limits shows
that 0 ≥ a1av + a2av + a3av − 2 is also a necessary condition.
Discussion: Define as the set of all rate vectors (a1av , a2av , a3av ) that satisfy the constraints in
part (e) of the above example problem. We know from part (e) that (a1av , a2av , a3av ) ∈ is a necessary
condition for existence of an algorithm that makes all queues rate stable. Further, it can be shown
that for any vector (a1av , a2av , a3av ) ∈ , there exist probabilities p1 , p2 , p3 that solve the following
linear program:
Showing this is not trivial and is left as an advanced exercise. However, this fact, together with the
Rate Stability Theorem, shows that it is possible to design an algorithm to make all queues rate
stable whenever (a1av , a2av , a3av ) ∈ . That is, (a1av , a2av , a3av ) ∈ is necessary and sufficient for the
existence of an algorithm that makes all queues rate stable. The set is called the capacity region for
the network. Exercises 2.7 and 2.8 provide additional practice questions about scheduling and delay
in this system.
2
S1(t) (0.36,0.4)
a1(t) Q1(t) (0,0.4)
(0.6,0.16)
a2(t) Q2(t)
S2(t)
(0,0)
(0.6,0)
1
(a) (b)
Figure 2.2: (a) The 2-queue, 1-server opportunistic scheduling system with ON/OFF channels. (b) The
capacity region for the specific channel probabilities given below.
If S (t) = (OF F, OF F ), then b1 (t) = b2 (t) = 0. If exactly one channel is ON, then clearly the
controller should choose to transmit over that channel. The only decision is which channel to
use when S (t) = (ON, ON). Suppose that (a1 (t), a2 (t)) is i.i.d. over slots with E {a1 (t)} =
λ1 and E {a2 (t)} = λ2 . Suppose that S (t) is i.i.d. over slots with P r[(OF F, OF F )]= p00 ,
P r[(OF F, ON)] = p01 , P r[(ON, OF F )] = p10 , P r[ON, ON ] = p11 .
a) Define as the set of all vectors (λ1 , λ2 ) that satisfy the constraints 0 ≤ λ1 ≤ p10 +
p11 , 0 ≤ λ2 ≤ p01 + p11 , λ1 + λ2 ≤ p01 + p10 + p11 . Show that (λ1 , λ2 ) ∈ is necessary for the
existence of a rate stabilizing algorithm.
b) Plot the 2-dimensional region for the special case when p00 = 0.24, p10 = 0.36, p01 =
0.16, p11 = 0.24.
c) For the system of part (b): Use a randomized algorithm that independently transmits over
channel 1 with probability β whenever S (t) = (ON, ON ). Choose β to make both queues rate
stable when (λ1 , λ2 ) = (0.6, 0.16).
d) For the system of part (b): Choose β to make both queues rate stable when (λ1 , λ2 ) =
(0.5, 0.26).
Solution:
a) Let b1 (t), b2 (t) be the decisions made by a particular algorithm that makes both queues
rate stable. From (2.5), we have for queue 1 and for all slots t > 0:
1 1
t−1 t−1
Q1 (t) Q1 (0)
− ≥ a1 (τ ) − b1 (τ )
t t t t
τ =0 τ =0
Because b1 (τ ) ≤ 1{S1 (τ )=ON } , where the latter is an indicator function that is 1 if S1 (τ ) = ON, and
0 else, we have:
1 1
t−1 t−1
Q1 (t) Q1 (0)
− ≥ a1 (τ ) − 1{S1 (τ )=ON } (2.15)
t t t t
τ =0 τ =0
24 2. INTRODUCTION TO QUEUES
However, we know that Q1 (t)/t → 0 with probability 1. Further, by the law of large numbers, we
have (with probability 1):
1 1
t−1 t−1
lim a1 (τ ) = λ1 , lim 1{S1 (τ )=ON } = p10 + p11
t→∞ t t→∞ t
τ =0 τ =0
0 ≥ λ1 − (p10 + p11 )
and hence λ1 ≤ p10 + p11 is a necessary condition for any rate stabilizing algorithm. A similar
argument shows that λ2 ≤ p01 + p11 is a necessary condition. Finally, note that for all t > 0:
1 1
t−1 t−1
Q1 (t) + Q2 (t) Q1 (0) + Q2 (0)
− ≥ [a1 (τ ) + a2 (τ )] − 1{{S1 (τ )=ON }∪{S2 (τ )=ON }}
t t t t
τ =0 τ =0
Taking a limit of the above proves that λ1 + λ2 ≤ p01 + p10 + p11 is necessary.
b) See Fig. 2.2b.
c) If S (t) = (OF F, OF F ) then don’t transmit. If S (t) = (ON, OF F ) or (ON, ON) then
transmit over channel 1. If S (t) = (OF F, ON ), then transmit over channel 2. Then by the law
of large numbers, we have b1av = p10 + p11 = 0.6, b2av = p01 = 0.16, and so both queues are rate
stable (by the Rate Stability Theorem).
d) Choose β = 0.14/0.24. Then b1av = 0.36 + 0.24β = 0.5, and b2av = 0.16 + 0.24(1 −
β) = 0.26.
Discussion: Exercise 2.9 treats scheduling and delay issues in this system. It can be shown that
the set given in part (a) above is the capacity region, so that (λ1 , λ2 ) ∈ is necessary and sufficient
for the existence of a rate stabilizing policy. See (8) for the derivation of the capacity region for
ON/OFF opportunistic scheduling systems with K queues (with K ≥ 2). See also (8) for optimal
delay scheduling in symmetric systems of this type (where all arrival rates are the same, as are all
ON/OFF probabilities), and (101)(100) for “order-optimal” delay in general (possibly asymmetric)
situations.
It is possible to support any point in using a stationary randomized policy that makes a
scheduling decision as a random function of the observed channel state S (t). Such policies are
called S -only policies. The solutions given in parts (c) and (d) above use S -only policies. Further, the
randomized server allocation policies considered in the 3-queue, 2-server example of Section 2.3.1
can be viewed as “degenerate” S -only policies, because, in that case, there is only one “channel state”
(i.e., (ON, ON, ON)). It is known that the capacity region of general single-hop and multi-hop
networks with time varying channels S (t) can be described in terms of S -only policies (15)(22) (see
also Theorem 4.5 of Chapter 4 for a related result for more general systems).
Note that S -only policies do not consider queue backlog information, and thus they may serve
a queue that is empty, which is clearly inefficient. Thus, one might wonder how S -only policies can
2.4. EXERCISES 25
stabilize queueing networks whenever traffic rates are inside the capacity region. Intuitively, the
reason is that inefficiency only arises when a queue becomes empty, a rare event when traffic rates are
near the boundary of the capacity region.2 Thus, using queue backlog information cannot “enlarge”
the region of supportable rates. However, Chapter 3 shows that queue backlogs are extremely useful
for designing dynamic algorithms that do not require a-priori knowledge of channel statistics or
a-priori computation of a randomized policy with specific time averages.
2.4 EXERCISES
Exercise 2.1. (Queue Sample Path) Fill in the missing entries of the table in Fig. 2.3 for a queue
Q(t) that satisfies (2.1).
t 0 1 2 3 4 5 6 7 8 9 10
Arrivals a(t) 3 3 0 2 1 0 0 2 0 0
Current Rate b(t) 4 2 1 3 3 2 2 4 0 2 1
Backlog Q(t) 0 3 4 3 2
Transmitted b̃(t) 0 2 1 2 1
Figure 2.3: An example sample path for the queueing system of Exercise 2.1.
Exercise 2.2. (Inequality comparison) Let Q(t) satisfy (2.1) with server process b(t) and arrival
process a(t). Let Q̃(t) be another queueing system with the same server process b(t) but with an
arrival process ã(t) = a(t) + z(t), where z(t) ≥ 0 for all t ∈ {0, 1, 2, . . .}. Assuming that Q(0) =
Q̃(0), prove that Q(t) ≤ Q̃(t) for all t ∈ {0, 1, 2, . . .}.
Exercise 2.3. (Proving sufficiency for Theorem 2.4a) Let Q(t) satisfy (2.1) with arrival and server
processes with well defined time averages a av and bav . Suppose that a av ≤ bav . Fix > 0, and
define Q (t) as a queue with Q (0) = Q(0), and with the same server process b(t) but with an
arrival process ã(t) = a(t) + (bav − a av ) + for all t.
a) Compute the time average of ã(t).
b) Assuming the result of Theorem 2.4b, compute limt→∞ Q (t)/t.
c) Use the result of part (b) and Exercise 2.2 to prove that Q(t) is rate stable. Hint: I am
thinking of a non-negative number x. My number has the property that x ≤ for all > 0. What
is my number?
2 For example, in the GI/B/1 queue of Exercise 2.6, it can be shown by Little’s Theorem (129) that the fraction of time the queue
is empty is 1 − λ/μ (assuming λ ≤ μ), which goes to zero when λ → μ.
26 2. INTRODUCTION TO QUEUES
Exercise 2.4. (Proof of Theorem 2.4b) Let Q(t) be a queue that satisfies (2.1). Assume time
averages of a(t) and b(t) are given by finite constants a av and bav , respectively.
a) Use the following equation to prove that limt→∞ a(t)/t = 0 with probability 1:
1 1
t t−1
t t a(t)
a(τ ) = a(τ ) +
t +1 t +1 t t +1 t
τ =0 τ =0
b) Suppose that b̃(ti ) < b(ti ) for some slot ti (where we recall that b̃(ti )= min[b(ti ), Q(ti )]).
Use (2.1) to compute Q(ti + 1).
c) Use part (b) and (2.5) to show that if b̃(ti ) < b(ti ), then:
ti
a(ti ) ≥ Q(0) + [a(τ ) − b(τ )]
τ =0
Conclude that if b̃(ti ) < b(ti ) for an infinite number of slots ti , then a av ≤ bav .
d) Use part (c) to conclude that if a av > bav , there is some slot t ∗ ≥ 0 such that for all t > t ∗ ,
we have:
t−1
Q(t) = Q(t ∗ ) + [a(τ ) − b(τ )]
τ =t ∗
Use this to prove the result of Theorem 2.4b.
Exercise 2.5. (Strong stability implies steady state stability) Prove that strong stability implies
steady state stability using the fact that E {|Q(τ )|} ≥ MP r[|Q(τ )| > M].
Exercise 2.6. (Discrete time GI/B/1 queue) Consider a queue Q(t) with dynamics (2.1). Assume
is i.i.d. over slots with non-negative integer values, with E {a(t)} = λ and E a(t) =
that 2
2a(t)
E a . Assume that b(t) is independent of the arrivals and is i.i.d. over slots with P r[b(t) = 1] = μ,
P r[b(t) = 0] = 1 − μ. Thus, Q(t)
2 is always integer valued. Suppose that λ < μ, and that there are
finite values E {Q}, Q, Q , E Q such that:
av
1 1
t−1 t−1
lim E {Q(τ )} = Q , lim Q(τ ) = Qav with prob. 1
t→∞ t t→∞ t
τ =0 τ =0
lim E {Q(t)} = E {Q} , lim E Q(t)2 = E Q2
t→∞ t→∞
Using ergodic Markov chain theory, it can be shown that Q = Qav = E {Q} (see also Exercise 7.9).
Here we want to compute E {Q}, using the magic of a quadratic.
a) Take expectations of equation (2.2) to find limt→∞ E b̃(t) .
2.4. EXERCISES 27
We have used the fact that Q(t) is independent of b(t), even though it is not independent of b̃(t).
This establishes the average backlog for an integer-based GI/B/1 queue (where “GI” means the
arrivals are general and i.i.d. over slots, “B” means the service is i.i.d. Bernoulli, and “1” means
there is a single server). By Little’s Theorem (129), it follows that average delay (in units of slots) is
W = Q/λ. When the arrival process is Bernoulli, these formulas simplify to Q = λ(1 − λ)/(μ − λ)
and W = (1 − λ)/(μ − λ). Using reversible Markov chain theory (130)(66)(131), it can be shown
that the steady state output process of a B/B/1 queue is also i.i.d. Bernoulli with rate λ (regardless
of μ, provided that λ < μ), which makes analysis of tandems of B/B/1 queues very easy.
Exercise 2.7. (Server Scheduling) Consider the 3-queue, 2-server system example of Section 2.3.1
(Fig. 2.1). Assume the arrival vector (a1 (t), a2 (t), a3 (t)) is i.i.d. over slots with E {ai (t)} = λi for
i ∈ {1, 2, 3}. Design a randomized server allocation algorithm to make all queues rate stable when:
a) (λ1 , λ2 , λ3 ) = (0.2, 0.9, 0.6)
b) (λ1 , λ2 , λ3 ) = (3/4, 3/4, 1/2)
c) (λ1 , λ2 , λ3 ) = (0.6, 0.5, 0.9)
d) (λ1 , λ2 , λ3 ) = (0.7, 0.6, 0.5)
e) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (b).
f ) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (c).
Exercise 2.8. (Delay for Server Scheduling) Consider the 3-queue, 2-server system of Fig. 2.1 that
operates according to the randomized schedule of the solution given in part (d) of Section 2.3.1, so
that p1 = 0.3, p2 = 0.5, p3 = 0.2. Suppose a1 (t) is i.i.d. over slots and Bernoulli, with P r[a1 (t) =
0] = 0.35, P r[a1 (t) = 1] = 0.65. Use the formula of Exercise 2.6 to compute the average backlog
Q1 and average delay W 1 in queue 1. (First, you must convince yourself that queue 1 is indeed a
discrete time GI/B/1 queue).
Exercise 2.9. (Delay for Opportunistic Scheduling) Consider the 2-queue wireless downlink with
ON/OFF channels as described in the example of Section 2.3.2 (Fig. 2.2). The channel probabilities
28 2. INTRODUCTION TO QUEUES
are given as in that example: p00 = 0.24, p10 = 0.36, p01 = 0.16, p11 = 0.24. Suppose the arrival
process a1 (t) is i.i.d. Bernoulli with rate λ1 = 0.4, so that P r[a1 (t) = 1] = 0.4, P r[a1 (t) = 0] =
0.6. Suppose a2 (t) is i.i.d. Bernoulli with rate λ2 = 0.3. Design a randomized algorithm, using
parameter β as the probability that we transmit over channel 1 when S (t) = (ON, ON), that
ensures the average delay satisfies W 1 ≤ 25 slots and W 2 ≤ 25 slots. You should use the delay
formula in Exercise 2.6 (first convincing yourself that each queue is indeed a GI/B/1 queue) along
with an educated guess for β and/or trial and error for β.
Exercise 2.11. (Virtual Queues) Suppose we have a system that operates in discrete time with slots
t ∈ {0, 1, 2, . . .}. A controller makes decisions every slot t about how to operate the system, and
these decisions incur power p(t). The controller wants to ensure the time average power expenditure
is no more than 12.3 power units per slot. Define a virtual queue Z(t) with Z(0) = 0, and with
update equation:
Z(t + 1) = max[Z(t) − 12.3, 0] + p(t) (2.16)
The controller keeps the value of Z(t) as a state variable, and updates Z(t) at the end of each slot
via (2.16) using the power p(t) that was spent on that slot.
a) Use Lemma 2.1 to prove that if Z(t) is rate stable, then:3
limt→∞ 1t t−1τ =0 p(τ ) ≤ 12.3 with probability 1
b) Suppose there is a positive constant Zmax such that Z(t) ≤ Zmax for all t ∈ {0, 1, 2, . . .}.
Use (2.3) to show that for any integer T > 0 and any interval of T slots, defined by {t1 , . . . , t1 +
T − 1} (where t1 ≥ 0), we have:
t1 +T −1
τ =t1 p(τ ) ≤ 12.3T + Zmax
This idea is used in (21) to ensure the total power used in a communication system over any interval
is less than or equal to the desired per-slot average power constraint multiplied by the interval size,
plus a constant allowable “power burst” Zmax . A variation of this technique is used in (137) to bound
the worst-case number of collisions with a primary user in a cognitive radio network.
3 For simplicity, we have implicitly assumed the limit lim 1 t−1 p(τ ) in Exercise 2.11(a) exists. More generally, the result
t→∞ t τ =0
holds when “lim” is replaced with “lim sup.”
29
CHAPTER 3
2
(0.14, 1.10)
Z
S1(t) in {0,1}
A1(t) Q1(t) max{ (0.49, 0.75)
Y
{
max
A2(t) Q2(t)
S2(t) in {0,1,2} (0.70, 0.33)
E{A1(t)} = 1
E{A2(t)} = 2 1
X
(a) (b)
Figure 3.1: (a) The 2-queue wireless downlink example with time-varying channels. (b) The capacity
region . For λ = (0.3, 0.7) (i.e., point Y illustrated), we have max (λ) = 0.12.
Maximize: (3.3)
Subject to: λ1 + ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S1 q1 (S1 , S2 ) (3.4)
λ2 + ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S2 q2 (S1 , S2 ) (3.5)
q1 (S1 , S2 ) + q2 (S1 , S2 ) ≤ 1 ∀(S1 , S2 ) ∈ S (3.6)
q1 (S1 , S2 ) ≥ 0, q2 (S1 , S2 ) ≥ 0 ∀(S1 , S2 ) ∈ S (3.7)
There are 8 known parameters that appear as constants in the above linear program:
There are 13 unknowns that act as variables to be optimized in the above linear program:
• L(Q(t)) ≥ 0 for all backlog vectors Q(t) = (Q1 (t), Q2 (t)), with equality if and only if the
network is empty on slot t.
• L(Q(t)) being “small” implies that both queue backlogs are “small.”
• L(Q(t)) being “large” implies that at least one queue backlog is “large.”
For example, if L(Q(t)) ≤ 32, then Q1 (t)2 + Q2 (t)2 ≤ 64, and thus we know that both Q1 (t) ≤ 8
and Q2 (t) ≤ 8.
If there is a finite√constant M such that L(Q(t)) ≤ M for all t, then clearly all queue backlogs
are always bounded by 2M, and so all queues are trivially strongly stable. While we usually cannot
guarantee that the Lyapunov function is deterministically bounded, it is intuitively clear that design-
ing an algorithm to consistently push the queue backlog towards a region such that L(Q(t)) ≤ M
(for some finite constant M) will help to control congestion and stabilize the queues.
One may wonder why we use a quadratic Lyapunov function, when another function, such as a
linear function, would satisfy properties similar to those stated above. When computing the change
in the Lyapunov function from one slot to the next, we will find that the quadratic has important
dominant cross terms that include an inner product of queue backlogs and transmission rates. This
is important for the same reason that it was important to use a quadratic function in the delay
computation of Exercise 2.6, and readers seeking more intuition on the “magic” of the quadratic
function are encouraged to review that exercise.
To understand how we can consistently push the Lyapunov function towards a low congestion
region, we first use (3.1) to compute a bound on the change in the Lyapunov function from one slot
to the next:
1
2
L(Q(t + 1)) − L(Q(t)) = [Qi (t + 1)2 − Qi (t)2 ]
2
i=1
1
2
= (max[Qi (t) − bi (t), 0] + Ai (t))2 − Qi (t)2
2
i=1
2
[Ai (t)2 + bi (t)2 ]
2
≤ + Qi (t)[Ai (t) − bi (t)] (3.12)
2
i=1 i=1
3.1. SCHEDULING FOR STABILITY 33
where in the final inequality we have used the fact that for any Q ≥ 0, b ≥ 0, A ≥ 0, we have:
(max[Q − b, 0] + A)2 ≤ Q2 + A2 + b2 + 2Q(A − b)
Now define (Q(t)) as the conditional Lyapunov drift for slot t:
(Q(t))= E {L(Q(t + 1) − L(Q(t))|Q(t)} (3.13)
where the expectation depends on the control policy, and is with respect to the random channel states
and the (possibly random) control actions made in reaction to these channel states. From (3.12), we
have that (Q(t)) for a general control policy satisfies:
2
Ai (t)2 + bi (t)2 2 2
(Q(t)) ≤ E | Q(t) + Qi (t)λi − E Qi (t)bi (t)|Q(t) (3.14)
2
i=1 i=1 i=1
where we have used the fact that arrivals are i.i.d. over slots and hence independent of current queue
backlogs, so that E {Ai (t)|Q(t)} = E {Ai (t)} = λi . Now define B as a finite constant that bounds
the first term on the right-hand-side of the above drift inequality, so that for all t, all possible Q(t),
and all possible control actions that can be taken, we have:
2
Ai (t)2 + bi (t)2
E | Q(t) ≤ B
2
i=1
For our system, we have that at most one bi (t) value can be non-zero on a given slot t.The probability
that the non-zero bi (t) (if any) is equal to 2 is at most 0.3 (because P r[S2 (t) = 2] = 0.3), and if it
is not equal to 2, then it is at most 1. Hence:
1 2
22 (0.3) + 12 (0.7)
E bi (t) |Q(t) ≤
2
= 0.95
2 2
i=1
To emphasize how the right-hand-side of the above inequality depends on the transmission decision
α(t), we use the identity bi (t) = b̂i (α(t), S (t)) to yield:
2 2
(Q(t)) ≤ B + Qi (t)λi − E Qi (t)b̂i (α(t), S (t))|Q(t) (3.16)
i=1 i=1
34 3. DYNAMIC SCHEDULING EXAMPLE
3.1.3 THE “MIN-DRIFT” OR “MAX-WEIGHT” ALGORITHM
Our dynamic algorithm is designed to observe the current queue backlogs (Q1 (t), Q2 (t)) and the
current channel states (S1 (t), S2 (t)) and to make a transmission decision α(t) to minimize the
right-hand-side of the drift bound (3.16). Note that the transmission decision on slot t only affects
the final term on the right-hand-side. Thus, we seek to design an algorithm that maximizes the
following expression:
2
E Qi (t)b̂i (α(t), S (t))|Q(t)
i=1
The above conditional expectation is with respect to the randomly observed channel states S (t) =
(S1 (t), S2 (t)) and the (possibly random) control decision α(t). We now use the concept of oppor-
tunistically maximizing an expectation: The above expression is maximized by the algorithm that
observes the current queues (Q1 (t), Q2 (t)) and channel states (S1 (t), S2 (t)) and chooses α(t) to
maximize:
2
Qi (t)b̂i (α(t), S (t)) (3.17)
i=1
This is often called the “max-weight” algorithm, as it seeks to maximize a weighted sum of the
transmission rates, where the weights are queue backlogs. As there are only three decisions (transmit
over channel 1, transmit over channel 2, or don’t transmit), it is easy to evaluate the weighted sum
(3.17) for each option:
2
• i=1 Qi (t)b̂i (α(t), S (t)) = Q1 (t)S1 (t) if we choose to transmit over channel 1.
2
• i=1 Qi (t)b̂i (α(t), S (t)) = Q2 (t)S2 (t) if we choose to transmit over channel 2.
2
• i=1 Qi (t)b̂i (α(t), S (t)) = 0 if we choose to remain idle.
It follows that the max-weight algorithm chooses to transmit over the channel i with the largest
(positive) value of Qi (t)Si (t), and remains idle if this value is 0 for both channels. This simple
algorithm just makes decisions based on the current queue states and channel states, and it does not
need knowledge of the arrival rates or channel probabilities.
Because this algorithm maximizes the weighted sum (3.17) over all alternative decisions, we
have:
2
2
Qi (t)b̂i (α(t), S (t)) ≥ Qi (t)b̂i (α ∗ (t), S (t))
i=1 i=1
where α ∗ (t)
represents any alternative (possibly randomized) transmission decision that can be made
on slot t. This includes the case when α ∗ (t) is an S -only decision that randomly chooses one of
the three transmit options (transmit 1, transmit 2, or idle) with a distribution that depends on the
observed S (t). Fixing a particular alternative (possibly randomized) decision α ∗ (t) for comparison
3.1. SCHEDULING FOR STABILITY 35
and taking a conditional expectation of the above inequality (given Q(t)) yields:
2
2
∗
E Qi (t)b̂i (α(t), S (t))|Q(t) ≥ E Qi (t)b̂i (α (t), S (t))|Q(t)
i=1 i=1
where the decision α(t) on the left-hand-side of the above inequality represents the max-weight
decision made on slot t, and the decision α ∗ (t) represents any other particular decision that could
have been made. Plugging the above directly into (3.16) yields:
2
2
∗
(Q(t)) ≤ B + Qi (t)λi − E Qi (t)b̂i (α (t), S (t))|Q(t) (3.18)
i=1 i=1
where the left-hand-side represents the drift under the max-weight decision α(t), and the final term
on the right-hand-side involves any other decision α ∗ (t). It is remarkable that the inequality (3.18)
holds true for all of the (infinite) number of possible randomized alternative decisions that can be
plugged into the final term on the right-hand-side. However, this should not be too surprising, as
we designed the max-weight policy to have exactly this property! Rearranging the terms in (3.18)
yields:
2
(Q(t)) ≤ B − Qi (t)[E bi∗ (t)|Q(t) − λi ] (3.19)
i=1
where we have used the identity bi∗ (t)= b̂i (α ∗ (t), S (t)) to represent the transmission rate that would
∗
be offered over channel i if decision α (t) were made.
Now suppose the arrival rates (λ1 , λ2 ) are interior to the capacity region , and consider the
particular S -only decision α ∗ (t) that chooses a transmit option independent of queue backlog to yield
(3.10). Because channel states are i.i.d. over slots, the resulting rates (b1∗ (t), b2∗ (t)) are independent
of current queue backlog, and so by (3.10), we have for i ∈ {1, 2}:
E bi∗ (t)|Q(t) = E bi∗ (t) ≥ λi + max (λ)
2
(Q(t)) ≤ B − Qi (t)max (λ) (3.20)
i=1
where we recall that max (λ) > 0. The above is a drift inequality concerning the max-weight al-
gorithm on slot t, and it is now in terms of a value max (λ) associated with the linear program
(3.3)-(3.7). However, we did not need to solve the linear program to obtain this inequality or to
implement the algorithm! It was enough to know that the solution to the linear program exists!
36 3. DYNAMIC SCHEDULING EXAMPLE
3.1.4 ITERATED EXPECTATIONS AND TELESCOPING SUMS
Taking an expectation of (3.20) over the randomness of the Q1 (t) and Q2 (t) values yields:
2
E {(Q(t))} ≤ B − max (λ) E {Qi (t)} (3.21)
i=1
Using the definition of (Q(t)) in (3.13) with the law of iterated expectations yields:
2
E {L(Q(t + 1))} − E {L(Q(t))} ≤ B − max (λ) E {Qi (t)}
i=1
The above holds for all t ∈ {0, 1, 2, . . .}. Summing over t ∈ {0, 1, . . . , T − 1} for some integer
T > 0 yields (by telescoping sums):
−1
T 2
E {L(Q(T ))} − E {L(Q(0))} ≤ BT − max (λ) E {Qi (t)}
t=0 i=1
Rearranging terms, dividing by max (λ)T , and using the fact that L(Q(T )) ≥ 0 yields:
T −1 2
1 B E {L(Q(0))}
E {Qi (t)} ≤ +
T max (λ) max (λ)T
t=0 i=1
Thus, all queues are strongly stable, and the total average backlog (summed over both queues) is less
than or equal to B/max (λ). Thus, the max-weight algorithm (developed by minimizing a bound
on the Lyapunov drift) ensures the queueing network is strongly stable whenever the rate vector
λ is interior to the capacity region , with an average queue congestion bound that is inversely
proportional to the distance the rate vector is away from the capacity region boundary.
As an example, assume λ1 = 0.3 and λ2 = 0.7, illustrated
by the point Y of Fig. 3.1(b). Then
max = 0.12. Assuming arrivals are Bernoulli so that E A2i = E {Ai } = λi and using the value of
B = 1.45 obtained from (3.15), we have:
1.45
Q1 + Q 2 ≤ = 12.083 packets
0.12
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 37
where Q1 + Q2 represents the lim sup time average expected queue backlog in the network. By
Little’s Theorem (129), average delay satisfies:
Q1 + Q 2
W = ≤ 12.083 slots
λ1 + λ 2
A simulation of the algorithm over 106 slots yields an empirical average queue backlog of
empirical empirical
Q1 + Q2 = 3.058 packets, and hence in this example, our upper bound overesti-
mates backlog by roughly a factor of 4.
Thus, the actual max-weight algorithm performs much better than the bound would suggest.
There are three reasons for this gap: (i) A simple upper bound was used when computing the
Lyapunov drift in (3.12), (ii) The value B used an upper bound on the second moments of service,
(iii) The drift inequality compares to a queue-unaware S -only algorithm, whereas the actual drift
is much better because our algorithm considers queue backlog. The third reason often dominates in
networks with many queues. For example, in (100) it is shown that average congestion and delay in
an N-queue wireless system with one server and ON/OFF channels is at least proportional to N if
a queue-unaware algorithm is used (a related result is derived for N × N packet switches in (99)).
However, a more sophisticated queue grouping analysis in (101) shows that the max-weight algorithm
on the ON/OFF downlink system gives average backlog and delay that is O(1), independent of the
number of queues. For brevity, we do not include queue grouping concepts in this text. The interested
reader is referred to the above references, see also queue grouping results in (102)(103)(104)(105).
{
30
max
25
20
Bound
15
(0.70, 0.33)
Simulation
10
0
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X
ρ
Figure 3.2: Average sum queue backlog (in units of packets) under the max-weight algorithm, as loading
is pushed from point X (i.e., ρ = 0) to point Z (i.e., ρ = 1). Each simulated data point is an average over
106 slots.
That is, we spend 1 unit of power if we transmit over either channel, and no power is spent if we
remain idle. Our goal is now to make transmission decisions to jointly stabilize the system while
also striving to minimize average power expenditure.
For a given rate vector (λ1 , λ2 ) in the capacity region , define (λ1 , λ2 ) as the minimum
average power that can be achieved by any S -only algorithm that makes all queues rate stable. The
value (λ1 , λ2 ) can be computed by solving the following linear program (compare with (3.3)-(3.7)):
Minimize: = )∈S P r[S1 , S2 ](q1 (S1 , S2 ) + q2 (S1 , S2 ))
(S1 ,S2
Subject to: λ1 ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S1 q1 (S1 , S2 )
λ2 ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S2 q2 (S1 , S2 )
q1 (S1 , S2 ) + q2 (S1 , S2 ) ≤ 1 ∀(S1 , S2 ) ∈ S
q1 (S1 , S2 ) ≥ 0 , q2 (S1 , S2 ) ≥ 0 ∀(S1 , S2 ) ∈ S
It can be shown that (λ1 , λ2 ) is the minimum time average expected power expenditure that can
be achieved by any control policy that stabilizes the system (including policies that are not S -only)
(21). Further, (λ1 , λ2 ) is continuous, convex, and entrywise non-decreasing.
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 39
Now assume that λ = (λ1 , λ2 ) is interior to , so that (λ1 + , λ2 + ) ∈ for all such
that 0 ≤ ≤ max (λ). It follows that whenever 0 ≤ ≤ max (λ), there exists an S -only algorithm
α ∗ (t) such that:
3.2.1 DRIFT-PLUS-PENALTY
Define the same Lyapunov function L(Q(t)) as in (3.11), and let (Q(t)) represent the conditional
Lyapunov drift for slot t. While taking actions to minimize a bound on (Q(t)) every slot t would
stabilize the system, the resulting average power expenditure might be unnecessarily large. For ex-
ample, suppose the rate vector is (λ1 , λ2 ) = (0, 0.4), and recall that P r[S2 (t) = 2] = 0.3. Then the
drift-minimizing algorithm of the previous section would transmit over channel 2 whenever the
queue is not empty and S2 (t) ∈ {1, 2}. In particular, it would sometimes use “inefficient” transmis-
sions when S2 (t) = 1, which spend one unit of power but only deliver 1 packet. However, if we
only transmit when S2 (t) = 2 and when the number of packets in the queue is at least 2, it can be
shown that the system is still stable, but power expenditure is reduced to its minimum of λ2 /2 = 0.2
units/slot.
Instead of taking a control action to minimize a bound on (Q(t)), we minimize a bound
on the following drift-plus-penalty expression:
(Q(t)) + V E {p(t)|Q(t)}
2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V E p̂(α(t))|Q(t) + Qi (t)λi
i=1
2
−E Qi (t)b̂i (α(t), S (t))|Q(t) (3.25)
i=1
40 3. DYNAMIC SCHEDULING EXAMPLE
where we have used the fact that p(t) = p̂(α(t)). The drift-plus-penalty algorithm then observes
(Q1 (t), Q2 (t)) and (S1 (t), S2 (t)) every slot t and chooses an action α(t) to minimize the right-
hand-side of the above inequality. Again, using the concept of opportunistically minimizing an
expectation, this is accomplished by greedily minimizing:
2
value = V p̂(α(t)) − Qi (t)b̂i (α(t), S (t))
i=1
We thus compare the following values and choose the action corresponding to the smallest (breaking
ties arbitrarily):
2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V E p̂(α ∗ (t))|Q(t) + Qi (t)λi
i=1
2
∗
−E Qi (t)b̂i (α (t), S (t))|Q(t) (3.26)
i=1
where α ∗ (t) is any other (possibly randomized) transmission decision that can be made on slot t.
Now assume that λ is interior to , and fix any value such that 0 ≤ ≤ max (λ). Plugging the
S -only algorithm (3.22)-(3.24) into the right-hand-side of the above inequality and noting that
this policy makes decisions independent of queue backlog yields:
2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V (λ1 + , λ2 + ) + Qi (t)λi
i=1
2
− Qi (t)(λi + )
i=1
2
= B + V (λ1 + , λ2 + ) − Qi (t) (3.27)
i=1
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 41
Taking expectations of the above inequality and using the law of iterated expectations as before
yields:
2
E {L(Q(t + 1))} − E {L(Q(t))} + V E {p(t)} ≤ B + V (λ1 + , λ2 + ) − E {Qi (t)}
i=1
Summing the above over t ∈ {0, 1, . . . , T − 1} for some positive integer T yields:
−1
T
E {L(Q(T ))} − E {L(Q(0))} + V E {p(t)} ≤ BT + V T (λ1 + , λ2 + )
t=0
−1
T 2
− E {Qi (t)} (3.28)
t=0 i=1
Rearranging terms in the above and neglecting non-negative quantities where appropriate yields the
following two inequalities:
T −1
1 B E {L(Q(0))}
E {p(t)} ≤ (λ1 + , λ2 + ) + +
T V VT
t=0
T −1 2 T −1
1 B + V [(λ1 + , λ2 + ) − T1 t=0 E {p(t)}] E {L(Q(0))}
E {Qi (t)} ≤ +
T T
t=0 i=1
where the first inequality follows by dividing (3.28) by V T and the second follows by dividing (3.28)
by T . Taking limits as T → ∞ shows that:1
T −1
1 B
p = lim E {p(t)} ≤ (λ1 + , λ2 + ) + (3.29)
T →∞ T V
t=0
−1
T 2
1 B V [(λ1 + , λ2 + ) − p]
Q1 + Q2 = lim E {Qi (t)} ≤ + (3.30)
T →∞ T
t=0 i=1
where the final inequality holds because it requires at most one unit of energy to support each new
packet, and so increasing the total input rate from λ1 + λ2 to λ1 + λ2 + 2 increases the minimum
required average power by at most 2. Plugging the above into (3.30) yields:
B
Q1 + Q 2 ≤ + 2V
The above holds for all that satisfy 0 ≤ ≤ max (λ), and so plugging = max (λ) yields:
B
Q1 + Q2 ≤ + 2V (3.32)
max (λ)
The performance bounds (3.31) and (3.32) demonstrate an [O(1/V ), O(V )] power-backlog trade-
off: We can use an arbitrarily large V to make B/V arbitrarily small, so that (3.31) implies the time
average power p is arbitrarily close to the optimum (λ1 , λ2 ). This comes with a tradeoff: The
average queue backlog bound in (3.32) is O(V ).
1.45
p ≤ (λ1 , λ2 ) + (3.33)
V
1.45
Q1 + Q 2 ≤ + 2V (3.34)
0.12
Figs. 3.3 and 3.4 plot simulations for this system together with the above power and backlog bounds.
Each simulated data point represents a simulation over 2 × 106 slots using a particular value of V .
Values of V in the range 0 to 100 are shown. It is clear from the figures that average power converges
to the optimal p∗ = 0.7 as V increases, while average backlog increases linearly in V .
Performance can be significantly improved by noting that the drift-plus-penalty algorithm
given in Section 3.2.1 never transmits from queue 1 unless Q1 (t) ≥ V (else, value[1] would be
place
positive). Hence, Q1 (t) ≥ Q1 = max[V − 1, 0] for all slots t ≥ 0, provided that this holds at
t = 0. Similarly, the algorithm never transmits from queue 2 unless Q2 (t) ≥ V /2, and so Q2 (t) ≥
place
Q2 = max[V /2 − 2, 0] for all slots t ≥ 0, provided this holds at t = 0. It follows that we can
stack the queues with fake packets (called place-holder packets) that never get transmitted, as described
3.3. GENERALIZATIONS 43
Average power versus V Average backlog versus V
1.6 250
1.5
2
Average power
1
1.1
Upper bound
1 100
Figure 3.3: Average power versus V with Figure 3.4: Average backlog versus V with
(λ1 , λ2 ) = (0.3, 0.7). (λ1 , λ2 ) = (0.3, 0.7).
in more detail in Section 4.8 of the next chapter. This place-holder technique yields the same power
guarantee (3.33), but it has a significantly improved queue backlog bound given by:
1.45
(with place-holders) Q1 + Q2 ≤ + 2V − max[V − 1, 0] − max[V /2 − 2, 0]
0.12
Thus, the average queue bound under the place-holder technique grows like 0.5V , rather than 2V as
suggested in (3.34), a dramatic savings when V is large. Simulations of the place-holder technique
are also shown in Figs. 3.3 and 3.4. The queue backlog improvements due to placeholders are quite
significant (Fig. 3.4), with no noticeable difference in power expenditure (Fig. 3.3). Indeed, the sim-
ulated power expenditure curves for the cases with and without place-holders are indistinguishable
in Fig. 3.3. A plot of queue values over the first 3000 slots is given in Chapter 4, Fig. 4.2.
3.3 GENERALIZATIONS
The reader can easily see that the analysis in this chapter, which considers an example system of
2 queues, can be repeated for a larger system of K queues. Indeed, in that case the “min drift-
plus-penalty” algorithm generalizes to choosing α(t) to maximize K k=1 Qk (t)b̂k (α(t), S (t)) −
V p̂(α(t)). This holds for systems with more general channel states S (t), more general resource
allocation decisions α(t), and for arbitrary rate functions b̂k (α(t), S (t)) and “penalty functions”
p̂(α(t)). In particular:
• The vector S (t) might have an infinite number of possible outcomes (rather than just 6
outcomes).
• The decision α(t) might represent one of an infinite number of possible power allocation
options (rather than just one of three options). Alternatively, α(t) might represent one of an
44 3. DYNAMIC SCHEDULING EXAMPLE
infinite number of more sophisticated physical layer actions that can take place on slot t (such
as modulation, coding, beamforming, etc.).
• The rate function b̂k (α(t), S (t)) can be any function that maps a resource allocation decision
α(t) and a channel state vector S (t) into a transmission rate (and does not need to have the
structure (3.2)).
• The “penalty” function p̂(α(t)) does not have to represent power, and it can be any general
function of α(t).
The next chapter presents the general theory. It develops an important concept of virtual
queues to ensure general time average equality and inequality constraints are satisfied. It also considers
variable V algorithms that achieve the exact minimum average penalty subject to mean rate stability
(which typically incurs infinite average backlog). Finally, it shows how to analyze systems with
non-i.i.d. and non-ergodic arrival and channel processes.
45
CHAPTER 4
where {wn }N
n=1are a collection of positive weights. We typically use wn = 1 for all n, as in (3.11)
of Chapter 3, although different weights are often useful to allow queues to be treated differently.
This function L((t)) is always non-negative, and it is equal to zero if and only if all components
of (t) are zero. Define the one-slot conditional Lyapunov drift ((t)) as follows:1
((t))= E {L((t + 1)) − L((t))|(t)} (4.2)
This drift is the expected change in the Lyapunov function over one slot, given that the current state
in slot t is (t).
Theorem 4.1 (Lyapunov Drift) Consider the quadratic Lyapunov function (4.1), and assume
E {L((0))} < ∞. Suppose there are constants B > 0, ≥ 0 such that the following drift condition
1 Strictly speaking, better notation would be ((t), t), as the drift may be due to a non-stationary policy. However, we use the
simpler notation ((t)) as a formal representation of the right-hand-side of (4.2).
46 4. OPTIMIZING TIME AVERAGES
holds for all slots τ ∈ {0, 1, 2, . . .} and all possible (τ ):
N
((τ )) ≤ B − |n (τ )| (4.3)
n=1
Then:
a) If ≥ 0 then all queues n (t) are mean rate stable.
b) If > 0, then all queues are strongly stable and:
1
t−1 N
B
lim sup E {|n (τ )|} ≤ (4.4)
t→∞ t
τ =0 n=1
Proof. We first prove part (b). Taking expectations of (4.3) and using the law of iterated expectations
yields:
N
E {L((τ + 1))} − E {L((τ ))} ≤ B − E {|n (τ )|}
n=1
Summing the above over τ ∈ {0, 1, . . . , t − 1} for some slot t > 0 and using the law of telescoping
sums yields:
t−1 N
E {L((t))} − E {L((0))} ≤ Bt − E {|n (τ )|} (4.5)
τ =0 n=1
Now assume that > 0. Dividing by t, rearranging terms, and using the fact that E {L((t))} ≥ 0
yields:
1
t−1 N
B E {L((0))}
E {|n (τ )|} ≤ + (4.6)
t t
τ =0 n=1
The above holds for all slots t > 0. Taking a limit as t → ∞ proves part (b).
To prove part (a), we have from (4.5) that for all slots t > 0:
E {L((t))} − E {L((0))} ≤ Bt
1
N
wn E n (t)2 ≤ E {L((0))} + Bt
2
n=1
Theorem 4.2 (Lyapunov Optimization) Suppose L((t)) and ymin are defined by (4.1) and (4.8),
and that E {L((0))} < ∞. Suppose there are constants B ≥ 0, V ≥ 0, ≥ 0, and y ∗ such that for all
slots τ ∈ {0, 1, 2, . . .} and all possible values of (τ ), we have:
N
((τ )) + V E {y(τ )|(τ )} ≤ B + V y ∗ − |n (τ )| (4.9)
n=1
Then all queues n (t) are mean rate stable. Further, if V > 0 and > 0 then time average expected
penalty and queue backlog satisfy:
1
t−1
B
lim sup E {y(τ )} ≤ y ∗ + (4.10)
t→∞ t V
τ =0
1 B + V (y ∗ − ymin )
t−1 N
lim sup E {|n (τ )|} ≤ (4.11)
t→∞ t
τ =0 n=1
48 4. OPTIMIZING TIME AVERAGES
Finally, if V = 0 then (4.11) still holds, and if = 0 then (4.10) still holds.
Proof. Fix any slot τ . Because (4.9) holds for this slot, we can take expectations of both sides and
use the law of iterated expectations to yield:
N
E {L((τ + 1))} − E {L((τ ))} + V E {y(τ )} ≤ B + V y ∗ − E {|n (τ )|}
n=1
Summing over τ ∈ {0, 1, . . . , t − 1} for some t > 0 and using the law of telescoping sums yields:
t−1
t−1
N
E {L((t))} − E {L((0))} + V E {y(τ )} ≤ (B + V y ∗ )t − E {|n (τ )|} (4.12)
τ =0 τ =0 n=1
Rearranging terms and neglecting non-negative terms when appropriate, it is easy to show that the
above inequality directly implies the following two inequalities for all t > 0:
1
t−1
B E {L((0))}
E {y(τ )} ≤ y ∗ + + (4.13)
t V Vt
τ =0
1 B + V (y ∗ − ymin ) E {L((0))}
t−1 N
E {|n (τ )|} ≤ + (4.14)
t t
τ =0 n=1
where (4.13) follows by dividing (4.12) by V t, and (4.14) follows by dividing (4.12) by t. Taking
limits of the above as t → ∞ proves (4.10) and (4.11).
Rearranging (4.12) also yields:
from which mean rate stability follows by an argument similar to that given in the proof of Theorem
4.1. 2
Theorem 4.2 can be understood as follows: If for any parameter V > 0, we can design a control
algorithm to ensure the drift condition (4.9) is satisfied on every slot τ , then the time average expected
penalty satisfies (4.10) and hence is either less than the target value y ∗ , or differs from y ∗ by no
more than a “fudge factor” B/V , which can be made arbitrarily small as V is increased. However, the
time average queue backlog bound increases linearly in the V parameter, as shown by (4.11). This
presents a performance-backlog tradeoff of [O(1/V ), O(V )]. Because Little’s Theorem tells us that
average queue backlog is proportional to average delay (129), we often call this a performance-delay
tradeoff. The proof reveals further details concerning the affect of the initial condition (0) on time
average expectations at any slot t (see (4.13) and (4.14)).
4.1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 49
This result suggests the following control strategy: Every slot τ , observe the current (τ )
values and take a control action that, subject to the known (τ ), greedily minimizes the drift-plus-
penalty expression on the left-hand-side of the desired drift inequality (4.9):
It follows that if on every slot τ , there exists a particular control action that satisfies the drift require-
ment (4.9), then the drift-plus-penalty minimizing policy must also satisfy this drift requirement.
For intuition, note that taking an action on slot τ to minimize the drift ((τ )) alone would
tend to push queues towards a lower congestion state, but it may incur a large penalty y(τ ). Thus,
we minimize a weighted sum of drift and penalty, where the penalty is scaled by an “importance”
weight V , representing how much we emphasize penalty minimization. Using V = 0 corresponds
to minimizing the drift ((τ )) alone, which reduces to the Tassiulas-Ephremides technique for
network stability in (7)(8). While this does not provide any guarantees on the resulting time average
penalty y(t) (as the bound (4.10) becomes infinity for V = 0), it still ensures strong stability by (4.11).
The case for V > 0 includes a weighted penalty term in the greedy minimization, and corresponds
to our technique for joint stability and performance optimization, developed for utility optimal flow
control in (17)(18) and used for average power optimization in (20)(21) and for problems similar to
the type (1.1)-(1.5) and (1.6)-(1.11) in (22).
Lemma 4.3 Let X(t) be a random process defined over t ∈ {0, 1, 2, . . .}, and suppose that the following
hold:
• E X(t)2 is finite for all t ∈ {0, 1, 2, . . .} and satisfies:
∞
E X(t)2
<∞
t2
t=1
• There is a real-valued constant β such that for all t ∈ {1, 2, 3, . . .} and all possible X(0), . . . , X(t −
1), the conditional expectation satisfies:
Then:
1
t−1
lim sup X(τ ) ≤ β (w.p.1)
t→∞ t
τ =0
50 4. OPTIMIZING TIME AVERAGES
where “(w.p.1)” stands for “with probability 1.”
A proof of this lemma is given in (138) as a simple application of the Kolmogorov law of large
numbers for martingale differences. See (139)(140)(130)(141) for background on martingales and a
statement and proof of the Kolmogorov law of large numbers. The lemma is used in (138) to prove
the probability 1 version of the Lyapunov optimization theorem given below.
Let (t) be a vector of queues and y(t) a penalty process, as before. Rather than defining
a drift that conditions on (t), we must condition on the full history H(t), which includes values
of (τ ) for τ ∈ {0, . . . , t} and values of y(τ ) for τ ∈ {0, . . . , t − 1}. Specifically, for integers t ≥ 0
define:
H(t)= {(0), (1), . . . , (t), y(0), y(1), . . . , y(t − 1)}
Define (t, H(t)) by:
(t, H(t))= E {L((t + 1)) − L((t))|H(t)}
Assume that:
• The penalty process y(t) is deterministically lower bounded by a (possibly negative) constant
ymin , so that:
y(t) ≥ ymin ∀t (w.p.1) (4.16)
• The second moments E y(t)2 are finite for all t ∈ {0, 1, 2, . . .}, and:
∞
E y(t)2
<∞ (4.17)
t2
t=1
• There is a finite constant D > 0 such that for all n ∈ {1, . . . , N}, all t, and all possible H(t),
we have:
E (n (t + 1) − n (t))4 |H(t) ≤ D (4.18)
Theorem 4.4 (Lyapunov Optimization with Probability 1 Convergence) Define L((t)) by (4.1),
assume that (0) is finite with probability 1, and suppose that assumptions (4.16)-(4.18) hold. Suppose
there are constants B ≥ 0, V > 0, > 0, and y ∗ such that for all slots τ ∈ {0, 1, 2, . . .} and all possible
H(τ ), we have:
N
(τ, H(τ )) + V E {y(τ )|H(τ )} ≤ B + V y ∗ − |n (τ )|
n=1
4.1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 51
Then all queues n (t) are rate stable, and:
1
t−1
B
lim sup y(τ ) ≤ y ∗ + (w.p.1) (4.19)
t→∞ t V
τ =0
1 B + V (y ∗ − ymin )
t−1 N
lim sup |n (τ )| ≤ (w.p.1) (4.20)
t→∞ t
τ =0 n=1
Further, if these same assumptions hold, and if there is a value y such that the following additional
inequality also holds for all τ and all possible (τ ):
Then:
1
t−1
lim sup y(τ ) ≤ y + B/V (w.p.1) (4.21)
t→∞ t
τ =0
Proof. Fix (0) as a given finite initial condition. Define the process X(t) for t ∈ {0, 1, 2, . . .} as
follows:
N
X(t)= L((t + 1)) − L((t)) + V y(t) − B − V y ∗ + |n (t)|
n=1
The conditions
on y(t) and (t) are shown in (138) to ensure that the queues n (t) are rate stable,
that E X(t) is finite for all t, and that for all t > 0 and all possible values of X(t − 1), . . . , X(0):
2
∞
E X(t)2
< ∞ , E {X(t)|X(t − 1), X(t − 2), . . . , X(0)} ≤ 0
t2
t=1
1
t−1
lim sup X(τ ) ≤ 0 (w.p.1) (4.22)
t→∞ t
τ =0
1 −L((0)) 1
t−1 t−1
X(τ ) ≥ + y(τ ) − [B/V + y ∗ ]
Vt Vt t
τ =0 τ =0
1 1 [B + V (y ∗ − ymin )]
t−1 t−1 N
−L((0))
X(τ ) ≥ + |n (τ )| −
t t t
τ =0 τ =0 n=1
Taking limits of the above two inequalities and using (4.22) proves the results (4.19)-(4.20). A similar
argument proves (4.21). 2
Conditioning on the history H(t) is needed to prove Theorem 4.4 via Lemma 4.3. A policy that
greedily minimizes (t, H(t)) + V E {y(t)|H(t)} every slot will also greedily minimize ((t)) +
V E {y(t)|(t)}. In this text, we focus primarily on time average expectations of the type (4.10) and
(4.11), with the understanding that the same bounds can be shown to hold for time averages (with
probability 1) if the additional assumptions (4.16)-(4.18) hold.
Figure 4.1: An illustration of a general K-queue network with attributes yl (t), ej (t).
Consider now a system with queue backlog vector Q(t) = (Q1 (t), . . . , QK (t)), as shown in
Fig. 4.1. Queue dynamics are given by:
where a(t) = (a1 (t), . . . , aK (t)) and b(t) = (b1 (t), . . . , bK (t)) are general functions of a random
event ω(t) and a control action α(t):
for some finite constant σ 2 > 0. Further, for all t and all actions α(t) ∈ Aω(t) , we require the
expectation of y0 (t) to be bounded by some finite constants y0,min , y0,max :
y0,min ≤ E ŷ0 (α(t), ω(t)) ≤ y0,max (4.30)
1
t−1
y l (t)= E {yl (τ )}
t
τ =0
54 4. OPTIMIZING TIME AVERAGES
where the expectation is over the randomness of the ω(τ ) values and the random control actions.
Define time average expectations a k (t), bk (t), ej (t) similarly. Define y l and ej as the limiting values
of y l (t) and ej (t), assuming temporarily that these limits are well defined. We desire a control policy
that solves the following problem:
Minimize: y0
Subject to: 1) y l ≤ 0 ∀l ∈ {1, . . . , L}
2) ej = 0 ∀j ∈ {1, . . . , J }
3) Queues Qk (t) are mean rate stable ∀k ∈ {1, . . . , K}
4) α(t) ∈ Aω(t) ∀t
The above description of the problem is convenient, although we can state the problem more precisely
without assuming limits are well defined as follows:
An example of such a problem is when we have a K-queue wireless network that must be
stabilized subject to average power constraints P l ≤ Plav for each node l ∈ {1, . . . , L}, where P l
represents the time average power of node l, and Plav represents a pre-specified average power
constraint. Suppose the goal is to maximize the time average of the total admitted traffic. Then y0 (t)
is −1 times the admitted traffic on slot t. We also define yl (t) = Pl (t) − Plav , being the difference
between the average power expenditure of node l and its time average constraint, so that y l ≤ 0
corresponds to P l ≤ Plav . In this example, there are no time average equality constraints, and so
J = 0. See also Section 4.6 and Exercises 2.11, 4.7-4.14 for more examples.
Consider now the special class of stationary and randomized policies that we call ω-only
policies, which observe ω(t) for each slot t and independently choose a control action α(t) ∈ Aω(t)
as a pure (possibly randomized) function of the observed ω(t). Let α ∗ (t) represent the decisions
under such an ω-only policy over time t ∈ {0, 1, 2, . . .}. Because ω(t) has the stationary distribution
π(ω) for all t, the expectation of the arrival, service, and attribute values are the same for all t:
E ŷl (α ∗ (t), ω(t)) = y l ∀l ∈ {0, 1, . . . , L}
E êj (α ∗ (t), ω(t)) = ej ∀j ∈ {1, . . . , J }
E âk (α ∗ (t), ω(t)) = a k ∀k ∈ {1, . . . , K}
E b̂k (α ∗ (t), ω(t)) = bk ∀k ∈ {1, . . . , K}
for some quantities y l , ej , a k , bk . In the case when is finite or countably infinite, the expectations
above can be understood as weighted sums over all ω values, weighted by the stationary distribution
4.3. OPTIMALITY VIA ω-ONLY POLICIES 55
π(ω). Specifically:
E ŷl (α ∗ (t), ω(t)) = π(ω)E ŷl (α ∗ (t), ω)|ω(t) = ω
ω∈
The above expectations y l , ej , a k , bk are finite under any ω-only policy because of the bound-
edness assumptions (4.25)-(4.30). In addition to assuming ω(t) is a stationary process, we make
the following mild “law of large numbers” assumption concerning time averages (not time average
expectations): Under any ω-only policy α ∗ (t) that yields expectations y l , ej , a k , bk on every slot t, the
infinite horizon time averages of ŷl (α ∗ (t), ω(t)), êj (α ∗ (t), ω(t)), âk (α ∗ (t), ω(t)), b̂k (α ∗ (t), ω(t))
are equal to y l , ej , a k , bk with probability 1. For example:
1
t−1
lim ŷl (α ∗ (τ ), ω(τ )) = y l (w.p.1)
t→∞ t
τ =0
where “(w.p.1)” means “with probability 1.” This is a mild assumption that holds whenever ω(t) is
i.i.d. over slots. This is because, by the law of large numbers, the resulting ŷl (α ∗ (t), ω(t)) process
is i.i.d. over slots with finite mean y l . However, this also holds for a large class of other stationary
processes, including stationary processes defined over finite state irreducible Discrete Time Markov
Chains (as considered in Section 4.9). It does not hold, for example, for degenerate stationary
processes where ω(0) can take different values according to some probability distribution, but is
then held fixed for all slots thereafter so that ω(t) = ω(0) for all t.
Under these assumptions, we say that the problem (4.31)-(4.35) is feasible if there exists a
opt
control policy that satisfies the constraints (4.32)-(4.35). Assuming feasibility, define y0 as the in-
fimum value of the cost metric (4.31) over all control policies that satisfy the constraints (4.32)-(4.35).
opt
This infimum is finite by (4.30). We emphasize that y0 considers all possible control policies that
choose α(t) ∈ Aω(t) over slots t, not just ω-only policies. However, in Appendix 4.A, it is shown that
opt
y0 can be computed in terms of ω-only policies. Specifically, it is shown that the set of all possible
limiting time average expectations of the variables [(yl (t)), (ej (t)), (ak (t)), (bk (t))], considering all
possible algorithms, is equal to the closure of the set of all one-slot averages [(y l ), (ej ), (a k ), (bk )]
achievable under ω-only policies. Further, the next theorem shows that if the problem (4.31)-(4.35)
opt
is feasible, then the utility y0 and the constraints y l ≤ 0, ej ≤ 0, a k ≤ bk can be achieved arbitrarily
closely by ω-only policies.
Theorem 4.5 (Optimality over ω-only Policies) Suppose the ω(t) process is stationary with distribution
π(ω), and that the system satisfies the boundedness assumptions (4.25)-(4.30) and the law of large numbers
assumption specified above. If the problem (4.31)-(4.35) is feasible, then for any δ > 0 there is an ω-only
56 4. OPTIMIZING TIME AVERAGES
policy α ∗ (t) that satisfies α ∗ (t) ∈ Aω(t) for all t, and:
E ŷ0 (α ∗ (t), ω(t))
opt
≤ y0 + δ (4.36)
E ŷl (α ∗ (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L} (4.37)
|E êj (α ∗ (t), ω(t)) | ≤ δ ∀j ∈ {1, . . . , J } (4.38)
E âk (α ∗ (t), ω(t)) ≤ E b̂k (α ∗ (t), ω(t)) + δ ∀k ∈ {1, . . . , K} (4.39)
The inequalities (4.36)-(4.39) are similar to those seen in Chapter 3, which related the ex-
istence of such randomized policies to the existence of linear programs that yield the desired time
averages. The stationarity of ω(t) simplifies the proof of Theorem 4.5 but is not crucial to its result.
Similar results are derived in (15)(21)(136) without the stationary assumption but under the addi-
tional assumption that ω(t) can take at most a finite (but arbitrarily large) number of values and has
well defined time averages.
We have stated Theorem 4.5 in terms of arbitrarily small values δ > 0. It may be of interest
to note that for most practical systems, there exists an ω-only policy that satisfies all inequalities
(4.36)-(4.39) with δ = 0. Appendix 4.A shows that this holds whenever the set , defined as the set
of all one-slot expectations achievable under ω-only policies, is closed. Thus, one may prefer a more
“aesthetically pleasing” version of Theorem 4.5 that assumes the additional mild closure property in
order to remove the appearance of “δ” in the theorem statement. We have presented the theorem in
the above form because it is sufficient for our purposes. In particular, we do not require the closure
property in order to apply the Lyapunov optimization techniques developed next.
The virtual queue Zl (t) is used to enforce the y l ≤ 0 constraint. Indeed, recall that if Zl (t) satisfies
(4.40) then by our basic sample path properties in Chapter 2, we have for all t > 0:
1
t−1
Zl (t) Zl (0)
− ≥ yl (τ )
t t t
τ =0
4.4. VIRTUAL QUEUES 57
Taking expectations of the above and taking t → ∞ shows:
E {Zl (t)}
lim sup ≥ lim sup y l (t)
t→∞ t t→∞
where we recall that y l (t) is the time average expectation of yl (τ ) over τ ∈ {0, . . . , t − 1}. Thus, if
Zl (t) is mean rate stable, the left-hand-side of the above inequality is 0 and so:
This means our desired time average constraint for yl (t) is satisfied. This turns the problem of
satisfying a time average inequality constraint into a pure queue stability problem! This discussion
is of course just a repeated derivation of Theorem 2.5 (as well as Exercise 2.11).
The virtual queue Hj (t) is designed to turn the time average equality constraint ej = 0 into a
pure queue stability problem.The Hj (t) queue has a different structure, and can possibly be negative,
because it enforces an equality constraint rather than an inequality constraint. It is easy to see by
summing (4.41) that for any t > 0:
t−1
Hj (t) − Hj (0) = ej (τ )
τ =0
lim ej (t) = 0
t→∞
1 1 1
K L J
L((t))= Qk (t)2 + Zl (t)2 + Hj (t)2 (4.43)
2 2 2
k=1 l=1 j =1
If there are no equality constraints, we have J = 0 and we remove the Hj (t) queues. If there are no
inequality constraints, then L = 0 and we remove the Zl (t) queues.
Lemma 4.6 Suppose ω(t) is i.i.d. over slots. Under any control algorithm, the drift-plus-penalty ex-
pression has the following upper bound for all t, all possible values of (t), and all parameters V ≥ 0:
K
((t)) + V E {y0 (t)|(t)} ≤ B + V E {y0 (t)|(t)} + Qk (t)E {ak (t) − bk (t) | (t)}
k=1
L
J
+ Zl (t)E {yl (t)|(t)} + Hj (t)E ej (t)|(t) (4.44)
l=1 j =1
1 1
K L
B ≥ E ak (t)2 + bk (t)2 | (t) + E yl (t)2 |(t)
2 2
k=1 l=1
1 J
K
+ E ej (t)2 |(t) − E b̃k (t)ak (t)|(t) (4.45)
2
j =1 k=1
where we recall that b̃k (t) = min[Qk (t), bk (t)]. Such a constant B exists because ω(t) is i.i.d. and the
boundedness assumptions in Section 4.2.1 hold.
Proof. Squaring the queue update equation (4.23) and using the fact that max[q − b, 0]2 ≤ (q − b)2
yields:
Qk (t + 1)2 ≤ (Qk (t) − bk (t))2 + ak (t)2 + 2 max[Qk (t) − bk (t), 0]ak (t)
= (Qk (t) − bk (t))2 + ak (t)2 + 2(Qk (t) − b̃k (t))ak (t) (4.46)
Therefore:
Qk (t + 1)2 − Qk (t)2 ak (t)2 + bk (t)2
≤ − b̃k (t)ak (t) + Qk (t)[ak (t) − bk (t)]
2 2
4.5. THE MIN DRIFT-PLUS-PENALTY ALGORITHM 59
Similarly,
Then update the virtual queues Zl (t) and Hj (t) according to (4.40) and (4.41), and the actual queues
Qk (t) according to (4.23).
A remarkable property of this algorithm is that it does not need to know the probabilities π(ω).
After observing ω(t), it seeks to minimize a (possibly non-linear, non-convex, and discontinuous)
function of α over all α ∈ Aω(t) . Its complexity depends on the structure of the functions âk (·),
b̂k (·), ŷl (·), êj (·). However, in the case when the set Aω(t) contains a finite (and small) number of
possible control actions, the policy simply evaluates the function over each option and chooses the
best one.
Before presenting the analysis, we note that the problem (4.48)-(4.49) may not have a well
defined minimum when the set Aω(t) is infinite. However, rather than assuming our decisions obtain
the exact minimum every slot (or come close to the infimum), we analyze the performance when our
implementation comes within an additive constant of the infimum in the right-hand-side of (4.44).
Definition 4.7 allows the deviation from the infimum to be in an expected sense, rather than a
deterministic sense, which is useful in some applications. These C-additive approximations are also
60 4. OPTIMIZING TIME AVERAGES
useful for implementations with out-of-date queue backlog information, as shown in Exercise 4.10,
and for achieving maximum throughput in interference networks via approximation algorithms, as
shown in Chapter 6.
Theorem 4.8 (Performance of Min Drift-Plus-Penalty Algorithm) Suppose that ω(t) is i.i.d. over slots
with probabilities π(ω), the problem (4.31)-(4.35) is feasible, and that E {L((0))} < ∞. Fix a value
C ≥ 0. If we use a C-additive approximation of the algorithm every slot t, then:
a) Time average expected cost satisfies:
1
t−1
opt B +C
lim sup E {y0 (τ )} ≤ y0 + (4.50)
t→∞ t V
τ =0
opt
where y0 is the infimum time average cost achievable by any policy that meets the required constraints,
and B is defined in (4.45).
b) All queues Qk (t), Zl (t), Hj (t) are mean rate stable, and all required constraints (4.32)-(4.35)
are satisfied.
c) Suppose there are constants > 0 and () for which the Slater condition of Assumption A1
holds, stated below in (4.61)-(4.64). Then:
1
t−1 K opt
B + C + V [() − y0 ]
lim sup E {Qk (τ )} ≤ (4.51)
t→∞ t
τ =0 k=1
opt
where [() − y0 ] ≤ y0,max − y0,min , and y0,min , y0,max are defined in (4.30).
We note that the bounds given in (4.50) and (4.51) are not just infinite horizon bounds:
Inequalities (4.58) and (4.59) in the below proof show that these bounds hold for all time t > 0 in
the case when all initial queue backlogs are zero, and that a “fudge factor” that decays like O(1/t)
must be included if initial queue backlogs are non-zero. The above theorem is for the case when
ω(t) is i.i.d. over slots. The same algorithm can be shown to offer similar performance under more
general ergodic ω(t) processes as well as for non-ergodic processes, as discussed in Section 4.9.
Proof. (Theorem 4.8) Because, every slot t, our implementation comes within an additive constant
C of minimizing the right-hand-side of the drift expression (4.44) over all α(t) ∈ Aω(t) , we have
for each slot t:
((t)) + V E {y0 (t)|(t)} ≤ B + C + V E y0∗ (t)|(t)
L
J
+ Zl (t)E yl∗ (t)|(t) + Hj (t)E ej∗ (t)|(t)
l=1 j =1
K
+ Qk (t)E ak∗ (t) − bk∗ (t) | (t) (4.52)
k=1
4.5. THE MIN DRIFT-PLUS-PENALTY ALGORITHM 61
where ak∗ (t), bk∗ (t), yl∗ (t), ej∗ (t) are the resulting arrival, service, and attribute values under
any alternative (possibly randomized) decision α ∗ (t) ∈ Aω(t) . Specifically, ak∗ (t)= âk (α ∗ (t), ω(t)),
bk∗ (t)= b̂k (α ∗ (t), ω(t)), yl∗ (t)= ŷl (α ∗ (t), ω(t)), ej∗ (t)= êj (α ∗ (t), ω(t)).
Now fix δ > 0, and consider the ω-only policy α ∗ (t) that yields (4.36)-(4.39). Because this
is an ω-only policy, and ω(t) is i.i.d. over slots, the resulting values of y0∗ (t), ak∗ (t), bk∗ (t), ej∗ (t) are
independent of the current queue backlogs (t), and we have from (4.36)-(4.39):
E y0∗ (t)|(t) = E y0∗ (t) ≤ y0 + δ
opt
(4.53)
E yl∗ (t)|(t) = E yl∗ (t) ≤ δ ∀l ∈ {1, . . . , L} (4.54)
|E ej∗ (t)|(t) | = |E ej∗ (t) | ≤ δ ∀j ∈ {1, . . . , J } (4.55)
∗
E ak (t) − bk∗ (t)|(t) = E ak∗ (t) − bk∗ (t) ≤ δ ∀k ∈ {1, . . . , K} (4.56)
This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2).
Hence, all queues are mean rate stable, and so all required time average constraints are satisfied,
which proves part (b). Further, from the above drift expression, we have for any t > 0 (from (4.13)
of Theorem 4.2, or simply from taking iterated expectations and telescoping sums):
1
t−1
opt B +C E {L((0))}
E {y0 (τ )} ≤ y0 + + (4.58)
t V Vt
τ =0
K
((t)) + V E {y0 (t)|(t)} ≤ B + C + V () − Qk (t)
k=1
Taking iterated expectations, summing the telescoping series, and rearranging terms as usual yields:
t−1
1
t−1 K
B + C + V [() − τ =0 E {y0 (τ )}]
1
E {L((0))}
E {Qk (τ )} ≤ t
+ (4.59)
t t
τ =0 k=1
However, because our algorithm satisfies all of the desired constraints of the optimization problem
opt
(4.31)-(4.35), its limiting time average expectation for y0 (t) cannot be better than y0 :
1
t−1
opt
lim inf E {y0 (τ )} ≥ y0 (4.60)
t→∞ t
τ =0
62 4. OPTIMIZING TIME AVERAGES
Indeed, this fact is shown in Appendix 4.A (equation (4.96)). Taking a lim sup of (4.59) as t → ∞
and using (4.60) yields:
1
t−1 K opt
B + C + V [() − y0 ]
lim sup E {Qk (τ )} ≤
t→∞ t
τ =0 k=1
2
The following is the Assumption A1 needed in part (c) of Theorem 4.8.
Assumption A1 (Slater Condition): There are values > 0 and () (where y0min ≤ () ≤
y0max ) and an ω-only policy α ∗ (t) that satisfies:
E ŷ0 (α ∗ (t), ω(t)) = () (4.61)
E ŷl (α ∗ (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (4.62)
E êj (α ∗ (t), ω(t)) = 0 ∀j ∈ {1, . . . , J } (4.63)
∗
∗
E âk (α (t), ω(t)) ≤ E b̂k (α (t), ω(t)) − ∀k ∈ {1, . . . , K} (4.64)
Assumption A1 ensures strong stability of the Qk (t) queues. However, often the structure of
a particular problem allows stronger deterministic queue bounds, even without Assumption A1 (see
Exercise 4.9). A variation on the above proof that considers probability 1 convergence is treated in
Exercise 4.6.
4.6 EXAMPLES
Here we provide examples of using the drift-plus-penalty algorithm for the same systems considered
in Sections 2.3.1 and 2.3.2. More examples are given in Exercises 4.7-4.15.
Then update the queues Qk (t) according to (4.23). Note that the problem (4.65)-(4.66) is equivalent
to minimizing 3k=1 Qk (t)[ak (t) − bk (t)] subject to the same constraints, but to minimize this, it
suffices to minimize only the terms we can control (so we can remove the 3k=1 Qk (t)ak (t) term
that is the same regardless of our control decision). It is easy to see that the problem (4.65)-(4.66)
reduces to choosing the two largest queues to serve every slot, breaking ties arbitrarily. This simple
policy does not require any knowledge of (λ1 , λ2 , λ3 ), yet ensures all queues are mean rate stable
whenever possible!
b) From (4.45) and using the fact that L = J = 0 and b̃k (t)ak (t) ≥ 0, we want to find a value
B that satisfies:
1
3 3
1
B≥ E ak2 (t)|(t) + E bk (t)2 |(t)
2 2
k=1 k=1
Because ak (t) is i.i.d. over slots, it is independent of (t) and so E ak (t)2 |(t) = E ak2 . Further,
bk (t)2 = bk (t) (because bk (t) ∈ {0, 1}). Thus, it suffices to find a value B that satisfies:
1
3 3
1
B≥ E ak2 + E bk (t)|(t)
2 2
k=1 k=1
64 4. OPTIMIZING TIME AVERAGES
However, since b1 (t) + b2 (t) + b3 (t) ≤ 2 for all t (regardless of (t)), we can choose:
1
3
B= E ak2 + 1
2
k=1
Q1 + Q2 + Q3 ≤ B/
c) We now define penalty y0 (t) = ŷ0 (b1 (t), b2 (t), b3 (t)), where:
1 if (b1 (t), b2 (t), b3 (t)) ∈ {(1, 1, 0) ∪ (1, 0, 1)}
ŷ0 (b1 (t), b2 (t), b3 (t)) =
2 if (b1 (t), b2 (t), b3 (t)) = (0, 1, 1)
Then the drift-plus-penalty algorithm (with V > 0) now observes (Q1 (t), Q2 (t), Q3 (t)) every slot
t and chooses a server allocation to solve:
Minimize: V ŷ0 (b1 (t), b2 (t), b3 (t)) − 2k=1 Qk (t)bk (t) (4.67)
Subject to: (b1 (t), b2 (t), b3 (t)) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (4.68)
This can be solved easily by comparing the value of (4.67) associated with each option:
Thus, every slot t we pick the option with the smallest of the above three values, breaking ties
arbitrarily. This is again a simple dynamic algorithm that does not require knowledge of the rates
(λ1 , λ2 , λ3 ). By (4.50), we know that the achieved time average power p (where p= y 0 ) satisfies
p ≤ p + B/V , where B is defined in part (b). Because y0,max = 2 and y0,min = 1, by (4.51),
opt
Packets that are not admitted are dropped. We thus have ω(t) = [(S1 (t), S2 (t)), (A1 (t), A2 (t))].
The control action is given by α(t) = [(α1 (t), α2 (t)); (β1 (t), β2 (t))] where αk (t) is a binary value
that is 1 if we choose to admit the packet (if any) arriving to queue k on slot t, and βk (t) is a binary
value that is 1 if we choose serve queue k on slot t, with the constraint β1 (t) + β2 (t) ≤ 1.
a) Use the drift-plus-penalty method (with V > 0 and C = 0) to stabilize the queues while
seeking to maximize the linear utility function of throughput w1 a 1 + w2 a 2 , where w1 and w2 are
given positive weights and a k represents the time average rate of data admitted to queue k.
b) Assuming the Slater condition of Assumption A1 holds for some value > 0, state the
resulting utility and average backlog performance.
c) Redo parts (a) and (b) with the additional constraint that a 1 ≥ 0.1 (assuming this constraint,
is feasible).
Solution:
a) We have K = 2 queues to stabilize. We have penalty function y0 (t) = −w1 a1 (t) − w2 a2 (t)
(so that minimizing the time average of this penalty maximizes w1 a 1 + w2 a 2 ). There are no
other attributes yl (t) or ej (t), so L = J = 0. The arrival and service variables are given by
ak (t) = âk (αk (t), Ak (t)) and bk (t) = b̂k (βk (t), Sk (t)) for k ∈ {1, 2}, where:
âk (αk (t), Ak (t)) = αk (t)Ak (t) , b̂k (βk (t), Sk (t)) = βk (t)1{Sk (t)=ON }
where 1{Sk (t)=ON } is an indicator function that is 1 if Sk (t) = ON , and 0 else. The drift-plus-penalty
algorithm of (4.48) thus reduces to observing the queue backlogs (Q1 (t), Q2 (t)) and the current
network state ω(t) = [(S1 (t), S2 (t)), (A1 (t), A2 (t))] and making flow control and transmission
actions αk (t) and βk (t) to solve:
2
Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ]
k=1
Subj. to: αk (t) ∈ {0, 1} ∀k ∈ {1, 2} , βk (t) ∈ {0, 1} ∀k ∈ {1, 2}, β1 (t) + β2 (t) ≤ 1
The flow control and transmission decisions appear in separate terms in the above problem,
and so they can be chosen to minimize their respective terms separately.This reduces to the following
simple algorithm:
• (Flow Control) For each k ∈ {1, 2}, choose αk (t) = 1 (so that we admit Ak (t) to queue k)
whenever V wk ≥ Qk (t), and choose αk (t) = 0 else.
66 4. OPTIMIZING TIME AVERAGES
• (Transmission) Choose (β1 (t), β2 (t)) subject to the constraints to maximize
Q1 (t)β1 (t)1{S1 (t)=ON } + Q2 (t)β2 (t)1{S2 (t)=ON } . This reduces to the “Longest Con-
nected Queue” algorithm of (8). Specifically, we place the server to the queue that is ON and
that has the largest value of queue backlog, breaking ties arbitrarily.
1 1
2 2
B≥ E ak (t)2 |(t) + E bk (t)2 |(t)
2 2
k=1 k=1
Because
arrivals
are i.i.d. Bernoulli, they are independent of queue backlog and so E ak (t)2 |(t) =
E ak (t)2 = E {ak (t)} = λk . Further, bk (t)2 = bk (t), and b1 (t) + b2 (t) ≤ 1. Thus we can choose:
B = (λ1 + λ2 + 1)/2. It follows from (4.50) that:
where utility opt is the maximum possible utility value subject to stability. Further, because y0,min =
−(w1 + w2 ) and y0,max = 0, we have from (4.51):
Q1 + Q2 ≤ (B + V (w1 + w2 ))/
This can be viewed as introducing an additional penalty y1 (t) = 0.1 − a1 (t). The drift-plus-penalty
algorithm (4.48) reduces to observing the queue backlogs and network state ω(t) every slot t and
making actions to solve
Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + 2k=1 Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ]
+Z1 (t)[0.1 − α1 (t)A1 (t)]
Subj. to: αk (t) ∈ {0, 1} ∀k ∈ {1, 2} , βk (t) ∈ {0, 1} ∀k ∈ {1, 2}, β1 (t) + β2 (t) ≤ 1
Then update virtual queue Z1 (t) according to (4.69) at the end of the slot, and update the queues
Qk (t) according to (4.23). This reduces to:
• (Flow Control) Choose α1 (t) = 1 whenever V w1 + Z1 (t) ≥ Q1 (t), and choose α1 (t) = 0
else. Choose α2 (t) = 1 whenever V w2 ≥ Q2 (t), and choose α2 (t) = 0 else.
Theorem 4.9 Suppose that ω(t) is i.i.d. over slots with probabilities π(ω), the problem (4.31)-(4.35)
is feasible, and E {L((0))} < ∞. Suppose that every slot t, we implement a C-additive approximation
that comes within C ≥ 0 of the infimum of a modified right-hand-side of (4.44), where the V parameter
is replaced with V (t), defined:
V (t)= V0 (t + 1)β ∀t ∈ {0, 1, 2, . . .} (4.70)
for some constants V0 > 0 and β such that 0 < β < 1. Then all queues are mean rate stable, all required
constraints (4.32)-(4.35) are satisfied, and:
1
t−1
opt
lim E {y0 (τ )} = y0
t→∞ t
τ =0
The manner in which the V0 and β parameters affect convergence is described in the proof, specifically in
(4.72) and (4.73).
opt
While this variable V approach yields the exact optimum y0 , its disadvantage is that we
achieve only mean rate stability and not strong stability, so that there is no finite bound on average
queue size and average delay. In fact, it is known that for typical problems (except for those with
a trivial structure), average backlog and delay necessarily grow to infinity as we push performance
closer and closer to optimal, becoming infinity at the optimal point (50)(51)(52)(53). The very large
queue sizes incurred by this variable V algorithm also make it more difficult to adapt to changes in
system parameters, whereas fixed V algorithms can easily adapt.
Proof. (Theorem 4.9) Repeating the proof of Theorem 4.8 by replacing V with V (t) for a given slot
t, the equation (4.57) becomes:
opt
((t)) + V (t)E {y0 (t)|(t)} ≤ B + C + V (t)y0
Taking expectations of both sides of the above and using iterated expectations yields:
opt
E {L((t + 1))} − E {L((t))} + V (t)E {y0 (t)} ≤ B + C + V (t)y0 (4.71)
opt
t−1
E {L((t))} − E {L((0))} ≤ (B + C)t + (y0 − y0,min ) V (τ )
τ =0
Using the definition of the Lyapunov function in (4.43) yields the following for all t > 0:
K
L
J
E Qk (t)2 + E Zl (t)2 + E Hj (t)2 ≤
k=1 l=1 j =1
opt
t−1
2(B + C)t + 2E {L((0))} + 2(y0 − y0,min ) V (τ )
τ =0
Take any queue Qk (t). Because E {Qk (t)}2 ≤ E Qk (t)2 , we have for all queues Qk (t):
t−1
E {Qk (t)} ≤ 2(B + C)t + 2E {L((0))} + 2(y0 − y0,min )
opt
V (τ )
τ =0
and the same bound holds for E {Zl (t)} and E |Hj (t)| for all l ∈ {1, . . . , L}, j ∈ {1, . . . , J }.
Dividing both sides of the above inequality by t yields the following for all t > 0:
E {Qk (t)} 1
t−1
2(B + C) 2E {L((0))}
≤
opt
+ + 2(y 0 − y 0,min ) V (τ ) (4.72)
t t t2 t2
τ =0
and the same bound holds for all E {Zl (t)} /t and E |Hj (t)| /t. However, we have:
1 V0
t−1 t−1
V0 t V0 (1 + t)1+β − 1
0≤ 2 V (τ ) = 2 (1 + τ ) ≤ 2
β
(1 + v) dv = 2
β
t t t 0 t 1+β
τ =0 τ =0
Because 0 < β < 1, taking a limit of the above as t → ∞ shows that t12 t−1 τ =0 V (τ ) → 0. Using
this and taking a limit of (4.72) shows that all queues are mean rate stable, and hence (by Section
4.4)) all required constraints (4.32)-(4.35) are satisfied.
opt
To prove that the time average expectation of y0 (t) converges to y0 , consider again the
inequality (4.71), which holds for all t. Dividing both sides of (4.71) by V (t) yields:
E {L((t + 1))} − E {L((t))} B +C opt
+ E {y0 (t)} ≤ + y0
V (t) V (t)
Summing the above over τ ∈ {0, 1, . . . , t − 1} and collecting terms yields:
E {L((t))} E {L((0))}
t−1 t−1
1 1
− + E {L((τ ))} − + E {y0 (τ )} ≤
V (t − 1) V (0) V (τ − 1) V (τ )
τ =1 τ =0
opt
t−1
1
ty0 + (B + C)
V (τ )
τ =0
4.8. PLACE-HOLDER BACKLOG 69
Because V (t) is non-decreasing, we have for all τ ≥ 1:
1 1
− ≥0
V (τ − 1) V (τ )
Using this in the above inequality and dividing by t yields:
1 1 1
t−1 t−1
opt E {L((0))}
E {y0 (τ )} ≤ y0 + (B + C) + (4.73)
t t V (τ ) V (0)t
τ =0 τ =0
However:
1 1
t−1
1 1 t−1 1 1 1 t 1−β − 1
0≤ ≤ + dv = +
t V (τ ) tV (0) V0 t 0 (1 + v) β tV (0) V0 t 1−β
τ =0
Taking a limit as t → ∞ shows that this term vanishes, and so the lim sup of the left-hand-side in
opt
(4.73) is less than or equal to y0 . However, the policy satisfies all constraints (4.32)-(4.35) and so
opt
the lim inf must be greater than or equal to y0 (by the Appendix 4.A result (4.96)), so the limit
opt
exists and is equal to y0 . 2
Clearly 0 is a place-holder value for all queues Qk (t) and Zl (t), but the idea is to compute
the largest possible place-holder values. It is often easy to pre-compute positive place-holder values
without knowing anything about the system probabilities. This is done in the Chapter 3 example
for minimizing average power expenditure subject to stability (see Section 3.2.4), and Exercises 4.8
and 4.11 provide further examples. Suppose now we run the algorithm with initial queue backlog
place
Qk (0) = Qk for all k ∈ {1, . . . , K}. Then we achieve exactly the same backlog and penalty
place
sample paths under either FIFO or LIFO. However, none of the initial backlog Qk would ever
exit the system under LIFO! Thus, we can achieve the same performance by replacing this initial
place
backlog Qk with fake backlog, called place-holder backlog (142)(143). Whenever a transmission
opportunity arises, we transmit only actual data whenever possible, serving the actual data in any
place
order we like (such as FIFO or LIFO). Because queue backlog never dips below Qk , we never
have to serve any fake data. Thus, the actual queue backlog under this implementation is equal to
place
Qactual
k (t) = Qk (t) − Qk for all t, which reduces the actual backlog by an amount exactly equal
place
to Qk . This does not affect the sample path and hence does not affect the time average penalty.
Specifically, for all k ∈ {1, . . . , K} and l ∈ {1, . . . , L}, we initialize the actual backlog
place place
Qactual
k (0) = Zlactual (0) = 0, but we use place-holder backlogs Qk , Zl so that:
place place
Qk (0) = Qk , Zl (0) = Zl ∀k ∈ {1, . . . , K}, l ∈ {1, . . . , L}
We then operate the algorithm using the Qk (t) and Zl (t) values (not the actual values Qactual
k (t)
and Zlactual (t)). The above discussion ensures that for all time t, we have:
place place
Qactual
k (t) = Qk (t) − Qk , Zlactual (t) = Zl (t) − Zl ∀t ≥ 0
Because the bounds in Theorem 4.8 are independent of the initial condition, the same penalty and
place place
backlog bounds are achieved. However, the actual backlog is reduced by exactly Qk and Zl
at every instant of time. This is a “free” reduction in the queue backlog, with no impact on the
limiting time average penalty. This has already been illustrated in the example minimum average
power problem of the previous chapter (Section 3.2.4, Figs. 3.3-3.4). The Fig. 4.2 below provides
further insight: Fig. 4.2 shows a sample path of Q2 (t) for the same example system of Section 3.2.4
place
(using V = 100 and (λ1 , λ2 ) = (0.3, 0.7)). We use Q2 = min[V − 2, 0] = 48 as the initial
backlog, and the figure illustrates that Q2 (t) indeed never drops below 48. The place-holder savings
is illustrated in the figure.
We developed this method of place-holder bits in (143) for use in dynamic data compression
problems and in (142) for general constrained cost minimization problems (including multi-hop
wireless networks with unreliable channels). The reader is referred to the examples and simulations
4.8. PLACE-HOLDER BACKLOG 71
Backlog Q2(t) versus time
120
100
60
2
40 place
Placeholder value Q2
savings
20
0
0 500 1000 1500 2000 2500 3000
t
Figure 4.2: A sample path of Q2 (t) over 3000 slots for the example system of Section 3.2.4.
given in (143)(142). A more aggressive place-holder technique is developed in (37). The idea of
(37) can be illustrated easily from Fig. 4.2: While the figure illustrates that Q2 (t) never drops be-
place
low Q2 , the backlog actually increases until it reaches a “plateau” around 100 packets, and then
oscillates with some noise about this value. Intuitively, we can almost double the place-holder value
in the figure, raising the horizontal line up to a level that is close to the minimum backlog value
seen in the plateau. While we cannot guarantee that backlog will never drop below this new line,
the idea is to show that such events occur rarely. Work in (45) shows that scaled queue backlog con-
verges to a Lagrange multiplier of a related static optimization problem, and work in (37) shows that
actual queue backlog oscillates very closely about this Lagrange multiplier. Specifically, it is shown
in (37) that, under mild assumptions, the steady state backlog distribution decays exponentially in
distance from the Lagrange multiplier value. It then develops an algorithm that uses a place-holder
that is a distance of O(log2 (V )) from the Lagrange multiplier, showing that deviations by more
than this amount are rare and can be handled separately by dropping a small amount of pack-
ets. The result fundamentally changes the performance-backlog tradeoff from [O(1/V ), O(V )] to
[O(1/V ), O(log2 (V ))] (within a logarithmic factor of the optimal tradeoff shown in (52)(51)(53)).
A disadvantage of this aggressive approach is that Lagrange multipliers must be known in
advance, which is difficult as they may depend on system statistics and they may be different for each
queue in the system.This is handled elegantly in a Last-In-First-Out (LIFO) implementation of the
drift-plus-penalty method, developed in (54). That LIFO can improve delay can be understood by
Fig. 4.2: First, a LIFO implementation would achieve all of the savings of the original place-holder
place
value of Q2 = 48 (at the cost of never serving the first 48 packets). Next, a LIFO implementation
72 4. OPTIMIZING TIME AVERAGES
would intuitively lead to delays of “most” packets that are on the order of the magnitude of noise
variations in the plateau area. That is, LIFO can achieve the more aggressive place-holder gains
without computing the Lagrange multipliers! This is formally proven in (55). Experiments with the
LIFO drift-plus-penalty method on an actual multi-hop wireless network deployment in (54) show
a dramatic improvement in delay (by more than an order of magnitude) for all but 2% of the packets.
Here we show that the same drift-plus-penalty algorithm provides similar [O(1/V ), O(V )] per-
formance guarantees when ω(t) varies according to a more general ergodic (possibly non-i.i.d.)
process. We then show it also provides efficient performance for arbitrary (possibly non-ergodic)
sample paths. The main proof techniques are the same as those we have already developed, with the
exception that we use a multi-slot drift analysis rather than a 1-slot drift analysis.
We consider the same system as in Section 4.2.1, with K queues with dynamics (4.23), and
attributes yl (t) = ŷl (α(t), ω(t)) for l ∈ {1, . . . , L}. For simplicity, we eliminate the attributes ej (t)
associated with equality constraints (so that J = 0). We seek an algorithm for choosing α(t) ∈ Aω(t)
every slot to minimize y 0 subject to mean rate stability of all queues Qk (t) and subject to y l ≤ 0
for all l ∈ {1, . . . , L}. The virtual queues Zl (t) for l ∈ {1, . . . , L} are the same as before, defined in
(4.40). For simplicity of exposition, we assume:
• The exact drift-plus-penalty algorithm of (4.48)-(4.49) is used, rather than a C-additive ap-
proximation (so that C = 0).
• The functions âk (·), b̂k (·), ŷl (·) are deterministically bounded, so that:
0 ≤ âk (α(t), ω(t)) ≤ akmax ∀k ∈ {1, . . . , K}, ∀ω(t), α(t) ∈ Aω(t) (4.74)
0 ≤ b̂k (α(t), ω(t)) ≤ bkmax ∀k ∈ {1, . . . , K}, ∀ω(t), α(t) ∈ Aω(t) (4.75)
yl ≤ ŷl (α(t), ω(t)) ≤ ylmax ∀l ∈ {0, 1, . . . , L}, ∀ω(t), α(t) ∈ Aω(t)
min
(4.76)
Define (t)= [Q(t), Z (t)], and define the Lyapunov function L((t)) as follows:
1 1
K L
L((t))= Qk (t)2 + Zl (t)2 (4.77)
2 2
k=1 l=1
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 73
We have the following preliminary lemma.
Lemma 4.11 (T -slot Drift) Assume (4.74)-(4.76) hold. For any slot t, any queue backlogs (t), and
any integer T > 0, the drift-plus-penalty algorithm ensures that:
−1
t+T −1
t+T
L((t + T )) − L((t)) + V ŷ0 (α(τ ), ω(τ )) ≤ DT + V 2
ŷ0 (α ∗ (τ ), ω(τ ))
τ =t τ =t
L −1
t+T
+ Zl (t) [ŷl (α ∗ (τ ), ω(τ ))]
l=1 τ =t
K −1
t+T
+ Qk (t) [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1 τ =t
1 max 2 1
K L
D= [(ak ) + (bkmax )2 ] + max[(ylmin )2 , (ylmax )2 ] (4.78)
2 2
k=1 l=1
K
L((τ + 1)) − L((τ )) ≤ D + Qk (τ )[âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))]
k=1
L
+ Zl (τ )ŷl (α(τ ), ω(τ ))
l=1
where D is defined in (4.78). We then add V ŷ0 (α(τ ), ω(τ )) to both sides. Because the drift-plus-
penalty algorithm is designed to choose α(τ ) to deterministically minimize the right-hand-side of
the resulting inequality when this term is added, it follows that:
Assume the state space S has a state “0” that we designate as a “renewal” state. Assume for simplicity
that (0) = 0, and let the sequence {T0 , T1 , T2 , . . .} represent the recurrence times
2to
state 0. Clearly
{Tr }∞
r=0 is an i.i.d. sequence with E {Tr } = 1/π 0 for all r. Define E {T } and E T as the first and
second moments of these recurrence times (so that E {T } = 1/π0 ). Define t0 = 0, and for integers
3This subsection (Subsection 4.9.1) assumes familiarity with DTMC theory and can be skipped without loss of continuity.
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 75
r
r > 0 define tr as the time of the rth revisitation to state 0, so that tr = j =1 Tj . We now define
the variable slot drift ((tr )) as follows:
((tr ))= E {L((tr+1 )) − L((tr ))|(tr )}
This drift represents the expected change in the Lyapunov function from renewal time tr to re-
newal time tr+1 , where the expectation is over the random duration of the renewal period and the
random events on each slot of this period. By plugging t = tr and T = Tr into Lemma 4.11 and
taking conditional expectations given (tr ), we have the following variable-slot drift-plus-penalty
expression:
tr +T
r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE Tr2 |(tr )
τ =tr
tr +T
r −1
∗
+V E ŷ0 (α (τ ), ω(τ ))|(tr )
τ =tr
tr +T
L r −1
+ Zl (tr )E ŷl (α ∗ (τ ), ω(τ ))|(tr )
τ =tr
tr +T
l=1
K r −1
+ Qk (tr )E [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]|(tr )
k=1 τ =tr
where α ∗ (τ ) are decisions from any other policy. First note that E Tr2 |(tr ) = E T 2 because the
renewal duration is independent of the queue state (tr ). Next, note that the conditional expectations
in the next three terms on the right-hand-side of the above inequality can be changed into pure
expectations (given that tr is a renewal time) under the assumption that the policy α ∗ (τ ) is ω-only.
Thus:
tr +T
r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 (4.79)
τ =tr
tr +T
r −1
+V E ŷ0 (α ∗ (τ ), ω(τ ))
τ =tr
tr +T
L r −1
∗
+ Zl (tr )E ŷl (α (τ ), ω(τ ))
τ =tr
tr +T
l=1
K r −1
∗ ∗
+ Qk (tr )E [âk (α (τ ), ω(τ )) − b̂k (α (τ ), ω(τ ))]
k=1 τ =tr
76 4. OPTIMIZING TIME AVERAGES
The expectations in the final terms are expected rewards over a renewal period, and so by basic
renewal theory (130)(66), we have for all l ∈ {0, 1, . . . , L} and all k ∈ {1, . . . , K}:
tr +T
r −1
E ŷl (α(τ ), ω(τ )) = E {T } yl∗ (4.80)
τ =tr
tr +T
r −1
E [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))] = E {T } (ak∗ − bk∗ ) (4.81)
τ =tr
where yl∗ , ak∗ , bk∗ are the infinite horizon time average values achieved for the ŷl (α ∗ (t), ω(t)),
âk (α ∗ (t), ω(t)), and b̂k (α ∗ (t), ω(t)) processes under the ω-only policy α ∗ (t). This basic renewal
theory fact can easily be understood as follows (with the below equalities holding with probability
1):4
t −1
1 R
yl∗ = lim ŷl (α ∗ (τ ), ω(τ ))
R→∞ tR
τ =0
R−1 tr +Tr −1
r=0 τ =tr ŷl (α ∗ (τ ), ω(τ ))
= lim R−1
R→∞
r=0 Tr
1 R−1 tr +Tr −1
limR→∞ R r=0 τ =tr ŷl (α ∗ (τ ), ω(τ ))
= R−1
limR→∞ R1 r=0 Tr
T0 −1 ∗
E τ =0 ŷl (α (τ ), ω(τ ))
=
E {T }
where the final equality holds by the strong law of large numbers (noting that both the numerator
and denominator are just a time average of i.i.d. quantities). In particular, the numerator is a sum of
i.i.d. quantities because the policy α ∗ (t) is ω-only, and so the sum penalty over each renewal period
is independent but identically distributed. Plugging (4.80)-(4.81) into (4.79) yields:
tr +T
r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 + V E {T } y0∗
τ =tr
L
K
+ Zl (t)E {T } yl∗ + Qk (t)E {T } (ak∗ − bk∗ )
l=1 k=1
The above holds for any time averages {yl∗ , ak∗ , bk∗ } that can be achieved by ω-only policies. However,
by Theorem 4.5, we know that if the problem is feasible, then either there is a single ω-only policy that
achieves time averages y0∗ = y0 , yl∗ ≤ 0 for all l ∈ {1, . . . , L}, (ak∗ − bk∗ ) ≤ 0 for all k ∈ {1, . . . , K},
opt
4 Because the processes are deterministically bounded and have time averages that converge with probability 1, the Lebesgue
Dominated Convergence Theorem (145) ensures the time average expectations are the same as the pure time averages (see
Exercise 7.9).
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 77
or there is an infinite sequence of ω-only policies that approach these averages. Plugging this into
the above yields:
tr +T
r −1
opt
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 + V E {T } y0
τ =tr
Taking expectations of the above, summing the resulting telescoping series over r ∈ {0, . . . , R − 1},
and dividing by V RE {T } yields:
R −1
t
E {L((tR ))} − E {L((0))} 1 opt DE T 2
+ E ŷ0 (α(τ ), ω(τ )) ≤ y0 +
V E {T } R E {T } R V E {T }
τ =0
Because tR /R → E {T } with probability 1 (by the law of large numbers), it can be shown that the
middle term has a lim sup that is equal to the lim sup time average expected penalty. Thus, assuming
E {L((0))} < ∞, we have:
1
t−1
opt DE T 2 opt
y 0 = lim sup E ŷ0 (α(τ ), ω(τ )) ≤ y0 + = y0 + O(1/V ) (4.82)
t→∞ t V E {T }
τ =0
where we note that the constants D, E {T }, and E T 2 do not depend on V . Similarly, it can
be shown that if the problem is feasible then all queues are mean rate stable, and if the slackness
condition of Assumption A1 holds, then sum average queue backlog is O(V ) (144). This leads to
the following theorem.
Theorem 4.12 (Markov Modulated Processes (144)) Assume the ω(t) process is modulated by the
DTMC (t) as described above, the boundedness assumptions (4.74)-(4.76) hold, E {L((0))} < ∞,
and that the drift-plus-penalty algorithm is used every slot t. If the problem is feasible, then:
opt
(a) The penalty satisfies (4.82), so that y 0 ≤ y0 + O(1/V ).
(b) All queues are mean rate stable, and so y l ≤ 0 for all l ∈ {1, . . . , L}.
(c) If the Slackness Assumption A1 holds, then all queues Qk (t) are strongly stable with average
backlog O(V ).
1 −1
(r+1)T
Minimize: cr = ŷ0 (α(τ ), ω(τ )) (4.83)
T
τ =rT
−1
(r+1)T
Subject to: 1) ŷl (α(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L}
τ =rT
−1
(r+1)T
2) [âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))] ≤ 0 ∀k ∈ {1, . . . , K}
τ =rT
3) α(τ ) ∈ Aω(τ ) ∀τ ∈ {rT , . . . , (r + 1)T − 1}
The value cr∗ thus represents the optimal empirical average penalty for frame r over all policies
that have full knowledge of the future ω(τ ) values over the frame and that satisfy the constraints.5
We assume throughout that the constraints are feasible for the above problem. Feasibility is often
guaranteed when there is an “idle” action, such as the action of admitting and transmitting no data,
which can be used on all slots to trivially satisfy the constraints in the form 0 ≤ 0.
Frame r consists of slots τ ∈ {rT , . . . , (r + 1)T − 1}. Let α ∗ (τ ) represent the decisions that
solve the T -slot lookahead problem (4.83) over this frame to achieve cost cr∗ .6 It is generally im-
possible to solve for the α ∗ (τ ) decisions, as these would require knowledge of the ω(τ ) values up to
T -slots into the future. However, the α ∗ (τ ) values exist, and can still be plugged into Lemma 4.11
to yield the following (using t = rT and T as the frame size):
+T −1
rT
L((rT + T )) − L((rT )) + V ŷ0 (α(τ ), ω(τ ))
τ =rT
+T −1
rT
L +T −1
rT
≤ DT 2 + V ŷ0 (α ∗ (τ ), ω(τ )) + Zl (rT ) [ŷl (α ∗ (τ ), ω(τ ))]
τ =rT l=1 τ =rT
K +T −1
rT
+ Qk (rT ) [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1 τ =rT
≤ DT 2 + V T cr∗
where the final inequality follows by noting that the α ∗ (τ ) policy satisfies the constraints of the
T -slot lookahead problem (4.83) and yields cost cr∗ .
5Theorem 4.13 holds exactly as stated in the extended case when c∗ is re-defined by a T -slot lookahead problem that al-
r
lows actions [(ỹl∗ (τ )), (ãk∗ (τ )), (b̃k∗ (τ ))] every slot τ to be taken within the convex hull of the set of all possible values of
[(ŷl (α, ω(τ ))), (âk (α, ω(τ ))), (b̂k (α, ω(τ )))] under α ∈ Aω(τ ) , but we skip this extension for simplicity of exposition.
6 For simplicity, we assume the infimum cost is achievable. Else, we can derive the same result by taking a limit over policies that
approach the infimum.
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 79
Summing the above over r ∈ {0, . . . , R − 1} (for any integer R > 0) yields:
RT −1
R−1
L((RT )) − L((0)) + V ŷ0 (α(τ ), ω(τ )) ≤ DT 2 R + V T cr∗ (4.84)
τ =0 r=0
Dividing by V T R, using the fact that L((RT )) ≥ 0, and rearranging terms yields:
RT −1
1 1 ∗ DT
R−1
L((0))
ŷ0 (α(τ ), ω(τ )) ≤ cr + + (4.85)
RT R V VTR
τ =0 r=0
where we recall that α(τ ) represents the decisions under the drift-plus-penalty algorithm. The
inequality (4.85) holds for all integers R > 0. When R is large, the final term on the right-hand-
side above goes to zero (this term is exactly zero if L((0)) = 0). Thus, we have that the time
average cost is within O(1/V ) of the time average of the cr∗ values. The above discussion proves part
(a) of the following theorem:
Theorem 4.13 (Universal Scheduling) Assume the ω(t) sample path satisfies the boundedness assump-
tions (4.74)-(4.76), and that initial queue backlog is finite. Fix any integers R > 0 and T > 0, and
assume the T -slot lookahead problem (4.83) is feasible for every frame r ∈ {0, 1, . . . , R − 1}. If the
drift-plus-penalty algorithm is implemented every slot t, then:
(a) The time average cost over the first RT slots satisfies (4.85). In particular,7
1 1 ∗
t−1 R−1
lim sup ŷ0 (α(τ ), ω(τ )) ≤ lim sup cr + DT /V
t→∞ t R→∞ R
τ =0 r=0
where cr∗ is the optimal cost in the T -slot lookahead problem (4.83) for frame r, and D is defined in (4.78).
(b) All actual and virtual queues are rate stable, and so we have:
1
t−1
lim sup ŷl (α(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L}
t→∞ t
τ =0
(c) Suppose there exists an > 0 and a sequence of decisions α̃(τ ) ∈ Aω(τ ) that satisfies the following
slackness assumptions for all frames r:
+T −1
rT
ŷl (α̃(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L} (4.86)
τ =rT
+T −1
rT
1
[âk (α̃(τ ), ω(τ )) − b̂k (α̃(τ ), ω(τ ))] ≤ − ∀k ∈ {1, . . . , K} (4.87)
T
τ =rT
7 It is clear that the lim sup over times sampled every T slots is the same as the regular lim sup because the ŷ (·) values are bounded.
0
t/T T t/T T
Indeed, we have τ =0 ŷ0 (α(τ ), ω(τ )) + T y0min ≤ tτ =0 ŷ0 (α(τ ), ω(τ )) ≤ τ =0 ŷ0 (α(τ ), ω(τ )) + T y0max . Dividing
both sides by t and taking limits shows these limits are equal.
80 4. OPTIMIZING TIME AVERAGES
Then:
1 V (y0max − y0min ) T − 1
t−1 K K
DT
lim sup Qk (τ ) ≤ + + max[akmax , bkmax ]
t→∞ t 2
τ =0 k=1 k=1
Proof. Part (a) has already been shown in the above discussion. We provide a summary of parts (b)
and (c): The inequality (4.84) plus the boundedness assumptions (4.74)-(4.76) imply that there is
a finite constant F > 0 such that L((RT )) ≤ F R for all R. By an argument similar to part (a)
of Theorem 4.1, it can then be shown that limR→∞ Qk (RT )/(RT ) = 0 for all k ∈ {1, . . . , K} and
limR→∞ Zl (RT )/(RT ) = 0 for all l ∈ {1, . . . , L}. Further, these limits that sample only on slots
RT (as R → ∞) are clearly the same when taken over all t → ∞ because the queues can change
by at most a constant proportional to T in between the sample times. This proves part (b).
Part (c) follows by plugging the policy α̃(τ ) for τ ∈ {rT , . . . , (r + 1)T − 1} into Lemma 4.11
and using (4.86)-(4.87) to yield:
+T −1
rT
K
L((rT + T )) − L((rT )) + V ŷ0 (α(τ ), ω(τ )) ≤ DT 2
+ V T y0max −T Qk (rT )
τ =rT k=1
and hence:
K
L((rT + T )) − L((rT )) ≤ DT 2 + V T (y0max − y0min ) − T Qk (rT )
k=1
−1
K T
≤ DT 2 + V T (y0max − y0min ) − Qk (rT + j )
k=1 j =0
−1
K T
+ j max[akmax , bkmax ]
k=1 j =0
−1
K T
= DT 2 + V T (y0max − y0min ) − Qk (rT + j )
k=1 j =0
(T − 1)T
K
+ max[akmax , bkmax ]
2
k=1
2
Inequality (4.85) holds for all R and T , and hence it can be viewed as a family of bounds that
apply to the same sample path under the drift-plus-penalty algorithm. Note also that increasing the
value of T changes the frame size and typically improves the cr∗ values (as it allows these values to be
achieved with a larger future lookahead). However, this affects the error term DT /V , requiring V
to also be increased as T increases. Increasing V creates a larger queue backlog. We thus see a similar
[O(1/V ), O(V )] cost-backlog tradeoff for this sample path context. If the slackness assumptions
(4.86)-(4.87) are modified to also include slackness in the yl (·) constraints, a modified argument
can be used to show the worst case queue backlog is bounded for all time by a constant that is O(V )
(see also (146)(39)(38)).
R−1 ∗
The target value R1 r=0 cr that we use for comparison does not represent the optimal cost
that can be achieved over the full horizon RT if the entire future were known. However, when T is
large it still represents a meaningful target that is not trivial to achieve, as it is one that is defined in
terms of an ideal policy with T -slot lookahead. It is remarkable that the drift-plus-penalty algorithm
can closely track such an “ideal” T -slot lookahead algorithm.
4.10 EXERCISES
Exercise 4.1. Let Q = (Q1 , . . . , QK ) and L(Q) = 21 K 2
k=1 Qk .
√
a) If L(Q) ≤ 25, show that Qk ≤ 50 for all k ∈ {1, . . . , K}.
√
b) If L(Q) > 25, show that Qk > 50/K for at least one queue k ∈ {1, . . . , K}.
c) Let K = 2. Plot the region of all non-negative vectors (Q1 , Q2 ) such that L(Q) = 2. Also
plot for L(Q) = 2.5. Give an example where L(Q1 (t), Q2 (t)) = 2.5, L(Q1 (t + 1), Q2 (t + 1)) =
2, but where Q1 (t) < Q1 (t + 1).
Exercise 4.3. Let Q(t) be a discrete time vector process with Q(0) = 0, and let f (t) and g(t)
be discrete time real valued processes. Suppose there is a non-negative function L(Q(t)) such that
82 4. OPTIMIZING TIME AVERAGES
L(0) = 0, and such that its conditional drift (Q(t)) satisfies the following every slot τ and for all
possible Q(τ ):
(Q(τ )) + E {f (τ )|Q(τ )} ≤ E {g(τ )|Q(τ )}
a) Use the law of iterated expectations to prove that:
b) Use telescoping sums together with part (a) to prove that for any t > 0:
1 t−1 1 t−1
t τ =0 E {f (τ )} ≤ t τ =0 E {g(τ )}
Exercise 4.5. (The Drift-Plus-Penalty Method) Explain, using the game of opportunistically min-
imizing an expectation described in Section 1.8, how choosing α(t) ∈ Aω(t) according to (4.48)-
(4.49) minimizes the right-hand-side of (4.44).
where yl∗ (t), ej∗ (t), ak∗ (t), bk∗ (t) represent decisions under any other (possibly randomized) action
α ∗ (t) that can be made on slot t (so that yl∗ (t) = ŷl (α ∗ (t), ω(t)), etc.).
a) Define h = (h1 , . . . , hJ ) by:
−1 if Hj (t) ≥ 0
hj =
1 if Hj (t) < 0
Using this h, plug the ω-only policy α ∗ (t) from (4.88)-(4.91) into the right-hand-side of (4.92) to
obtain:
b) Assume that (4.16)-(4.17) hold for y0 (t), and that the fourth moment assumption (4.18)
holds. Use this with part (a) to obtain probability 1 bounds on the lim sup time average queue backlog
via Theorem 4.4.
c) Now consider the ω-only policy that yields (4.53)-(4.56), and plug this into the right-hand-
side of (4.92) to yield a probability 1 bound on the lim sup time average of y0 (t), again by Theorem
4.4.
Exercise 4.7. (Min Average Power (21)) Consider a wireless downlink with arriving data a(t) =
(a1 (t), . . . , aK (t)) every slot t. The data is stored in separate queues Q(t) = (Q1 (t), . . . , QK (t))
for transmission over K different channels. The update equation is (4.23). Service variables bk (t) are
determined by a power allocation vector P (t) = (P1 (t), . . . , PK (t)) according to bk (t) = log(1 +
Sk (t)Pk (t)), where log(·) denotes the natural logarithm, and S (t) = (S1 (t), . . . , SK (t)) is a vector
of channel attenuations. Assume that S (t) is known at the beginning of each slot t, and satisfies
0 ≤ Sk (t) ≤ 1 for all k. Power is allocated subject to P (t) ∈ A, where A is the set of all power
vectors with at most one non-zero element and such that 0 ≤ Pk ≤ Pmax for all k ∈ {1, . . . , K},
where Pmax is a peak power constraint. Assume that the vectors a(t) and S (t) are i.i.d. over slots,
and that 0 ≤ ak (t) ≤ akmax for all t, for some finite constants akmax .
84 4. OPTIMIZING TIME AVERAGES
a) Using ω(t)= (a(t), S (t)), α(t) = P (t), J = 0, L = 0, y0 (t) = K
k=1 Pk (t), state the drift-
plus-penalty algorithm for a fixed V in this context.
b) Assume we use an exact implementation of the algorithm in part (a) (so that C = 0), and
that the problem is feasible. Use Theorem 4.8 to conclude that all queues are mean rate stable, and
compute a value B such that:
1
t−1 K
opt
lim sup E {Pk (τ )} ≤ Pav + B/V
t→∞ t
τ =0 k=1
opt
where Pav is the minimum average power over any stabilizing algorithm.
c) Assume Assumption A1 holds for a given > 0. Use Theorem 4.8c to give a bound on the
time average sum of queue backlog in all queues.
Exercise 4.9. (Maximum Throughput Subject to Peak and Average Power Constraints (21)) Con-
sider the same system of Exercise 4.7, with the exception that it is now a wireless uplink, and queue
backlogs now satisfy:
Qk (t + 1) = max[Qk (t) − bk (t), 0] + xk (t)
where xk (t) is a flow control decision for slot t, made subject to the constraint 0 ≤ xk (t) ≤ ak (t) for all
t. The control action is now a joint flow control and power allocation decision α(t) = [x(t), P (t)].
We want the average power expenditure over each link k to be less than or equal to Pkav , where Pkav
is a fixed constant for each k ∈ {1, . . . , K} (satisfying Pkav ≤ Pmax ). The new goal is to maximize
a weighted sum of admission rates K k=1 θk x k subject to queue stability and to all average power
constraints, where {θ1 , . . . , θK } are a given set of positive weights.
a) Using J = 0, L = K, y0 (t) = − K k=1 θk xk (t), and a fixed V , state the drift-plus-penalty
algorithm for this problem. Note that the constraints P k ≤ Pkav should be enforced by virtual queues
Zk (t) of the form (4.40) with a suitable definition of yk (t).
4.10. EXERCISES 85
b) Use Theorem 4.8 to conclude that all queues are mean rate stable (and hence all average
power constraints are met), and compute a value B such that:
1
t−1 K
lim inf θk E {xk (τ )} ≥ util opt − B/V
t→∞ t
τ =0 k=1
where util opt is the optimal weighted sum of admitted rates into the network under any algorithm
that stabilizes the queues and satisfies all average power constraints.
c) Show that the algorithm is such that xk (t) = 0 whenever Qk (t) > V θk . Assume that all
queues are initially empty, and compute values Qmax k such that Qk (t) ≤ Qmax k for all t ≥ 0 and
all k ∈ {1, . . . , K}. This shows that queues are deterministically bounded, even without the Slater
condition of Assumption A1.
d) Show that the algorithm is such that Pk (t) = 0 whenever Zk (t) > Qk (t). Conclude that
max
Zk (t) ≤ Zkmax , where Zkmax is defined Zkmax = Qk + (Pmax − Pkav ).
e) Use part (d) and the sample path input-output inequality (2.3) to conclude that for any
positive integer T , the total power expended by each link k over any T -slot interval is deterministically
less than or equal to T Pkav + Zkmax . That is:
t0 +T
−1
Pk (τ ) ≤ T Pkav + Zkmax ∀t0 ∈ {0, 1, 2, . . .}, ∀T ∈ {1, 2, 3, . . .}
τ =t0
f ) Suppose link k is a wireless transmitter with a battery that has initial energy Ek . Use part
(e) to provide a guarantee on the lifetime of the link.
Exercise 4.10. (Out-of-Date Queue Backlog Information) Consider the K-queue problem with
L = J = 0, and 0 ≤ ak (t) ≤ amax and 0 ≤ bk (t) ≤ bmax for all k and all t, for some finite constants
amax and bmax . The network controller attempts to perform the drift-plus-penalty algorithm (4.48)-
(4.49) every slot. However, it does not have access to the current queue backlogs Qk (t), and only
receives delayed information Qk (t − T ) for some integer T ≥ 0. It thus uses Qk (t − T ) in place
of Qk (t) in (4.48). Let α ideal (t) be the optimal decision of (4.48)-(4.49) in the ideal case when
current queue backlogs Qk (t) are used, and let α approx (t) be the implemented decision that uses the
out-of-date queue backlogs Qk (t − T ). Show that α approx (t) yields a C-additive approximation for
some finite constant C. Specifically, compute a value C such that:
K
V ŷ0 (α approx
(t), ω(t)) + Qk (t)[âk (α approx (t), ω(t)) − b̂k (α approx (t), ω(t))] ≤
k=1
K
V ŷ0 (α ideal (t), ω(t)) + Qk (t)[âk (α ideal (t), ω(t)) − b̂k (α ideal (t), ω(t))] + C
k=1
86 4. OPTIMIZING TIME AVERAGES
This shows that we can still optimize the system and provide stability with out-of-date queue backlog
information. Treatment of delayed queue information for Lyapunov drift arguments was perhaps
first used in (147), where random delays without a deterministic bound are also considered.
t 0 1 2 3 4 5 6 7 8
Arrivals a1 (t) 3 0 3 0 0 1 0 1 0
a2 (t) 2 0 1 0 1 1 0 0 0
Channels S1 (t) G G M M G G M M G
S2 (t) M M B M B M B G B
Max Qi bi Q1 (t) 0 3 0 3 1 0 1 1 2
Policy Q2 (t) 0 2 2 2 2 3 2 1 0
Figure 4.3: Arrivals, channel conditions, and queue backlogs for a two queue wireless downlink.
Exercise 4.11. (Simulation) Consider a 2-queue system with time varying channels (S1 (t), S2 (t)),
where Si (t) ∈ {G, M, B}, representing “Good,” “Medium,” “Bad” channel conditions for i ∈ {1, 2}.
Only one channel can be served per slot. All packets have fixed length, and 3 packets can be served
when a channel is “Good,” 2 when “Medium,” and 1 when “Bad.” Exactly one unit of power is
expended when we serve any channel (regardless of its condition). A sample path example is given in
Fig. 4.3, which expends 8 units of power over the first 9 slots under the policy that serves the queue
that yields the largest Qi (t)bi (t) value, which is a special case of the drift-plus-penalty algorithm
for K = 2, J = L = 0, V = 0.
a) Given the full future arrival and channel events as shown in the table, and given Q1 (0) =
Q2 (0) = 0, select a different set of channels to serve over slots {0, 1, . . . , 8} that also leaves the
system empty on slot 9, but that minimizes the amount of power required to do so (so that more
than 1 slot will be idle). How much power is used?
b) Assume these arrivals and channels are repeated periodically every 9 slots. Simulate the
system using the drift-plus-penalty policy of choosing the queue i that maximizes Qi (t)bi (t) − V
whenever this quantity is non-negative, and remains idle if this is negative for both i = 1 and i = 2.
Find the empirical average power expenditure and the empirical average queue backlog over 106
slots when V = 0. Repeat for V = 1, V = 5, V = 10, V = 20, V = 50, V = 100, V = 200.
c) Repeat part (b) in the case when arrival vectors (a1 (t), a2 (t)) and channel vectors
(S1 (t), S2 (t)) are independent and i.i.d. over slots with the same empirical distribution as that
achieved over 9 slots in the table, so that P r[(a1 , a2 ) = (3, 2)] = 1/9, P r[(S1 , S2 ) = (G, M)] =
3/9, P r[(S1 , S2 ) = (M, B)] = 2/9, etc. Note: You should find that the resulting minimum power that is
approached as V is increased is the same as part (b), and is strictly less than the empirical power expenditure
of part (a).
4.10. EXERCISES 87
d) Show that queue i is only served if Qi (t) ≥ V /3. Conclude that Qi (t) ≥ max[V /3 −
3, 0]=Qplace for all t, provided that this inequality holds for Qi (0). Hence, using Qplace place-holder
packets would reduce average backlog by exactly this amount, with no loss of power performance.
Exercise 4.12. (Wireless Network Coding) Consider a system of 4 wireless users that communicate
to each other through a base station (Fig. 4.4). User 1 desires to send data to user 2 and user 2 desires
to send data to user 1. Likewise, user 3 desires to send data to user 4 and user 4 desires to send data
to user 3.
Base Base
Station Station
p2 p3+p4 p3+p4
p4
p3+p4
p3+p4
p1 p3
2 2
4 4
1 3 1 3
Phase 1: Phase 2:
Uplink transmission Downlink Broadcast
of different packets pi of an XORed packet
Let t ∈ {0, 1, 2, . . .} index a cycle. Each cycle t is divided into 2 phases: In the first phase,
users 1, 2, 3, and 4 all send a new packet (if any) to the base station (this can be accomplished, for
example, using TDMA or FDMA in the first phase). In the second phase, the base station makes
a transmission decision α(t) ∈ {{1, 2}, {3, 4}}. If α(t) = {1, 2}, the head-of-line packets for users 1
and 2 are XORed together, XORing with 0 if only one packet is available, and creating a null packet
if no packets from users 1 or 2 are available. The XORed packet (or null packet) is then broadcast
to all users. We assume all packets are labeled with sequence numbers, and the sequence numbers of
both XORed packets are placed in a packet header. As in (148), users 1 and 2 can decode the new
data if they keep copies of the previous packets they sent. If α(t) = {3, 4}, a similar XOR operation
is done for user 3 and 4 packets.
Assume that downlink channel conditions are time-varying and known at the beginning of
each cycle, with channel state vector S(t) = (S1 (t), S2 (t), S3 (t), S4 (t)), where Si (t) ∈ {ON, OF F }.
Only users with ON channel states can receive the transmission. The queueing dynamics from one
cycle to the next thus satisfy:
and ak (t) is the number of packets arriving over the uplink from node k during cycle t (notice that
data destined for node 1 arrives as the process a2 (t), etc.). Suppose that S(t) is i.i.d. over cycles, with
probabilities πs = P r[S(t) = s], where s = (S1 , S2 , S3 , S4 ). Arrivals ak (t) are i.i.d. over cycles with
rate λk = E {ak (t)}, for k ∈ {1, . . . , 4}, and with bounded second moments.
a) Suppose that S(t) = (ON, ON, OF F, ON) and that Qk (t) > 0 for all queues k ∈
{1, 2, 3, 4}. It is tempting to assume that mode α(t) = {1, 2} is the best choice in this case,
although this is not always true. Give an example where it is impossible to stabilize the sys-
tem if the controller always chooses α(t) = {1, 2} whenever S(t) = (ON, ON, OF F, ON) or
S(t) = (ON, ON, ON, OF F ), but where a more intelligent control choice would stabilize the
system.8
b) Define L(Q(t)) = 21 4k=1 Qk (t)2 . Compute (Q(t)) and show it has the form:
4
(Q(t)) ≤ B − E Q
k=1 k (t)[b k (t) − λ ]
m(k) Q(t) (4.93)
where m(1) = 2, m(2) = 1, m(3) = 4, m(4) = 3, and where B < ∞. Design a control policy that
observes S(t) and chooses actions α(t) to minimize the right-hand-side of (4.93) over all feasible
control policies.
c) Consider all possible S-only algorithms that choose a transmission mode as a stationary
and random function of the observed S(t) (and independent of queue backlog). Define the S-only
throughput region as the set of all (λ1 , λ2 , λ3 , λ4 ) vectors for which there exists an S-only policy
α ∗ (t) such that:
E b̂1 (α ∗ (t), S(t)), b̂2 (α ∗ (t), S(t)), b̂3 (α ∗ (t), S(t)), b̂4 (α ∗ (t), S(t)) ≥ (λ2 , λ1 , λ4 , λ3 )
Exercise 4.13. (A modified algorithm) Suppose the conditions of Theorem 4.8 hold. However,
suppose that every slot t we observe (t), ω(t) and choose an action α(t) ∈ Aω(t) that minimizes the
8 It can also be shown that an algorithm that always chooses α(t) = {1, 2} under states (ON, ON, OF F, ON ) or
(ON, ON, ON, OF F ) and when there are indeed two packets to serve will not necessarily work—we need to take queue
length into account. See (10) for related examples in the context of a 3 × 3 packet switch.
4.10. EXERCISES 89
exact drift-plus-penalty expression ((t)) + V E ŷ0 (α(t), ω(t))|(t) , rather than minimizing
the upper bound on the right-hand-side of (4.44).
a) Show that the same performance guarantees of Theorem 4.8 hold.
b) Using (2.2), state this algorithm (for C = 0) in the special case when L = J = 0,
yl (t) = ej (t) = 0, ω(t) = [(a1 (t), . . . , aK (t)), (S1 (t), . . . , SK (t))], âk (α(t), ω(t)) = ak (t), α(t) ∈
{1, . . . , K} (representing a single queue that we serve every slot), and:
Sk (t) if α(t) = k
b̂k (α(t), ω(t)) =
0 if α(t) = k
a(t) b(t)
(A(t), (t)) Compressor Q(t)
Distortion d(t)
Exercise 4.14. (Distortion-Aware Data Compression (143)) Consider a single queue Q(t) with
dynamics (2.1), where b(t) is an i.i.d. transmission rate process with bounded second moments. As
shown in Fig. 4.5, the arrival process a(t) is generated as the output of a data compression operation.
Specifically, every slot t a new packet of size A(t) bits arrives to the system (where A(t) = 0 if
no packet arrives). This packet has meta-data β(t), where β(t) ∈ B , where B represents a set of
different data types. Assume the pair (A(t), β(t)) is i.i.d. over slots. Every slot t, a network controller
observes (A(t), β(t)) and chooses a data compression option c(t) ∈ {0, 1, . . . , C}, where c(t) indexes a
collection of possible data compression algorithms.The output of the compressor is a compressed packet
of random size a(t) = â(A(t), β(t), c(t)), causing a random distortion d(t) = d̂(A(t), β(t), c(t)).
Note that â(·) and d̂(·) are random functions. Assume the pair (a(t), d(t)) is i.i.d. over all slots with
the same A(t), β(t), c(t). Define functions m(A, β, c) and δ(A, β, c) as follows:
m(A, β, c) = E â(A(t), β(t), c(t))|A(t) = A, β(t) = β, c(t) = c
δ(A, β, c) = E d̂(A(t), β(t), c(t))|A(t) = A, β(t) = β, c(t) = c
1 1 1
K L J
L((t)) = wk Qk (t)2 + Zl (t)2 + Hj (t)2
2 2 2
k=1 l=1 j =1
where {wk }K
k=1 are a positive weights. How does the drift-plus-penalty algorithm change?
Y(t)
Q1(t) 1(t)
a1(t) Q3(t) 3(t)
X(t)
a2(t)
Q2(t)
2(t)
Exercise 4.16. (Multi-Hop with Orthogonal Channels) Consider the 3-node wireless network of
Fig. 4.6.The network operates in discrete time with unit time slots t ∈ {0, 1, 2, . . .}. It has orthogonal
channels, so that node 3 can send and receive at the same time. The network controller makes power
allocation decisions and routing decisions.
• (Power Allocation) Let μi (t) be the transmission rate at node i on slot t, for i ∈ {1, 2, 3}. This
transmission rate depends on the channel state Si (t) and the power allocation decision Pi (t)
by the following function:
• (Routing) There are two arrival processes X(t) and Y (t), taking units of bits. The X(t) process
can be routed to either queue 1 or 2. The Y (t) process goes directly into queue 3. Let a1 (t)
and a2 (t) represent the routing decision variables, where a1 (t) is the amount of bits routed to
queue 1, and a2 (t) is the amount of bits routed to queue 2. The network controller observes
X(t) every slot and makes decisions for (a1 (t), a2 (t)) subject to the following constraints:
a1 (t) ≥ 0 , a2 (t) ≥ 0 , a1 (t) + a2 (t) = X(t) ∀t
It can be shown that the Lyapunov drift (Q(t)) satisfies the following every slot t:
(Q(t)) ≤ B + Q1 (t)E {a1 (t) − μ1 (t)|Q(t)} + Q2 (t)E {a2 (t) − μ2 (t)|Q(t)}
+Q3 (t)E {μ1 (t) + Y (t) − μ3 (t)|Q(t)}
where B is a positive constant. We want to design a dynamic algorithm that solves the following
problem:
Minimize: P1 + P2 + P3
Subject to: 1) Qi (t) is mean rate stable ∀i ∈ {1, 2, 3}
2) a1 (t) ≥ 0 , a2 (t) ≥ 0 , a1 (t) + a2 (t) = X(t) ∀t
3) 0 ≤ Pi (t) ≤ 1 ∀i ∈ {1, 2, 3}, ∀t
a) Using a fixed parameter V > 0, state the drift-plus-penalty algorithm for this problem.
The algorithm should have separable power allocation and routing decisions.
b) Suppose that V = 20, Q1 (t) = 50, Q2 (t) = Q3 (t) = 20, S1 (t) = S2 (t) = S3 (t) = 1.
What should the value of P1 (t) be under the drift-plus-penalty algorithm? (give a numeric value)
c) Suppose (X(t), Y (t)) is i.i.d. over slots with E {X(t)} = λX and E {Y (t)} = λY .
Suppose (S1 (t), S2 (t), S3 (t)) is i.i.d. over slots. Suppose there is a stationary and ran-
domized policy that observes (X(t), Y (t), S1 (t), S2 (t), S3 (t)) every slot t, and makes ran-
domized decisions (a1∗ (t), a2∗ (t), P1∗ (t), P2∗ (t), P3∗ (t)) based only on the observed vector
(X(t), Y (t), S (t), S (t), S (t)). State desirable properties for the expectations of E a ∗ (t) ,
1 2 3 1
E a2∗ (t) , E log(1 + Pi∗ (t)Si (t)) for i ∈ {1, 2, 3} that would ensure your algorithm of part (a)
would make all queues mean rate stable with time average expected power expenditure given by:
P 1 + P 2 + P 3 ≤ φ + B/V
where φ is a desired value for the sum time average power. Your properties should be in the form of
desirable inequalities.
92 4. OPTIMIZING TIME AVERAGES
lim x(ti ) = x
i→∞
ω-only policies. Equivalently, this can be viewed as the region of all one-slot expectations that can
be achieved via randomized decisions when the ω(t) variable takes values according to its stationary
distribution. The boundedness assumptions (4.25)-(4.30) ensure that the set is bounded. It is easy
to show that is also convex by using an ω-only policy that is a mixture of two other ω-only policies.
Now note that for any slot τ and assuming that ω(τ ) has its stationary distribution, the one-
slot expectation under any decision α(τ ) ∈ Aω(τ ) is in the set , even if that decision is from an
arbitrary policy that is not an ω-only policy. That is:
E [(ŷl (α(τ ), ω(τ )), (êj (α(τ ), ω(τ )), (âk (α(τ ), ω(τ )), (b̂k (α(τ ), ω(τ ))] ∈
where the expectation is with respect to the random ω(τ ) (which has the stationary distribution)
and the possibly random α(τ ) that is made by the policy in reaction to the observed ω(τ ). This
expectation is in because any sample path of events that lead to the policy choosing α(τ ) on
slot τ simply affects the conditional distribution of α(τ ) given the observed ω(τ ), and hence the
expectation can be equally achieved by the ω-only policy that uses the same conditional distribution.9
This observation directly leads to the following simple lemma.
Lemma 4.17 If ω(τ ) is in its stationary distribution for all slots τ , then for any policy that chooses
α(τ ) ∈ Aω(τ ) over time (including policies that are not ω-only), we have for any slot t > 0:
1
t−1
E [(ŷl (α(τ ), ω(τ )), (êj (α(τ ), ω(τ )), (âk (α(τ ), ω(τ )), (b̂k (α(τ ), ω(τ ))] ∈ (4.94)
t
τ =0
9 We implicitly assume that the decision α(τ ) on slot τ has a well defined conditional distribution.
4.11. APPENDIX 4.A — PROVING THEOREM 4.5 93
Thus, if r∗ is a limit point of the time average on the left-hand-side of (4.94) over a subsequence of times
ti that increase to infinity, then r ∗ is in the closure of .
Proof. Each term in the time average is itself in , and so the time average is also in because is
convex. 2
Thus, the finite horizon time average expectation under any policy cannot escape the set ,
and any infinite horizon time average that converges to a limit point cannot escape the closure of
. If the set is closed, then any limit point r ∗ is inside and hence (by definition of ) can
be exactly achieved as the one-slot average under some ω-only policy. If is not closed, then r ∗
can be achieved arbitrarily closely (i.e., within a distance δ, for any arbitrarily small δ > 0), by an
ω-only policy. This naturally leads to the following characterization of optimality in terms of ω-only
policies.
It can be shown that, if non-empty, ˜ is closed and bounded. If ˜ is non-empty, define y0∗ as the
˜ Intuitively, the set ˜
minimum value of y0 for which there is a point [(yl ), (ej ), (ak ), (bk )] ∈ .
is the set of all time averages achievable by ω-only policies that meet the required time average
constraints and that have time average expected arrivals less than or equal to time average expected
service, and y0∗ is the minimum time average penalty achievable by such ω-only policies. We now
show that y0∗ = y0 .
opt
Theorem 4.18 Suppose the ω(t) process is stationary with distribution π(ω), and that the system
satisfies the boundedness assumptions (4.25)-(4.30) and the law of large numbers assumption specified in
Section 4.2. Suppose the problem (4.31)-(4.35) is feasible. Let α(t) be any control policy that satisfies the
constraints (4.32)-(4.35), and let r (t) represent the t-slot expected time average in the left-hand-side of
(4.94) under this policy.
a) Any limit point [(yl ), (ej ), (ak ), (bk )] of {r (t)}∞ ˜ ˜
t=1 is in the set . In particular, the set is
non-empty.
b) The time average expected penalty under the algorithm α(t) satisfies:
1
t−1
lim inf E ŷ0 (α(t), ω(t)) ≥ y0∗ (4.96)
t→∞ t
τ =0
Thus, no algorithm that satisfies the constraints (4.32)-(4.35) can yield a time average expected penalty
smaller than y0∗ . Further, y0∗ = y0 .
opt
94 4. OPTIMIZING TIME AVERAGES
Proof. To prove part (a), note from Lemma 4.17 that r (t) is always inside the (bounded) set .
Hence, it has a limit point, and any such limit point is in the closure of . Now consider a particular
limit point [(yl ), (ej ), (ak ), (bk )], and let {ti }∞
i=1 be the subsequence of non-negative integer time
slots that increase to infinity and satisfy:
Because the constraints (4.32) and (4.33) are satisfied, it must be the case that:
Further, by the sample-path inequality (2.5), we have for all ti > 0 and all k:
ti −1
E {Qk (ti )} E {Qk (0)} 1
− ≥ E âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))
ti ti ti
τ =0
Because the control policy makes all queues mean rate stable, taking a limit of the above over the
times ti → ∞ yields 0 ≥ ak − bk , and hence we find that:
ak ≤ bk ∀k ∈ {1, . . . , K} (4.98)
˜
The results (4.97) and (4.98) imply that the limit point [(yl ), (ej ), (ak ), (bk )] is in the set .
To prove part (b), let {ti }∞
i=1 be a subsequence of non-negative integer time slots that increase
to infinity, that yield the lim inf by:
ti −1
1 1
t−1
lim E ŷ0 (α(τ ), ω(τ )) = lim inf E ŷ0 (α(τ ), ω(τ )) (4.99)
i→∞ ti t→∞ t
τ =0 τ =0
and that yield well defined time averages [(yl ), (ej ), (ak ), (bk )] for r (ti ) (such a subsequence can be
constructed by first taking a subsequence {ti } that achieves the lim inf, and then taking a convergent
subsequence {ti } of {ti } that ensures the r (ti ) values converge to a limit point). Then by part (a), we
˜ and so its y0 component (being the lim inf value in (4.99)) is
know that [(yl ), (ej ), (ak ), (bk )] ∈ ,
∗
greater than or equal to y0 because y0∗ , is the smallest possible y0 value of all points in . ˜
It follows that no control algorithm that satisfies the required constraints has a time average
expected penalty less than y0∗ . We now show that it is possible to achieve y0∗ , and so y0∗ = y0 . For
opt
simplicity, we consider only the case when is closed. Let [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] be the point in ˜
that has component y0∗ . Because is closed, ˜ is a subset of , and so [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] ∈ . It
follows there is an ω-only algorithm α ∗ (t) with expectations exactly equal to [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )]
on every slot t. Thus, the time average penalty is y0∗ , and the constraints (4.32), (4.33) are satisfied
because yl∗ ≤ 0 for all l ∈ {1, . . . , L}, ej∗ = 0 for all j ∈ {1, . . . , J }. Further, our “law-of-large-
number” assumption on ω(t) ensures the time averages of âk (α ∗ (t), ω(t)) and b̂k (α ∗ (t), ω(t)),
4.11. APPENDIX 4.A — PROVING THEOREM 4.5 95
achieved under the ω-only algorithm α ∗ (t),
are equal to and ak∗ bk∗
with probability 1. Because
∗ ∗
ak ≤ bk and the second moments of ak (t) and bk (t) are bounded by a finite constant σ 2 for all t,
the Rate Stability Theorem (Theorem 2.4) ensures that all queues Qk (t) are mean rate stable. 2
We use this result to prove Theorem 4.5.
Proof. (Theorem 4.5) Let [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] be the point in ˜ that has component y0∗ (where
y0∗ = y0 by Theorem 4.18). Note by definition that ˜ is in the closure of . If is closed,
opt
then [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] ∈ and so there exists an ω-only policy α ∗ (t) that achieves the av-
erages [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] and thus satisfies (4.36)-(4.39) with δ = 0. If is not closed, then
[(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] is a limit point of and so there is an ω-only policy that gets arbitrarily close
to [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )], yielding (4.36)-(4.39) for any δ > 0. 2
The above proof shows that if the assumptions of Theorem 4.5 hold and if the set is closed,
then an ω-only policy exists that satisfies the inequalities (4.36)-(4.39) with δ = 0.
97
CHAPTER 5
where γm,min and γm,max are finite constants (we typically choose γm,min = 0 in cases when attributes
xm (t) are non-negative).This rectangle constraint is useful because it limits the x vector to a bounded
region, and it will ensure that the auxiliary variables that we soon define are also bounded. While
this x ∈ R constraint may limit optimality, it is clear that φ opt increases to the maximum utility of
the problem without this constraint as the rectangle R is expanded. Further, φ opt is exactly equal to
the maximum utility of the original problem (5.2)-(5.5) whenever the rectangle R is chosen large
enough to contain a time average attribute vector x that is optimal for the original problem.
1 1
t−1 t−1
γ= lim E {γ (τ )} , φ(γ )= lim E {φ(γ (τ ))} (5.12)
t→∞ t t→∞ t
τ =0 τ =0
where we temporarily assume the above limits exist. We have used the fact that the rectangle R is a
closed set to conclude that a limit of vectors in R is also in R.
In summary, whenever the limits of γ and φ(γ ) exist, we can conclude by Jensen’s inequality
that φ(γ ) ≥ φ(γ ). That is, the utility function evaluated at the time average expectation γ is greater
than or equal to the time average expectation of φ(γ (t)).
Assume that ω(t) is i.i.d., and that yl (t), xm (t), ak (t), bk (t) satisfy the boundedness assump-
tions (4.25)-(4.28). It is easy to show the drift-plus-penalty expression satisfies:
L
((t)) − V E {φ(γ (t))|(t)} ≤ D − V E {φ(γ (t))|(t)} + Zl (t)E {yl (t)|(t)}
l=1
K
M
+ Qk (t)E {ak (t) − bk (t)|(t)} + Gm (t)E {γm (t) − xm (t)|(t)} (5.21)
k=1 m=1
where D is a finite constant related to the worst-case second moments of yl (t), xm (t), ak (t), bk (t).
A C-additive approximation chooses γ (t) ∈ R and α(t) ∈ Aω(t) such that, given (t), the right-
hand-side of (5.21) is within C of its infimum value. A 0-additive approximation thus performs the
following:
• (Auxiliary Variables) For each slot t, observe G(t) and choose γ (t) to solve:
Maximize: V φ(γ (t)) − M m=1 Gm (t)γm (t) (5.22)
Subject to: γm,min ≤ γm (t) ≤ γm,max ∀m ∈ {1, . . . , M} (5.23)
5.1. SOLVING THE TRANSFORMED PROBLEM 101
• (α(t) Decision) For each slot t, observe (t) and ω(t), and choose α(t) ∈ Aω(t) to minimize:
L
K
Zl (t)ŷl (α(t), ω(t)) + Qk (t)[âk (α(t), ω(t)) − b̂k (α(t), ω(t))]
l=1 k=1
M
− Gm (t)x̂m (α(t), ω(t))
m=1
• (Queue Update) Update the virtual queues Zl (t) and Gm (t) according to (5.19) and (5.20),
and the actual queues Qk (t) by (5.1).
Define time average expectations x(t), γ (t), y l (t) by:
t−1
t−1
t−1
1 1 1
x(t)= E {x(τ )} , γ (t)= E {γ (τ )} , y l (t)= E {yl (τ )} (5.24)
t t t
τ =0 τ =0 τ =0
Define φ max as an upper bound on φ(γ (t)) for all t, and assume it is finite:
φ max = φ(γ1,max , γ2,max , . . . , γm,max ) < ∞ (5.25)
Theorem 5.1 Suppose the boundedness assumptions (4.25)-(4.28), (5.25) hold, the function φ(x) is
continuous, concave, and entrywise non-decreasing, the problem (5.2)-(5.5), (5.8) (including the constraint
x ∈ R) is feasible, and E {L((0))} < ∞. If ω(t) is i.i.d. over slots and any C-additive approximation
is used every slot, then all actual and virtual queues are mean rate stable and:
where φ opt is the maximum utility of the problem (5.2)-(5.5), (5.8) (including the constraint x ∈ R),
and x(t), y l (t) are defined in (5.24).
The following extended result provides average queue bounds and utility bounds for all slots
t.
where φ(γ ∗ ) is the maximum objective function value for the transformed problem (5.13)-(5.18).
(b) If all virtual and actual queues are initially empty (so that (0) = 0) and if there are finite
constants νm ≥ 0 such that for all γ (t) and all x(t), we have:
M
|φ(γ (t)) − φ(x(t))| ≤ νm |γm (t) − xm (t)| (5.32)
m=1
D + C νm E {Gm (t)}
M
φ(x(t)) ≥ φ opt − − (5.33)
V t
m=1
√
where E {Gm (t)} /t is O(1/ t) for all m ∈ {1, . . . , M}.
The assumption that all queues are initially empty, made in part (b) of the above theorem,
is made only for convenience. The right-hand-side of (5.33) would be modified by subtracting
the additional term E {L((0))} /V t otherwise. We note that the νm constraint (5.32) needed
in part (b) of the above theorem is satisfied for the example utility function in (5.6), but not for
the proportionally fair utility function in (5.7). Further, the algorithm developed in this section
(or C-additive approximations of the algorithm) often result in deterministically bounded queues,
regardless of whether or not the Slater assumptions (5.28)-(5.31) hold (see flow control examples
in Sections 5.2-5.3 and Exercises 5.5-5.7). For example, it can be shown that if (5.32) holds, if γ (t)
is chosen by (5.22)-(5.23), and if xm (t) ≥ γm,min for all t, then Gm (t) ≤ V νm + γm,max for all t
√
(provided this holds at t = 0). In this case, E {Gm (t)} /t is O(1/t), better than the O(1/ t) bound
given in the above theorem. As before, the same algorithm can be shown to perform efficiently when
the ω(t) process is non-i.i.d. (38)(39)(136)(42). This is because the auxiliary variables transform the
problem to a structure that is the same as that covered by the ergodic theory and universal scheduling
theory of Section 4.9.
Proof. (Theorem 5.1) Because the C-additive approximation comes within C of minimizing the
right-hand-side of (5.21), we have:
L
∗
((t)) − V E {φ(γ (t))|(t)} ≤ D + C − V φ(γ ) + Zl (t)E yl∗ (t)|(t)
l=1
K
M
+ Qk (t)E ak∗ (t) − bk∗ (t)|(t) + Gm (t)E γm∗ − xm
∗
(t)|(t) (5.34)
k=1 m=1
5.1. SOLVING THE TRANSFORMED PROBLEM 103
where γ ∗ = (γ1∗ , . . . , γM
∗ ) is any vector in R, and y ∗ (t), a ∗ (t), b∗ (t), x ∗ (t) are from any alternative
l k k m
(possibly randomized) policy α ∗ (t) ∈ Aω(t) . Now note that feasibility of the problem (5.2)-(5.5),
(5.8) implies feasibility of the transformed problem (5.13)-(5.18).1 This together with Theorem 4.5
implies that for any δ > 0, there is an ω-only policy α ∗ (t) ∈ Aω(t) and a vector γ ∗ ∈ R such that:
∗
−φ(γ
) ≤ −φ opt + δ
∗
E ŷl (α (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L}
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ δ ∀k ∈ {1, . . . , K}
E γm∗ − x̂m (α ∗ (t), ω(t)) ≤ δ ∀m ∈ {1, . . . , M}
Assuming that δ = 0 for convenience and plugging the above into (5.34) gives:2
((t)) − V E {φ(γ (t))|(t)} ≤ D + C − V φ opt (5.35)
This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2)
and hence by that theorem (or, equivalently, by using iterated expectations and telescoping sums in
the above inequality), for all t > 0, we have:
1
t−1
E {φ(γ (τ ))} ≥ φ opt − (D + C)/V − E {L((0))} /(V t)
t
τ =0
By Jensen’s inequality for the concave function φ(γ ), we have for all t > 0:
φ(γ (t)) ≥ φ opt − (D + C)/V − E {L((0))} /(V t) (5.36)
Taking a lim inf of both sides yields:
lim inf φ(γ (t)) ≥ φ opt − (D + C)/V (5.37)
t→∞
On the other hand, rearranging (5.35) yields:
((t)) ≤ D + C + V (φ max − φ opt )
Thus, by the Lyapunov Drift Theorem (Theorem 4.1), we know that all queues Qk (t), Zl (t), Gm (t)
√
are mean rate stable (in fact, we know that E {Qk (t)} /t, E {Gm (t)} /t, and E {Zl (t)} /t are O(1/ t)).
Mean rate stability of Zl (t) and Gm (t) together with Theorem 2.5 implies that (5.27) holds, and
that for all m ∈ {1, . . . , M}:
lim sup[γ m (t) − x m (t)] ≤ 0
t→∞
Using this with the continuity and entrywise non-decreasing properties of φ(x), it can be shown
that:
lim inf φ(γ (t)) ≤ lim inf φ(x(t))
t→∞ t→∞
Using this in (5.37) proves (5.26). 2
1To see this, the transformed problem can just use the same α(t) decisions, and it can choose γ (t) = x for all t.
2The same can be derived using δ > 0 and then taking a limit as δ → 0.
104 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
Proof. (Theorem 5.2) We first prove part (b). We have:
where (5.38) follows by the entrywise non-decreasing property of φ(x) (where the max[·] rep-
resents an entrywise max), and (5.39) follows by (5.32). Substituting this into (5.36) and using
E {L((0))} = 0 yields:
M
φ(x(t)) ≥ φ opt − (D + C)/V − νm max[γ m (t) − x m (t), 0] (5.40)
m=1
By definition of Gm (t) in (5.20) and the sample path queue property (2.5) together with the fact
that Gm (0) = 0, we have for all m ∈ {1, . . . , M} and any t > 0:
1 1
t−1 t−1
Gm (t)
≥ γm (τ ) − xm (τ )
t t t
τ =0 τ =0
ω(t)= [(b1 (t), . . . , bL (t)); (A1 (t), . . . , AM (t))] (5.41)
The control action taken every slot is to first choose xm (t), the amount of type m traffic admitted
into the network on slot t, according to:
The constraint (5.42) is just one example of a flow control constraint. We can easily modify this
to the constraint xm (t) ∈ {0, Am (t)}, which either admits all newly arriving data, or drops all of
it. Alternatively, the flow controller could place all non-admitted data into a transport layer storage
reservoir (rather than dropping it), as in (18)(22)(19)(17) (see also Section 5.6). One can model a
network where all sources always have data to send by Am (t) = γm,max for all t, for some finite value
γm,max used to limit the amount of data admitted to the network on any slot.
Next, we must specify a path for the newly arriving data from a collection of paths Pm associated
with path options of session m on slot t (possibly being the set of all possible paths in the network
from the source of session m to its destination). Here, a path is defined in the usual sense, being a
sequence of links starting at the source, ending at the destination, and being such that the end node
of each link is the start node of the next link. Let 1l,m (t) be an indicator variable that is 1 if the
data xm (t) is selected to use a path that contains link l, and is 0 else. The (1l,m (t)) values completely
specify the chosen paths for slot t, and hence the decision variable for slot t is given by:
α(t)= [(x1 (t), . . . , xM (t)); (1l,m (t))|l∈{1,...,L},m∈{1,...,M} ]
Let x = (x 1 , . . . , x M ) be a vector of the infinite horizon time average admitted flow rates.
Let φ(x) = M m=1 φm (xm ) be a separable utility function, where each φm (x) is a continuous, concave,
non-decreasing function in x. Our goal is to maximize the throughput-utility φ(x) subject to the
constraint that the time average flow over each link l is less than or equal to the time average capacity
of that link. The infinite horizon utility optimization problem of interest is thus:
M
Maximize: m=1 φm (x m ) (5.43)
M
Subject to: m=1 1l,m xm ≤ bl ∀l ∈ {1, . . . , L} (5.44)
0 ≤ xm (t) ≤ Am (t) , (1l,m (t)) ∈ Pm ∀m ∈ {1, . . . , M}, ∀t (5.45)
106 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
where the time averages are defined:
1
t−1
xm = t→∞
lim E {xm (τ )}
t
τ =0
1
t−1
1l,m xm = t→∞
lim E 1l,m (τ )xm (τ )
t
τ =0
1
t−1
bl = lim E {bl (τ )}
t→∞ t
τ =0
We emphasize that while the actual network can queue data at each link l, we are not explicitly
accounting for such queueing dynamics. Rather, we are only ensuring the time average flow rate on
each link l satisfies (5.44).
Define φ opt as the maximum utility associated with the above problem and subject to the
additional constraint that:
0 ≤ x m ≤ γm,max ∀m ∈ {1, . . . , M} (5.46)
for some finite values γm,max . This fits the framework of the utility maximization problem (5.2)-
M
(5.5) with yl (t)= m=1 1l,m (t)xm (t) − bl (t), K = 0, and with R being all γ vectors that satisfy
0 ≤ γm ≤ γm,max for all m ∈ {1, . . . , M} (we choose γm,min = 0 because attributes xm (t) are non-
negative). As there are no actual queues Qk (t) in this model, we use only virtual queues Zl (t) and
Gm (t), defined by update equations:
M
Zl (t + 1) = max Zl (t) + 1l,m (t)xm (t) − bl (t), 0 (5.47)
m=1
Gm (t + 1) = max[Gm (t) + γm (t) − xm (t), 0] (5.48)
where γm (t) are auxiliary variables for m ∈ {1, . . . , M}. The algorithm given in Section 5.0.5 thus
reduces to:
• (Auxiliary Variables) Every slot t, each session m ∈ {1, . . . , M} observes Gm (t) and chooses
γm (t) as the solution to:
Maximize: V φm (γm (t)) − Gm (t)γm (t) (5.49)
Subject to: 0 ≤ γm (t) ≤ γm,max (5.50)
• (Routing and Flow Control) For each slot t and each session m ∈ {1, . . . , M}, observe the
new arrivals Am (t), the virtual queue backlogs Gm (t), and the link queues Zl (t), and choose
xm (t) and a path to maximize:
Maximize: xm (t)Gm (t) − xm (t) L l=1 1l,m (t)Zl (t)
Subject to: 0 ≤ xm (t) ≤ Am (t)
The path specified by (1l,m (t)) is in Pm
5.2. A FLOW-BASED NETWORK MODEL 107
This reduces to the following: First find a shortest path from the source of session m to the
destination of session m, using link weights Zl (t) as link costs. If the total weight of the
shortest path is less than or equal to Gm (t), choose xm (t) = Am (t) and route this data over
this single shortest path. Else, there is too much congestion in the network, and so we choose
xm (t) = 0 (thereby dropping all data Am (t)).
• (Virtual Queue Updates) Update the virtual queues according to (5.47) and (5.48).
The shortest path routing in this algorithm is similar to that given in (149), which treats a
flow-based network stability problem under the assumption that arriving traffic is admissible (so that
flow control is not used). This problem with flow control was introduced in (39) using the universal
scheduling framework of Section 4.9.2, where there are no probabilistic assumptions on the arrivals
or time varying link capacities.
To prove this fact, note that if a link l satisfies Zl (t) ≤ V ν max + γ max , then on the next slot, we
have Zl (t + 1) ≤ V ν max + γ max + MAmax because the queue can increase by at most MAmax on
any slot (see update equation (5.47)). Else, if Zl (t) > V ν max + γ max , then any path that uses this
link incurs a cost larger than V ν max + γ max , and thus would incur a cost larger than Gm (t) for any
session m. Thus, by the routing and flow control algorithm, no session will choose a path that uses
this link on the current slot, and so Zl (t) cannot increase on the next slot.
Using the sample path inequality (2.3) with the deterministic bound on Zl (t) in (5.54), it
follows that over any interval of T slots (for any positive integer T and any initial slot t0 ), the data
injected for use over link l is no more than V ν max + γ max + MAmax beyond the total capacity
offered by the link over that interval:
t0 +T
−1
M t0 +T
−1
1l,m (τ )xm (τ ) ≤ bl (τ ) + V ν max + γ max + MAmax (5.55)
τ =t0 m=1 τ =t0
Here we consider a general multi-hop network, treating the actual queueing rather than using the
flow-based model of the previous section. Suppose the network has N nodes and operates in slotted
time. There are M sessions, and we let A(t) = (A1 (t), . . . , AM (t)) represent the vector of data that
exogenously arrives to the transport layer for each session on slot t (measured either in integer units
of packets or real units of bits).
Each session m ∈ {1, . . . , M} has a particular source node and destination node. Data delivery
takes place by transmissions over possibly multi-hop paths. We assume that a transport layer flow
controller observes Am (t) every slot and decides how much of this data to add to the network layer
at its source node and how much to drop (flow control decisions are made to limit queue buffers and
ensure the network is stable). Let (xm (t))|M m=1 be the collection of flow control decision variables on
slot t. These decisions are made subject to the constraints 0 ≤ xm (t) ≤ Am (t) (see also discussion
after (5.42) on modifications of this constraint).
All data that is intended for destination node c ∈ {1, . . . , N} is called commodity c data,
(c)
regardless of its particular session. For each n ∈ {1, . . . , N} and c ∈ {1, . . . , N}, let Mn denote
the set of all sessions m ∈ {1, . . . , M} that have source node n and commodity c. All data is queued
(c)
according to its commodity, and we define Qn (t) as the amount of commodity c data in node n on
(n)
slot t. We assume that Qn (t) = 0 for all t, as data that reaches its destination is removed from the
network. Let Q(t) denote the matrix of current queue backlogs for all nodes and commodities.
3 Convex constraints can be incorporated using the generalized structure of Section 5.4.
110 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
The queue backlogs change from slot to slot as follows:
N
(c)
N
(c)
n (t + 1) = Qn (t) −
Q(c) μ̃nj (t) + μ̃in (t) +
(c)
xm (t)
j =1 i=1 (c)
m∈Mn
(c)
where μ̃ij (t) denotes the actual amount of commodity c data transmitted from node i to node j
(c)
(i.e., over link (i, j )) on slot t. It is useful to define transmission decision variables μij (t) as the bit
rate offered by link (i, j ) to commodity c data, where this full amount is used if there is that much
commodity c data available at node i, so that:
(c) (c)
μ̃ij (t) ≤ μij (t) ∀i, j, c ∈ {1, . . . , N}, ∀t
For simplicity, we assume that if there is not enough data to send at the offered rate, then null data
is sent, so that:4
⎡ ⎤
N
N
Q(c)
n (t + 1) = max ⎣ Q (c)
n (t) − μ
(c)
(t), 0 ⎦ +
(c)
μin (t) + xm (t) (5.56)
nj
j =1 i=1 m∈Mn
(c)
(c)
This satisfies (5.1) if we relate index k (for Qk (t) in (5.1)) to index (n, c) (for Qn (t) in (5.56)), and
if we define:
N
N
(c) (c)
bn(c) (t)= μnj (t) , an(c) (t)= μin (t) + xm (t)
j =1 i=1 (c)
m∈Mn
Constraints (5.58) are due to the common-sense observation that it makes no sense to transmit
data from a node to itself, or to keep transmitting data that has already arrived to its destination.
One can easily incorporate additional constraints that restrict the set of allowable links that certain
commodities are allowed to use, as in (22).
representing the resource allocation, transmission, and flow control decisions.The action space Aω(t)
(c)
is defined by the set of all I (t) ∈ IS(t) , all (μij (t)) that satisfy (5.57)-(5.59), and all (xm (t)) that
satisfy 0 ≤ xm (t) ≤ Am (t) for all m ∈ {1, . . . , M}.
Define x as the time average expectation of the vector x(t). Our objective is to solve the
following problem:
Maximize: φ(x) (5.60)
Subject to: α(t) ∈ Aω(t) ∀t (5.61)
(c)
All queues Qn (t) are mean rate stable (5.62)
M
where φ(x) = m=1 φm (xm ) is a continuous, concave, and entrywise non-decreasing utility func-
tion.
• (Flow Control) For each slot t, each session m observes Am (t) and the queue values Gm (t),
(c )
Qnmm (t) (where nm denotes the source node of session m, and cm represents its destination).
Note that these queues are all local to the source node of the session, and hence they can be
observed easily. It then chooses xm (t) to solve:
(c )
Maximize: Gm (t)xm (t) − Qnmm (t)xm (t) (5.65)
Subject to: 0 ≤ xm (t) ≤ Am (t)
(c )
This reduces to the “bang-bang” flow control decision of choosing xm (t) = Am (t) if Qnmm (t) ≤
Gm (t), and xm (t) = 0 otherwise.
• (Resource Allocation and Transmission) For each slot t, the network controller observes queue
(c) (c)
backlogs {Qn (t)} and the topology state S(t) and chooses I (t) ∈ IS(t) and {μij (t)} to solve:
(c) (c) N (c)
Maximize: n,c Qn (t)[ N j =1 μnj (t) − i=1 μin (t)] (5.66)
Subject to: I (t) ∈ IS(t) and (5.57)-(5.59)
• (Queue Updates) Update the virtual queues Gm (t) according to (5.63) and the actual queues
(c)
Qn (t) according to (5.56).
The resource allocation and transmission decisions that solve (5.66) are described in Subsection
5.3.4 below. Before covering this, we state the performance of the algorithm under a general C-
additive approximation. Assuming that second moments of arrivals and service variables are finite,
and that ω(t) is i.i.d. over slots, by Theorem 5.1, we have that all virtual and actual queues are mean
rate stable, and:
where D is a constant related to the maximum second moments of arrivals and transmission rates.
(c)
The queues Qn (t) can be shown to be strongly stable with average size O(V ) under an additional
Slater-type condition. If the φm (x) functions are bounded with bounded right derivatives, it can be
shown that the queues Gm (t) are deterministically bounded. A slight modification of the algorithm
that results in a C-additive approximation can deterministically bound all actual queues by a constant
of size O(V ) (38)(42)(153), even without the Slater condition. The theory of Section 4.9 can be
used to show that the same algorithm operates efficiently for non-i.i.d. traffic and channel processes,
including processes that arise from arbitrary node mobility (38).
5.3. MULTI-HOP QUEUEING NETWORKS 113
5.3.4 BACKPRESSURE-BASED ROUTING AND RESOURCE ALLOCATION
By switching the sums in (5.66), it is easy to show that the resource allocation and transmission
maximization reduces to the following generalized “max-weight” and “backpressure” algorithms
(see (7)(22)): Every slot t, choose I (t) ∈ IS(t) to maximize:
N
N
bij (I (t), S(t))Wij (t) (5.68)
i=1 j =1
(c)
Wij (t)= max max[Wij (t), 0] (5.69)
c∈{1,...,N }
(c)
where Wij (t) are differential backlogs:
∗ (t) is defined as the commodity c ∈ {1, . . . , N} that maximizes the differential backlog
where cij
(c)
Wij (t) (breaking ties arbitrarily).
This backpressure approach achieves throughput optimality, but, because it explores all pos-
sible routes, may incur large delay. A useful C-additive approximation that experimentally improves
delay is to combine the queue differential with a shortest path estimate for each link. This is pro-
posed in (15) as an enhancement to backpressure routing, and it is shown to perform quite well in
simulations given in (154)(22) ((154) extends to networks with unreliable channels). Related work
that combines shortest paths and backpressure using the drift-plus-penalty method is developed in
(155) to treat maximum hop count constraints. A theory of more aggressive place-holder packets
for delay improvement in backpressure is developed in (37), although the algorithm ideally requires
knowledge of Lagrange multiplier information in advance. A related and very simple Last-In-First-
Out (LIFO) implementation of backpressure that does not need Lagrange multiplier information
is developed in (54), where experiments on wireless sensor networks show delay improvements by
more than an order of magnitude over FIFO implementations (for all but 2% of the packets) while
preserving efficient throughput (note that LIFO does not change the dynamics of (5.1) or (5.56)).
Analysis of the LIFO rule and its connection to place-holders and Lagrange multipliers is in (55).
114 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
Minimize: y 0 + f (x ) (5.71)
Subject to: 1) y l + gl (x) ≤ 0 ∀l ∈ {1, . . . , L} (5.72)
2) x∈X ∩R (5.73)
3) All queues Qk (t) are mean rate stable (5.74)
4) α(t) ∈ Aω(t) ∀t (5.75)
where f (x) and gl (x) are continuous and convex functions of x ∈ RM , X is a closed and convex
subset of RM , and R is an M-dimensional hyper-rectangle defined as:
where γm,min and γm,max are finite constants (this rectangle set R is only added to bound the auxiliary
variables that we use, as in the previous sections).
Let γ (t) = (γ1 (t), . . . , γM (t)) be a vector of auxiliary variables that can be chosen within the
set X ∩ R every slot t. We transform the problem (5.71)-(5.75) to:
Minimize: y 0 + f (γ ) (5.76)
Subject to: 1) y l + gl (γ ) ≤ 0 ∀l ∈ {1, . . . , L} (5.77)
2) γ m = x m ∀m ∈ {1, . . . , M} (5.78)
3) All queues Qk (t) are mean rate stable (5.79)
4) γ (t) ∈ X ∩ R ∀t (5.80)
5) α(t) ∈ Aω(t) ∀t (5.81)
where we define:
1 1
t−1 t−1
f (γ )= lim E {f (γ (τ ))} , gl (γ )= lim E {gl (γ (τ ))}
t→∞ t t→∞ t
τ =0 τ =0
It is not difficult to show that this transformed problem is equivalent to the problem (5.71)-(5.75),
in that the maximum utility values are the same, and any solution to one can be used to construct a
solution to the other (see Exercise 5.9).
We solve the transformed problem (5.76)-(5.81) simply by re-stating the drift-plus-penalty
algorithm for this context. While a variable-V implementation can be developed, we focus here on
the fixed V algorithm as specified in (4.48)-(4.49). For each inequality constraint (5.77), define a
virtual queue Zl (t) with update equation:
Define (t) = [Q(t), Z (t), H (t)]. Assume the boundedness assumptions (4.25)-(4.30) hold, and
that ω(t) is i.i.d. over slots. For the Lyapunov function (4.43), we have the following drift bound:
where D is a finite constant related to the worst case second moments of the arrival, service, and
attribute vectors. Now define a C-additive approximation as any algorithm for choosing γ (t) ∈
X ∩ R and α(t) ∈ Aω(t) every slot t that, subject to a given (t), yields a right-hand-side in (5.84)
that is within a distance C from its infimum value.
Theorem 5.3 (Algorithm Performance) Suppose the boundedness assumptions (4.25)-(4.30) hold, the
problem (5.71)-(5.75) is feasible, and E {L((0))} < ∞. Suppose the functions f (γ ) and gl (γ ) are upper
and lower bounded by finite constants over γ ∈ X ∩ R. If ω(t) is i.i.d. over slots and any C-additive
approximation is used every slot, then:
opt D+C
lim sup y 0 (t) + f (x(t)) ≤ y0 + f opt + (5.85)
t→∞ V
opt
where y0 + f opt represents the infimum cost metric of the problem (5.71)-(5.75) over all feasible policies.
Further, all actual and virtual queues are mean rate stable, and:
lim sup y l (t) + gl (x(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.86)
t→∞
lim dist (x(t), X ∩ R) = 0 (5.87)
t→∞
where dist(x(t), X ∩ R) represents the distance between the vector x(t) and the set X ∩ R, being zero
if and only if x(t) is in the (closed) set X ∩ R.
Utility(x)
Performing such a general non-convex optimization is, in some cases, as hard as combinatorial
bin-packing, and so we do not expect to find a global optimum. Rather, we seek an algorithm that
satisfies the constraints (5.89)-(5.91) and that yields a local optimum of f (x).
We use the drift-plus-penalty framework with the same virtual queues as before:
Zl (t + 1) = max[Zl (t) + ŷl (α(t), ω(t)), 0] (5.92)
The actual queues Qk (t) are assumed to satisfy (5.1). Define (t)= [Q(t), Z (t), xav (t)], where
xav (t) is defined as an empirical running time average of the attribute vector:
1 t−1
x (τ ) if t > 0
xav (t)= t τ =0 m
x̂m (α(−1), ω(−1)) if t = 0
5.5. NON-CONVEX STOCHASTIC OPTIMIZATION 117
where x̂m (α(−1), ω(−1)) can be viewed as an initial sample taken at time “t = −1” before the
1 K L
2 [ k=1 Qk (t) + l=1 Zl (t) ]. Assume ω(t)
network implementation begins. Define L((t))= 2 2
Below we state the performance of the algorithm that observes queue backlogs every slot t and
takes an action α(t) ∈ Aω(t) that comes within C of minimizing the right-hand-side of the drift
expression (5.93).
Theorem 5.4 (Non-Convex Stochastic Network Optimization (43)) Suppose ω(t) is i.i.d. over slots, the
boundedness assumptions (4.25)-(4.28) hold, the function f (x) is bounded and continuously differentiable
with partial derivatives bounded in magnitude by finite constants νm ≥ 0, and the problem (5.88)-(5.91)
is feasible. For simplicity, assume that (0) = 0. For any V ≥ 0, and for any C-additive approximation
of the above algorithm that is implemented every slot, we have:
(a) All queues Qk (t) and Zl (t) are mean rate stable and:
lim sup y l (t) ≤ 0 ∀l ∈ {1, . . . , L}
t→∞
(b) For all t > 0 and for any alternative vector x∗ that can be achieved as the time average of a
policy that makes all queues mean rate stable and satisfies all required constraints, we have:
1 1 ∗
t−1 M t−1 M
xm (τ )∂f (xav (τ )) ∂f (xav (τ )) D+C
E ≤ xm E +
t ∂xm t ∂xm V
τ =0 m=1 τ =0 m=1
where D is a finite constant related to second moments of the ak (t), bk (t), yl (t) processes.
c) If all time averages converge, so that there is a constant vector x such that xav (t) → x with
probability 1 and x(t) → x, then the achieved limit is a near local optimum/critical point, in the sense
that for any alternative vector x∗ that can be achieved as the time average of a policy that makes all queues
mean rate stable and satisfies all required constraints, we have:
M
∂f (x) D+C
∗
(xm − xm) ≥−
∂xm V
m=1
118 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
d) Suppose there is an > 0 and an ω-only policy α ∗ (t) such that:
E ŷl (α ∗ (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.94)
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ − ∀k ∈ {1, . . . , K} (5.95)
Then all queues Qk (t) are strongly stable with average size O(V ).
e) Suppose we use a variable V (t) algorithm with V (t)= V0 · (1 + t)d for V0 > 0 and 0 < d < 1,
and use any C-additive approximation (where C is constant for all t). Then all virtual and actual queues
are mean rate stable (and so all constraints y l ≤ 0 are satisfied), and under the convergence assumptions of
part (c), the limiting x is a local optimum/critical point, in that:
M
∂f (x)
∗
(xm − xm) ≥0
∂xm
m=1
where x∗ is any alternative vector as specified in part (c).
The inequality guarantee in part (e) can be understood as follows: Suppose we start at our
achieved time average attribute vector x, and we want to shift this in any feasible direction by moving
towards another feasible vector x∗ by an amount (for some > 0). Then:
M
∂f (x)
f x + (x∗ − x) ≈ f (x) + ∗
(xm − xm) ≥ f (x)
∂xm
m=1
Hence, the new cost achieved by taking a small step in any feasible direction is no less than the cost
f (x) that we are already achieving. More precisely, the change in cost cost () satisfies:
cost ()
lim ≥0
→0
Proof. (Theorem 5.4) Our proof uses the same drift-plus-penalty technique as described in previous
sections. Analogous to Theorem 4.5, it can be shown that for any x∗ = (x1∗ , . . . , xM
∗ ) that is a limit
point of x(t) under any policy that makes all queues mean rate stable and satisfies all constraints,
and for any δ > 0, there exists an ω-only policy α ∗ (t) such that (43):
E ŷl (α ∗ (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L}
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ δ ∀k ∈ {1, . . . , K}
dist(E x̂(α ∗ (t), ω(t)) , x∗ ) ≤ δ
For simplicity of the proof, assume the above holds with δ = 0. Plugging the above into the right-
hand-side of (5.93) with δ = 0 yields:6
M
∂f (xav (t))
M
∗ ∂f (xav (t))
((t)) + V E x̂m (α(t), ω(t)) |(t) ≤ D + C + V xm
∂xm ∂xm
m=1 m=1
6The same result can be derived by plugging in with δ > 0 and then taking a limit as δ → 0.
5.5. NON-CONVEX STOCHASTIC OPTIMIZATION 119
Taking expectations of the above drift bound (using the law of iterated expectations), summing the
telescoping series over τ ∈ {0, 1, . . . , t − 1}, and dividing by V t immediately yields the result of
part (b).
On the other hand, this drift expression can also be rearranged as:
M
∗
((t)) ≤ D + C + V νm (xm − xm,min )
m=1
where xm,min is a bound on the expectation of xm (t) under any policy, known to exist by the
boundedness assumptions. Hence, the drift is less than or equal to a finite constant, and so by
Theorem 4.2, we know all queues are mean rate stable, proving part (a). The proof of part (d) follows
similarly by plugging in the policy α ∗ (t) of (5.94)-(5.95).
The proof of part (c) follows by taking a limit of the result in part (b), where the limits can be
pushed through by the boundedness assumptions and the continuity assumption on the derivatives
of f (x). The proof of part (e) is similar to that of Theorem 4.9 and is omitted for brevity. 2
Using a penalty given by partial derivatives of the function evaluated at the empirical average
attribute vector can be viewed as a “primal-dual” operation that differs from our “pure-dual” approach
for convex problems. Such a primal-dual approach was first used in context of convex network utility
maximization problems in (32)(33)(34). Specifically, the work (32)(33) used a partial derivative
evaluated at the time average xav (t) to maximize a concave function of throughput in a multi-user
wireless downlink with time varying channels. However, the system in (32)(33) assumed infinite
backlog in all queues (similar to Exercise 5.6), so that there were no queue stability constraints.
This was extended in (34) to consider the primal-dual technique for joint stability and performance
optimization, again for convex problems, but using an exponential weighted average, rather than a
running time average xav (t). There, it was shown that a related “fluid limit” of the system has an
optimal utility, and that this limit is “weakly” approached under appropriately scaled systems. It was
also conjectured in (34) that the actual network will have utility that is close to this fluid limit as a
parameter β related to the exponential weighting is scaled (see Section 4.9 in (34)). However, the
analysis does not specify the size of β needed to achieve a near-optimal utility. Recent work in (36)
considers related primal-dual updates for convex problems, and it shows the long term utility of the
actual network is close to optimal as a parameter is scaled.
For the special case of convex problems, Theorem 5.4 above shows that, if the algorithm is
assumed to converge to well defined time averages, and if we use a running time average xav (t)
rather than an exponential average, the primal-dual algorithm achieves a similar [O(1/V ), O(V )]
performance-congestion tradeoff as the dual algorithm. Unfortunately, it is not clear how long the
system must run to approach convergence. The pure dual algorithm seems to provide stronger
analytical guarantees for convex problems because: (i) It does not need a running time average
xav (t) and hence can be shown to be robust to changes in system parameters (as in Section 4.9
and (42)(38)(17)), (ii) It does not require additional assumptions about convergence, (iii) It provides
results for all t > 0 that show how long we must run the system to be close to the infinite horizon
120 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
limit guarantees. However, if one applies the pure dual technique with a non-convex cost function
f (x), one would get a global optimum of the time average f (x), which may not even be a local
optimum of f (x). This is where the primal-dual technique shows its real potential, as it can achieve
a local optimum for non-convex problems.
where Amax is a finite constant. This means that ak (t) is chosen from the Lk (t) + Ak (t) amount of
data available on slot t, and is no more than Amax per slot (which limits the amount we can send
into the network layer). It is assumed that Ak (t) ≤ Amax for all k and all t. Newly arriving data
Ak (t) that is not immediately admitted into the network layer is stored in the transport layer queue
Lk (t). The controller also chooses a channel-aware transmission decision I (t) ∈ IS (t) , where IS (t) is
an abstract set that defines transmission options under channel state S (t). The transmission rates
are given by deterministic functions of I (t) and S (t):
0 ≤ dk (t) ≤ Amax
For each k ∈ {1, . . . , K}, let φk (a) be a continuous, concave, and non-decreasing utility func-
tion defined over the interval 0 ≤ a ≤ Amax . Let νk be the maximum right-derivative of φk (a)
(which occurs at a = 0), and assume νk < ∞. Example utility functions that have this form are:
φk (a) = log(1 + νk a)
5.6. WORST CASE DELAY 121
where log(·) denotes the natural logarithm. We desire a solution to the following problem, defined
in terms of a parameter > 0:
K
K
Maximize: φk (a k ) − βνk d k (5.98)
k=1 k=1
Subject to: All queues Qk (t) are mean rate stable (5.99)
bk ≥ ∀k ∈ {1, . . . , K} (5.100)
0 ≤ ak (t) ≤ Ak (t) ∀k ∈ {1, . . . , K}, ∀t (5.101)
I (t) ∈ IS (t) ∀k ∈ {1, . . . , K}, ∀t (5.102)
where β is a constant that satisfies 1 ≤ β < ∞. This problem does not specify anything about
worst-case delay, but we soon develop an algorithm with worst case delay of O(V ) that comes
within O(1/V ) of optimizing the utility associated with the above problem (5.98)-(5.102). Note
the following:
• The constraint (5.101) is different from the constraint (5.96).Thus, the less stringent constraint
(5.96) is used for the actual algorithm, but performance is measured with respect to the
optimum utility achievable in the problem (5.98)-(5.102). It turns out that optimal utility is
the same with either constraint (5.101) or (5.96), and in particular, it is the same if there are
no transport layer queues, so that Lk (t) = 0 for all t and all data is either admitted or dropped
upon arrival. We include the Lk (t) queues as they are useful in situations where it is preferable
to store data for later transmission than to drop it.
• An optimal solution to (5.98)-(5.102) has d k = 0 for all k. That is, the objective (5.98) can
equivalently be replaced by the objective of maximizing K k=1 φk (a k ) and by adding the
constraint d k = 0 for all k. This is because the penalty for dropping is βνk , which is greater
than or equal to the largest derivative of the utility function φk (a). Thus, it can be shown
that it is always better to restrict data at the transport layer rather than admitting it and later
dropping it. We recommend choosing β such that 1 ≤ β ≤ 2. A larger value of β will trade
packet drops at the network layer for packet non-admissions at the flow controller.
• The constraint (5.100) requires each queue to transmit with a time-average rate of at least .
This constraint ensures all queues are getting at least a minimum rate of service. If the input
rate E {Ak (t)} is less than , then this constraint is wasteful. However, we shall not enforce
this constraint. Rather, we simply measure utility of our system with respect to the optimal
utility of the problem (5.98)-(5.102), which includes this constraint. It is assumed throughout
that this constraint is feasible, and so the problem (5.98)-(5.102) is feasible. If one prefers to
enforce constraint (5.100), this is easily done with an appropriate virtual queue.
122 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
5.6.1 THE -PERSISTENT SERVICE QUEUE
To ensure worst-case delay is bounded, we define an -persistent service queue, being a virtual queue
Zk (t) for each k ∈ {1, . . . , K} with Zk (0) = 0 and with dynamics:
max[Zk (t) − bk (t) − dk (t) + , 0] if Qk (t) > bk (t) + dk (t)
Zk (t + 1) = (5.103)
0 if Qk (t) ≤ bk (t) + dk (t)
where > 0. We assume throughout that ≤ Amax . The condition Qk (t) ≤ bk (t) + dk (t) is satis-
fied whenever the backlog Qk (t) is cleared (by service and/or drops) on slot t. If this constraint is
not active, then Zk (t) has a departure process that is the same as Qk (t), but it has an arrival of size
every slot. The size of the queue Zk (t) can provide a bound on the delay of the head-of-line data in
queue Qk (t) in a first-in-first-out (FIFO) system. This is similar to (76) (where explicit delays are
kept for each packet) and (159) (which uses a slightly different update). If a scheduling algorithm
is used that ensures Zk (t) ≤ Zk,max and Qk (t) ≤ Qk,max for all t, for some finite constants Zk,max
and Qk,max , then worst-case delay is also bounded, as shown in the following lemma:
Lemma 5.5 Suppose Qk (t) and Zk (t) evolve according to (5.97) and (5.103), and that an algorithm
is used that ensures Qk (t) ≤ Qk,max and Zk (t) ≤ Zk,max for all slots t ∈ {0, 1, 2, . . .}. Assume service
and drops are done in FIFO order. Then the worst-case delay of all non-dropped data in queue k is Wk,max ,
defined:
Wk,max = (Qk,max + Zk,max )/ (5.104)
Proof. Fix a slot t. We show that all arrivals a(t) are either served or dropped on or before slot
t + Wk,max . Suppose this is not true. We reach a contradiction. Note by (5.97) that arrivals a(t) are
added to the queue backlog Qk (t + 1) and are first available for service on slot t + 1. It must be
that Qk (τ ) > bk (τ ) + dk (τ ) for all τ ∈ {t + 1, . . . , t + Wk,max } (else, the backlog on slot τ would
be cleared). Therefore, by (5.103), we have for all slots τ ∈ {t + 1, . . . , t + Wk,max }:
Zk (τ + 1) = max[Zk (τ ) − bk (τ ) − dk (τ ) + , 0]
Zk (τ + 1) ≥ Zk (τ ) − bk (τ ) − dk (τ ) +
t+Wk,max
Zk (t + Wk,max + 1) − Zk (t + 1) ≥ − [bk (τ ) + dk (τ )] + Wk,max
τ =t+1
5.6. WORST CASE DELAY 123
Rearranging terms in the above inequality and using the fact that Zk (t + 1) ≥ 0 and Zk (t +
Wk,max + 1) ≤ Zk,max yields:
t+Wk,max
Wk,max ≤ [bk (τ ) + dk (τ )] + Zk,max (5.105)
τ =t+1
On the other hand, the sum of bk (τ ) + dk (τ ) over the interval τ ∈ {t + 1, . . . , t + Wk,max } must
be strictly less than Qk (t + 1) (else, by the FIFO service, all data a(t), which is included at the end
of the backlog Qk (t + 1), would have been cleared during this interval). Thus:
t+Wk,max
[bk (τ ) + dk (τ )] < Qk (t + 1) ≤ Qk,max (5.106)
τ =t+1
1
K
B ≥ [E ( − bk (t) − dk (t))2 |(t)
2
k=1
1
K
+ E ak (t)2 + (bk (t) − dk (t))2 + (γk (t) − ak (t))2 |(t) (5.116)
2
k=1
K
[Qk (t) + Zk (t)]b̂k (I (t), S (t)) (5.120)
k=1
5.6. WORST CASE DELAY 125
• (Packet Drops) For each k ∈ {1, . . . , K}, choose dk (t) by:
Amax if Qk (t) + Zk (t) > βV νk
dk (t) = (5.121)
0 if Qk (t) + Zk (t) ≤ βV νk
In some cases, the above algorithm may choose a drop variable dk (t) such that Qk (t) <
bk (t) + dk (t). In this case, all queue updates are kept the same (so the algorithm is unchanged), but
it is useful to first transmit data with offered rate bk (t) on slot t, and then drop only what remains.
Theorem 5.6 If ≤ Amax , then for arbitrary sample paths the above algorithm ensures:
where Zk,max , Qk,max , Gk,max are defined in (5.122)-(5.124), provided that these inequalities hold for
t = 0. Thus, worst-case delay Wk,max is given by:
Wk,max = (Zk,max + Qk,max )/ = O(V )
Proof. That Gk (t) ≤ Gk,max for all t follows by an argument similar to that given in Section 5.2.1,
showing that the auxiliary variable update (5.117)-(5.118) chooses γk (t) = 0 whenever Gk (t) >
V νk .
To show the Qk,max bound, it is clear that the packet drop decision (5.121) yields dk (t) = Amax
whenever Qk (t) > βV νk . Because ak (t) ≤ Amax , the arrivals are less than or equal to the offered
drops whenever Qk (t) > βV νk , and so Qk (t) ≤ βV νk + Amax for all t. However, we also see that
if Qk (t) > Gk,max , then the flow control decision will choose ak (t) = 0, and so Qk (t) also cannot
increase. It follows that Qk (t) ≤ Gk,max + Amax for all t.This proves the Qk,max bound.The Zk,max
bound is proven similarly. The worst-case-delay result then follows immediately from Lemma 5.5.
2
126 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
The above theorem only uses the fact that packet drops dk (t) take place according to the
rule (5.121), flow control decisions ak (t) take place according to the rule (5.119), and auxiliary
variable decisions satisfy γk (t) = 0 whenever Gk (t) > V νk (a property of the solution to (5.117)-
(5.118)).The fact that γk (t) = 0 whenever Gk (t) > V νk can be hard-wired into the auxiliary variable
decisions, even when they are chosen to approximately solve (5.117)-(5.118) otherwise. Further, the
I (t) decisions can be arbitrary and are not necessarily those that maximize (5.120).The next theorem
holds for any C-additive approximation for minimizing the right-hand-side of (5.115) that preserves
the above basic properties. A 0-additive approximation performs the exact algorithm given above.
Theorem 5.7 Suppose ω(t) is i.i.d. over slots and any C-additive approximation for minimizing
the right-hand-side of (5.115) is used such that (5.121), (5.119) hold exactly, and γk (t) = 0 whenever
Gk (t) > V νk . Suppose Qk (0) ≤ Qk,max , Zk (0) ≤ Zk,max , Gk (0) ≤ Gk,max for all k, and ≤ Amax .
Then the worst-case queue backlog and delay bounds given in Theorem 5.6 hold, and achieved utility
satisfies:
K K
lim inf t→∞ k=1 φk (a k (t)) − k=1 βd k (t) ≥ φ ∗ − B/V
1 t−1 1 t−1
a k (t)= t τ =0 E {ak (τ )} , d k (t)= t τ =0 E {dk (τ )}
and where φ ∗ is the optimal utility associated with the problem (5.98)-(5.102).
The theorem relies on the following fact, which can be proven using Theorem 4.5: For all δ > 0,
there exists a vector γ ∗ = (γ1∗ , . . . , γK∗ ) and an ω-only policy [a∗ (t), I ∗ (t), d∗ (t)] that chooses a∗ (t)
as a random function of A(t), I ∗ (t) as a random function of S (t), and d∗ (t) = 0 (so that it does
not drop any data) such that:
K
φk (γk∗ ) = φ ∗ (5.125)
k=1
E ak∗ (t) = γk∗ ∀k ∈ {1, . . . , K} (5.126)
∗
E b̂k (I (t), S (t)) ≥ − δ ∀k ∈ {1, . . . , K} (5.127)
E b̂k (I ∗ (t), S (t)) ≥ E ak∗ (t) − δ ∀k ∈ {1, . . . , K} (5.128)
∗
I (t) ∈ IS (t) , 0 ≤ γk∗ ≤ Amax , 0 ≤ ak∗ (t) ≤ Ak (t) ∀k ∈ {1, . . . , K}, ∀t (5.129)
where d∗ (t), I ∗ (t), a∗ (t) are any alternative decisions that satisfy I ∗ (t) ∈ IS (t) , 0 ≤ dk∗ (t) ≤ Amax ,
and 0 ≤ ak∗ (t) ≤ min[Lk (t) + Ak (t), Amax ] for all k ∈ {1, . . . , K} and all t. Substituting the ω-only
policy from (5.125)-(5.129) in the right-hand-side of the above inequality and taking δ → 0 yields:
K
((t)) − V E [φk (γk (t)) − βνk dk (t)]|(t) ≤ B + C − V φ ∗
k=1
Using iterated expectations and telescoping sums as usual yields for all t > 0:
1
t−1 K
E [φk (γk (τ )) − βνk dk (τ )] ≥ φ ∗ − (B + C)/V − E {L((0))} /(V t)
t
τ =0 k=1
Using Jensen’s inequality for the concave functions φk (γ ) yields for all t > 0:
K
[φk (γ k (t)) − d k (t)] ≥ φ ∗ − (B + C)/V − E {L((0))} /(V t) (5.130)
k=1
However, because Gk (t) ≤ Gk,max for all t, it is easy to show (via (5.114) and (2.5)) that for all k
and all slots t > 0:
a k (t) ≥ max[γ k (t) − Gk,max /t, 0]
Therefore, since φk (γ ) is continuous and non-decreasing, it can be shown:
K
K
lim inf [φk (a k (t)) − d k (t)] ≥ lim inf [φk (γ k (t)) − d k (t)]
t→∞ t→∞
k=1 k=1
K
B +C (φ ∗ − φ ∗ )
lim sup νk d k (t) ≤ + =0
t→∞ V (β − 1) β −1
k=1
where φ ∗ is the optimal solution to (5.98)-(5.102) for the given > 0, and φ=0 ∗ is the solution to
∗ ∗
(5.98)-(5.102) with = 0 (which removes constraint (5.100)). Thus, if φ = φ=0 , network layer
drops can be made arbitrarily small by either increasing β or V .7
The above analysis allows for an arbitrary operation of the transport layer queues Lk (t).
Indeed, the above theorems only assume that Lk (t) ≥ 0 for all t. Thus, as in (17), these can have
either infinite buffer space, finite buffer space, or 0 buffer space. With 0 buffer space, all data that is
not immediately admitted to the network layer is dropped.
• It maximizes the second lowest entry over all vectors in that satisfy the above condition.
• It maximizes the third lowest entry over all vectors in that satisfy the above two conditions,
and so on.
This can be viewed as a sequence of nested optimizations, much different from the utility opti-
mization framework treated in this chapter. For flow-based networks with capacitated links, one can
reach a max-min fair allocation by starting from 0 and gradually increasing all flows equally until a
bottleneck link is found, then increasing all non-bottlenecked flows equally, and so on (see Chapter
6.5.2 in (129)). A token-based scheduling scheme is developed in (160) for achieving max-min
fairness in one-hop wireless networks on graphs with link selections defined by matchings.
One can approximate max-min fairness using a concave utility function in a network with
capacitated links. Indeed, it is shown in (3) that optimizing a sum of concave functions of the form
gα (x) = −1x α approaches a max-min fair point as α → ∞. It is likely that such an approach also holds
for more general wireless networks with transmission rate allocation and scheduling. However, such
functions are non-singular at x = 0 (preventing worst-case backlog bounds as in Exercises 5.6-5.7),
7 If b ≥ for all k then the final term (φ ∗ ∗
k =0 − φ )/(β − 1) can be removed. Alternatively, if virtual queues Hk (t +
1) = max[Hk (t) − μk (t) + , 0] are added to enforce these constraints, then lim supt→∞ [ν1 d 1 (t) + . . . + νK d K (t)] ≤ (B̃ +
C)/(V (β − 1)), where B̃ adds second moment terms (μk (t) − )2 to (5.116).
5.8. EXERCISES 129
and for large α they have very large values of |gα (x)/gα (x)| for x > 0, which typically results in
large queue backlog if used in conjunction with the drift-plus-penalty method.
A simpler hard fairness approach seeks only to maximize the minimum throughput (161).This
easily fits into the concave utility based drift-plus-penalty framework using the concave function
g(x) = min[x1 , . . . , xM ]:
See also Exercise 5.4. A “mixed” approach can also be considered, which seeks to maximize
β min[x 1 , . . . , x M ] + Mm=1 log(1 + x m ). The constant β is a large weight that ensures maximizing
the minimum throughput has a higher priority than maximizing the logarithmic terms.
5.8 EXERCISES
Exercise 5.1. (Using Logarithmic Utilities) Give a closed form solution to the auxiliary variable
update of (5.49)-(5.50) when:
a) φ(γ ) = M log(γm ), where log(·) denotes the natural logarithm.
m=1
b) φ(γ ) = M m=1 log(1 + νm γm ), where log(·) denotes the natural logarithm.
Exercise 5.2. (Transformed Problem with Auxiliary Variables) Let α (t) be a policy that yields
well defined averages x , y l , and that satisfies all constraints of problem (5.2)-(5.5),(5.8) (including
the constraint x ∈ R), with utility φ(x ) = φ opt . Construct a policy that satisfies all constraints of
problem (5.13)-(5.18) and that yields the same utility value φ(x ). Hint: Use γ (t) = x for all t.
Exercise 5.3. ( Jensen’s Inequality) Let φ(γ ) be a concave function defined over a convex set R ⊆
RM . Let γ (τ ) be a sequence of random vectors in R for τ ∈ {0, 1, 2, . . .}. Fix an integer t > 0,
and define T as an independent and random time that is uniformly distributed over the integers
{0, 1, . . . , t − 1}. Define the random vector X = γ (T ). Use (5.9) to prove (5.10)-(5.11).
Exercise 5.4. (Hard Fairness (161)) Consider a system with M attributes x(t) =
(x1 (t), . . . , xM (t)), where xm (t) = x̂m (α(t), ω(t)) for m ∈ {1, . . . , M}. Assume there is a positive
constant θmax such that:
Maximize: θ
Subject to: 1) x m ≥ θ ∀m ∈ {1, . . . , M}
2) 0 ≤ θ (t) ≤ θmax ∀t ∈ {0, 1, 2, . . .}
3) α(t) ∈ Aω(t) ∀t ∈ {0, 1, 2, . . .}
Maximize: min[x 1 , x 2 , . . . , x M ]
Subject to: α(t) ∈ Aω(t) ∀t ∈ {0, 1, 2, . . .}
Exercise 5.5. (Bounded Virtual Queues) Consider the auxiliary variable optimization for γm (t) in
(5.49)-(5.50), where φm (x) has the property that:
Use this to prove that γm (t) = 0 is the unique optimal solution to (5.49)-(5.50) whenever Gm (t) >
V νm . Conclude from (5.48) that Gm (t) ≤ V νm + γm,max for all t, provided this is true at t = 0.
Exercise 5.6. (1-Hop Wireless System with Infinite Backlog) Consider a wireless system with
M channels. Transmission rates on slot t are given by b(t) = (b1 (t), . . . , bM (t)) with bm (t) =
b̂m (α(t), ω(t)), where ω(t) = (S1 (t), . . . , SM (t)) is an observed channel state vector for slot t (as-
sumed to be i.i.d. over slots), and α(t) is a control action chosen within a set Aω(t) . Assume that
each channel has an infinite backlog of data, so that there is always data to send. The goal is to
choose α(t) every slot to maximize φ(b), where φ(b) is a concave and entrywise non-decreasing
utility function.
a) Verify that the algorithm of Section 5.0.5 in this case is:
5.8. EXERCISES 131
• (Auxiliary Variables) Choose γ (t) = (γ1 (t), . . . , γM (t)) to solve:
Maximize: V φ(γ (t)) − M m=1 Gm (t)γm (t)
Subject to: 0 ≤ γm (t) ≤ γm,max ∀m ∈ {1, . . . , M}
• (Virtual Queue Update) Update Gm (t) for all m ∈ {1, . . . , M} according to:
D + C νm (V νm + γm,max )
M
φ(b(t)) ≥ φ opt − − , ∀t > 0
V t
m=1
Exercise 5.7. (1-Hop Wireless System with Random Arrivals) Consider the same system as Ex-
ercise 5.6, with the exception that we have random arrivals Am (t) and:
where xm (t) is a flow control decision, made subject to 0 ≤ xm (t) ≤ Am (t). We want to maximize
φ(x).
a) State the new algorithm for this case.
b) Suppose 0 ≤ Am (t) ≤ Am,max for some finite constant Am,max . Suppose φ(b) has the
structure of Exercise 5.6(b). Using a similar argument, show that all queues Gm (t) and Qk (t) are
deterministically bounded.
Exercise 5.8. (Imperfect Channel Knowledge) Consider the general problem of Theorem 5.3, but
under the assumption that ω(t) provides only a partial understanding of the channel for each queue
Qk (t), so that b̂k (α(t), ω(t)) is a random function of α(t) and ω(t), assumed to be i.i.d. over all slots
132 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
with the same α(t) and ω(t), and assumed to have finite second moments regardless of the choice
of α(t). Define:
βk (α, ω)= E b̂k (α(t), ω(t))|α(t) = α, ω(t) = ω
Assume that the function βk (α, ω) is known. Assume the other functions x̂m (·), ŷl (·), âk (·) are
deterministic as before. State the modified algorithm that minimizes the right-hand-side of (5.84)
in this case. Hint:
E {bk (t)|(t)} = E {E {bk (t)|(t), α(t), ω(t)} |(t)} = E {βk (α(t), ω(t))|(t)}
Note: Related problems with randomized service outcomes and Lyapunov drift are consid-
ered in (162)(163)(164)(154)(165)(161), where knowledge of the channel statistics is needed for
computing the βk (α, ω) functions and their generalizations, and a max-weight learning framework
is developed in (166) for the case of unknown statistics.
Exercise 5.10. (Proof of Theorem 5.3) We make use of the following fact, analogous to Theorem
4.5: If problem
(5.71)-(5.75)is feasible, then for all δ > 0 there exists an ω-only policy α ∗ (t) ∈ Aω(t)
such that E x̂(α (t), ω(t)) = γ for some vector γ ∗ , and:
∗ ∗
E ŷ0 (α ∗ (t), ω(t)) + f (γ ∗ ) ≤ y0 + f opt + δ
opt
For simplicity, in this proof, we assume the above holds for δ = 0, and that all actual and virtual
queues are initially empty. Further assume that the functions f (γ ) and gl (γ ) are Lipschitz continuous,
so that there are positive constants νm , βl,m such that for all x(t) and γ (t), we have:
|f (γ (t)) − f (x(t))| ≤ M
m=1 νm |γm (t) − xm (t)|
M
|gl (γ (t)) − gl (γ (t))| ≤ m=1 βl,m |γm (t) − xm (t)| , ∀l ∈ {1, . . . , L}
5.8. EXERCISES 133
a) Plug the above policy α ∗ (t), together with the constant auxiliary vector γ (t)
= γ ∗ , into the
right-hand-side of the drift bound (5.84) and add C (because of the C-additive approximation) to
derive a simpler bound on the drift expression. The resulting right-hand-side should be: D + C +
opt
V (y0 + f opt ).
b) Use the Lyapunov optimization theorem to prove that for all t > 0:
1
t−1
opt
E {y0 (τ ) + f (γ (τ ))} ≤ y0 + f opt + (D + C)/V
t
τ =0
and hence, by Jensen’s inequality (with y 0 (t) and γ (t) defined by (5.24)):
opt
y 0 (t) + f (γ (t)) ≤ y0 + f opt + (D + C)/V
c) Manipulate the drift bound of part (a) to prove that ((t)) ≤ W for some finite constant
W . Conclude that all virtual and actual queues are mean rate stable, and that (4.7) holds for all t > 0
√
and so E {|Hm (t)|} /t ≤ 2W/t.
d) Use (5.83) and (4.42) to prove that for all m ∈ {1, . . . , M}:
Exercise 5.11. (Profit Risk and Non-Convexity) Consider a K-queue system described by (5.1),
with arrival and service functions âk (α(t), ω(t)) and b̂k (α(t), ω(t)). Let p(t) = p̂(α(t), ω(t)) be a
random profit variable that is i.i.d. over all slots for which we have α(t) and ω(t), and that has finite
second moment regardless of the policy. Define:
φ(α, ω) = E p̂(α(t), ω(t))|α(t) = α, ω(t) = ω
ψ(α, ω) = E p̂(α(t), ω(t))2 |α(t) = α, ω(t) = ω
and assume the functions φ(α, ω), ψ(α, ω) are known. The goal is to stabilize all queues while
maximizing a linear combination of the profit minus the variance of the profit (where variance
2
is a proxy for “risk”). Specifically, define the variance as V ar(p)= p − p2 , where the notation
h represents a time average expectation of a given process h(t), as usual. We want to maximize
θ1 p − θ2 V ar(p), where θ1 and θ2 are positive constants.
a) Define attributes p1 (t) = p(t), p2 (t) = p(t)2 . Write the problem using p 1 and p 2 in the
form of (5.88)-(5.91), and show this is a non-convex stochastic network optimization problem.
134 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
b) State the “primal-dual” algorithm that minimizes the right-hand-side of (5.93) in this
context. Hint: Note that:
E {p1 (t)|(t)} = E {E {p1 (t)|(t), α(t), ω(t)} |(t)} = E {φ(α(t), ω(t))|(t)}
Exercise 5.12. (Optimization without Auxiliary Variables (17)(18)) Consider the problem (5.2)-
(5.5). Assume there is a vector γ = (γ1 , . . . , γM
), called the optimal operating point, such that
φ(γ ) = φ , where φ is the maximum utility for the problem. Assume that there is an ω-only
policy α (t) such that for all possible values of ω(t), we have:
x̂m (α (t), ω(t)) = γm ∀m ∈ {1, . . . , M} (5.134)
E âk (α (t), ω(t)) ≤ E b̂k (α (t), ω(t))
∀k ∈ {1, . . . , K} (5.135)
E ŷl (α (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.136)
The assumptions (5.134)-(5.136) are restrictive, particularly because (5.134) must hold determin-
istically for all ω(t) realizations. However, these assumptions can be shown to hold for the special
case when xm (t) represents the amount of data admitted to a network from a source m when: (i) All
sources are “infinitely backlogged” and hence always have data to send, and (ii) Data can be admitted
as a real number.
The Lyapunov drift can be shown to satisfy the following for some constant B > 0:
L
((t)) − V E φ(x̂(α(t), ω(t))) | (t) ≤ B + Zl (t)E ŷl (α(t), ω(t))|(t)
l=1
K
+ Qk (t)E âk (α(t), ω(t)) − b̂k (α(t), ω(t)) | (t) − V E φ(x̂(α(t), ω(t))) | (t)
k=1
Suppose every slot we observe (t) and ω(t) and choose an action α(t) that minimizes the right-
hand-side of the above drift inequality.
a) Assume ω(t) is i.i.d. over slots. Plug the alternative policy α (t) into the right-hand-side
above to get a greatly simplified drift expression.
b) Conclude from part (a) that ((t)) ≤ D + V (φ max − φ ) for all t, for some finite con-
stant D and where φ max is an upper bound on the instantaneous value of φ(x̂(·)) (assumed to
be finite). Conclude that all actual and virtual queues are mean rate stable, and hence all desired
inequality constraints are satisfied.
c) Use Jensen’s inequality and part (a) (with iterated expectations and telescoping sums) to
conclude that for all t > 0, we have:
1
t−1
φ(x(t)) ≥ E {φ(x(τ ))} ≥ φ − B/V − E {L((0))} /(V t)
t
τ =0
5.8. EXERCISES 135
1 t−1
where x(t)= t τ =0 E {x(τ )} and x(τ )=x̂(α(τ ), ω(τ )).
Exercise 5.13. (Delay-Limited Transmission (71)) Consider a K-user wireless system with arrival
vector A(t) = (A1 (t), . . . , AK (t)) and channel state vector S (t) = (S1 (t), . . . , SK (t)) for each
slot t ∈ {0, 1, 2, . . .}. There is no queueing, and all data must either be transmitted in 1 slot or
dropped (similar to the delay-limited capacity formulation of (70)). Thus, there are no actual queues
in the system. Define ω(t)= [A(t), S (t)] as the random network event observed every slot. Define
α(t) ∈ Aω(t) as a general control action, which affects how much of the data to transmit and the
amount of power used according to general functions μ̂k (α, ω) and p̂(α, ω):
μ(t) = (μ̂1 (α(t), ω(t)), . . . , μ̂K (α(t), ω(t))) , p(t) = p̂(α(t), ω(t))
where μ(t) = (μ1 (t), . . . , μK (t)) is the transmission vector and p(t) is the power used on slot t.
Assume these are constrained as follows for all slots t:
for some finite constant pmax . Assume that Ak (t) ≤ Amax k for all t, for some finite constants Amax
k
for k ∈ {1, . . . , K}. Let μ be the time average expectation of the transmission vector μ(t), and let
φ(μ) be a continuous, concave, and entrywise non-decreasing utility function of μ. The goal is to
solve the following problem:
Maximize: φ(μ)
Subject to: p ≤ Pav
where p is the time average expected power expenditure, and Pav is a pre-specified average power
constraint. This is a special case of the general problem (5.2)-(5.5).
a) Use auxiliary variables γ (t) = (γ1 (t), . . . , γK (t)) subject to 0 ≤ γk (t) ≤ Amax
k for all t, k
to write the corresponding transformed problem (5.13)-(5.18) for this case.
b) State the drift-plus-penalty algorithm that solves this transformed problem. Hint: Use a
virtual queue Z(t) to enforce the constraint p ≤ Pav , and use virtual queues Gk (t) to enforce the
constraints μk ≥ γ k for all k ∈ {1, . . . , K}.
Exercise 5.14. (Delay-Limited Transmission with Errors (71)) Consider the same system as Ex-
ercise 5.13, but now assume that transmissions can have errors, so that μk (t) = μ̂k (α(t), ω(t)) is a
random transmission outcome (as in Exercise 5.8), assumed to be i.i.d. over all slots with the same
α(t) and ω(t), with known expectations βk (α(t), ω(t))= E {μk (t)|α(t), ω(t)} for all k ∈ {1, . . . , K}.
Use iterated expectations (as in Exercise 5.8) to redesign the drift-plus-penalty algorithm for this
case. Multi-slot versions of this problem are treated in Section 7.6.1.
137
CHAPTER 6
Approximate Scheduling
This chapter focuses on the max-weight problem that arises when scheduling for stability or maxi-
mum throughput-utility in a wireless network with interference. Previous chapters showed the key
step is maximizing the expectation of a weighted sum of link transmission rates, or coming within
an additive constant C of the maximum. Specifically, consider a (possibly multi-hop) network with
L links, and let b(t) = (b1 (t), . . . , bL (t)) be the transmission rate offered over link l ∈ {1, . . . , L}
on slot t. The goal is to make (possibly randomized) decisions for b(t) to come within an additive
constant C of maximizing the following expectation:
L
Wl (t)E {bl (t)|W (t)} (6.1)
l=1
where the expectation is with respect to the possibly random decision, and where W (t) =
(W1 (t), . . . , WL (t)) is a vector of weights for slot t. The weights are related to queue backlogs
for single-hop problems and differential backlogs for multi-hop problems. Algorithms that accom-
plish this for a given constant C ≥ 0 every slot are called C-additive approximations. For problems of
network stability, previous chapters showed that C-additive approximations can be used to stabilize
the network whenever arrival rates are inside the network capacity region, with average backlog and
delay bounds that grow linearly with C. For problems of maximum throughput-utility, Chapter 5
showed that C-additive approximations can be used with a simple flow control rule to give utility that
is within (B + C)/V of optimality (where B is a fixed constant and V is any non-negative parame-
ter chosen as desired), with average backlog that grows linearly in both V and C. Thus, C-additive
approximations can be used to push network utility arbitrarily close to optimal, as determined by
the parameter V .
Such max-weight problems can be very complex for wireless networks with interference. This
is because a transmission on one link can affect transmissions on many other links.Thus, transmission
decisions are coupled throughout the network. In this chapter, we first consider a class of interference
networks without time varying channels and develop two C-additive approximation algorithms for
this context. The first is a simple algorithm based on trading off computation complexity and delay.
The second is a more elegant randomized transmission technique that admits a simple distributed
implementation. We then present a multiplicative approximation theorem that holds for general
networks with possibly time-varying channels. It guarantees constant factor throughput results for
algorithms that schedule transmissions within a multiplicative constant of the max-weight solution
every slot.
138 6. APPROXIMATE SCHEDULING
The amount of computation required to find an optimal vector bopt (t) depends on the structure of
the set B . If this set is defined by all links that satisfy matching constraints, so that no two active links
share a node, then bopt (t) can be found in polynomial time (via a centralized algorithm). However,
the problem may be NP-hard for general sets B , so that no polynomial time solution is available.
Let C be a given non-negative constant. A C-additive approximation to the max-weight
problem finds a vector b(t) every slot t that satisfies:
L
L
Wl (t)E {bl (t)|W (t)} ≥ max Wl (t)bl − C
b∈B
l=1 l=1
opt opt
Implement b (t0) Implement b (t1)
t0 t1 t2 t3
Figure 6.1: An illustration of the frame structure for the algorithm of Section 6.1.1.
Now assume the maximum change in queue backlog over one slot is deterministically bounded,
as is the maximum change in each link weight. Specifically, assume that no link weight can change by
an amount more than θ, where θ is some positive constant. It follows that for any two slots t1 < t2 :
Under this assumption, we now compute a value C such that the above algorithm is a C-
additive approximation for all slots t ≥ T . Fix any slot t ≥ T . Let r represent the frame containing
this slot. Note that |t − tr−1 | ≤ 2T − 1. We have:
L
L
opt
Wl (t)bl (t) = Wl (t)bl (tr−1 )
l=1 l=1
L
opt
L
opt
= Wl (tr−1 )bl (tr−1 ) + (Wl (t) − Wl (tr−1 ))bl (tr−1 )
l=1 l=1
L
opt
L
opt
≥ Wl (tr−1 )bl (tr−1 ) − θ |t − tr−1 |bl (tr−1 )
l=1 l=1
L
opt
≥ Wl (tr−1 )bl (tr−1 ) − Lθ (2T − 1) (6.2)
l=1
Further, because bopt (tr−1 ) solves the max-weight problem for links W (tr−1 ), we have:
L
opt
L
Wl (tr−1 )bl (tr−1 ) = max Wl (tr−1 )bl
b∈B
l=1 l=1
L
opt
≥ Wl (tr−1 )bl (t)
l=1
140 6. APPROXIMATE SCHEDULING
L
opt
L
opt
= Wl (t)bl (t) − [Wl (t) − Wl (tr−1 )]bl (t)
l=1 l=1
L
opt
≥ Wl (t)bl (t) − Lθ (2T − 1)
l=1
L
= max Wl (t)bl − Lθ (2T − 1) (6.3)
b∈B
l=1
L
l=1 exp(Wl (t)bl )
p∗ (b)= P r[b(t) = b] = (6.4)
A
where A is a normalizing constant that makes the distribution sum to 1.
The work (172) motivates this algorithm by the modified problem that computes a probability
distribution p(b) over the set B to solve the following:
Maximize: − b∈B p(b) log(p(b)) + b∈B p(b) L l=1 Wl (t)bl (6.5)
Subject to: 0 ≤ p(b) ∀b ∈ B , b∈B p( b) = 1 (6.6)
where log(·) denotes the natural logarithm. This problem is equivalent to maximizing H (p(·)) +
L
l=1 Wl (t)E {bl (t)|W (t)}, where H (p(·)) is the entropy (in nats) associated with the probability
distribution p(b), and E {bl (t)|W (t)} is the expected transmission rate over link l given that b(t)
is selected according to the probability distribution p(b). However, note that because the set B
contains at most 2L link activation sets, and the entropy of any probability distribution that contains
at most k probabilities is at most log(k), we have for any probability distribution p(b):
0≤− p(b) log(p(b)) ≤ L log(2)
b∈B
It follows that if we can find a probability distribution p(b) to solve the problem (6.5)-(6.6), then this
produces a C-additive approximation to the max-weight problem (6.1), with C = L log(2). It follows
that such an algorithm can yield full throughput optimality, and can come arbitrarily close to utility
optimality, with an average backlog and delay expression that is polynomial in the network size.
Remarkably, the next theorem, developed in (172), shows that the probability distribution (6.4) is
the desired distribution, in that it exactly solves the problem (6.5)-(6.6). Thus, the max link-weight-
plus-entropy algorithm is a C-additive approximation for the max-weight problem.
Theorem 6.1 (Jiang-Walrand Theorem (172)) The probability distribution p∗ (b) that solves (6.5)
and (6.6) is given by (6.4).
Proof. The proof follows directly from the analysis techniques used in (172), although we organize
the proof differently below. We first compute the value of the maximization objective under the
particular distribution p ∗ (b) given in (6.4). We have:
L
L
∗ ∗ ∗ ∗ ∗
− p (b) log(p (b)) + p (b) Wl (t)bl = p (b) log(A) − p (b) Wl (t)bl
b∈B b∈B l=1 b∈B b∈B l=1
L
+ p ∗ (b) Wl (t)bl
b∈B l=1
= log(A)
6.1. TIME-INVARIANT INTERFERENCE NETWORKS 143
where we have used the fact that p∗ (b)
is a probability distribution and hence sums to 1. We now
show that the expression in the objective of (6.5) for any other distribution p(b) is no larger than
log(A), so that p ∗ (b) is optimal for this objective. To this end, consider any other distribution p(b).
We have:
L
− p(b) log(p(b)) + p(b) Wl (t)bl
b∈B b∈B l=1
L
p(b)∗
= − p(b) log p (b) ∗ + p(b) Wl (t)bl
p (b)
b∈B b∈ B l=1
p(b)
= − p(b) log − p(b) log(p∗ (b))
p ∗ (b)
b∈B b∈B
L
+ p(b) Wl (t)bl
b∈B l=1
L
≤ − p(b) log(p∗ (b)) + p(b) Wl (t)bl (6.7)
b∈B b∈B l=1
L
= − p(b) log(1/A) − p(b) Wl (t)bl
b∈B b∈B l=1
L
+ p(b) Wl (t)bl
b∈B l=1
= log(A)
where in (6.7), we have used the well known Kullback-Leibler divergence result, which states that
the divergence between any two distributions p∗ (b) and p(b) is non-negative (174):
∗ p(b)
dKL (p||p )= p(b) log ≥0
p ∗ (b)
b∈B
Thus, the maximum value of the objective function (6.5) is log(A), which is achieved by the distri-
bution p ∗ (b), proving the result. 2
Assume now the set B of all valid link activation vectors has a connectedness property, so that it
is possible to get from any b1 ∈ B to any other b2 ∈ B by a sequence of adding or removing single
links, where each step of the sequence produces another valid activation vector in B (this holds in
the reasonable case when removing any activated link from an activation vector in B yields another
activation vector in B ). In this case, the distribution (6.4) is particularly interesting because it is the
exact stationary distribution associated with a continuous time ergodic Markov chain with state b(v)
(where v is a continuous time variable that is not related to the discrete time index t for the current
144 6. APPROXIMATE SCHEDULING
slot). Transitions for this Markov chain take place by having each link l such that bl (v) = 1 de-
activate at times according to an independent exponential distribution with rate μ = 1, and having
each link l such that bl (v) = 0 independently activate according to an exponential distribution with
rate λl = exp(Wl (t)), provided that turning this link ON does not violate the link constraints B .
That the resulting steady state is given by (6.4) can be shown by state space truncation arguments as
in (129)(131). This has the form of a simple distributed algorithm where links independently turn
ON or OFF, with Carrier Sense Multiple Access (CSMA) telling us if it is possible to turn a new
link ON (see also (175)(172)(173)(176)(177) for details on this).
Unfortunately, we need to run such an algorithm in continuous time for a long enough time
to reach a near steady state, and this all needs to be done within one slot to implement the result. Of
course, we can use a T -slot argument as in Section 6.1.1 to allow more time to reach the steady state,
with the understanding that the queue backlog changes by an amount O(T ) that yields an additional
additive term in our C-additive approximation (see (176) for an argument in this direction using
stochastic approximation theory). However, for general networks, the convergence of the Markov
chain to near-steady-state takes a non-polynomial amount of time (else, we could solve NP-hard
problems with efficient randomized algorithms). This is because the Markov chain can get “trapped”
for long durations of time in certain sub-optimal link activations (this is compensated for in the steady
state distribution by getting “trapped” in a max-weight link activation for an even longer duration
of time). Even computing the normalizing A constant for the distribution in (6.4) is known to be a
“#P-complete” problem (178) (see also factor graph approximations in (179)). However, it is known
that for link activation sets with certain degree-2 properties, such as those formed by networks
on rings, similar Markov chains require only a small (polynomial) time to reach near steady state
(180)(181). This may explain why the simulations in (172) for networks with small degree provide
good performance.
While C-additive approximations can push throughput and throughput-utility arbitrarily close to
optimal, they may have large convergence times and delays as discussed in the previous section.
It is often possible to provide low complexity decisions for b(t) that come within a multiplicative
factor of the max-weight solution. This section shows that such algorithms immediately lead to
constant-factor stability and throughput-utility guarantees. The result holds for general networks,
possibly with time-varying channels, and possibly with non-binary rate vectors.
Let S(t) describe the channel randomness on slot t (i.e., the topology state), and let I (t)
be the transmission action on slot t, chosen within an abstract set IS (t) . The rate vector b(t) =
(b1 (t), . . . , bL (t)) is determined by a general function of I (t) and S(t):
where the expectation in the left-hand-side is with respect to the distribution of S(t) and the possibly
randomized decision for I ∗ (t) that is made in reaction to the observed S(t). For simplicity, assume
the set is closed. Recall that for any rate vector (λ1 , . . . , λN ) in the capacity region , there exists
a S-only policy I ∗ (t) that satisfies:
We say that a vector (λ1 , . . . , λL ) is interior to the scaled capacity region β if there is an > 0
such that:
(λ1 + , . . . , λL + ) ∈ β
146 6. APPROXIMATE SCHEDULING
Assume second moments of the arrival and service rate processes are bounded. Define
L(Q(t)) = 21 L 2
l=1 Ql (t) , and recall that Lyapunov drift satisfies (see (3.16)):
L
L
(Q(t)) ≤ B + Ql (t)λl − Ql (t)E b̂l (I (t), S(t))|Q(t) (6.9)
l=1 l=1
Theorem 6.3 Consider the above 1-hop network with ω(t) i.i.d. over slots and with arrival rates
(λ1 , . . . , λL ). Fix β such that 0 < β ≤ 1. Suppose there is an > 0 such that:
(λ1 + , . . . , λL + ) ∈ β (6.10)
If a (β, C)-approximation is used for all slots t (where C ≥ 0 is a given constant), and if E {L(Q(0))} <
∞, then the network is mean rate stable and strongly stable, with average queue backlog bound:
1
t−1 L
lim sup E {Ql (τ )} ≤ B/
t→∞ t
τ =0 l=1
Proof. Fix slot t. Because our decision I (t) yields a (β, C)-approximation for minimizing the final
term in the right-hand-side of (6.9), we have:
L
L
(Q(t)) ≤ B + C + Ql (t)λl − β Ql (t)E b̂l (I ∗ (t), S(t))|Q(t) (6.11)
l=1 l=1
where I ∗ (t) is any other (possibly randomized) decision in the set IS(t) . Because (6.10) holds, we
know that:
(λ1 /β + /β, . . . , λL /β + /β) ∈
Thus, there exists a S-only policy I ∗ (t) that satisfies:
where the first equality above holds because I ∗ (t) is S-only and hence independent of the queue
backlogs Q(t). Plugging this policy into the right-hand-side of (6.11) yields:
L
L
(Q(t)) ≤ B + C + Ql (t)λl − β Ql (t)(λl /β + /β) (6.12)
l=1 l=1
L
= B +C− Ql (t) (6.13)
l=1
The result then follows by the Lyapunov drift theorem (Theorem 4.1). 2
6.2. MULTIPLICATIVE FACTOR APPROXIMATIONS 147
The above theorem can be intuitively interpreted as follows: Any (perhaps approximate) effort
to schedule transmissions to maximize the weighted sum of transmission rates translates into good
network performance. More concretely, simple greedy algorithms with β = 1/2 and C = 0 (i.e.
(1/2, 0)-approximation algorithms) exist for networks with matching constraints (where links can be
simultaneously scheduled if they do not share a common node). Indeed, it can be shown that the
greedy maximal match algorithm that first selects the largest weight link (breaking ties arbitrarily),
then selects the next largest weight link that does not conflict with the previous one, and so on, yields
a (1/2, 0)-approximation, so that it comes within a factor β = 1/2 of the max-weight decision (see,
for example, (137)). Distributed random access versions of this that produce (β, C) approximations
are considered in (154).
Different forms of approximate scheduling, not based on approximating the queue-based
max-weight rule, are treated using maximal matchings for stable switch scheduling in (183)(102),
for stable wireless networks in (184)(104)(103), for utility optimization in (185), and for energy
optimization in (186).
149
CHAPTER 7
Optimization of Renewal
Systems
Here we extend the drift-plus-penalty framework to allow optimization over renewal systems. In
previous chapters, we considered a slotted structure and assumed that every slot t a single random
event ω(t) is observed, a single action α(t) is taken, and the combination of α(t) and ω(t) generates
a vector of attributes (i.e., either penalties or rewards) for that slot. Here, we change the slot structure
to a renewal frame structure. The frame durations are variable and can depend on the decisions made
over the course of the frame. Rather than specifying a single action to take on each frame r, we must
specify a dynamic policy π [r] for the frame. A policy is a contingency plan for making a sequence
of decisions, where new random events might take place after each decision in the sequence. This
model allows a larger class of problems to be treated, including Markov Decision Problems, described
in more detail in Section 7.6.2.
An example renewal system is a wireless sensor network that is repeatedly used to perform
sensing tasks. Assume that each new task starts immediately when the previous task is completed.
The duration of each task and the network resources used depend on the policy implemented for
that task. Examples of this type are given in Section 7.4 and Exercise 7.1.
Consider a dynamic system over the continuous timeline t ≥ 0 (where t can be a real number).
We decompose the timeline into successive renewal frames. Renewal frames occur one after the other,
and the start of each renewal frame is a time when the system state is “refreshed,” which will be
made precise below. Define t[0] = 0, and let {t[0], t[1], t[2], . . .} be a strictly increasing sequence
that represents renewal events. For each r ∈ {0, 1, 2, . . .}, the interval of time [t[r], t[r + 1]) is the
150 7. OPTIMIZATION OF RENEWAL SYSTEMS
rth renewal frame. Denote T [r]= t[r + 1] − t[r] as the duration of the rth renewal frame (see Fig.
7.1).
At the start of each renewal frame r ∈ {0, 1, 2, . . .}, the controller chooses a policy π [r] from
some abstract policy space P .This policy is implemented over the course of the frame.There may be a
sequence of random events during each frame r, and the policy π[r] specifies decisions that are made
in reaction to these events. The size of the frame T [r] is random and may depend on the policy. Fur-
ther, the policy on frame r generates a random vector of penalties y [r] = (y0 [r], y1 [r], . . . , yL [r]).
We formally write the renewal size T [r] and the penalties yl [r] as random functions of π[r]:
Thus, given π [r], T̂ (π[r]) and ŷl (π [r]) are random variables. We make the following renewal
assumptions:
• For any policy π ∈ P , the conditional distribution of (T [r], y [r]), given π[r] = π, is inde-
pendent of the events and outcomes from past frames, and is identically distributed for each
frame that uses the same policy π .
• The frame sizes T [r] are always strictly positive, and there are finite constants Tmin , Tmax ,
y0,min , y0,max such that for all policies π ∈ P , we have:
0 < Tmin ≤ E T̂ (π[r])|π [r] = π ≤ Tmax , y0,min ≤ E ŷ0 (π [r])|π [r] = π ≤ y0,max
2
• There are finite constants D 2 and yl,max for l ∈ {1, . . . , L} such that for all π ∈ P :
That is, second moments are uniformly bounded, regardless of the policy.
In the special case when the system evolves in discrete time with unit time slots, all frame
sizes T [r] are positive integers, and Tmin = 1.
1 1
R−1 R−1
lim T [r] = T (w.p.1) , lim yl [r] = y l (w.p.1) (7.3)
R→∞ R R→∞ R
r=0 r=0
7.1. THE RENEWAL SYSTEM MODEL 151
We want to design an algorithm that chooses policies π [r] over each frame r ∈ {0, 1, 2, . . .} to solve
the following problem:
Minimize: y 0 /T (7.4)
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L} (7.5)
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.6)
where (c1 , . . . , cL ) are a given collection of real numbers that define time average cost constraints for
each penalty.
The value y l /T represents the time average penalty associated with the yl [r] process. To
understand this, note that the time average penalty, sampled at renewal times, is given by:
R−1 R−1
yl [r] limR→∞ 1
r=0 yl [r] yl
lim r=0
R−1
= R
R−1 =
r=0 T [r] limR→∞ R1 T [r] T
R→∞
r=0
Hence, our goal is to minimize the time average associated with the y0 [r] penalty, subject to the
constraint that the time average associated with the yl [r] process is less than or equal to cl , for all
l ∈ {1, . . . , L}.
As before, we shall find it easier to work with time average expectations of the form:
1
R−1 R−1
1
T [R]= E {T [r]} , y l [R]= E {yl [r]} ∀l ∈ {0, 1, . . . , L} (7.7)
R R
r=0 r=0
Under mild boundedness assumptions on T [r] and yl [r] (for example, when these are determinis-
tically bounded), the Lebesgue dominated convergence theorem ensures that the limiting values of
T [R] and y l [R] also converge to T and y l whenever (7.3) holds (see Exercise 7.9).
Lemma 7.1 If there is an i.i.d. algorithm that satisfies the feasibility constraints (7.8), then for any
δ > 0 there is an i.i.d. algorithm π ∗ [r] that satisfies:
E ŷ0 (π ∗ [r]) ≤ E T̂ (π ∗ [r]) (ratioopt + δ) (7.9)
E ŷl (π ∗ [r]) ≤ E T̂ (π ∗ [r]) cl ∀l ∈ {1, . . . , L} (7.10)
The value ratioopt is defined in terms of i.i.d. algorithms. It can be shown that, under mild
assumptions, the value ratioopt is also the infimum of the objective function in the problem (7.4)-
(7.6), which does not restrict to i.i.d. algorithms. This is similar in spirit to Theorems 4.18 and 4.5.
However, rather than stating these assumptions and proving this result, we simply use ratioopt as
our target, so that we desire to push the time average penalty objective as close as possible to the
smallest value that can be achieved over i.i.d. algorithms.
It is often useful to additionally assume that the following “Slater” assumption holds:
Slater Assumption for Renewal Systems: There is a value > 0 and an i.i.d. algorithm π ∗ [r]
such that:
E ŷl (π ∗ [r]) ≤ E T̂ (π ∗ [r]) (cl − ) ∀l ∈ {1, . . . , L} (7.11)
Let Z [r] be the vector of queue values, and define the Lyapunov function L(Z [r]) by:
1
L
L(Z [r])= Zl [r]2 (7.13)
2
l=1
L
(Z [r]) ≤ B + Zl [r]E ŷl (π [r]) − cl T̂ (π [r])|Z [r] (7.14)
l=1
where B is a finite constant that satisfies the following for all r and all possible Z [r]:
1
L
B≥ E (yl [r] − cl T [r])2 |Z [r] (7.15)
2
l=1
Such a finite constant B exists by the boundedness assumptions (7.1)-(7.2). The drift-plus-penalty
for frame r thus satisfies:
L
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + V E ŷ0 (π [r])|Z [r] + Zl [r]E ŷl (π [r])|Z [r]
l=1
L
− Zl [r]cl E T̂ (π [r])|Z [r] (7.16)
l=1
This variable-frame drift methodology was developed in (56)(57) for optimizing delay in networks
defined on Markov chains. However, the analysis in (56)(57) used a policy based on minimizing the
right-hand-side of the above inequality, which was only shown to be effective for pure feasibility
problems (where ŷ0 (π [r]) = 0 for all r) or for problems where the frame durations are independent of
the policy (see also Exercise 7.3). Our algorithm below, which can be applied to the general problem,
is inspired by the decision rule in (58), which minimizes the ratio of expected drift-plus-penalty
over expected frame size.
Renewal-Based Drift-Plus-Penalty Algorithm: At the beginning of each frame r ∈ {0, 1, 2, . . .},
observe Z [r] and do the following:
• Choose a policy π[r] ∈ P that minimizes the following ratio:
L
E V ŷ0 (π [r]) + l=1 Zl [r]ŷl (π [r])|Z [r]
(7.17)
E T̂ (π [r])|Z [r]
Definition 7.2 A policy π [r] is a C-additive approximation of the policy that minimizes (7.17) if:
⎡ ⎤
E V ŷ0 (π [r]) + L l=1 Zl [r]ŷl (π [r])|Z [r] E V ŷ0 (π ) + L l=1 Zl [r]ŷl (π )|Z [r]
≤ C + inf ⎣ ⎦
E T̂ (π[r])|Z [r] π∈P E T̂ (π )|Z [r]
154 7. OPTIMIZATION OF RENEWAL SYSTEMS
where π ∗ [r] is any i.i.d. algorithm that is chosen in P and is independent of queues Z [r]. In the
above inequality, we have used the fact that:
E {Zl [R]}
lim = 0 ∀l ∈ {1, . . . , L}
R→∞ R
b) For all l ∈ {1, . . . , L} we have:
B + CTmax
y 0 [R] − ratioopt T [R] ≤
V
where B is defined in (7.15).
d) If the Slater assumption (7.11) holds for a constant > 0, then all queues Zl [r] are strongly
stable and satisfy the following for all R > 0:
1
R−1 L
VF
E {Zl [r]} ≤ (7.19)
R Tmin
r=0 l=1
7.2. DRIFT-PLUS-PENALTY FOR RENEWAL SYSTEMS 155
where the constant F is defined below in (7.22). Further, if for all l ∈ {1, . . . , L}, yl [r] − cl T [r] is either
deterministically lower bounded or deterministically upper bounded, then queues Zl [r] are rate stable and:
R−1
r=0 yl [r]
1
R
lim sup 1 R−1 ≤ cl ∀l ∈ {1, . . . , L} (w.p.1)
R→∞ R r=0 T [r]
Proof. (Theorem 7.3) Because we use a C-additive approximation every frame r, we know that
(7.18) holds. Plugging the i.i.d. algorithm π ∗ [r] from (7.18) into the right-hand-side of the drift-
plus-penalty inequality (7.16) yields:
E T̂ (π [r])|Z [r]
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + V E ŷ0 (π ∗ [r])
E T̂ (π ∗ [r])
E T̂ (π [r])|Z [r] L L
∗
+ Zl [r]E ŷl (π [r]) − Zl [r]cl E T̂ (π [r])|Z [r] (7.20)
E T̂ (π ∗ [r]) l=1 l=1
where π ∗ [r] is any policy in P . Now fix δ > 0, and plug into the right-hand-side of (7.20) the policy
π ∗ [r] that satisfies (7.9)-(7.10), which makes decisions independent of Z [r], to yield:
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + E T̂ (π [r])|Z [r] V (ratioopt + δ)
where we use max[ratioopt Tmax , ratioopt Tmin ] because ratioopt may be negative. This proves
that all components Zl [r] are mean rate stable by Theorem 4.1, proving part (a). The first lim sup
statement in part (b) follows immediately from mean rate stability of Zl [r] (via Theorem 2.5(b)).
The second lim sup statement in part (b) follows from the first (see Exercise 7.4).
To prove part (c), we take expectations of (7.21) to find:
E {L(Z [r + 1])} − E {L(Z [r])} + V E {y0 [r]} ≤ B + CTmax + E T̂ (π [r]) V ratioopt
Using the definitions of y 0 [R] and T [R] in (7.7) and noting that E {L(Z [R])} ≥ 0 and
E {L(Z [0])} = 0 yields:
B + CTmax
y 0 [R] ≤ + ratioopt T [R]
V
This proves part (c).
Part (d) follows from plugging the policy π ∗ [r] from (7.11) into (7.20) to obtain:
E T̂ (π [r])|Z [r] L
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + V y0,max − Tmin Zl [r]
E T̂ (π ∗ [r]) l=1
Thus, from Theorem 4.1, we have that (7.19) holds, so that all queues Zl [r] are strongly stable.
In the special case when the yl [r] − cl T [r] are deterministically bounded, we have by the Strong
Stability Theorem (Theorem 2.8) that all queues are rate stable. Thus, by Theorem 2.5(a):
1 1
R−1 R−1
lim sup yl [r] − cl T [r] ≤ 0 (w.p.1)
R→∞ R R
r=0 r=0
However:
R−1
R−1
R−1
r=0 yl [r] r=0 yl [r] r=0 T [r]
1 1 1
R
R−1
− cl ≤ max R1 R−1
− cl , 0 R
R−1
r=0 T [r] r=0 T [r] T [r]
1 1
R R R
r=0
1
R−1
1
R−1
1
= max yl [r] − cl T [r], 0 R−1
(7.23)
r=0 T [r]
R R 1
r=0 r=0 R
Further, because for all r ∈ {1, 2, . . .} we have E {T [r]|T [0], T [1], . . . , T [r − 1]} ≥ Tmin and
E T [r]2 |T [0], T [1], . . . , T [r − 1] ≤ D 2 , from Lemma 4.3 it follows that:
1
R−1
lim inf T [r] ≥ Tmin > 0 (w.p.1)
R→∞ R
r=0
7.3. MINIMIZING THE DRIFT-PLUS-PENALTY RATIO 157
and so taking a lim sup of (7.23) yields:
R−1
r=0 yl [r]
1
1
lim sup R
R−1 − cl ≤ 0 × = 0 (w.p.1)
r=0 T [r]
1 Tmin
R→∞ R
Minimize: y 0 /T
Subject to: y l ≤ 0 ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This changes the constraints from y l /T ≤ cl to y l ≤ 0. However, this is just a special case of the
original problem (7.4)-(7.6) with cl = 0.
Now suppose we seek to minimize y 0 , rather than y 0 /T . The problem is:
Minimize: y0
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This problem has a significantly different structure than (7.4)-(7.6), and it is considerably easier to
solve. Indeed, Exercise 7.3 shows that it can be solved by minimizing an expectation every frame,
rather than a ratio of expectations.
Finally, we note that Exercise 7.5 explores an alternative algorithm for the original problem
(7.4)-(7.6). The alternative uses only a minimum of an expectation every frame, rather than a ratio
of expectations.
with equality if and only if policy π achieves the infimum ratio E {a(π)} /E {b(π )} = θ ∗ .
Multiplying both sides by E {b(π )} and noting that E {b(π )} > 0 yields (7.25). That equality holds
if and only if E {a(π )} /E {b(π )} = θ ∗ follows immediately. 2
where the final equality uses the definition of θ ∗ in (7.24). This proves (7.26).
To prove (7.27), suppose that θ > θ ∗ . Then:
inf E {a(π ) − θb(π )} = inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )E {b(π )}
π∈P π∈P
≤ inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )Tmin
π∈P
= −(θ − θ ∗ )Tmin < 0
where we have used (7.26). This proves (7.27). To prove (7.28), suppose θ < θ ∗ . Then:
inf E {a(π ) − θb(π )} = inf [E a(π) − θ ∗ b(π ) + E (θ ∗ − θ )b(π ) ]
π∈P π∈P
≥ inf E a(π) − θ ∗ b(π ) + (θ ∗ − θ )Tmin
π∈P
= (θ ∗ − θ )Tmin > 0
(k)
Define θbisect as:
(k) (k) (k)
θbisect = (θmax + θmin )/2
We then compute inf π∈P E a(π) − θbisect b(π ) . If the result is 0, then θbisect = θ ∗ . If the result
(k) (k)
is positive then we know θbisect < θ ∗ , and if the result is negative we know θbisect > θ ∗ . We then
(k) (k)
appropriately adjust our upper and lower bounds for stage k + 1. The uncertainty interval decreases
by a factor of 2 on each stage, and so this algorithm converges exponentially fast to the value θ ∗ . This
is useful because each stage involves minimizing an expectation, rather than a ratio of expectations.
160 7. OPTIMIZATION OF RENEWAL SYSTEMS
7.3.2 OPTIMIZATION OVER PURE POLICIES
Let P pure be any finite or countably infinite set of policies that we call pure policies:
P pure = {π1 , π2 , π3 , . . .}
Let P be the larger policy space that considers all probabilistic mixtures of pure policies. Specifically,
the space P considers policies that make a randomized decision about which policy πi ∈ P pure
to use, according to some probabilities qi = P r[Implement policy πi ] with ∞ i=1 qi = 1. It turns
out that minimizing the ratio E {a(π)} /E {b(π )} over π ∈ P can be achieved by considering only
pure policies π ∈ P pure . To see this, define θ ∗ as the infimum ratio over π ∈ P , and for simplicity,
assume that θ ∗ is achieved by some particular policy π ∗ ∈ P , which corresponds to a probability
distribution (q1∗ , q2∗ , . . .) for selecting pure policies (π1 , π2 , . . .). Then:
∞
0 = E a(π ∗ ) − θ ∗ b(π ∗ ) = qi∗ E a(πi ) − θ ∗ b(πi )
i=1
∞
≥ qi∗ inf ∗
E a(π) − θ b(π )
π∈P pure
i=1
= inf E a(π) − θ ∗ b(π )
π∈P pure
On the other hand, because P is a larger policy space than P pure , we have:
0 = inf E a(π) − θ ∗ b(π ) ≤ infpure E a(π) − θ ∗ b(π )
π∈P π∈P
Thus:
infpure E a(π) − θ ∗ b(π ) = 0
π∈P
which shows that the infimum ratio θ ∗ can be found over the set of pure policies.
The same result holds more generally: Let P pure be any (possibly uncountably infinite) set of
policies that we call pure policies. Define as the set of all vectors (E {a(π)} , E {b(π )}) that can
be achieved by policies π ∈ P pure . Suppose P is a larger policy space that contains all pure policies
and is such that the set of all vectors (E {a(π)} , E {b(π )}) that can be achieved by policies π ∈ P is
equal to the convex hull of , denoted Conv( ).1 If θ ∗ is the infimum ratio of E {a(π)} /E {b(π)}
over π ∈ P , then:
0 = inf E a(π) − θ ∗ b(π ) = inf [a − θ ∗ b]
π∈P (a,b)∈Conv( )
= inf [a − θ ∗ b]
(a,b)∈
= infpure E a(π) − θ ∗ b(π )
π∈P
1The convex hull of a set ⊆ Rk (for some integer k > 0) is the set of all finite probabilistic mixtures of vectors in . It can be
shown that Conv( ) is the set of all expectations E {X} that can be achieved by random vectors X that take values in the set
according to any probability distribution that leads to a finite expectation.
7.3. MINIMIZING THE DRIFT-PLUS-PENALTY RATIO 161
where we have used the well known fact that the infimum of a linear function over the convex hull
of a set is equal to the infimum over the set itself. Therefore, by Lemma 7.4, it follows that θ ∗ is also
the infimum ratio of E {a(π )} /E {b(π )} over the smaller set of pure policies P pure .
7.4 shows it must also minimize the ratio E {a(π)} /E {b(π )}.
If θ ∗ is unknown, we can compute an approximation of θ ∗ via the bisection algorithm as
follows. At step k, we have θbisect [k], and we want to compute:
inf E {a(π ) − θbisect [k]b(π )} = E inf E a(π ) − θbisect [k]b(π )|η[r]
π∈P π ∈Pη [r]
This can be done by generating a collection of W i.i.d. samples {η1 , η2 , . . . , ηW } (all with the same
distribution as η[r]), computing the infimum conditional expectation for each sample, and then
using the law of large numbers to approximate the expectation as follows:
E inf E a(π ) − θbisect [k]b(π )|η[r] ≈
π ∈Pη [r]
1
W
inf E a(π ) − θbisect [k]b(π )|η[r] = ηw = val(θbisect [k]) (7.29)
W π ∈P η
w=1 w
For a given frame r, the same samples {η1 , . . . , ηW } should be used for each step of the
bisection routine. This ensures the stage-r approximation function val(θ) uses the same samples
and is thus non-increasing in θ, important for the bisection to work properly (see Exercise 7.2).
However, new samples should be used on each frame. If it is difficult to generate new i.i.d. samples
{η1 , . . . , ηW } on each frame (possibly because the distribution of η[r] is unknown), we can use W
past values of η[r]. There is a subtle issue here because these past values are not independent of the
queue backlogs Zl [r] that are part of the a(π) function. However, using these past values can still
be shown to work via a delayed-queue argument given in the max-weight learning theory of (166).
T [r] = T̂ (η[r], π[r]) , g[r] = ĝ(η[r], π [r]) , yl [r] = ŷl (η[r], π[r])
7.4. TASK PROCESSING EXAMPLE 163
Let pav be a positive constant. The goal is design an algorithm to solve:
Maximize: g/T
Subject to: y l /T ≤ pav ∀l ∈ {1, . . . , L}
π [r] ∈ P pure ∀r ∈ {0, 1, 2, . . .}
Example Problem:
a) State the renewal-based drift-plus-penalty algorithm for this problem.
b) Assume that the frame size is independent of the policy, so that T̂ (η[r], π [r]) = T̂ (η[r]).
Show that minimization of the ratio of expectations can be done without bisection, by solving a
single deterministic problem every slot.
c) Assume the general case when the frame size depends on the policy. Suppose the optimal
ratio value θ ∗ [r] is known for frame r. State the deterministic problem to solve every slot, with the
structure of minimizing a(π ) − θ ∗ [r]b(π ) as in Section 7.3.3.
d) Describe the bisection algorithm that obtains an estimate of θ ∗ [r] for part (c). Assume we
have W past values of initial information {η[r], η[r − 1], . . . , η[r − W + 1]}, and that we know
θmin ≤ θ ∗ [r] ≤ θmax for some constants θmin and θmax .
Solution:
a) Create virtual queues Zl [r] for each l ∈ {1, . . . , L} as follows:
Zl [r + 1] = max[Zl [r] + ŷl (η[r], π[r]) − T̂ (η[r], π[r])pav , 0] (7.30)
Every frame r ∈ {0, 1, 2, . . .}, observe η[r] and Z [r] and do the following:
• Choose π[r] ∈ P pure to minimize:
L
E −V ĝ(η[r], π[r]) + l=1 Zl [r]ŷl (η[r], π[r])|Z [r]
(7.31)
E T̂ (η[r], π[r])|Z [r]
c) If θ ∗ [r] is known, then we observe η[r] and Z [r] and choose the policy π [r] ∈ P pure as
the one that minimizes:
L
−V ĝ(η[r], π [r]) + Zl [r]ŷl (η[r], π[r]) − θ ∗ [r]T̂ (η[r], π [r])
l=1
164 7. OPTIMIZATION OF RENEWAL SYSTEMS
d) Fix a particular frame r. Let θmin and θmax be the bounds on θ ∗ [r] for step k of the bisection,
(k) (k)
(0) (k) (k) (k) (k)
where θmin = θmin and θmax = θmax . Define θbisect = (θmin + θmax )/2. Define {η1 , . . . ηW } as the
W samples to be used. Define the function val(θ) as follows:
1
W L
val(θ) = min −V ĝ(ηi , π) + Zl [r]ŷl (ηi , π) − θ T̂ (ηi , π) (7.32)
W π ∈P pure
i=1 l=1
Note that computing val(θ) involves W separate minimizations. Note also that val(θ) is non-
(k)
increasing in θ (see Exercise 7.2). Now compute val(θbisect ):
The frame size T [r] is also a random function of the policy as before: T [r] = T̂ (π[r]). We make
the same assumptions as before, including that second moments of x̂m (π [r]) are uniformly bounded
regardless of the policy, and that the conditional distribution of (T [r], y [r], x[r]), given π[r] = π ,
is independent of events on previous frames, and is identically distributed on each frame that uses
the same policy π . Let Tmin , Tmax , xm,min , xm,max be finite constants such that for all policies π ∈ P
and all m ∈ {1, . . . , M}, we have:
0 < Tmin ≤ E T̂ (π [r])|π [r] = π ≤ Tmax , xm,min ≤ E x̂(π [r])|π [r] = π ≤ xm,max
Under a particular algorithm for choosing policies π[r] over frames r ∈ {0, 1, 2, . . .}, define T [R],
y l [R], x m [R] for R > 0 by:
1
R−1 R−1 R−1
1 1
T [R]= E {T [r]} , y l [R]= E {yl [r]} , x m [R]= E {xm [r]}
R R R
r=0 r=0 r=0
7.5. UTILITY OPTIMIZATION FOR RENEWAL SYSTEMS 165
Define T , y l , x m as the limiting values of T [R], y l [R], x m [R], assuming temporarily that the limit
exists. For each m ∈ {1, . . . , M}, define γm,min and γm,max by:
xm,min xm,min xm,max xm,max
γm,min = min , , γm,max = max ,
Tmin Tmax Tmin Tmax
x m [R] xm
γm,min ≤ ≤ γm,max , γm,min ≤ ≤ γm,max (7.33)
T [R] T
Let φ(γ ) be a continuous, concave, and entrywise non-decreasing function of vector γ =
(γ1 , . . . , γM ) over the rectangle γ ∈ R, where:
To transform this problem to one that has the structure given in Section 7.1.1, we define
auxiliary variables γ [r] = (γ1 [r], . . . , γM [r]) that are chosen in the rectangle R every frame r. We
then define a new penalty y0 [r] as follows:
y0 [r]= − T [r]φ(γ [r])
where:
1
R−1
T φ(γ ) = lim E {T [r]φ(γ [r])} = −y 0
R→∞ R
r=0
1
R−1
T γm = lim E {T [r]γm [r])} ∀m ∈ {1, . . . , M}
R→∞ R
r=0
166 7. OPTIMIZATION OF RENEWAL SYSTEMS
That the problems (7.35)-(7.37) and (7.38)-(7.42) are equivalent is proven in Exercise 7.7
using the fact:
T φ(γ )/T ≤ φ(T γ /T )
This fact is a variation on Jensen’s inequality and is proven in the following lemma.
Lemma 7.6 Let φ(γ ) be any continuous and concave (not necessarily non-decreasing) function defined
over γ ∈ R, where R is defined in (7.34).
(a) Let (T , γ ) be a random vector that takes values in the set {(T , γ )|T > 0, γ ∈ R} according to
any joint distribution that satisfies 0 < E {T } < ∞. Then:
E {T φ(γ )} E {T γ }
≤φ
E {T } E {T }
(b) Let (T [r], γ [r]) be a sequence of random vectors of the type specified in part (a), for r ∈
{0, 1, 2, . . .}. Then for any integer R > 0:
R−1
1 R−1
r=0 T [r]φ(γ [r]) 1
r=0 T [r]γ [r]
R
R−1
≤ φ R
1 R−1
(7.43)
r=0 T [r] r=0 T [r]
1
1 R−1 1 R−1
R R
r=0 E {T [r]φ(γ [r])} r=0 E {T [r]γ [r]}
R
1 R−1
≤ φ R
1 R−1
(7.44)
R r=0 E {T [r]} R r=0 E {T [r]}
Proof. Part (b) follows easily from part (a) (see Exercise 7.6). Here we prove part (a). Let
{(T [r], γ [r])}∞
r=0 be an i.i.d. sequence of random vectors, each with the same distribution as (T , γ ).
R−1
Define t0 = 0, and for integers R > 0 define tR = r=0 T [r]. Let interval [tr , tr+1 ) represent the
rth frame. Define γ̂ (t) to take the value γ [r] if t is in the rth frame, so that:
On the other hand, by Jensen’s inequality for the concave function φ(γ ):
tR R−1
r=0 T [r]γ [r]
1
1 tR 1
φ(γ̂ (t))dt ≤ φ γ̂ (t)dt = φ R 1 R−1
(7.46)
r=0 T [r]
tR 0 tR 0
R
7.5. UTILITY OPTIMIZATION FOR RENEWAL SYSTEMS 167
Taking limits of (7.45) as R → ∞ and using the law of large numbers yields:
1 tR E {T φ(γ )}
lim φ(γ̂ (t))dt = (w.p.1)
R→∞ tR 0 E {T }
Taking limits of (7.46) as R → ∞ and using the law of large numbers and continuity of φ(γ ) yields:
1 tR E {T γ }
lim φ(γ̂ (t))dt ≤ φ (w.p.1)
R→∞ tR 0 E {T }
2
This minimization can be simplified by separating out the terms that use auxiliary variables. The
expression to minimize is thus:
M
E T̂ (π [r])[−V φ(γ [r]) + m=1 Gm [r]γm [r]]|Z [r], G[r]
Clearly, the γ [r] variables can be optimized separately to minimize the first term, making the
frame size in the numerator and denominator of the first term cancel. The resulting algorithm is
168 7. OPTIMIZATION OF RENEWAL SYSTEMS
thus: Observe Z [r] and G[r] at the beginning of each frame r ∈ {0, 1, 2, . . .}, and perform the
following:
• (Virtual Queue Updates) At the end of frame r, update Z [r] and G[r] by (7.47) and (7.48).
The auxiliary variable update has the same structure as that given in Chapter 5, and it is
a deterministic optimization that reduces to M optimizations of single variable functions if φ(γ )
has the form φ(γ ) = M m=1 φm (γm ). The policy selection stage is a minimization of a ratio of
expectations, and it can be solved with the techniques given in Section 7.3.
The goal is to maximize a weighted sum of throughput subject to average power constraints:
L
Maximize: l=1 wl D l /T
Subject to: y l /T ≤ pav ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}
where {wl }L l=1 are a given collection of positive weights, pav is a given constant power constraint, D l
and y l are the average delivered data and energy expenditure by transmitter l on one frame, and P is
the policy space that conforms to the above transmission constraints over the frame. This problem
fits the standard renewal form given in Section 7.1 with cl = pav for all l ∈ {1, . . . , L}, and:
L +T −1
rT
y0 [r]= − wl 1l (τ )
l=1 τ =rT
170 7. OPTIMIZATION OF RENEWAL SYSTEMS
We thus form virtual queues Zl [r] for each l ∈ {1, . . . , L}, with updates:
rT +T −1
Zl [r + 1] = max Zl [r] + pl (τ ) − pav T , 0 (7.49)
τ =rT
+T −1
L rT
Maximize: E {V wl 1l (τ ) − Zl [r]pl (τ )|Z [r], A[r]}
l=1 τ =rT
Subject to: (1) 0 ≤ ≤ pmax ∀l, ∀τ ∈ {rT , . . . , rT + T − 1}
pl (τ )
(2) pl (τ ) = 0 if Ql (τ ) = 0 ∀l, ∀τ ∈ {rT , . . . , rT + T − 1}
(3) pl (τ )pm (τ ) = 0 ∀l = m, ∀τ ∈ {rT , . . . , rT + T − 1}
The problem can be solved as a dynamic program (64). Specifically, we can start backwards and
define JT (Q) as the optimal reward in the final stage T (corresponding to slot τ = rT + T − 1)
given that Q(rT + T − 1) = Q:
JT (Q)= max max [V wl ql (p) − Zl [r]p]
l|Ql >0 {p|0≤p≤pmax }
This function JT (Q) is computed for all integer vectors Q that satisfy 0 ≤ Q ≤ A[r]. Then define
JT −1 (Q) as the optimal expected sum reward in the last two stages {T − 1, T }, given that Q(rT +
T − 2) = Q:
JT −1 (Q)= max max [V wl ql (p) − Zl [r]p + ql (p)JT (Q − el ) + (1 − ql (p))JT (Q)]
l|Ql >0 {p|0≤p≤pmax }
where el is a vector that is zero in all entries j = l, and is 1 in entry l. The function JT −1 (Q) is also
computed for all Q that satisfy 0 ≤ Q ≤ A[r]. In general, we have for stages k ∈ {1, . . . , T − 1}
the following recursive equation:
Jk (Q)= max max [V wl ql (p) − Zl [r]p + ql (p)Jk+1 (Q − el ) + (1 − ql (p))Jk+1 (Q)]
l|Ql >0 {p|0≤p≤pmax }
7.6. DYNAMIC PROGRAMMING EXAMPLES 171
The value J1 (Q) represents the expected total reward over frame r under the optimal policy, given
that Q(rT ) = Q. The optimal action to take at each stage k corresponds to the transmitter l and
the power level p that achieves the maximum in the computation of Jk (Q).
For a modified problem where power allocations are restricted to pl (τ ) ∈ {0, pmax }, it can
be shown the problem has a simple greedy solution: On each slot τ of frame r, consider the set
of links l such that Ql (τ ) > 0, and transmit over the link l in this set that has the largest positive
V wl ql (pmax ) − Zl [r]pmax value, breaking ties arbitrarily and choosing not to transmit over any
link if none of these values are positive.
where 1i (t) is an indicator function that is 1 if a packet is successfully transmitted from queue i on
slot t (and is 0 otherwise), and Ai (t) is the (integer) number of new packet arrivals to queue i. The
maximum packet loss rate due to forced renewals is thus 20δ, which can be made arbitrarily small
with a small choice of δ > 0. We assume the controller knows the value of χ (t) at the beginning
of each slot. We have two choices of a renewal definition: (i) Define a renewal event on slot t
whenever (Q1 (t), Q2 (t)) = (0, 0), (ii) Define a renewal event on slot t whenever χ (t − 1) = 1.
The first definition has shorter renewal frames, but the frames sizes depend on the control actions.
This would require minimizing a ratio of expectations every slot. The second definition has frame
sizes that are independent of the control actions, and have mean 1/δ. For simplicity, we use the
second definition.
Let gi (t) be the number of packets dropped from queue i on slot t:
Ai (t)1{Qi (t) = 10} if χ (t) = 0
gi (t) =
Qi (t) + Ai (t) − 1i (t) if χ (t) = 1
where 1{Qi (t) = 10} is an indicator function that is 1 if Qi (t) = 10, and 0 otherwise.
Assume the processes A1 (t) and A2 (t) are independent of each other. A1 (t) is i.i.d. Bernoulli
with P r[A1 (t) = 1] = λ1 , and A2 (t) is i.i.d. Bernoulli with P r[A2 (t) = 1] = λ2 . Every slot, the
172 7. OPTIMIZATION OF RENEWAL SYSTEMS
controller chooses a queue for transmission by selecting a power allocation vector (p1 (t), p2 (t))
subject to the constraints:
where pmax is a given maximum power level. Let P (Q) denote the set of all power vectors that
satisfy these constraints. Transmission successes are independent of past history given the power
level used, with probabilities:
qi (p)= P r[1i (t) = 1|Qi (t) > 0, pi (t) = p]
1 [r]−1
R−1 t[r]+T
lim [Q1 (τ ) − 3(A1 (τ ) − g1 (τ ))] ≤ 0
R→∞ R
r=0 τ =t[r]
1 R−1 t[r]+T [r]−1
R r=0 τ =t[r] [p1 (τ ) + p2 (τ )]
lim R−1 ≤ pav
r=0 T [r]
R→∞ 1
R
Following the renewal system framework, we define virtual queues Z1 [r], Z2 [r], Zp [r]:
⎡ ⎤
[r]−1
t[r]+T
Z1 [r + 1] = max ⎣Z1 [r] + [Q1 (τ ) − 3(A1 (τ ) − g1 (τ ))], 0⎦ (7.50)
τ =t[r]
⎡ ⎤
[r]−1
t[r]+T
Z2 [r + 1] = max ⎣Z2 [r] + [Q2 (τ ) − 3(A2 (τ ) − g2 (τ ))], 0⎦ (7.51)
τ =t[r]
⎡ ⎤
[r]−1
t[r]+T
Zp [r + 1] = max ⎣Zp [r] + [p1 (τ ) + p2 (τ ) − pav ], 0⎦ (7.52)
τ =t[r]
Making the queues Z1 [r] and Z2 [r] rate stable ensures the desired delay constraints are satisfied,
and making queue Zp [r] rate stable ensures the power constraint is satisfied. We thus have the
following algorithm, which only minimizes the numerator in the ratio of expectations because the
denominator is independent of the policy:
7.6. DYNAMIC PROGRAMMING EXAMPLES 173
• At the beginning of each frame r, observe Z [r] = [Z1 [r], Z2 [r], Zp [r]] and make power
allocation decisions to minimize the following expression over the frame:
⎧ ⎫
⎨ t[r]+T ⎬
[r]−1
E f (p(τ ), A(τ ), Q(τ ), Z [r]) Z [r]
⎩ ⎭
τ =t[r]
where:
q1 (p1 ) if p1 > 0 (1, 0) if p1 > 0
q(p) = , e(p) =
q2 (p2 ) if p1 = 0 (0, 1) if p1 = 0
The equation (7.53) must be solved to find JZ (Q) for all Q ∈ {0, 1, . . . , 10} × {0, 1, . . . , 10}.
Define (J ) as an operator that takes a function J (Q) (for Q ∈ {0, 1, . . . , 10} ×
{0, 1, . . . , 10}) and maps it to another such function via the right-hand-side of (7.53). Then (7.53)
reduces to:
JZ (Q) = (JZ (Q))
and hence the desired JZ (Q) is a fixed point of the (·) operator. It can be shown that (·) is a
contraction with an appropriate definition of distance (67)(57), and so the fixed point is unique and
can be obtained by iteration of the (·) operator starting with any initial function J (0) (Q) (such as
J (0) (Q) = 0):
Then limi→∞ J (i) (Q) solves the fixed point equation and hence is equal to the desired JZ (Q)
function. While this then needs to be recomputed for the next frame (because the queue Z [r]
change), the change in these queues over one frame is bounded and the resulting JZ (Q) function
for frame r is already a good approximation for this function on frame r + 1. Thus, the initial value
of the iteration can be the final value found in the previous frame.
Iteration of the (J ) operator requires knowledge of the A(t) distribution to compute the
desired expectations. In this case of independent Bernoulli inputs, this involves knowing only two
scalars λ1 and λ2 . However, for larger problems when the random events every slot can be a large
vector, the expectations can be accurately approximated by averaging over past samples, as in (7.29).
See (57) for an analysis of the error bounds in this technique.
See also (61)(60)(59) for alternative approximations to the Markov Decision Problem for wire-
less queueing delay. A detailed treatment of stochastic shortest path problems and approximations is
found in (67). Approximate dynamic programming methods that approximate value functions with
simpler functions can be found in (68)(187)(67)(69). Recent work in (62)(63) combines Markov
Decision theory and approximate value functions for treatment of energy and delay optimization in
wireless systems.
7.7 EXERCISES
Exercise 7.1. (Deterministic Task Processing) Suppose N network nodes cooperate to process
a sequence of tasks. A new task is started when the previous task ends, and we label the tasks
r ∈ {0, 1, 2, . . .}. For each new task r, the network controller makes a decision about which single
node n[r] will process the task, and what modality m[r] will be used in the processing. Assume
there are M possible modalities, each with different durations and energy expenditures. The task r
decision is π[r] = (n[r], m[r]), where n[r] ∈ {1, . . . , N} and m[r] ∈ {1, . . . , M}. Define T (n, m)
and β(n, m) as the duration of time and the energy expenditure, respectively, required for node n to
process a task using modality m. Assume that T (n, m) ≥ 0 and β(n, m) ≥ 0 for all n, m. Let en [r]
represent the energy expended by node n ∈ {1, . . . , N} during task r:
β(n[r], m[r]) if n[r] = n
en [r] =
0 if n[r] = n
We want to maximize the task processing rate subject to average power constraints at each node:
Maximize: 1/T
Subject to: 1) en /T ≤ pn,av , ∀n ∈ {1, . . . , N}
2) n[r] ∈ {1, . . . , N}, m[r] ∈ {1, . . . , M} , ∀r ∈ {0, 1, 2, . . .}
where pn,av is the average power constraint for node n ∈ {1, . . . , N}. State the renewal-based drift-
plus-penalty algorithm of Section 7.2 for this problem. Note that there is no randomness here, and so
the ratio of expectations to be minimized on each frame becomes a ratio of deterministic functions.
7.7. EXERCISES 175
Exercise 7.2. (Non-Increasing Property of val(θ)). Consider the val(θ) function in (7.32). Suppose
that θ1 ≤ θ2 .
a) Argue that for all ηi , π, Zl [r], we have:
L
L
−V ĝ(ηi , π ) + Zl [r]ŷl (ηi , π) − θ1 T̂ (ηi , π) ≥ −V ĝ(ηi , π) + Zl [r]ŷl (ηi , π) − θ2 T̂ (ηi , π)
l=1 l=1
Exercise 7.3. (An Alternative Algorithm with Modified Objective) Consider the system of Section
7.1. However, suppose we desire a solution to the following modified problem:
Minimize: y0
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This differs from (7.4)-(7.6) because we seek to minimize y 0 rather than y 0 /T . Define the same
virtual queues Z [r] in (7.12). Note that (7.16) still applies. Consider the algorithm that, every frame
r, observes Z [r] and chooses a policy π [r] ∈ P to minimize the right-hand-side of (7.16). It then
updates Z [r] by (7.12) at the end of the frame. Assume there is an i.i.d. algorithm π ∗ [r] that yields:
E ŷ0 (π ∗ [r]) = y0
opt
(7.54)
∗
∗
E ŷl (π [r]) ≤ E T̂ (π [r]) cl ∀l ∈ {1, . . . , L} (7.55)
a) Plug the i.i.d. algorithm π ∗ [r] into the right-hand-side of (7.16) to show that (Z [r]) ≤ F
for some finite constant F , and hence all queues are mean rate stable so that:
b) Again plug the i.i.d. algorithm π ∗ [r] into the right-hand-side of (7.16), and use iterated
expectations and telescoping sums to prove:
opt
lim sup y 0 [R] ≤ y0 + B/V
R→∞
Exercise 7.4. (Manipulating limits) Suppose that lim supR→∞ [y l [R] − cl T [R]] ≤ 0, where 0 <
Tmin ≤ T [R] ≤ Tmax for all R > 0.
a) Argue that for all integers R > 0:
y l [R] y [R] T [R] 1
− cl ≤ max 0, l − cl = max 0, y l [R] − cl T [R]
T [R] T [R] Tmin Tmin
176 7. OPTIMIZATION OF RENEWAL SYSTEMS
b) Take limits of the inequality in (a) to conclude that:
y l [R]
lim sup ≤ cl
R→∞ T [R]
Exercise 7.5. (An Alternative Algorithm with Time Averaging) Consider the optimization prob-
lem (7.4)-(7.6) for a renewal system with frame sizes T [r] that depend on the policy π [r]. Define
θ[0] = 0. For each stage r ∈ {1, 2, . . . , } define θ [r] by:
1 r−1
r y0 [k]
θ [r]= 1 k=0
r−1
(7.56)
r k=0 T [k]
so that θ[r] is the empirical time average of the penalty to be minimized over the first r frames.
Consider the following modified algorithm, which does not require the multi-step bisection phase,
but makes assumptions about convergence:
• Every frame r, observe θ [r], Z [r], and choose a policy π [r] ∈ P to minimize:
E V [ŷ0 (π [r]) − θ[r]T̂ (π[r])] + L l=1 Z l [r][ ŷl (π [r]) − c l T̂ (π[r])]| Z [r], θ [r]
To analyze this algorithm, we assume that there are constants θ, T , y 0 such that, with probability 1:
R−1 R−1
limR→∞ θ[R] = θ , limR→∞ 1
R r=0 T [r] = T , limR→∞ 1
R r=0 y0 [r] = y 0 (7.57)
We further assume there is an i.i.d. algorithm π ∗ [r] that satisfies (7.9)-(7.10) with δ = 0.
a) Use (7.14) to complete the right-hand-side of the following inequality:
b) Assume E {L(Z [0])} = 0. Plug the i.i.d. algorithm π ∗ [r] from (7.9)-(7.10) into the right-
hand-side of part (a) to prove that (Z [r]) ≤ F for some constant F , and so all queues are mean
rate stable. Use iterated expectations and the law of telescoping sums to conclude that for any R > 0:
E R1 R−1 ∗ ratioopt − R1 R−1
r=0 [y0 [r] − θ [r]T [r]] ≤ E T̂ (π [r]) r=0 E {θ[r]} + B/V
This can be justified via part (c) together with the Lebesgue Dominated convergence theorem,
provided that mild additional boundedness assumptions on the processes are introduced. Use this
with part (b) to prove:
1 R−1
y0 [r] B
θ = lim 1 r=0
R
R−1
≤ ratioopt + (w.p.1)
r=0 T [r] E T̂ (π ∗ [r]) V
R→∞
R
Exercise 7.6. (Variation on Jensen’s Inequality) Assume the result of Lemma 7.6(a).
a) Let {T [0], T [1], . . . , T [R − 1]}, {γ [0], γ [1], . . . , γ [R − 1]} be deterministic sequences.
Prove (7.43) by defining X as a random integer that is uniform over {0, . . . , R − 1} and defining
the random vector (T [X], γ [X]).
b) Prove (7.44) by considering {T [0], T [1], . . . , T [R − 1]}, {γ [0], γ [1], . . . , γ [R − 1]} as
random sequences that are independent of X.
Exercise 7.8. (Utility Optimization with Delay-Limited Scheduling) Modify the example in Sec-
tion 7.6.1 to treat the problem of maximizing the utility function L
l=1 log(1 + D l /T ), rather than
L
maximizing l=1 wl D l /T .
Exercise 7.9. (A simple form of Lebesgue Dominated Convergence) Let {f [r]}∞ r=0 be an infinite
sequence of random variables. Suppose there are finite constants fmin and fmax such that the random
variables deterministically satisfy fmin ≤ f [r] ≤ fmax for all r ∈ {0, 1, 2, . . .}. Suppose there is a
finite constant f such that:
R−1
limR→∞ R1 r=0 f [r] = f (w.p.1)
178 7. OPTIMIZATION OF RENEWAL SYSTEMS
R−1
We will show that limR→∞ R1 r=0 E {f [r]} = f .
a) Fix > 0. Argue that for any integer R > 0:
R−1 R−1 R−1
E 1
R r=0 f [r] ≤ (f + )P r 1
R r=0 f [r] ≤ f + + fmax P r 1
R r=0 f [r] > f +
Use this with part (a) to conclude that for all > 0:
R−1
limR→∞ R1 r=0 E {f [r]} ≤ f +
Conclude that the left-hand-side in the above inequality is less than or equal to f .
R−1
c) Make a similar argument to show limt→∞ R1 r=0 E {f [r]} ≥ f .
179
CHAPTER 8
Conclusions
This text has presented a theory for optimizing time averages in stochastic networks. The tools
of Lyapunov drift and Lyapunov optimization were developed to solve these problems. Our focus
was on communication and queueing networks, including networks with wireless links and mobile
devices. The theory can be used for networks with a variety of goals and functionalities, such as
networks with:
• Network coding capabilities (see Exercise 4.12 and (188)(189)(190)).
((t)) + V E {penalty(t)|(t)} ≤
N
B + V E {penalty(t)|(t)} + n (t)E {hn (t)|(t)}
n=1
3. Design the policy to minimize the right-hand-side of the above drift-plus-penalty bound.
4. Conclude that, under this algorithm, the drift-plus-penalty is bounded by plugging any other
policy into the right-hand-side:
((t)) + V E {penalty(t)|(t)} ≤
∗
N
B + V E penalty (t)|(t) + n (t)E h∗n (t)|(t)
n=1
5. Plug an ω-only policy α ∗ (t) into the right-hand-side, one that is known to exist (although it
would be hard to compute) that satisfies all constraints and yields a greatly simplified drift-
plus-penalty expression on the right-hand-side.
Also important in this theory is the use of virtual queues to transform time average inequality
constraints into queue stability problems, and auxiliary variables for the case of optimizing convex
functions of time averages.The drift-plus-penalty framework was also shown to hold for optimization
of non-convex functions of time averages, and for optimization over renewal systems.
The resulting min-drift (or “max-weight”) algorithms can be very complex for general prob-
lems, particularly for wireless networks with interference. However, we have seen that low complexity
approximations can be used to provide good performance. Further, for interference networks with-
out time-variation, methods that take a longer time to find the max-weight solution (either by a
deterministic or randomized search) were seen to provide full throughput and throughput-utility op-
timality with arbitrarily low per-timeslot computation complexity, provided that we let convergence
time and/or delay increase (possibly non-polynomially) to infinity. Simple distributed Carrier Sense
Multiple Access (CSMA) implementations are often possible (and provably throughput optimal)
for these networks via the Jiang-Walrand theorem, which hints at deeper connections with Lya-
punov optimization, max-weight theory, C-additive approximations, maximum entropy solutions,
randomized algorithms, and Markov chain steady state theory.
181
Bibliography
[1] F. Kelly. Charging and rate control for elastic traffic. European Transactions on Telecommuni-
cations, vol. 8, no. 1 pp. 33-37, Jan.-Feb. 1997. DOI: 10.1002/ett.4460080106 3, 98
[2] F.P. Kelly, A.Maulloo, and D. Tan. Rate control for communication networks: Shadow prices,
proportional fairness, and stability. Journ. of the Operational Res. Society, vol. 49, no. 3, pp.
237-252, March 1998. DOI: 10.2307/3010473 3, 7, 98, 104
[4] L. Massoulié and J. Roberts. Bandwidth sharing: Objectives and algorithms. IEEE/ACM
Transactions on Networking, vol. 10, no. 3, pp. 320-328, June 2002.
DOI: 10.1109/TNET.2002.1012364 3
[5] A. Tang, J. Wang, and S. Low. Is fair allocation always inefficient. Proc. IEEE INFOCOM,
March 2004. DOI: 10.1109/INFCOM.2004.1354479 3, 98, 128
[6] B. Radunovic and J.Y. Le Boudec. Rate performance objectives of multihop wireless net-
works. IEEE Transactions on Mobile Computing, vol. 3, no. 4, pp. 334-349, Oct.-Dec. 2004.
DOI: 10.1109/TMC.2004.45 3, 128
[7] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and
scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions
on Automatic Control, vol. 37, no. 12, pp. 1936-1948, Dec. 1992. DOI: 10.1109/9.182479 6,
49, 113, 138
[8] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel queues with randomly
varying connectivity. IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466-478,
March 1993. DOI: 10.1109/18.212277 6, 10, 24, 49, 66
[9] P. R. Kumar and S. P. Meyn. Stability of queueing networks and scheduling policies. IEEE
Trans. on Automatic Control, vol.40,.n.2, pp.251-260, Feb. 1995. DOI: 10.1109/9.341782 6
[24] S. H. Low. A duality model of TCP and queue management algorithms. IEEE Trans. on
Networking, vol. 11, no. 4, pp. 525-536, August 2003. DOI: 10.1109/TNET.2003.815297 7
[25] L. Xiao, M. Johansson, and S. Boyd. Simultaneous routing and resource allocation for wireless
networks. Proc. of the 39th Annual Allerton Conf. on Comm., Control, Comput., Oct. 2001. 7
[26] L. Xiao, M. Johansson, and S. P. Boyd. Simultaneous routing and resource allocation via dual
decomposition. IEEE Transactions on Communications, vol. 52, no. 7, pp. 1136-1144, July
2004. DOI: 10.1109/TCOMM.2004.831346 7
[27] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Downlink power allocation for multi-class cdma
wireless networks. Proc. IEEE INFOCOM, 2002. DOI: 10.1109/INFCOM.2002.1019399
7
[28] M. Chiang. Balancing transport and physical layer in wireless multihop networks: Jointly op-
timal congestion control and power control. IEEE Journal on Selected Areas in Communications,
vol. 23, no. 1, pp. 104-116, Jan. 2005. DOI: 10.1109/JSAC.2004.837347 7
[30] R. Cruz and A. Santhanam. Optimal routing, link scheduling, and power
control in multi-hop wireless networks. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208720 7
[31] X. Lin and N. B. Shroff. Joint rate control and scheduling in multihop wireless networks.
Proc. of 43rd IEEE Conf. on Decision and Control, Paradise Island, Bahamas, Dec. 2004. 7, 8,
109
[32] R. Agrawal and V. Subramanian. Optimality of certain channel aware scheduling policies.
Proc. 40th Annual Allerton Conference on Communication , Control, and Computing, Monticello,
IL, Oct. 2002. 7, 119
[34] A. Stolyar. Maximizing queueing network utility subject to stability: Greedy primal-dual algo-
rithm. Queueing Systems, vol. 50, no. 4, pp. 401-457, 2005. DOI: 10.1007/s11134-005-1450-0
7, 119
184 BIBLIOGRAPHY
[35] A. Stolyar. Greedy primal-dual algorithm for dynamic resource allocation in complex net-
works. Queueing Systems, vol. 54, no. 3, pp. 203-220, 2006. DOI: 10.1007/s11134-006-0067-2
7
[36] Q. Li and R. Negi. Scheduling in wireless networks under uncertainties: A greedy primal-dual
approach. Arxiv Technical Report: arXiv:1001:2050v2, June 2010. 8, 119
[37] L. Huang and M. J. Neely. Delay reduction via lagrange multipliers in stochastic network
optimization. Proc. of 7th Intl. Symposium on Modeling and Optimization in Mobile, Ad Hoc,
and Wireless Networks (WiOpt), June 2009. DOI: 10.1109/WIOPT.2009.5291609 8, 10, 69,
71, 113
[38] M. J. Neely. Universal scheduling for networks with arbitrary traffic, channels, and mobility.
Proc. IEEE Conf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 81, 102, 112,
119
[39] M. J. Neely. Universal scheduling for networks with arbitrary traffic, channels, and mobility.
ArXiv technical report, arXiv:1001.0960v1, Jan. 2010. 8, 77, 81, 102, 107
[40] M. J. Neely. Stock market trading via stochastic network optimization. Proc. IEEE Conference
on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 179
[41] M. J. Neely. Stock market trading via stochastic network optimization. ArXiv Technical
Report, arXiv:0909.3891v1, Sept. 2009. 8, 77, 179
[42] M. J. Neely and R. Urgaonkar. Cross layer adaptive control for wireless mesh networks. Ad
Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, August 2007.
DOI: 10.1016/j.adhoc.2007.01.004 8, 102, 112, 119, 179
[43] M. J. Neely. Stochastic network optimization with non-convex utilities and costs. Proc. Infor-
mation Theory and Applications Workshop (ITA), Feb. 2010. DOI: 10.1109/ITA.2010.5454100
8, 116, 117, 118
[44] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue-
length-based scheduling and congestion control. Proc. IEEE INFOCOM, March 2005.
DOI: 10.1109/INFCOM.2005.1498459 8
[45] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue-length-
based scheduling and congestion control. IEEE/ACM Transactions on Networking, vol. 15,
no. 6, pp. 1333-1344, Dec. 2007. DOI: 10.1109/TNET.2007.897944 8, 69, 71
[46] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Opportunistic power scheduling for dynamic
multiserver wireless systems. IEEE Transactions on Wireless Communications, vol. 5, no.6, pp.
1506-1515, June 2006. DOI: 10.1109/TWC.2006.1638671 8
BIBLIOGRAPHY 185
[47] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state information
for throughput maximization. IEEE Transactions on Information Theory, vol. 50, no. 11, pp.
2566-2582, Nov. 2004. DOI: 10.1109/TIT.2004.836687 8
[48] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state
information for throughput maximization. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/TIT.2004.836687 8
[49] X. Liu, E. K. P. Chong, and N. B. Shroff. A framework for opportunistic schedul-
ing in wireless networks. Computer Networks, vol. 41, no. 4, pp. 451-474, March 2003.
DOI: 10.1016/S1389-1286(02)00401-2 8
[50] R. Berry and R. Gallager. Communication over fading channels with delay constraints.
IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1135-1149, May 2002.
DOI: 10.1109/18.995554 8, 9, 67
[51] M. J. Neely. Optimal energy and delay tradeoffs for multi-user wireless downlinks.
IEEE Transactions on Information Theory, vol. 53, no. 9, pp. 3095-3113, Sept. 2007.
DOI: 10.1109/TIT.2007.903141 8, 10, 67, 71
[52] M. J. Neely. Super-fast delay tradeoffs for utility optimal fair scheduling in wireless
networks. IEEE Journal on Selected Areas in Communications, Special Issue on Nonlin-
ear Optimization of Communication Systems, vol. 24, no. 8, pp. 1489-1501, Aug. 2006.
DOI: 10.1109/JSAC.2006.879357 8, 10, 67, 71
[53] M. J. Neely. Intelligent packet dropping for optimal energy-delay tradeoffs in wireless down-
links. IEEE Transactions on Automatic Control, vol. 54, no. 3, pp. 565-579, March 2009.
DOI: 10.1109/TAC.2009.2013652 8, 10, 67, 71
[54] S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali. Routing without routes: The
backpressure collection protocol. Proc. 9th ACM/IEEE Intl. Conf. on Information Processing
in Sensor Networks (IPSN), April 2010. DOI: 10.1145/1791212.1791246 8, 10, 71, 72, 113,
179
[55] L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari. LIFO-backpressure achieves near
optimal utility-delay tradeoff. Arxiv Technical Report, arXiv:1008.4895v1, August 2010. 8,
10, 72, 113, 179
[56] M. J. Neely. Stochastic optimization for Markov modulated networks with application to
delay constrained wireless scheduling. Proc. IEEE Conf. on Decision and Control (CDC),
Shanghai, China, Dec. 2009. DOI: 10.1109/CDC.2009.5400270 8, 9, 153, 171, 173
[57] M. J. Neely. Stochastic optimization for Markov modulated networks with application to delay
constrained wireless scheduling. ArXiv Technical Report, arXiv:0905.4757v1, May 2009. 8,
9, 153, 158, 171, 173, 174
186 BIBLIOGRAPHY
[58] C.-P. Li and M. J. Neely. Network utility maximization over partially observable markovian
channels. Arxiv Technical Report: arXiv:1008.3421v1, Aug. 2010. 8, 153
[59] F. J. Vázquez Abad and V. Krishnamurthy. Policy gradient stochastic approximation algo-
rithms for adaptive control of constrained time varying Markov decision processes. Proc.
IEEE Conf. on Decision and Control, Dec. 2003. DOI: 10.1109/CDC.2003.1273053 8, 174
[60] D. V. Djonin and V. Krishnamurthy. q-learning algorithms for constrained Markov de-
cision processes with randomized monotone policies: Application to mimo transmission
control. IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 2170-2181, May 2007.
DOI: 10.1109/TSP.2007.893228 8, 9, 174
[62] F. Fu and M. van der Schaar. A systematic framework for dynamically optimizing multi-user
video transmission. IEEE Journal on Selected Areas in Communications, vol. 28, no. 3, pp.
308-320, April 2010. DOI: 10.1109/JSAC.2010.100403 8, 9, 174
[63] F. Fu and M. van der Schaar. Decomposition principles and online learning in cross-layer
optimization for delay-sensitive applications. IEEE Trans. Signal Processing, vol. 58, no. 3,
pp. 1401-1415, March 2010. DOI: 10.1109/TSP.2009.2034938 8, 9, 174
[64] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols. 1 and 2. Athena Scientific,
Belmont, Mass, 1995. 8, 158, 168, 170
[65] E. Altman. Constrained Markov Decision Processes. Boca Raton, FL, Chapman and Hall/CRC
Press, 1999. 8
[66] S. Ross. Introduction to Probability Models. Academic Press, 8th edition, Dec. 2002. 8, 12, 27,
76
[68] W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John
Wiley & Sons, 2007. DOI: 10.1002/9780470182963 8, 174
[69] S. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2008. 8, 174
[70] D. Tse and S. Hanly. Multi-access fading channels: Part ii: Delay-limited capacities.
IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2816-2831, Nov. 1998.
DOI: 10.1109/18.737514 9, 135
BIBLIOGRAPHY 187
[71] R. Urgaonkar and M. J. Neely. Delay-limited cooperative communication with re-
liability constraints in wireless networks. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062187 9, 135, 168, 179
[72] A. Mekkittikul and N. McKeown. A starvation free algorithm for achieving 100% throughput
in an input-queued switch. Proc. ICCN, pp. 226-231, 1996. 9
[73] A. L. Stolyar and K. Ramanan. Largest weighted delay first scheduling: Large de-
viations and optimality. Annals of Applied Probability, vol. 11, no. 1, pp. 1-48, 2001.
DOI: 10.1214/aoap/998926986 9, 11
[75] S. Shakkottai and A. Stolyar. Scheduling for multiple flows sharing a time-varying channel:
The exponential rule. American Mathematical Society Translations, series 2, vol. 207, 2002. 9
[76] M. J. Neely. Delay-based network utility maximization. Proc. IEEE INFOCOM, March
2010. DOI: 10.1109/INFCOM.2010.5462097 9, 120, 122
[77] A. Fu, E. Modiano, and J. Tsitsiklis. Optimal energy allocation for delay-constrained data
transmission over a time-varying channel. Proc. IEEE INFOCOM, 2003. 9
[78] M. Zafer and E. Modiano. Optimal rate control for delay-constrained data transmission over
a wireless channel. IEEE Transactions on Information Theory, vol. 54, no. 9, pp. 4020-4039,
Sept. 2008. DOI: 10.1109/TIT.2008.928249 9
[79] M. Zafer and E. Modiano. Minimum energy transmission over a wireless channel with
deadline and power constraints. IEEE Transactions on Automatic Control, vol. 54, no. 12, pp.
2841-2852, December 2009. DOI: 10.1109/TAC.2009.2034202 9
[80] M. Goyal, A. Kumar, and V. Sharma. Power constrained and delay optimal policies
for scheduling transmission over a fading channel. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208683 9
[84] M. Zafer and E. Modiano. A calculus approach to energy-efficient data transmission with
quality-of-service constraints. IEEE/ACM Transactions on Networking, vol. 17, no. 13, pp.
898-911, June 2009. DOI: 10.1109/TNET.2009.2020831 9
[85] W. Chen, M. J. Neely, and U. Mitra. Energy-efficient transmissions with individual packet
delay constraints. IEEE Transactions on Information Theory, vol. 54, no. 5, pp. 2090-2109,
May 2008. DOI: 10.1109/TIT.2008.920344 9
[86] W. Chen, U. Mitra, and M. J. Neely. Energy-efficient scheduling with individual packet delay
constraints over a fading channel. Wireless Networks, vol. 15, no. 5, pp. 601-618, July 2009.
DOI: 10.1007/s11276-007-0093-y 9
[88] B. Hajek. Optimal control of two interacting service stations. IEEE Transactions on Automatic
Control, vol. 29, no. 6, pp. 491-499, June 1984. DOI: 10.1109/TAC.1984.1103577 9
[89] S. Sarkar. Optimum scheduling and memory management in input queued switches with finite
buffer space. Proc. IEEE INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208973
9
[90] A. Tarello, J. Sun, M. Zafer, and E. Modiano. Minimum energy transmission scheduling
subject to deadline constraints. ACM Wireless Networks, vol. 14, no. 5, pp. 633-645, 2008.
DOI: 10.1007/s11276-006-0005-6 9
[91] B. Sadiq, S. Baek, and Gustavo de Veciana. Delay-optimal opportunistic scheduling and
approximations: the log rule. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062088 9
[92] B. Sadiq and G. de Veciana. Optimality and large deviations of queues under the pseudo-log
rule opportunistic scheduling. 46th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, IL, Sept. 2008. DOI: 10.1109/ALLERTON.2008.4797636 9, 11
[93] A. L. Stolyar. Large deviations of queues sharing a randomly time-varying server. Queueing
Systems Theory and Applications, vol. 59, no. 1, pp. 1-35, 2008.
DOI: 10.1007/s11134-008-9072-y 9, 11
BIBLIOGRAPHY 189
[94] A. Ganti, E. Modiano, and J. N. Tsitsiklis. Optimal transmission scheduling in symmetric
communication models with intermittent connectivity. IEEE Transactions on Information
Theory, vol. 53, no. 3, pp. 998-1008, March 2007. DOI: 10.1109/TIT.2006.890695 10
[95] E. M. Yeh and A. S. Cohen. Delay optimal rate allocation in multiaccess fading communi-
cations. Proc. Allerton Conference on Communication, Control, and Computing, Monticello, IL,
2004. 10
[96] E. M. Yeh. Multiaccess and Fading in Communication Networks. PhD thesis, Massachusetts
Institute of Technology, Laboratory for Information and Decision Systems (LIDS), 2001. 10
[98] A. Ephremides, P. Varaiya, and J. Walrand. A simple dynamic routing problem. IEEE
Transactions on Automatic Control, vol. AC-25, no.4, pp. 690-693, Aug. 1980. 10
[99] M. J. Neely, E. Modiano, and Y.-S. Cheng. Logarithmic delay for n × n packet switches
under the crossbar constraint. IEEE Transactions on Networking, vol. 15, no. 3, pp. 657-668,
June 2007. DOI: 10.1109/TNET.2007.893876 10, 11, 37
[100] M. J. Neely. Order optimal delay for opportunistic scheduling in multi-user wireless uplinks
and downlinks. IEEE/ACM Transactions on Networking, vol. 16, no. 5, pp. 1188-1199, October
2008. DOI: 10.1109/TNET.2007.909682 10, 24, 37
[101] M. J. Neely. Delay analysis for max weight opportunistic scheduling in wireless sys-
tems. IEEE Transactions on Automatic Control, vol. 54, no. 9, pp. 2137-2150, Sept. 2009.
DOI: 10.1109/TAC.2009.2026943 10, 11, 24, 37
[102] S. Deb, D. Shah, and S. Shakkottai. Fast matching algorithms for repetitive optimization: An
application to switch scheduling. Proc. of 40th Annual Conference on Information Sciences and
Systems (CISS), Princeton, NJ, March 2006. DOI: 10.1109/CISS.2006.286659 10, 37, 147
[103] M. J. Neely. Delay analysis for maximal scheduling with flow control in wireless networks
with bursty traffic. IEEE Transactions on Networking, vol. 17, no. 4, pp. 1146-1159, August
2009. DOI: 10.1109/TNET.2008.2008232 10, 11, 37, 147
[104] X. Wu, R. Srikant, and J. R. Perkins. Scheduling efficiency of distributed greedy scheduling
algorithms in wireless networks. IEEE Transactions on Mobile Computing, vol. 6, no. 6, pp.
595-605, June 2007. DOI: 10.1109/TMC.2007.1061 11, 37, 147
[105] J. G. Dai and B. Prabhakar. The throughput of data switches with and without speedup. Proc.
IEEE INFOCOM, 2000. DOI: 10.1109/INFCOM.2000.832229 11, 37
190 BIBLIOGRAPHY
[106] J. M. Harrison and J. A. Van Mieghem. Dynamic control of brownian networks: State space
collapse and equivalent workload formulations. The Annals of Applied Probability, vol. 7(3),
pp. 747-771, Aug. 1997. DOI: 10.1214/aoap/1034801252 11
[107] S. Shakkottai, R. Srikant, and A. Stolyar. Pathwise optimality of the exponential scheduling
rule for wireless channels. Advances in Applied Probability, vol. 36, no. 4, pp. 1021-1045, Dec.
2004. DOI: 10.1239/aap/1103662957 11
[108] A. L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and
workload minimization in heavy traffic. Annals of Applied Probability, pp. 1-53, 2004.
DOI: 10.1214/aoap/1075828046 11
[109] D. Shah and D. Wischik. Optimal scheduling algorithms for input-queued switches. Proc.
IEEE INFOCOM, 2006. DOI: 10.1109/INFOCOM.2006.238 11
[110] I. Keslassy and N. McKeown. Analysis of scheduling algorithms that provide 100% through-
put in input-queued switches. Proc. 39th Annual Allerton Conf. on Communication, Control,
and Computing, Oct. 2001. 11
[111] T. Ji, E. Athanasopoulou, and R. Srikant. Optimal scheduling policies in small generalized
switches. Proc. IEEE INFOCOM, Rio De Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5062259 11
[112] V. J. Venkataramanan and X. Lin. Structural properties of ldp for queue-length based wireless
scheduling algorithms. Proc. of 45th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, Illinois, September 2007. 11
[113] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Large deviations analysis of the
generalized processor sharing policy. Queueing Systems, vol. 32, pp. 319-349, 1999.
DOI: 10.1023/A:1019151423773 11
[114] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Asymptotic buffer overflow probabilities
in multiclass multiplexers: An optimal control approach. IEEE Transactions on Automatic
Control, vol. 43, no. 3, pp. 315-335, March 1998. DOI: 10.1109/9.661587 11
[115] S. Bodas, S. Shakkottai, L. Ying, and R. Srikant. Scheduling in multi-channel wireless
networks: Rate function optimality in the small-buffer regime. Proc. ACM SIGMET-
RICS/Performance Conference, June 2009. DOI: 10.1145/1555349.1555364 11
[116] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEETransactions on Information
Theory, vol. 46, no. 2, pp. 388-404, March 2000. DOI: 10.1109/18.825799 11
[117] M. Grossglauser and D. Tse. Mobility increases the capacity of ad-hoc wireless net-
works. IEEE/ACM Trans. on Networking, vol. 10, no. 4, pp. 477-486, August 2002.
DOI: 10.1109/TNET.2002.801403 11
BIBLIOGRAPHY 191
[118] M. J. Neely and E. Modiano. Capacity and delay tradeoffs for ad-hoc mobile net-
works. IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 1917-1937, June 2005.
DOI: 10.1109/TIT.2005.847717 11
[119] S. Toumpis and A. J. Goldsmith. Large wireless networks under fading, mobility, and delay
constraints. Proc. IEEE INFOCOM, 2004. DOI: 10.1109/INFCOM.2004.1354532 12
[120] X. Lin and N. B. Shroff. Towards achieving the maximum capacity in large mobile wireless
networks. Journal of Communications and Networks, Special Issue on Mobile Ad Hoc Wireless
Networks, vol. 6, no. 4, December 2004. 12
[121] X. Lin and N. B. Shroff. The fundamental capacity-delay tradeoff in large mobile ad hoc
networks. Purdue University Tech. Report, 2004. 12
[123] G. Sharma, R. Mazumdar, and N. Shroff. Delay and capacity trade-offs in mobile ad-hoc
networks: A global perspective. Proc. IEEE INFOCOM, April 2006.
DOI: 10.1109/INFOCOM.2006.144 12
[125] N. Bansal and Z. Liu. Capacity, delay and mobility in wireless ad-hoc networks. Proc. IEEE
INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208990 12
[126] L. Ying, S. Yang, and R. Srikant. Optimal delay-throughput tradeoffs in mobile ad hoc
networks. IEEE Transactions on Information Theory, vol. 54, no. 9, pp. 4119-4143, Sept. 2008.
DOI: 10.1109/TIT.2008.928247 12
[127] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughput-delay trade-off in
mobile wireless networks. Proceedings of the International Symposium on Information Theory,
Seoul, Korea, June 2009. 12
[128] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughput-delay trade-
off in mobile wireless networks. IEEE Transactions on Information Theory, to appear.
DOI: 10.1109/ISIT.2009.5205277 12
[129] D. P. Bertsekas and R. Gallager. Data Networks. New Jersey: Prentice-Hall, Inc., 1992. 12,
19, 25, 27, 37, 48, 109, 128, 144, 172
192 BIBLIOGRAPHY
[130] R. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, Boston, 1996. 12, 27,
50, 74, 76
[131] F. P. Kelly. Reversibility and Stochastic Networks. Wiley, Chichester, 1979. 12, 27, 144
[132] S. Ross. Stochastic Processes. John Wiley & Sons, Inc., New York, 1996. 12, 74
[133] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and Optimization. Boston:
Athena Scientific, 2003. 12, 67
[134] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 12,
67
[136] M. J. Neely. Stability and capacity regions for discrete time queueing networks. ArXiv
Technical Report: arXiv:1003.3396v1, March 2010. 18, 19, 56, 102
[137] R. Urgaonkar and M. J. Neely. Opportunistic scheduling with reliability guarantees in cogni-
tive radio networks. IEEE Transactions on Mobile Computing, vol. 8, no. 6, pp. 766-777, June
2009. DOI: 10.1109/TMC.2009.38 28, 145, 147
[138] M. J. Neely. Queue stability and probability 1 convergence via lyapunov optimization. Arxiv
Technical Report, arXiv:1008.3519, August 2010. 50, 51
[139] O. Kallenberg. Foundations of Modern Probability, 2nd ed., Probability and its Applications.
Springer-Verlag, 2002. 50
[141] Y. V. Borovskikh and V. S. Korolyuk. Martingale Approximation. VSP BV, The Netherlands,
1997. 50
[142] M. J. Neely and R. Urgaonkar. Opportunism, backpressure, and stochastic optimization with
the wireless broadcast advantage. Asilomar Conference on Signals, Systems, and Computers,
Pacific Grove, CA, Oct. 2008. DOI: 10.1109/ACSSC.2008.5074815 70, 71, 179
[143] M. J. Neely and A. Sharma. Dynamic data compression with distortion constraints for wireless
transmission over a fading channel. arXiv:0807.3768v1, July 24, 2008. 70, 71, 84, 89, 179
[144] L. Huang and M. J. Neely. Max-weight achieves the exact [O(1/V ), O(V )] utility-delay
tradeoff under Markov dynamics. Arxiv Technical Report, arXiv:1008.0200, August 2010. 74,
77
BIBLIOGRAPHY 193
[145] P. Billingsley. Probability Theory and Measure, 2nd edition. New York: John Wiley & Sons,
1986. 76, 92
[146] M. J. Neely. Distributed and secure computation of convex programs over a network of
connected processors. DCDIS Conf., Guelph, Ontario, July 2005. 81
[147] L. Tassiulas and A. Ephremides. Throughput properties of a queueing network with dis-
tributed dynamic routing and flow control. Advances in Applied Probability, vol. 28, pp.
285-307, 1996. DOI: 10.2307/1427922 86
[148] Y. Wu, P. A. Chou, and S-Y Kung. Information exchange in wireless networks with network
coding and physical-layer broadcast. Conference on Information Sciences and Systems, Johns
Hopkins University, March 2005. 87
[149] E. Leonardi, M. Mellia, M. A. Marsan, and F. Neri. Optimal scheduling and routing for
maximizing network throughput. IEEE/ACM Transactions on Networking, vol. 15, no. 6,
Dec. 2007. DOI: 10.1109/TNET.2007.896486 104, 107
[150] Y. Li, A. Papachristodoulou, and M. Chiang. Stability of congestion control schemes with
delay sensitive traffic. Proc. IEEE ACC, Seattle, WA, June 2008.
DOI: 10.1109/ACC.2008.4586779 104, 108, 109
[151] J. K. MacKie-Mason and H. R. Varian. Pricing congestible network resources. IEEE Journal
on Selected Areas in Communications, vol. 13, no. 7, September 1995. DOI: 10.1109/49.414634
109
[152] M. J. Neely and E. Modiano. Convexity in queues with general inputs. IEEE Transactions on
Information Theory, vol. 51, no. 2, pp. 706-714, Feb. 2005. DOI: 10.1109/TIT.2004.840859
109
[153] M. J. Neely. Optimal pricing in a free market wireless network. Wireless Networks, vol. 15, no.
7, pp. 901-915, October 2009. DOI: 10.1007/s11276-007-0083-0 112, 179
[154] M. J. Neely and R. Urgaonkar. Optimal backpressure routing in wireless networks with
multi-receiver diversity. Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862-881, July 2009.
DOI: 10.1016/j.adhoc.2008.07.009 113, 132, 145, 147, 179
[156] J.W. Lee, R. R. Mazumdar, and N. B. Shroff. Non-convex optimization and rate control
for multi-class services in the internet. IEEE/ACM Trans. on Networking, vol. 13, no. 4, pp.
827-840, Aug. 2005. DOI: 10.1109/TNET.2005.852876 116
194 BIBLIOGRAPHY
[157] M. Chiang. Nonconvex optimization of communication systems. Advances in Mechanics and
Mathematics, Special volume on Strang’s 70th Birthday, Springer, vol. 3, 2008. 116
[158] W.-H. Wang, M. Palaniswami, and S. H. Low. Application-oriented flow control: Funda-
mentals, algorithms, and fairness. IEEE/ACM Transactions on Networking, vol. 14, no. 6, Dec.
2006. DOI: 10.1109/TNET.2006.886318 116
[159] M. J. Neely, A. S. Tehrani, and A. G. Dimakis. Efficient algorithms for renewable energy
allocation to delay tolerant consumers. 1st IEEE International Conference on Smart Grid
Communications, 2010. 120, 122, 179
[160] L. Tassiulas and S. Sarkar. Maxmin fair scheduling in wireless ad hoc networks. IEEE
Journal on Selected Areas in Communications, Special Issue on Ad Hoc Networks, vol. 23, no. 1,
pp. 163-173, Jan. 2005. 128
[161] H. Shirani-Mehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with non-perfect
channel state knowledge. IEEE Transactions on Communications, vol. 58, no. 7, pp. 2055-2066,
July 2010. DOI: 10.1109/TCOMM.2010.07.090377 129, 132
[162] M. Kobayashi, G. Caire, and D. Gesbert. Impact of multiple transmit antennas in a queued
SDMA/TDMA downlink. In Proc. of 6th IEEE Workshop on Signal Processing Advances in
Wireless Communications (SPAWC), June 2005. DOI: 10.1109/SPAWC.2005.1506198 132,
179
[163] C. Li and M. J. Neely. Energy-optimal scheduling with dynamic channel acquisition in
wireless downlinks. IEEE Transactions on Mobile Computing, vol. 9, no. 4, pp. 527-539, April
2010. DOI: 10.1109/TMC.2009.140 132
[164] A. Gopalan, C. Caramanis, and S. Shakkottai. On wireless scheduling with partial channel-
state information. Allerton Conf. on Comm., Control, and Computing, Sept. 2007. 132
[165] M. J. Neely. Dynamic data compression for wireless transmission over a fading channel. Proc.
Conference on Information Sciences and Systems (CISS), invited paper, Princeton, March 2008.
DOI: 10.1109/CISS.2008.4558703 132, 179
[166] M. J. Neely. Max weight learning algorithms with application to scheduling in unknown
environments. arXiv:0902.0630v1, Feb. 2009. 132, 162
[167] D. Shah and M. Kopikare. Delay bounds for approximate maximum weight match-
ing algorithms for input queued switches. Proc. IEEE INFOCOM, June 2002.
DOI: 10.1109/INFCOM.2002.1019350 140
[168] M. J. Neely, E. Modiano, and C. E. Rohrs. Tradeoffs in delay guarantees and computation
complexity for n × n packet switches. Proc. of Conf. on Information Sciences and Systems (CISS),
Princeton, March 2002. 140, 141
BIBLIOGRAPHY 195
[169] L.Tassiulas. Linear complexity algorithms for maximum throughput in radio networks and in-
put queued switches. Proc. IEEE INFOCOM, 1998. DOI: 10.1109/INFCOM.1998.665071
140, 141
[171] D. Shah, D. N. C. Tse, and J. N. Tsitsiklis. Hardness of low delay network scheduling. under
submission. 141
[172] L. Jiang and J.Walrand. A distributed csma algorithm for throughput and utility maximization
in wireless networks. Proc. Allerton Conf. on Communication, Control, and Computing, Sept.
2008. DOI: 10.1109/ALLERTON.2008.4797741 141, 142, 144
[173] S. Rajagopalan and D. Shah. Reversible networks, distributed optimization, and network
scheduling: What do they have in common? Proc. Conf. on Information Sciences and Systems
(CISS), 2008. 141, 144
[174] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: John Wiley &
Sons, Inc., 1991. DOI: 10.1002/0471200611 143
[175] L. Jiang and J. Walrand. Scheduling and congestion control for wireless and processing
networks. Synthesis Lectures on Communication Networks, vol. 3, no. 1, pp. 1-156, 2010.
DOI: 10.2200/S00270ED1V01Y201008CNT006 144, 179
[176] L. Jiang and J.Walrand. Convergence and stability of a distributed csma algorithm for maximal
network throughput. Proc. IEEE Conference on Decision and Control (CDC), Shanghai, China,
December 2009. DOI: 10.1109/CDC.2009.5400349 144
[177] J. Ni, B. Tan, and R. Srikant. Q-csma: Queue length based csma/ca algorithms for achiev-
ing maximum throughput and low delay in wireless networks. ArXive Technical Report:
arXiv:0901.2333v4, Dec. 2009. 144
[178] G. Louth, M. Mitzenmacher, and F. Kelly. Computational complexity of loss networks. The-
oretical Computer Science, vol. 125, pp. 45-59, 1994. DOI: 10.1016/0304-3975(94)90216-X
144
[179] J. Ni and S. Tatikonda. A factor graph modelling of product-form loss and queueing net-
works. 43rd Allerton Conference on Communication, Control, and Computing (Monticello, IL),
September 2005. 144
[180] M. Luby and E. Vigoda. Fast convergence of the glauber dynamics for sampling independent
sets: Part i. International Computer Science Institute, Berkeley, CA, Technical Report TR-99-002,
196 BIBLIOGRAPHY
Jan. 1999.
DOI: 10.1002/(SICI)1098-2418(199910/12)15:3/4%3C229::AID-RSA3%3E3.0.CO;2-X
144
[181] D. Randall and P. Tetali. Analyzing glauber dynamics by comparison of Markov chains.
Lecture Notes in Computer Science, Proc. of the 3rd Latin American Symposium on Theoretical
Informatics, vol. 1380:pp. 292–304, 1998. DOI: 10.1063/1.533199 144
[182] L. Bui, E. Eryilmaz, R. Srikant, and X. Wu. Joint asynchronous congestion control and
distributed scheduling for multi-hop wireless networks. Proc. IEEE INFOCOM, 2006.
DOI: 10.1109/INFOCOM.2006.210 145
[183] D. Shah. Maximal matching scheduling is good enough. Proc. IEEE Globecom, Dec. 2003.
DOI: 10.1109/GLOCOM.2003.1258788 147
[184] P. Chaporkar, K. Kar, X. Luo, and S. Sarkar. Throughput and fairness guarantees through
maximal scheduling in wireless networks. IEEE Trans. on Information Theory, vol. 54, no. 2,
pp. 572-594, Feb. 2008. DOI: 10.1109/TIT.2007.913537 147
[185] X. Lin and N. B. Shroff. The impact of imperfect scheduling on cross-layer rate control in
wireless networks. Proc. IEEE INFOCOM, 2005. DOI: 10.1109/INFCOM.2005.1498460
147
[186] L. Lin, X. Lin, and N. B. Shroff. Low-complexity and distributed energy minimization in
multi-hop wireless networks. Proc. IEEE INFOCOM, 2007.
DOI: 10.1109/TNET.2009.2032419 147
[187] C. C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic program-
ming for queuing networks. Submitted for publication, 2008. 174
[188] T. Ho, M. Médard, J. Shi, M. Effros, and D. R. Karger. On randomized network coding.
Proc. of 41st Annual Allerton Conf. on Communication, Control, and Computing, Oct. 2003. 179
[189] A. Eryilmaz and D. S. Lun. Control for inter-session network coding. Proc. Information
Theory and Applications Workshop (ITA), Jan./Feb. 2007. 179
[190] X. Yan, M. J. Neely, and Z. Zhang. Multicasting in time varying wireless networks: Cross-
layer dynamic resource allocation. Proc. IEEE International Symposium on Information Theory
(ISIT), June 2007. DOI: 10.1109/ISIT.2007.4557630 179
[193] H. Shirani-Mehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with non-
perfect channel state knowledge. IEEE Transactions on Communications, to appear.
DOI: 10.1109/TCOMM.2010.07.090377 179
[194] E. M. Yeh and R. A. Berry. Throughput optimal control of cooperative relay networks. IEEE
Transactions on Information Theory: Special Issue on Models, Theory, and Codes for Relaying
and Cooperation in Communication Networks, vol. 53, no. 10, pp. 3827-3833, October 2007.
DOI: 10.1109/TIT.2007.904978 179
[195] L. Huang and M. J. Neely. The optimality of two prices: Maximizing revenue in a stochastic
communication system. IEEE/ACM Transactions on Networking, vol. 18, no. 2, pp. 406-419,
April 2010. DOI: 10.1109/TNET.2009.2028423 179
[196] L. Jiang and J. Walrand. Stable and utility-maximizing scheduling for stochastic pro-
cessing networks. Allerton Conference on Communication, Control, and Computing, 2009.
DOI: 10.1109/ALLERTON.2009.5394870 179
[197] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
profit. Proc. IEEE Conf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 179
[198] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
profit. ArXiv Technical Report, arXiv:1004.0479v1, April 2010. 179
[199] A. Warrier, S. Ha, P. Wason, I. Rhee, and J. H. Kim. Diffq: Differential backlog congestion
control for wireless multi-hop networks. Conference on Sensor, Mesh and Ad Hoc Communi-
cations and Networks (SECON), San Francisco, US, 2008. DOI: 10.1109/SAHCN.2008.78
179
[200] A. Warrier, S. Janakiraman, S. Ha, and I. Rhee. Diffq: Practical differential backlog con-
gestion control for wireless networks. Proc. IEEE INFOCOM, Rio de Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5061929 179
[201] A. Sridharan, S. Moeller, and B. Krishnamachari. Making distributed rate control us-
ing lyapunov drifts a reality in wireless sensor networks. 6th Intl. Symposium on Mod-
eling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), April 2008.
DOI: 10.4108/ICST.WIOPT2008.3205 179
[202] U. Akyol, M. Andrews, P. Gupta, J. Hobby, I. Saniee, and A. Stolyar. Joint schedul-
ing and congestion control in mobile ad-hoc networks. Proc. IEEE INFOCOM, 2008.
DOI: 10.1109/INFOCOM.2008.111 179
198 BIBLIOGRAPHY
[203] B. Radunović, C. Gkantsidis, D. Gunawardena, and P. Key. Horizon: Balancing
tcp over multiple paths in wireless mesh network. Proc. ACM Mobicom, 2008.
DOI: 10.1145/1409944.1409973 179
199
Author’s Biography
MICHAEL J. NEELY
Michael J. Neely received B.S. degrees in both Electrical Engineering and Mathematics from the
University of Maryland, College Park, in 1997. He then received a 3 year Department of Defense
NDSEG Fellowship for graduate study at the Massachusetts Institute of Technology, where he
completed the M.S. degree in 1999 and the Ph.D. in 2003, both in Electrical Engineering. He
joined the faculty of Electrical Engineering at the University of Southern California in 2004, where
he is currently an Associate Professor. His research interests are in the areas of stochastic network
optimization and queueing theory, with applications to wireless networks, mobile ad-hoc networks,
and switching systems. Michael received the NSF Career award in 2008 and the Viterbi School of
Engineering Junior Research Award in 2009. He is a member of Tau Beta Pi and Phi Beta Kappa.