0% found this document useful (0 votes)

24 views211 pages

Neely 2010

Uploaded by

Sayantan Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views211 pages

Neely 2010

Uploaded by

Sayantan Bose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 211

Stochastic Network Optimization

with Application to
Communication and
Queueing Systems
Copyright © 2010 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.

Stochastic Network Optimization with Application to Communication and Queueing Systems

Michael J. Neely
www.morganclaypool.com

ISBN: 9781608454556 paperback

ISBN: 9781608454563 ebook

DOI 10.2200/S00271ED1V01Y201006CNT007

A Publication in the Morgan & Claypool Publishers series

SYNTHESIS LECTURES ON COMMUNICATION NETWORKS

Lecture #7
Series Editor: Jean Walrand, University of California, Berkeley
Series ISSN
Synthesis Lectures on Communication Networks
Print 1935-4185 Electronic 1935-4193

This material is supported in part by one or more of the following: the DARPA IT-MANET program grant
W911NF-07-0028, the NSF Career grant CCF-0747525, and continuing through participation in the Network
Science Collaborative Technology Alliance sponsored by the U.S. Army Research Laboratory.
Synthesis Lectures on
Communication Networks
Editor
Jean Walrand, University of California, Berkeley
Synthesis Lectures on Communication Networks is an ongoing series of 50- to 100-page publications
on topics on the design, implementation, and management of communication networks. Each lecture is
a self-contained presentation of one topic by a leading expert. The topics range from algorithms to
hardware implementations and cover a broad spectrum of issues from security to multiple-access
protocols. The series addresses technologies from sensor networks to reconfigurable optical networks.
The series is designed to:

• Provide the best available presentations of important aspects of communication networks.

• Help engineers and advanced students keep up with recent developments in a rapidly evolving
technology.

• Facilitate the development of courses in this field.

Stochastic Network Optimization with Application to Communication and Queueing

Systems
Michael J. Neely
2010

Scheduling and Congestion Control for Wireless and Processing Networks

Libin Jiang and Jean Walrand
2010

Performance Modeling of Communication Networks with Markov Chains

Jeonghoon Mo
2010

Communication Networks: A Concise Introduction

Jean Walrand and Shyam Parekh
2010

Path Problems in Networks

John S. Baras and George Theodorakopoulos
2010
iv

Performance Modeling, Loss Networks, and Statistical Multiplexing

Ravi R. Mazumdar
2009

Network Simulation
Richard M. Fujimoto, Kalyan S. Perumalla, and George F. Riley
2006
Stochastic Network Optimization
with Application to
Communication and
Queueing Systems

Michael J. Neely
University of Southern California

SYNTHESIS LECTURES ON COMMUNICATION NETWORKS #7

M
&C Morgan & cLaypool publishers
ABSTRACT
This text presents a modern theory of analysis, control, and optimization for dynamic networks.
Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown
to enable constrained optimization of time averages in general stochastic systems. The focus is on
communication and queueing systems, including wireless networks with time-varying channels,
mobility, and randomly arriving traffic. A simple drift-plus-penalty framework is used to optimize
time averages such as throughput, throughput-utility, power, and distortion. Explicit performance-
delay tradeoffs are provided to illustrate the cost of approaching optimality. This theory is also
applicable to problems in operations research and economics, where energy-efficient and profit-
maximizing decisions must be made without knowing the future.
Topics in the text include the following:

• Queue stability theory

• Backpressure, max-weight, and virtual queue methods

• Primal-dual methods for non-convex stochastic utility maximization

• Universal scheduling theory for arbitrary sample paths

• Approximate and randomized scheduling theory

• Optimization of renewal systems and Markov decision systems

Detailed examples and numerous problem set questions are provided to reinforce the main
concepts.

KEYWORDS
dynamic scheduling, decision theory, wireless networks, Lyapunov optimization, con-
gestion control, fairness, network utility maximization, multi-hop, mobile networks,
routing, backpressure, max-weight, virtual queues
vii

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Example Opportunistic Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Example Problem 1: Minimizing Time Average Power Subject to
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Example Problem 2: Maximizing Throughput Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Example Problem 3: Maximizing Throughput-Utility Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 General Stochastic Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Differences from our Earlier Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 On General Markov Decision Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 On Network Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.1 Delay and Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
√
1.7.2 Optimal O( V ) and O(log(V )) delay tradeoffs . . . . . . . . . . . . . . . . . . . . . . 9
1.7.3 Delay-optimal Algorithms for Symmetric Networks . . . . . . . . . . . . . . . . . . 10
1.7.4 Order-optimal Delay Scheduling and Queue Grouping . . . . . . . . . . . . . . . 10
1.7.5 Heavy Traffic and Decay Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7.6 Capacity and Delay Tradeoffs for Mobile Networks . . . . . . . . . . . . . . . . . . 11
1.8 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Introduction to Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Stronger Forms of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Randomized Scheduling for Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 A 3-Queue, 2-Server Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 A 2-Queue Opportunistic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
viii

3 Dynamic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 Scheduling for Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 The S-only Algorithm and max . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Lyapunov Drift for Stable Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 The “Min-Drift” or “Max-Weight” Algorithm . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.4 Iterated Expectations and Telescoping Sums . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.5 Simulation of the Max-Weight Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Stability and Average Power Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Drift-Plus-Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Analysis of the Drift-Plus-Penalty Algorithm . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Optimizing the Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4 Simulations of the Drift-Plus-Penalty Algorithm . . . . . . . . . . . . . . . . . . . . 42
3.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Optimizing Time Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Lyapunov Drift Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Lyapunov Optimization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.3 Probability 1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 General System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1 Boundedness Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Optimality via ω-only Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Virtual Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 The Min Drift-Plus-Penalty Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.1 Where are we Using the i.i.d. Assumptions? . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.1 Dynamic Server Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.2 Opportunistic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 Variable V Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Place-Holder Backlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.9 Non-i.i.d. Models and Universal Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.9.1 Markov Modulated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.9.2 Non-Ergodic Models and Arbitrary Sample Paths . . . . . . . . . . . . . . . . . . . . 77
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.11 Appendix 4.A — Proving Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.11.1 The Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.11.2 Characterizing Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
ix

5 Optimizing Functions of Time Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.0.3 The Rectangle Constraint R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.0.4 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.0.5 Auxiliary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1 Solving the Transformed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 A Flow-Based Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 Performance of the Flow-Based Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.2 Delayed Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.3 Limitations of this Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 Multi-Hop Queueing Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Transmission Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3.2 The Utility Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.3 Multi-Hop Network Utility Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.4 Backpressure-Based Routing and Resource Allocation . . . . . . . . . . . . . . . 113
5.4 General Optimization of Convex Functions of Time Averages . . . . . . . . . . . . . . 114
5.5 Non-Convex Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6 Worst Case Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.6.1 The -persistent service queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6.2 The Drift-Plus-Penalty for Worst-Case Delay . . . . . . . . . . . . . . . . . . . . . . 123
5.6.3 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.7 Alternative Fairness Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6 Approximate Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.1 Time-Invariant Interference Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.1.1 Computing over Multiple Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.1.2 Randomized Searching for the Max-Weight Solution . . . . . . . . . . . . . . . . 140
6.1.3 The Jiang-Walrand Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2 Multiplicative Factor Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Optimization of Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.1 The Renewal System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.1.1 The Optimization Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.2 Optimality over i.i.d. algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 Drift-Plus-Penalty for Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2.1 Alternate Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
x 0. CONTENTS
7.3 Minimizing the Drift-Plus-Penalty Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3.1 The Bisection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.2 Optimization over Pure Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3.3 Caveat — Frames with Initial Information . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4 Task Processing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Utility Optimization for Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.5.1 The Utility Optimal Algorithm for Renewal Systems . . . . . . . . . . . . . . . . 167
7.6 Dynamic Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.6.1 Delay-Limited Transmission Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.6.2 Markov Decision Problem for Minimum Delay Scheduling . . . . . . . . . . . 171
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Preface
This text is written to teach the theory of Lyapunov drift and Lyapunov optimization for stochastic
network optimization. It assumes only that the reader is familiar with basic probability concepts
(such as expectations and the law of large numbers). Familiarity with Markov chains and with stan-
dard (non-stochastic) optimization is useful but not required. A variety of examples and simulation
results are given to illustrate the main concepts. Diverse problem set questions (several with ex-
ample solutions) are also given. These questions and examples were developed over several years
for use in the stochastic network optimization course taught by the author. They include topics
of wireless opportunistic scheduling, multi-hop routing, network coding for maximum throughput,
distortion-aware data compression, energy-constrained and delay-constrained queueing, dynamic
decision making for maximum profit, and more.
The Lyapunov theory for optimizing network time averages was described collectively in our
previous text (22). The current text is significantly different from (22). It has been reorganized with
many more examples to help the reader. This is done while still keeping all of the details for a
complete and self-contained exposition of the material. This text also provides many recent topics
not covered in (22), including:

• A more detailed development of queue stability theory (Chapter 2).

• Variable-V algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).

• Place-holder bits for delay improvement (Sections 3.2.4 and 4.8).

• Universal scheduling for non-ergodic sample paths (Section 4.9).

• Worst case delay bounds (Sections 5.6 and 7.6.1).

• Non-convex stochastic optimization (Section 5.5).

• Approximate scheduling and full throughput scheduling in interference networks via the Jiang-
Walrand theorem (Chapter 6).

• Optimization of renewal systems and Markov decision examples (Chapter 7).

• Treatment of problems with equality constraints and abstract set constraints (Section 5.4).
xii PREFACE
Finally, this text emphasizes the simplicity of the Lyapunov method, showing how all of the
results follow directly from four simple concepts: (i) telescoping sums, (ii) iterated expectations,
(iii) opportunistically minimizing an expectation, and (iv) Jensen’s inequality.

Michael J. Neely
September 2010
1

CHAPTER 1

Introduction
This text considers the analysis and control of stochastic networks, that is, networks with random
events, time variation, and uncertainty. Our focus is on communication and queueing systems.
Example applications include wireless mesh networks with opportunistic scheduling, cognitive radio
networks, ad-hoc mobile networks, internets with peer-to-peer communication, and sensor networks
with joint compression and transmission. The techniques are also applicable to stochastic systems
that arise in operations research, economics, transportation, and smart-grid energy distribution.
These problems can be formulated as problems that optimize the time averages of certain quantities
subject to time average constraints on other quantities, and they can be solved with a common
mathematical framework that is intimately connected to queueing theory.

1.1 EXAMPLE OPPORTUNISTIC SCHEDULING PROBLEM

S1(t)
a1(t) Q1(t) b1(t)=b1(S(t),p(t))

Receiver
a2(t) Q2(t) b2(t)=b2(S(t),p(t))
S2(t)

Figure 1.1: The 2-user wireless system for the example of Section 1.1.

Here we provide a simple wireless example to illustrate how the theory for optimizing time
averages can be used. Consider a 2-user wireless uplink that operates in slotted time t ∈ {0, 1, 2, . . .}.
Every slot new data randomly arrives to each user for transmission to a common receiver. Let
(a1 (t), a2 (t)) be the vector of new arrivals on slot t, in units of bits. The data is stored in queues
Q1 (t) and Q2 (t) to await transmission (see Fig. 1.1). We assume the receiver coordinates network
decisions every slot.
Channel conditions are assumed to be constant for the duration of a slot, but they can change
from slot to slot. Let S (t) = (S1 (t), S2 (t)) denote the channel conditions between users and the
receiver on slot t.The channel conditions represent any information that affects the channel on slot t,
such as fading coefficients and/or noise ratios. We assume the network controller can observe S (t) at
the beginning of each slot t before making a transmission decision. This channel-aware scheduling
is called opportunistic scheduling. Every slot t, the network controller observes the current S (t)
2 1. INTRODUCTION
and chooses a power allocation vector p(t) = (p1 (t), p2 (t)) within some set P of possible power
allocations. This decision, together with the current S (t), determines the transmission rate vector
(b1 (t), b2 (t)) for slot t, where bk (t) represents the transmission rate (in bits/slot) from user k ∈ {1, 2}
to the receiver on slot t. Specifically, we have general transmission rate functions b̂k (p(t), S (t)):

b1 (t) = b̂1 (p(t), S (t)) , b2 (t) = b̂2 (p(t), S (t))

The precise form of these functions depends on the modulation and coding strategies used for
transmission. The queueing dynamics are then:

Qk (t + 1) = max[Qk (t) − b̂k (p(t), S (t)), 0] + ak (t) ∀k ∈ {1, 2}, ∀t ∈ {0, 1, 2, . . .}

Several types of optimization problems can be considered for this simple system.

1.1.1 EXAMPLE PROBLEM 1: MINIMIZING TIME AVERAGE POWER

SUBJECT TO STABILITY
Let pk be the time average power expenditure of user k under a particular power allocation algorithm
(for k ∈ {1, 2}):

pk = limt→∞ 1t t−1 τ =0 pk (τ )

The problem of designing an algorithm to minimize time average power expenditure subject to
queue stability can be written mathematically as:

Minimize: p1 + p2
Subject to: 1) Queues Qk (t) are stable ∀k ∈ {1, 2}
2) p(t) ∈ P ∀t ∈ {0, 1, 2, . . .}

where queue stability is defined in the next chapter. It is shown in the next chapter that queue
stability ensures the time average output rate of the queue is equal to the time average input rate.
Our theory will allow the design of a simple algorithm that makes decisions p(t) ∈ P every slot
t, without requiring a-priori knowledge of the probabilities associated with the arrival and channel
processes a(t) and S (t). The algorithm meets all desired constraints in the above problem whenever
it is possible to do so. Further, the algorithm is parameterized by a constant V ≥ 0 that can be
chosen as desired to yield time average power within O(1/V ) from the minimum possible time
average power required for queue stability. Choosing a large value of V can thus push average power
arbitrarily close to optimal. However, this comes with a tradeoff in average queue backlog and delay
that is O(V ).

1.1.2 EXAMPLE PROBLEM 2: MAXIMIZING THROUGHPUT SUBJECT TO

TIME AVERAGE POWER CONSTRAINTS
Consider the same system, but now assume the arrival process a(t) = (a1 (t), a2 (t)) can be controlled
by a flow control mechanism. We thus have two decision vectors: p(t) (the power allocation vector)
1.1. EXAMPLE OPPORTUNISTIC SCHEDULING PROBLEM 3
and a(t) (the data admission vector). The admission vector a(t) is chosen within some set A every
slot t. Let a k be the time average admission rate (in bits/slot) for user k, which is the same as
the time average throughput of user k if its queue is stable (as shown in the next chapter). We
have the following problem of maximizing a weighted sum of throughput subject to average power
constraints:
Maximize: w1 a 1 + w2 a 2
Subject to: 1) p k ≤ pk,av ∀k ∈ {1, 2}
2) Queues Qk (t) are stable ∀k ∈ {1, 2}
3) p(t) ∈ P ∀t ∈ {0, 1, 2, . . .}
4) a(t) ∈ A ∀t ∈ {0, 1, 2, . . .}
where w1 , w2 are given positive weights that define the relative importance of user 1 traffic and user
2 traffic, and p1,av , p2,av are given constants that represent desired average power constraints for
each user. Again, our theory leads to an algorithm that meets all desired constraints and comes within
O(1/V ) of the maximum throughput possible under these constraints, with an O(V ) tradeoff in
average backlog and delay.

1.1.3 EXAMPLE PROBLEM 3: MAXIMIZING THROUGHPUT-UTILITY

SUBJECT TO TIME AVERAGE POWER CONSTRAINTS
Consider the same system as Example Problem 2, but now assume the objective is to maximize
a concave function of throughput, rather than a linear function of throughput (the definition of
“concave” is given in footnote 1 in the next subsection). Specifically, let g1 (a) and g2 (a) be continuous,
concave, and non-decreasing functions of a over the range a ≥ 0. Such functions are called utility
functions. The value g1 (a 1 ) represents the utility (or satisfaction) that user 1 gets by achieving a
throughput of a 1 . Maximizing g1 (a 1 ) + g2 (a 2 ) can provide a more “fair” throughput vector (a 1 , a 2 ).
Indeed, maximizing a linear function often yields a vector with one component that is very high and
the other component very low (possibly 0). We then have the problem:
Maximize: g1 (a 1 ) + g2 (a 2 )
Subject to: 1) pk ≤ pk,av ∀k ∈ {1, 2}
2) Queues Qk (t) are stable ∀k ∈ {1, 2}
3) p(t) ∈ P ∀t ∈ {0, 1, 2, . . .}
4) a(t) ∈ A ∀t ∈ {0, 1, 2, . . .}
Typical utility functions are g1 (a) = g2 (a) = log(a), or g1 (a) = g2 (a) = log(1 + a). These
functions are non-decreasing and strictly concave, so that g1 (a 1 ) has a diminishing returns prop-
erty with each incremental increase in throughput a 1 . This means that if a 1 < a 2 , the sum utility
g1 (a 1 ) + g2 (a 2 ) would be improved more by increasing a 1 than by increasing a 2 . This creates a
more evenly distributed throughput vector. The log(a) utility functions provide a type of fairness
called proportional fairness (see (1)(2)). Fairness properties of different types of utility functions are
considered in (3)(4)(5)(6).
4 1. INTRODUCTION
For any given continuous and concave utility functions, our theory enables the design of
an algorithm that meets all desired constraints and provides throughput-utility within O(1/V ) of
optimality, with a tradeoff in average backlog and delay that is O(V ).
We emphasize that these three problems are just examples. The general theory can treat
many more types of networks. Indeed, the examples and problem set questions provided in this text
include networks with probabilistic channel errors, network coding, data compression, multi-hop
communication, and mobility. The theory is also useful for problems within operations research and
economics.

1.2 GENERAL STOCHASTIC OPTIMIZATION PROBLEMS

The three example problems considered in the previous section all involved optimizing a time
average (or a function of time averages) subject to time average constraints. Here we state the
general problems of this type. Consider a stochastic network that operates in discrete time with
unit time slots t ∈ {0, 1, 2, . . .}. The network is described by a collection of queue backlogs, written
in vector form Q(t) = (Q1 (t), . . . , QK (t)), where K is a non-negative integer. The case K = 0
corresponds to a system without queues. Every slot t, a control action is taken, and this action affects
arrivals and departures of the queues and also creates a collection of real valued attribute vectors x(t),
y (t), e(t):

x(t) = (x1 (t), . . . , xM (t))

y (t) = (y0 (t), y1 (t), . . . , yL (t))
e(t) = (e1 (t), . . . , eJ (t))

for some non-negative integers M, L, J (used to distinguish between equality constraints and two
types of inequality constraints).The attributes can be positive or negative, and they represent penalties
or rewards associated with the network on slot t, such as power expenditures, distortions, or packet
drops/admissions. These attributes are given by general functions:

xm (t) = x̂m (α(t), ω(t)) ∀m ∈ {1, . . . , M}

yl (t) = ŷl (α(t), ω(t)) ∀l ∈ {0, 1, . . . , L}
ej (t) = êj (α(t), ω(t)) ∀j ∈ {1, . . . , J }

where ω(t) is a random event observed on slot t (such as new packet arrivals or channel conditions)
and α(t) is the control action taken on slot t (such as packet admissions or transmissions). The action
α(t) is chosen within an abstract set Aω(t) that possibly depends on ω(t). Let x m , y l , ej represent
the time average of xm (t), yl (t), ej (t) under a particular control algorithm. Our first objective is to
1.3. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 5
design an algorithm that solves the following problem:

Minimize: y0 (1.1)
Subject to: 1) y l ≤ 0 for all l ∈ {1, . . . , L} (1.2)
2) ej = 0 for all j ∈ {1, . . . , J } (1.3)
3) α(t) ∈ Aω(t) ∀t (1.4)
4) Stability of all Network Queues (1.5)

Our second objective, more general than the first, is to optimize convex functions of time
averages.1 Specifically, let f (x), g1 (x), . . . , gL (x) be convex functions from RM to R, and let X
be a closed and convex subset of RM . Let x = (x 1 , . . . , x M ) be the vector of time averages of the
xm (t) attributes under a given control algorithm. We desire a solution to the following problem:

Minimize: y 0 + f (x ) (1.6)
Subject to: 1) y l + gl (x) ≤ 0 for all l ∈ {1, . . . , L} (1.7)
2) ej = 0 for all j ∈ {1, . . . , J } (1.8)
3) x∈X (1.9)
4) α(t) ∈ Aω(t) ∀t (1.10)
5) Stability of all Network Queues (1.11)

These problems (1.1)-(1.5) and (1.6)-(1.11) can be viewed as stochastic programs, and are
analogues of the classic linear programs and convex programs of static optimization theory. A solution
is an algorithm for choosing control actions over time in reaction to the existing network state, such
that all of the constraints are satisfied and the quantity to be minimized is as small as possible. These
problems have wide applications, and they are of interest even when there is no underlying queueing
network to be stabilized (so that the “Stability” constraints in (1.5) and (1.11) are removed). However,
it turns out that queueing theory plays a central role in this type of stochastic optimization. Indeed,
even if there are no underlying queues in the original problem, we can introduce virtual queues as
a strong method for ensuring that the required time average constraints are satisfied. Inefficient
control actions incur larger backlog in certain queues. These backlogs act as “sufficient statistics” on
which to base the next control decision. This enables algorithms that do not require knowledge of
the probabilities associated with the random network events ω(t).

1.3 LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION

We solve the problems described above with a simple and elegant theory of Lyapunov drift and
Lyapunov optimization. While this theory is presented in detail in future chapters, we briefly describe
it here. The first step is to look at the constraints of the problem to be solved. For example, for the
1 A set X ⊆ RM is convex if the line segment formed by any two points in X is also in X . A function f (x) defined over a convex
set X is a convex function if for any two points x1 , x2 ∈ X and any two probabilities p1 , p2 ≥ 0 such that p1 + p2 = 1, we
have f (p1 x1 + p2 x2 ) ≤ p1 f (x1 ) + p2 f (x2 ). A function f (x) is concave if −f (x) is convex. A function f (x) is affine if it

is linear plus a constant, having the form: f (x) = c0 + M m=1 cm xm .
6 1. INTRODUCTION
problem (1.1)-(1.5), the constraints are (1.2)-(1.5). Then construct virtual queues (in a way to be
specified) that help to meet the desired constraints. Next, define a function L(t) as the sum of
squares of backlog in all virtual and actual queues on slot t. This is called a Lyapunov function, and
it is a scalar measure of network congestion. Intuitively, if L(t) is “small,” then all queues are small,
and if L(t) is “large,” then at least one queue is large. Define (t) = L(t + 1) − L(t), being the
difference in the Lyapunov function from one slot to the next.2 If control decisions are made every
slot t to greedily minimize (t), then backlogs are consistently pushed towards a lower congestion
state, which intuitively maintains network stability (where “stability” is precisely defined in the next
chapter).
Minimizing (t) every slot is called minimizing the Lyapunov drift. Chapter 3 shows this
method provides queue stability for a particular example network, and Chapter 4 shows it also
stabilizes general networks. However, at this point, the problem is only half solved: The virtual
queues and Lyapunov drift help only to ensure the desired time average constraints are met. The
objective function to be minimized has not yet been incorporated. For example, y0 (t) is the objective
function for the problem (1.1)-(1.5). The objective function is mapped to an appropriate function
penalty(t). Instead of taking actions to greedily minimize (t), actions are taken every slot t to
greedily minimize the following drift-plus-penalty expression:

(t) + V × penalty(t)

where V is a non-negative control parameter that is chosen as desired. Choosing V = 0 corresponds

to the original algorithm of minimizing the drift alone. Choosing V > 0 includes the weighted
penalty term in the control decision and allows a smooth tradeoff between backlog reduction and
penalty minimization. We show that the time average objective function deviates by at most O(1/V )
from optimality, with a time average queue backlog bound of O(V ).
While Lyapunov techniques have a long history in the field of control theory, this form
of Lyapunov drift was perhaps first used to construct stable routing and scheduling policies for
queueing networks in the pioneering works (7)(8) by Tassiulas and Ephremides.These works used the
technique of minimizing (t) every slot, resulting in backpressure routing and max-weight scheduling
algorithms that stabilize the network whenever possible. The algorithms are particularly interesting
because they only require knowledge of the current network state, and they do not require knowledge
of the probabilities associated with future random events. Minimizing (t) has had wide success
for stabilizing many other types of networks, including packet switch networks (9)(10)(11), wireless
systems (7)(8)(12)(13)(14), and ad-hoc mobile networks (15). A related technique was used for
computing multi-commodity network flows in (16).
We introduced the V × penalty(t) term to the drift minimization in (17)(18)(19) to solve
problems of joint network stability and stochastic utility maximization, and we introduced the virtual
queue technique in (20)(21) to solve problems of maximizing throughput in a wireless network

2The notation used in later chapters is slightly different. Simplified notation is used here to give the main ideas.
1.4. DIFFERENCES FROM OUR EARLIER TEXT 7
subject to individual average power constraints at each node. Our previous text (22) unified these
ideas for application to general problems of the type described in Section 1.2.

1.4 DIFFERENCES FROM OUR EARLIER TEXT

The theory of Lyapunov drift and Lyapunov optimization is described collectively in our previous
text (22). The current text is different from (22) in that we emphasize the general optimization
problems first, showing how the problem (1.6)-(1.11) can be solved directly by using the solution
to the simpler problem (1.1)-(1.5). We also provide a variety of examples and problem set questions
to help the reader. These have been developed over several years for use in the stochastic network
optimization course taught by the author. This text also provides many new topics not covered in
(22), including:

• A more detailed development of queue stability theory (Chapter 2).

• Variable-V algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).

• Place-holder bits for delay improvement (Sections 3.2.4 and 4.8).

• Universal scheduling for non-ergodic sample paths (Section 4.9).

• Worst case delay bounds (Sections 5.6 and 7.6.1).

• Non-convex stochastic optimization (Section 5.5).

• Approximate scheduling and full throughput scheduling in interference networks via the Jiang-
Walrand theorem (Chapter 6).

• Optimization of renewal systems and Markov decision examples (Chapter 7).

• Treatment of problems with equality constraints (1.3) and abstract set constraints (1.9) (Section
5.4).

1.5 ALTERNATIVE APPROACHES

The relationship between network utility maximization, Lagrange multipliers, convex programming,
and duality theory is developed for static wireline networks in (2)(23)(24) and for wireless networks
in (25)(26)(27)(28)(29) where the goal is to converge to a static flow allocation and/or resource
allocation over the network. Scheduling in wireless networks with static channels is considered
from a duality perspective in (30)(31). Primal-dual techniques for maximizing utility in a stochastic
wireless downlink are developed in (32)(33) for systems without queues. The primal-dual technique
is extended in (34)(35) to treat networks with queues and to solve problems similar to (1.6)-(1.11)
in a fluid limit sense. Specifically, the work (34) shows the primal-dual technique leads to a fluid
8 1. INTRODUCTION
limit with an optimal utility, and it conjectures that the utility of the actual network is close to this
fluid limit when an exponential averaging parameter is scaled. It makes a statement concerning weak
limits of scaled systems. A related primal-dual algorithm is used in (36) and shown to converge to
utility-optimality as a parameter is scaled.
Our drift-plus-penalty approach can be viewed as a dual-based approach to the stochastic
problem (rather than a primal-dual approach), and it reduces to the well known dual subgradient
algorithm for linear and convex programs when applied to non-stochastic problems (see (37)(22)(17)
for discussions on this). One advantage of the drift-plus-penalty approach is the explicit convergence
analysis and performance bounds, resulting in the [O(1/V ), O(V )] performance-delay tradeoff.This
tradeoff is not shown in the alternative approaches described above. The dual approach is also robust
to non-ergodic variations and has “universal scheduling” properties, i.e., properties that hold for sys-
tems with arbitrary sample paths, as shown in Section 4.9 (see also (38)(39)(40)(41)(42)). However,
one advantage of the primal-dual approach is that it provides local optimum guarantees for problems
of minimizing f (x) for non-convex functions f (·) (see Section 5.5 and (43)). Related dual-based ap-
proaches are used for “infinitely backlogged” systems in (31)(44)(45)(46) using static optimization,
fluid limits, and stochastic gradients, respectively. Related algorithms for channel-aware scheduling
in wireless downlinks with different analytical techniques are developed in (47)(48)(49).
We note that the [O(1/V ), O(V )] performance-delay tradeoff achieved by the drift-plus-
penalty algorithm on general√ systems is not necessarily the optimal tradeoff for particular networks.
An optimal [O(1/V ), O( V )] energy-delay tradeoff is shown by Berry and Gallager in (50) for a
single link with known channel statistics, and optimal performance-delay tradeoffs for multi-queue
systems are developed in (51)(52)(53) and shown to be achievable even when channel statistics are
unknown. This latter work builds on the Lyapunov optimization method, but it uses a more aggres-
sive drift steering technique. A place-holder technique for achieving near-optimal delay tradeoffs is
developed in (37) and related implementations are in (54)(55).

1.6 ON GENERAL MARKOV DECISION PROBLEMS

The penalties x̂m (α(t), ω(t)), described in Section 1.2, depend only on the network control action
α(t) and the random event ω(t) (where ω(t) is generated by “nature” and is not influenced by
past control actions). In particular, the queue backlogs Q(t) are not included in the penalties. A
more advanced penalty structure would be x̂m (α(t), ω(t), z(t)), where z(t) is a controlled Markov
chain (possibly related to the queue backlog) with transition probabilities that depend on control
actions. Extensions of Lyapunov optimization for this case are developed in Chapter 7 using a
drift-plus-penalty metric defined over renewal frames (56)(57)(58).
A related 2-timescale approach to learning optimal decisions in Markov decision problems is
developed in (59), and learning approaches to power-aware scheduling in single queues are developed
in (60)(61)(62)(63). Background on dynamic programming and Markov decision problems can be
found in (64)(65)(66), and approximate dynamic programming, neuro-dynamic programming, and
Q-learning theory can be found in (67)(68)(69). All of these approaches may suffer from large
1.7. ON NETWORK DELAY 9
convergence times, high complexity, or inaccurate approximation when applied to large networks.
This is due to the curse of dimensionality for Markov decision problems. This problem does not arise
when using the Lyapunov optimization technique and when penalties have the structure given in
Section 1.2.

1.7 ON NETWORK DELAY

This text develops general [O(1/V ), O(V )] tradeoffs, giving explicit bounds on average queue
backlog and delay that grow linearly with V . We also provide examples of exact delay analysis for
randomized algorithms (Exercises 2.6-2.10), delay-limited transmission (Exercises 5.13-5.14 and
Section 7.6.1), worst case delay (Section 5.6), and average delay constraints (Section 7.6.2). Further
work on delay-limited transmission is found in (70)(71), and Lyapunov drift algorithms that use
delays as weights, rather than queue backlogs, are considered in (72)(73)(74)(75)(76). There are
many additional interesting topics on network delay that we do not cover in this text. We briefly
discuss some of those topics in the following sub-sections, with references given for further reading.

1.7.1 DELAY AND DYNAMIC PROGRAMMING

Dynamic programming and Markov decision frameworks are considered for one-queue energy and
delay optimality problems in (77)(78)(79)(80)(81). One-queue problems with strict deadlines and
a-priori knowledge of future events are treated in (82)(83)(84)(85)(86), and filter theory is used to
establish delay bounds in (87). Control rules for two interacting service stations are given in (88).
Optimal scheduling in a finite buffer 2 × 2 packet switch is treated in (89).
Minimum energy problems with delay deadlines are considered for multi-queue wireless sys-
tems in (90). In the case when channels are static, the work (90) maps the problem to a shortest
path problem. In the case when channels are varying but rate-power functions are linear, (90) shows
the optimal multi-dimensional dynamic program has a very simple threshold structure. Heuristic
approximations are given for more general rate-power curves. Related work in (91) considers delay
optimal scheduling in multi-queue systems and derives structural results of the dynamic programs,
resulting in efficient approximation algorithms. These approximations are shown to have optimal
decay exponents for sum queue backlog in (92), which relies on techniques developed in (93) for op-
timal max-queue exponents. A mixed Lyapunov optimization and dynamic programming approach
is given in (56) for networks with a small number of delay-constrained queues and an arbitrar-
ily large number of other queues that only require stability. Approximate dynamic programs and
q-learning type algorithms, which attempt to learn optimal decision strategies, are considered in
(61)(60)(56)(57)(62)(63).
√
1.7.2 OPTIMAL O( V ) AND O(log(V )) DELAY TRADEOFFS
The [O(1/V ), O(V )] performance-delay tradeoffs we derive for general networks in this text are
not necessarily the optimal tradeoffs for particular networks. The work (50) considers the optimal
10 1. INTRODUCTION
energy-delay tradeoff for a one-queue wireless √ system with a fading channel. It shows that no
algorithm can do better than an [O(1/V ), O( V )] tradeoff, and it proposes a buffer-partitioning
algorithm that√can be shown to come within a logarithmic factor of this tradeoff. This optimal
[O(1/V ), O( V )] tradeoff is extended to multi-queue systems in (51), and an algorithm with
an exponential Lyapunov function and aggressive drift steering is shown to meet this tradeoff to
within a logarithmic factor. The work (51) also shows an improved [O(1/V ), O(log(V ))] tradeoff
is achievable in certain exceptional cases with piecewise linear structure.
Optimal [O(1/V ), O(log(V ))] energy-delay tradeoffs are shown in (53) in cases when packet
dropping is allowed, and optimal [O(1/V ), O(log(V ))] utility-delay tradeoffs are shown for flow
control problems in (52). Near-optimal [O(1/V ), O(log2 (V ))] tradeoffs are shown for the basic
quadratic Lyapunov drift-plus-penalty method in (37)(55) using place-holders and Last-In-First-Out
(LIFO) scheduling, described in more detail in Section 4.8, and related implementations are in (54).

1.7.3 DELAY-OPTIMAL ALGORITHMS FOR SYMMETRIC NETWORKS

The works (8)(94)(95)(96)(97) treat multi-queue wireless systems with “symmetry,” where arrival
rates and channel probabilities are the same for all queues. They use stochastic coupling theory to
prove delay optimality for particular algorithms. The work (8) proves delay optimality of the longest
connected queue first algorithm for ON/OFF channels with a single server, the work (94)(97) considers
multi-server systems, and the work (95)(96) considers wireless problems under the information
theoretic multi-access capacity region. Related work in (98) proves delay optimality of the join the
shortest queue strategy for routing packets to two queues with identical exponential service.

1.7.4 ORDER-OPTIMAL DELAY SCHEDULING AND QUEUE GROUPING

The work (99) shows that delay is at least linear in N for N × N packet switches that use
queue-unaware scheduling, and it develops a simple queue-aware scheduling algorithm that gives
O(log(N)) delay whenever rates are within the capacity region. Related work in (100) considers
scheduling in N-user wireless systems with ON/OFF channels and shows that delay is at least linear
in N if queue-unaware algorithms are used, but it can be made O(1) with a simple queue-aware
queue grouping algorithm. This O(1) delay, independent of the number of users, is called order opti-
mal because it differs from optimal only in a constant coefficient that does not depend on N. Order
optimality of the simple longest connected queue first rule (simpler than the algorithm of (100)) is
proven in (101) via a queue grouping analysis.
Order-optimal delay for 1-hop switch scheduling under maximal scheduling (which provides
stability only when rates are within a constant factor of the capacity boundary) are developed in
(102)(103), again using queue grouping theory. In particular, it is shown that N × N packet switches
can provide O(1) delay (order-optimal) if they are at most half-loaded. The best known delay bound
beyond the half-loaded region is the O(log(N )) delay result of (99), and it is not known if it is possible
to achieve O(1) delay in this region. Time-correlated “bursty” traffic is considered in (103). The
1.7. ON NETWORK DELAY 11
queue grouping results in (101)(103) are inspired by queue-grouped Lyapunov functions developed
in (104)(105) for stability analysis.

1.7.5 HEAVY TRAFFIC AND DECAY EXPONENTS

A line of work addresses asymptotic delay optimality in a “heavy traffic” regime where input rates are
pushed very close to the capacity region boundary. Delay is often easier to understand in this heavy
traffic regime due to a phenomenon of state space collapse (106). Of course, delay grows to infinity
if input rates are pushed toward the capacity boundary, but the goal is to design an algorithm that
minimizes an asymptotic growth coefficient. Heavy traffic analysis is considered in (107) for wireless
scheduling and (108)(109) for packet switches.
The work (108)(109) suggests that delay in packet switches can be improved by changing the
well-known max-weight rule, which seeks to maximize a weighted sum of queue backlog and service

rates every slot t ( i Qi (t)μi (t)), to an α-max weight rule that seeks to maximize i Qi (t)α μi (t),
where 0 < α ≤ 1. Simulations on N × N packet switches in (110) show that delay is improved
when α is positive but small. A discussion of this in the context of heavy traffic theory is given in
(111), along with some counterexamples. It is interesting to note that α-max weight policies with
small but positive α make matching decisions that are similar to the max-size matches used in the
frame-based algorithm of (99), which achieves O(log(N )) delay. This may be a reason why the delay
of α-max weight policies is also small. Large deviation theory is often used to analyze queue backlog
and delay, and this is considered for α-max weight policies in (112), for delay-based scheduling in
(73), and for processor sharing queues in (113)(114). Algorithms that optimize the exponent of
queue backlog are considered in (93) for optimizing the max-queue exponent and in (92) for the
sum-queue exponent. These consider analysis of queue backlog when the queue is very large. An
analysis of backlog distributions that are valid also in the small buffer regime is given in (115) for
the case when the number of network channels is scaled to infinity.

1.7.6 CAPACITY AND DELAY TRADEOFFS FOR MOBILE NETWORKS

Work by Gupta and Kumar in (116) shows that per-node capacity of ad-hoc √ wireless networks with
N nodes and with random source-destination pairings is roughly (1/ N) (neglecting logarithmic
factors in N for simplicity). Grossglauser and Tse show in (117) that mobility increases per-node
capacity to (1), which does not vanish with N . However, the algorithm in (117) uses a 2-hop relay
algorithm that creates a large delay.The exact capacity and average end-to-end delay are computed in
(118)(17) for a cell-partitioned network with a simplified i.i.d. mobility model. The work (118)(17)
also shows for this simple model that the average delay W of any scheduling and routing protocol,
possibly one that uses redundant packet transfers, must satisfy:

W N −d
≥ (1 − log(2))
λ 4d
12 1. INTRODUCTION
where λ is the per-user throughput, C is the number of cells, d = N/C is the node/cell density, and
log(·) denotes the natural logarithm. Thus, if the node/cell density d = (1), then W /λ ≥ (N).
The 2-hop relay algorithm meets this bound with λ = (1) and W = (N ), and a relay algorithm
√
that redundantly
√ transmits packets over multiple paths meets this bound with λ = (1/ N) and
W = ( N ). Similar i.i.d. mobility models are considered in (119)(120)(121). The work (119)
shows that improved tradeoffs are possible if the transmission radius of each node can be scaled to
include a large amount of users in each transmission (so that the d = (1) assumption is relaxed).
The work (120)(121) quantifies the optimal tradeoff achievable under this type of radius scaling,
and it also shows improved tradeoffs are possible if the model is changed to allow time slot scaling
and network bit-pipelining. Related delay tradeoffs via transmission radius scaling for non-mobile
networks are in (122). Analysis of non-i.i.d. mobility models is more complex and considered in
(123)(124)(122)(125). Recent network coding approaches are in (126)(127)(128).

1.8 PRELIMINARIES
We assume the reader is comfortable with basic concepts of probability and random processes (such
as expectations, the law of large numbers, etc.) and with basic mathematical analysis. Familiarity
with queueing theory, Markov chains, and convex functions is useful but not required as we present
or derive results in these areas as needed in the text. For additional references on queueing theory
and Markov chains, including discussions of Little’s Theorem and the renewal-reward theorem,
see (129)(66)(130)(131)(132). For additional references on convex analysis, including discussions of
convex hulls, Caratheodory’s theorem, and Jensen’s inequality, see (133)(134)(135).
All of the major results of this text are derived directly from one or more of the following four
key concepts:

• Law of Telescoping Sums: For any function f (t) defined over integer times t ∈ {0, 1, 2, . . .}, we
have for any integer time t > 0:

t−1
[f (τ + 1) − f (τ )] = f (t) − f (0)
τ =0

The proof follows by a simple cancellation of terms. This is the main idea behind Lyapunov
drift arguments: Controlling the change in a function at every step allows one to control the
ending value of the function.

• Law of Iterated Expectations: For any random variables X and Y , we have:3

E {X} = E {E {X|Y }}
3 Strictly speaking, the law of iterated expectations holds whenever the result of Fubini’s Theorem holds (which allows one to
switch the integration order of a double integral). This holds whenever any one of the following hold: (i) E {|X|} < ∞, (ii)
E {max[X, 0]} < ∞, (iii) E {min[X, 0]} > −∞.
1.8. PRELIMINARIES 13
where the outer expectation is with respect to the distribution of Y , and the inner expectation
is with respect to the conditional distribution of X given Y .

• Opportunistically Minimizing an Expectation: Consider a game we play against nature, where

nature generates a random variable ω with some (possibly unknown) probability distribution.
We look at nature’s choice of ω and then choose a control action α within some action set Aω
that possibly depends on ω. Let c(α, ω) represent a general cost function. Our goal is to design
a (possibly randomized) policy for choosing α ∈ Aω to minimize the expectation E {c(α, ω)},
where the expectation is taken with respect to the distribution of ω and the distribution of
our action α that possibly depends on ω. Assume for simplicity that for any given outcome ω,
there is at least one action αωmin that minimizes the function c(α, ω) over all α ∈ Aω . Then,
not surprisingly, the policy that minimizes E {c(α, ω)} is the one that observes ω and selects
a minimizing action αωmin .

This is easy to prove: If αω∗ represents any random control action chosen in the set Aω in
response to the observed ω, we have: c(αωmin , ω) ≤ c(αω∗ , ω). This is an inequality
relationship

concerning the min ∗ yields E c(αωmin , ω) ≤
∗ random variables ω, αω , αω . Taking expectations
E c(αω , ω) , showing that the expectation under the policy αωmin is less than or equal to the
expectation under any other policy. This is useful for designing drift minimizing algorithms.

• Jensen’s Inequality (not needed until Chapter 5): Let X be a convex subset of RM (possibly being
the full space RM itself ), and let f (x) be a convex function over X . Let X be any random
vector that takes values in X , and assume that E {X } is well defined and finite (where the
expectation is taken entrywise). Then:

E {X } ∈ X and f (E {X }) ≤ E {f (X )} (1.12)

This text also uses, in addition to regular limits of functions, the lim sup and lim inf. Using
(or not using) these limits does not impact any of the main ideas in this text, and readers who are
not familiar with these limits can replace all instances of “lim sup” and “lim inf” with regular limits
“lim,” without loss of rigor, under the additional assumption that the regular limit exists. For readers
interested in more details on this, note that a function f (t) may or may not have a well defined
limit as t → ∞ (consider, for example, a cosine function). We define lim supt→∞ f (t) as the largest
possible limiting value of f (t) over any subsequence of times tk that increase to infinity, and for
which the limit of f (tk ) exists. Likewise, lim inf t→∞ f (t) is the smallest possible limiting value. It
can be shown that these limits always exist (possibly being ∞ or −∞). For example, the lim sup and
lim inf of the cosine function are 1 and −1, respectively. The main properties of lim sup and lim inf
that we use in this text are:

• If f (t), g(t) are functions that satisfy f (t) ≤ g(t) for all t, then lim supt→∞ f (t) ≤
lim supt→∞ g(t). Likewise, lim inf t→∞ f (t) ≤ lim inf t→∞ g(t).
14 1. INTRODUCTION
• For any function f (t), we have lim inf t→∞ f (t) ≤ lim supt→∞ f (t), with equality if and only
if the regular limit exists. Further, whenever the regular limit exists, we have lim inf t→∞ f (t) =
lim supt→∞ f (t) = limt→∞ f (t).

• For any function f (t), we have lim supt→∞ f (t) = − lim inf t→∞ [−f (t)] and
lim inf t→∞ f (t) = − lim sup[−f (t)].

• If f (t) and g(t) are functions such that limt→∞ g(t) = g ∗ , where g ∗ is a finite constant, then
lim supt→∞ [g(t) + f (t)] = g ∗ + lim supt→∞ f (t).
15

CHAPTER 2

Introduction to Queues
Let Q(t) represent the contents of a single-server discrete time queueing system defined over integer
time slots t ∈ {0, 1, 2, . . .}. Specifically, the initial state Q(0) is assumed to be a non-negative real
valued random variable. Future states are driven by stochastic arrival and server processes a(t) and
b(t) according to the following dynamic equation:

Q(t + 1) = max[Q(t) − b(t), 0] + a(t) for t ∈ {0, 1, 2, . . .} (2.1)

We call Q(t) the backlog on slot t, as it can represent an amount of work that needs to be done. The
stochastic processes {a(t)}∞ ∞
t=0 and {b(t)}t=0 are sequences of real valued random variables defined
over slots t ∈ {0, 1, 2, . . .}.
The value of a(t) represents the amount of new work that arrives on slot t, and it is assumed
to be non-negative. The value of b(t) represents the amount of work the server of the queue can
process on slot t. For most physical queueing systems, b(t) is assumed to be non-negative, although
it is sometimes convenient to allow b(t) to take negative values. This is useful for the virtual queues
defined in future sections where b(t) can be interpreted as a (possibly negative) attribute.1 Because
we assume Q(0) ≥ 0 and a(t) ≥ 0 for all slots t, it is clear from (2.1) that Q(t) ≥ 0 for all slots t.
The units of Q(t), a(t), and b(t) depend on the context of the system. For example, in a
communication system with fixed size data units, these quantities might be integers with units of
packets. Alternatively, they might be real numbers with units of bits, kilobits, or some other unit of
unfinished work relevant to the system.
We can equivalently re-write the dynamics (2.1) without the non-linear max[·, 0] operator as
follows:
Q(t + 1) = Q(t) − b̃(t) + a(t) for t ∈ {0, 1, 2, . . .} (2.2)

where b̃(t) is the actual work processed on slot t (which may be less than the offered amount b(t)
if there is little or no backlog in the system on slot t). Specifically, b̃(t) is mathematically defined:

b̃(t)= min[b(t), Q(t)]

1 Assuming that the b(t) value in (2.1) is possibly negative also allows treatment of modified queueing models that place new

arrivals inside the max[·, 0] operator. For example, a queue with dynamics Q̂(t + 1) = max[Q̂(t) − β(t) + α(t), 0] is the same
as (2.1) with a(t) = 0 and b(t) = β(t) − α(t) for all t. Leaving a(t) outside the max[·, 0] is crucial for treatment of multi-hop
networks, where a(t) can be a sum of exogenous and endogenous arrivals.
16 2. INTRODUCTION TO QUEUES

Note by definition that b̃(t) ≤ b(t) for all t.The dynamic equation (2.2) yields a simple but important
property for all sample paths, described in the following lemma.

Lemma 2.1 (Sample Path Property) For any discrete time queueing system described by (2.1), and for
any two slots t1 and t2 such that 0 ≤ t1 < t2 , we have:

2 −1
t 2 −1
t
Q(t2 ) − Q(t1 ) = a(τ ) − b̃(τ ) (2.3)
τ =t1 τ =t1

Therefore, for any t > 0, we have:

1 1
t−1 t−1
Q(t) Q(0)
− = a(τ ) − b̃(τ ) (2.4)
t t t t
τ =0 τ =0
Q(t) Q(0) 1
t−1
1
t−1
− ≥ a(τ ) − b(τ ) (2.5)
t t t t
τ =0 τ =0

Proof. By (2.2), we have for any slot τ ≥ 0:

Q(τ + 1) − Q(τ ) = a(τ ) − b̃(τ )

Summing the above over τ ∈ {t1 , . . . , t2 − 1} and using the law of telescoping sums yields:

2 −1
t 2 −1
t
Q(t2 ) − Q(t1 ) = a(τ ) − b̃(τ )
τ =t1 τ =t1

This proves (2.3). Inequality (2.4) follows by substituting t1 = 0, t2 = t, and dividing by t. Inequality
(2.5) follows because b̃(τ ) ≤ b(τ ) for all τ . 2

An important application of Lemma 2.1 to power-aware systems is treated in Exercise 2.11.

The equality (2.4) is illuminating. It shows that limt→∞ Q(t)/t = 0 if and only if the time average
of the process a(t) − b̃(t) is zero (where the time average of a(t) − b̃(t) is the limit of the right-
hand-side of (2.4)). This happens when the time average rate of arrivals a(t) is equal to the time
average rate of actual departures b̃(t). This motivates the definitions of rate stability and mean rate
stability, defined in the next section.
2.1. RATE STABILITY 17

2.1 RATE STABILITY

Let Q(t) be a real valued stochastic process that evolves in discrete time over slots t ∈ {0, 1, 2, . . .}
according to some probability law.

Definition 2.2 A discrete time process Q(t) is rate stable if:

Q(t)
lim = 0 with probability 1
t→∞ t

Definition 2.3 A discrete time process Q(t) is mean rate stable if:

E {|Q(t)|}
lim =0
t→∞ t

We use an absolute value of Q(t) in the mean rate stability definition, even though our queue
in (2.1) is non-negative, because later it will be useful to define mean rate stability for virtual queues
that can be possibly negative.

Theorem 2.4 (Rate Stability Theorem) Suppose Q(t) evolves according to (2.1), with a(t) ≥ 0 for all
t, and with b(t) real valued (and possibly negative) for all t. Suppose that the time averages of the processes
a(t) and b(t) converge with probability 1 to finite constants a av and bav , so that:

1
t−1
lim a(τ ) = a av with probability 1 (2.6)
t→∞ t
τ =0
1
t−1
lim b(τ ) = bav with probability 1 (2.7)
t→∞ t
τ =0

Then:
(a) Q(t) is rate stable if and only if a av ≤ bav .
(b) If a av > bav , then:

Q(t)
lim = a av − bav with probability 1
t→∞ t

(c) Suppose there are finite constants > 0 and C > 0 such that E [a(t) + b− (t)]1+ ≤ C for

all t, where b− (t)= − min[b(t), 0]. Then Q(t) is mean rate stable if and only if a av ≤ bav .
18 2. INTRODUCTION TO QUEUES
Proof. Here we prove only the necessary condition of part (a). Suppose that Q(t) is rate stable, so
that Q(t)/t → 0 with probability 1. Because (2.5) holds for all slots t > 0, we can take limits in
(2.5) as t → ∞ and use (2.6)-(2.7) to conclude that 0 ≥ a av − bav . Thus, a av ≤ bav is necessary for
rate stability. The proof for sufficiency in part (a) and the proof of part (b) are developed in Exercises
2.3 and 2.4. The proof of part (c) is more complex and is omitted (see (136)). 2
The following theorem presents a more general necessary condition for rate stability that does
not require the arrival and server processes to have well defined limits.

Theorem 2.5 (Necessary Condition for Rate Stability) Suppose Q(t) evolves according to (2.1), with
any general processes a(t) and b(t) such that a(t) ≥ 0 for all t. Then:
(a) If Q(t) is rate stable, then:

1
t−1
lim sup [a(τ ) − b(τ )] ≤ 0 with probability 1 (2.8)
t→∞ t
τ =0

(b) If Q(t) is mean rate stable and if E {Q(0)} < ∞, then:

1
t−1
lim sup E {a(τ ) − b(τ )} ≤ 0 (2.9)
t→∞ t
τ =0

Proof. The proof of (a) follows immediately by taking a lim sup of both sides of (2.5) and noting
that Q(t)/t → 0 because Q(t) is rate stable. The proof of (b) follows by first taking an expectation
of (2.5) and then taking limits. 2

2.2 STRONGER FORMS OF STABILITY

Rate stability and mean rate stability only describe the long term average rate of arrivals and depar-
tures from the queue, and do not say anything about the fraction of time the queue backlog exceeds
a certain value, or about the time average expected backlog. The stronger stability definitions given
below are thus useful.

Definition 2.6 A discrete time process Q(t) is steady state stable if:

lim g(M) = 0
M→∞

where for each M ≥ 0, g(M) is defined:

1
t−1

g(M)= lim sup P r[|Q(τ )| > M] (2.10)
t→∞ t
τ =0
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY 19

Definition 2.7 A discrete time process Q(t) is strongly stable if:

1
t−1
lim sup E {|Q(τ )|} < ∞ (2.11)
t→∞ t
τ =0

Under mild boundedness assumptions, strong stability implies all of the other forms of stability,
as specified in Theorem 2.8 below.

Theorem 2.8 (Strong Stability Theorem) Suppose Q(t) evolves according to (2.1) for some general
stochastic processes {a(t)}∞ ∞
t=0 and {b(t)}t=0 , where a(t) ≥ 0 for all t, and b(t) is real valued for all t.
Suppose Q(t) is strongly stable. Then:
(a) Q(t) is steady state stable.
(b) If there is a finite constant C such that either a(t) + b− (t) ≤ C with probability 1 for all t

(where b− (t)= − min[b(t), 0]), or b(t) − a(t) ≤ C with probability 1 for all t, then Q(t) is rate stable,
so that Q(t)/t → 0 with probability 1.
(c) If there is a finite constant C such that either E a(t) + b− (t) ≤ C for all t, or
E {b(t) − a(t)} ≤ C for all t, then Q(t) is mean rate stable.

Proof. Part (a) is given in Exercise 2.5. Parts (b) and (c) are omitted (see (136)). 2

Readers familiar with discrete time Markov chains (DTMCs) may be interested in the fol-
lowing connection: For processes Q(t) defined over an ergodic DTMC with a finite or countably
infinite state space and with the property that, for each real value M, the event {|Q(t)| ≤ M} corre-
sponds to only a finite number of states, steady state stability implies the existence of a steady state
distribution, and strong stability implies finite average backlog and (by Little’s theorem (129)) finite
average delay.

2.3 RANDOMIZED SCHEDULING FOR RATE STABILITY

The Rate Stability Theorem (Theorem 2.4) suggests the following simple method for stabilizing a
multi-queue network: Make scheduling decisions so that the time average service and arrival rates
are well defined and satisfy aiav ≤ biav for each queue i. This method typically requires perfect
knowledge of the arrival and channel probabilities so that the desired time averages can be achieved.
Some representative examples are provided below. A better method that does not require a-priori
statistical knowledge is developed in Chapters 3 and 4.
20 2. INTRODUCTION TO QUEUES

a1(t) Q1(t)

a2(t) Q2(t)

a3(t) Q3(t)

Figure 2.1: A 3-queue, 2-server system. Every slot the network controller decides which 2 queues receive
servers. A single queue cannot receive 2 servers on the same slot.

2.3.1 A 3-QUEUE, 2-SERVER EXAMPLE

Example Problem: Consider the 3-queue, 2-server system of Fig. 2.1. All packets have fixed length,
and a queue that is allocated a server on a given slot can serve exactly one packet on that slot. Every
slot we choose which 2 queues to serve. The service is given for i ∈ {1, 2, 3} by:

1 if a server is connected to queue i on slot t
bi (t) =
0 otherwise

Assume the arrival processes have well defined time average rates (a1av , a2av , a3av ), in units of pack-
ets/slot. Design a server allocation algorithm to make all queues rate stable when arrival rates are
given as follows:
a) (a1av , a2av , a3av ) = (0.5, 0.5, 0.9)
b) (a1av , a2av , a3av ) = (2/3, 2/3, 2/3)
c) (a1av , a2av , a3av ) = (0.7, 0.9, 0.4)
d) (a1av , a2av , a3av ) = (0.65, 0.5, 0.75)
e) Use (2.5) to prove that the constraints 0 ≤ aiav ≤ 1 for all i ∈ {1, 2, 3}, and a1av + a2av +
a3 ≤ 2, are necessary for the existence of a rate stabilizing algorithm.
av

Solution:
a) Choose the service vector (b1 (t), b2 (t), b3 (t)) to be independent and identically distributed
(i.i.d.) every slot, choosing (0, 1, 1) with probability 1/2 and (1, 0, 1) with probability 1/2. Then
{b1 (t)}∞t=0 is i.i.d. over slots with b1 = 0.5 by the law of large numbers. Likewise, b2 = 0.5 and
av av

b3av = 1. Then clearly aiav ≤ biav for all i ∈ {1, 2, 3}, and so the Rate Stability Theorem ensures all
queues are rate stable. While this is a randomized scheduling algorithm, one could also design a
deterministic algorithm, such as one that alternates between (0, 1, 1) (on odd slots) and (1, 0, 1)
(on even slots).
b) Choose (b1 (t), b2 (t), b3 (t)) i.i.d. over slots, equally likely over the three options (1, 1, 0),
(1, 0, 1), and (0, 1, 1). Then biav = 2/3 = aiav for all i ∈ {1, 2, 3}, and so by the Rate Stability
Theorem all queues are rate stable.
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY 21
c) Every slot, independently choose the service vector (0, 1, 1) with probability p1 , (1, 0, 1)
with probability p2 , and (1, 1, 0) with probability p3 , so that p1 , p2 , p3 satisfy:

p1 (0, 1, 1) + p2 (1, 0, 1) + p3 (1, 1, 0) ≥ (0.7, 0.9, 0.4) (2.12)

p1 + p2 + p3 = 1 (2.13)
pi ≥ 0 ∀i ∈ {1, 2, 3} (2.14)

where the inequality (2.12) is taken entrywise.This is an example of a linear program. Linear programs
are typically difficult to solve by hand, but this one can be solved easily by guessing that the constraint
in (2.12) can be solved with equality. One can verify the following (unique) solution: p1 = 0.3,
p2 = 0.1, p3 = 0.6. Thus, b1av = p2 + p3 = 0.7, b2av = p1 + p3 = 0.9, b3av = p1 + p2 = 0.4, and
so all queues are rate stable by the Rate Stability Theorem. It is an interesting exercise to design an
alternative deterministic algorithm that uses a periodic schedule to produce the same time averages.
d) Use the same linear program (2.12)-(2.14), but replace the constraint (2.12) with the
following:

p1 (0, 1, 1) + p2 (1, 0, 1) + p3 (1, 1, 0) ≥ (0.65, 0.5, 0.75)

This can be solved by hand by trial-and-error. One simplifying trick is to replace the above inequality
constraint with the following equality constraint:

p1 (0, 1, 1) + p2 (1, 0, 1) + p3 (1, 1, 0) = (0.7, 0.5, 0.8)

Then we can use p1 = 0.3, p2 = 0.5, p3 = 0.2.

e) Consider any algorithm that makes all queues rate stable, and let bi (t) be the queue-i
decision made by the algorithm on slot t. For each queue i, we have for all t > 0:

1 1
t−1 t−1
Qi (t) Qi (0)
− ≥ ai (τ ) − bi (τ )
t t t t
τ =0 τ =0
1
t−1
≥ ai (τ ) − 1
t
τ =0

where the first inequality follows by (2.5) and the final inequality holds because bi (τ ) ≤ 1 for all τ .
The above holds for all t > 0. Taking a limit as t → ∞ and using the fact that queue i is rate stable
yields, with probability 1:

0 ≥ aiav − 1
22 2. INTRODUCTION TO QUEUES
and so we find that, for each i ∈ {1, 2, 3}, the condition aiav ≤ 1 is necessary for the existence of an
algorithm that makes all queues rate stable. Similarly, we have:
Q1 (t) + Q2 (t) + Q3 (t) Q1 (0) + Q2 (0) + Q3 (0)
−
t t
1 1
t−1 t−1
≥ [a1 (τ ) + a2 (τ ) + a3 (τ )] − [b1 (τ ) + b2 (τ ) + b3 (τ )]
t t
τ =0 τ =0
1
t−1
≥ [a1 (τ ) + a2 (τ ) + a3 (τ )] − 2
t
τ =0

where the final inequality holds because b1 (τ ) + b2 (τ ) + b3 (τ ) ≤ 2 for all τ . Taking limits shows
that 0 ≥ a1av + a2av + a3av − 2 is also a necessary condition.
Discussion: Define as the set of all rate vectors (a1av , a2av , a3av ) that satisfy the constraints in
part (e) of the above example problem. We know from part (e) that (a1av , a2av , a3av ) ∈ is a necessary
condition for existence of an algorithm that makes all queues rate stable. Further, it can be shown
that for any vector (a1av , a2av , a3av ) ∈ , there exist probabilities p1 , p2 , p3 that solve the following
linear program:

p1 (0, 1, 1) + p2 (1, 0, 1) + p3 (1, 1, 0) ≥ (a1av , a2av , a3av )

p1 + p 2 + p 3 = 1
pi ≥ 0 ∀i ∈ {1, 2, 3}

Showing this is not trivial and is left as an advanced exercise. However, this fact, together with the
Rate Stability Theorem, shows that it is possible to design an algorithm to make all queues rate
stable whenever (a1av , a2av , a3av ) ∈ . That is, (a1av , a2av , a3av ) ∈ is necessary and sufficient for the
existence of an algorithm that makes all queues rate stable. The set is called the capacity region for
the network. Exercises 2.7 and 2.8 provide additional practice questions about scheduling and delay
in this system.

2.3.2 A 2-QUEUE OPPORTUNISTIC SCHEDULING EXAMPLE

Example Problem: Consider a 2-queue wireless downlink that operates in discrete time (Fig. 2.2a).
All data consists of fixed length packets. The arrival process (a1 (t), a2 (t)) represents the (integer)
number of packets that arrive to each queue on slot t. There are two wireless channels, and packets in
queue i must be transmitted over channel i, for i ∈ {1, 2}. At the beginning ofeach slot, the network
controller observes the channel state vector S (t) = (S1 (t), S2 (t)), where Si (t) ∈ {ON, OF F }, so
that there are four possible channel state vectors. The controller can transmit at most one packet per
slot, and it can only transmit a packet over a channel that is ON. Thus, for each channel i ∈ {1, 2},
we have:

1 if Si (t) = ON and channel i is chosen for transmission on slot t
bi (t) =
0 otherwise
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY 23

2
S1(t) (0.36,0.4)
a1(t) Q1(t) (0,0.4)

(0.6,0.16)
a2(t) Q2(t)
S2(t)
(0,0)
(0.6,0)
1
(a) (b)

Figure 2.2: (a) The 2-queue, 1-server opportunistic scheduling system with ON/OFF channels. (b) The
capacity region for the specific channel probabilities given below.

If S (t) = (OF F, OF F ), then b1 (t) = b2 (t) = 0. If exactly one channel is ON, then clearly the
controller should choose to transmit over that channel. The only decision is which channel to
use when S (t) = (ON, ON). Suppose that (a1 (t), a2 (t)) is i.i.d. over slots with E {a1 (t)} =

λ1 and E {a2 (t)} = λ2 . Suppose that S (t) is i.i.d. over slots with P r[(OF F, OF F )]= p00 ,
P r[(OF F, ON)] = p01 , P r[(ON, OF F )] = p10 , P r[ON, ON ] = p11 .
a) Define as the set of all vectors (λ1 , λ2 ) that satisfy the constraints 0 ≤ λ1 ≤ p10 +
p11 , 0 ≤ λ2 ≤ p01 + p11 , λ1 + λ2 ≤ p01 + p10 + p11 . Show that (λ1 , λ2 ) ∈ is necessary for the
existence of a rate stabilizing algorithm.
b) Plot the 2-dimensional region for the special case when p00 = 0.24, p10 = 0.36, p01 =
0.16, p11 = 0.24.
c) For the system of part (b): Use a randomized algorithm that independently transmits over
channel 1 with probability β whenever S (t) = (ON, ON ). Choose β to make both queues rate
stable when (λ1 , λ2 ) = (0.6, 0.16).
d) For the system of part (b): Choose β to make both queues rate stable when (λ1 , λ2 ) =
(0.5, 0.26).
Solution:
a) Let b1 (t), b2 (t) be the decisions made by a particular algorithm that makes both queues
rate stable. From (2.5), we have for queue 1 and for all slots t > 0:

1 1
t−1 t−1
Q1 (t) Q1 (0)
− ≥ a1 (τ ) − b1 (τ )
t t t t
τ =0 τ =0

Because b1 (τ ) ≤ 1{S1 (τ )=ON } , where the latter is an indicator function that is 1 if S1 (τ ) = ON, and
0 else, we have:
1 1
t−1 t−1
Q1 (t) Q1 (0)
− ≥ a1 (τ ) − 1{S1 (τ )=ON } (2.15)
t t t t
τ =0 τ =0
24 2. INTRODUCTION TO QUEUES
However, we know that Q1 (t)/t → 0 with probability 1. Further, by the law of large numbers, we
have (with probability 1):

1 1
t−1 t−1
lim a1 (τ ) = λ1 , lim 1{S1 (τ )=ON } = p10 + p11
t→∞ t t→∞ t
τ =0 τ =0

Thus, taking a limit as t → ∞ in (2.15) yields:

0 ≥ λ1 − (p10 + p11 )

and hence λ1 ≤ p10 + p11 is a necessary condition for any rate stabilizing algorithm. A similar
argument shows that λ2 ≤ p01 + p11 is a necessary condition. Finally, note that for all t > 0:

1 1
t−1 t−1
Q1 (t) + Q2 (t) Q1 (0) + Q2 (0)
− ≥ [a1 (τ ) + a2 (τ )] − 1{{S1 (τ )=ON }∪{S2 (τ )=ON }}
t t t t
τ =0 τ =0

Taking a limit of the above proves that λ1 + λ2 ≤ p01 + p10 + p11 is necessary.
b) See Fig. 2.2b.
c) If S (t) = (OF F, OF F ) then don’t transmit. If S (t) = (ON, OF F ) or (ON, ON) then
transmit over channel 1. If S (t) = (OF F, ON ), then transmit over channel 2. Then by the law
of large numbers, we have b1av = p10 + p11 = 0.6, b2av = p01 = 0.16, and so both queues are rate
stable (by the Rate Stability Theorem).
d) Choose β = 0.14/0.24. Then b1av = 0.36 + 0.24β = 0.5, and b2av = 0.16 + 0.24(1 −
β) = 0.26.
Discussion: Exercise 2.9 treats scheduling and delay issues in this system. It can be shown that
the set given in part (a) above is the capacity region, so that (λ1 , λ2 ) ∈ is necessary and sufficient
for the existence of a rate stabilizing policy. See (8) for the derivation of the capacity region for
ON/OFF opportunistic scheduling systems with K queues (with K ≥ 2). See also (8) for optimal
delay scheduling in symmetric systems of this type (where all arrival rates are the same, as are all
ON/OFF probabilities), and (101)(100) for “order-optimal” delay in general (possibly asymmetric)
situations.
It is possible to support any point in using a stationary randomized policy that makes a
scheduling decision as a random function of the observed channel state S (t). Such policies are
called S -only policies. The solutions given in parts (c) and (d) above use S -only policies. Further, the
randomized server allocation policies considered in the 3-queue, 2-server example of Section 2.3.1
can be viewed as “degenerate” S -only policies, because, in that case, there is only one “channel state”
(i.e., (ON, ON, ON)). It is known that the capacity region of general single-hop and multi-hop
networks with time varying channels S (t) can be described in terms of S -only policies (15)(22) (see
also Theorem 4.5 of Chapter 4 for a related result for more general systems).
Note that S -only policies do not consider queue backlog information, and thus they may serve
a queue that is empty, which is clearly inefficient. Thus, one might wonder how S -only policies can
2.4. EXERCISES 25
stabilize queueing networks whenever traffic rates are inside the capacity region. Intuitively, the
reason is that inefficiency only arises when a queue becomes empty, a rare event when traffic rates are
near the boundary of the capacity region.2 Thus, using queue backlog information cannot “enlarge”
the region of supportable rates. However, Chapter 3 shows that queue backlogs are extremely useful
for designing dynamic algorithms that do not require a-priori knowledge of channel statistics or
a-priori computation of a randomized policy with specific time averages.

2.4 EXERCISES

Exercise 2.1. (Queue Sample Path) Fill in the missing entries of the table in Fig. 2.3 for a queue
Q(t) that satisfies (2.1).

t 0 1 2 3 4 5 6 7 8 9 10
Arrivals a(t) 3 3 0 2 1 0 0 2 0 0
Current Rate b(t) 4 2 1 3 3 2 2 4 0 2 1
Backlog Q(t) 0 3 4 3 2
Transmitted b̃(t) 0 2 1 2 1

Figure 2.3: An example sample path for the queueing system of Exercise 2.1.

Exercise 2.2. (Inequality comparison) Let Q(t) satisfy (2.1) with server process b(t) and arrival
process a(t). Let Q̃(t) be another queueing system with the same server process b(t) but with an
arrival process ã(t) = a(t) + z(t), where z(t) ≥ 0 for all t ∈ {0, 1, 2, . . .}. Assuming that Q(0) =
Q̃(0), prove that Q(t) ≤ Q̃(t) for all t ∈ {0, 1, 2, . . .}.

Exercise 2.3. (Proving sufficiency for Theorem 2.4a) Let Q(t) satisfy (2.1) with arrival and server
processes with well defined time averages a av and bav . Suppose that a av ≤ bav . Fix > 0, and
define Q (t) as a queue with Q (0) = Q(0), and with the same server process b(t) but with an
arrival process ã(t) = a(t) + (bav − a av ) + for all t.
a) Compute the time average of ã(t).
b) Assuming the result of Theorem 2.4b, compute limt→∞ Q (t)/t.
c) Use the result of part (b) and Exercise 2.2 to prove that Q(t) is rate stable. Hint: I am
thinking of a non-negative number x. My number has the property that x ≤ for all > 0. What
is my number?
2 For example, in the GI/B/1 queue of Exercise 2.6, it can be shown by Little’s Theorem (129) that the fraction of time the queue
is empty is 1 − λ/μ (assuming λ ≤ μ), which goes to zero when λ → μ.
26 2. INTRODUCTION TO QUEUES
Exercise 2.4. (Proof of Theorem 2.4b) Let Q(t) be a queue that satisfies (2.1). Assume time
averages of a(t) and b(t) are given by finite constants a av and bav , respectively.
a) Use the following equation to prove that limt→∞ a(t)/t = 0 with probability 1:

1 1
t t−1
t t a(t)
a(τ ) = a(τ ) +
t +1 t +1 t t +1 t
τ =0 τ =0

b) Suppose that b̃(ti ) < b(ti ) for some slot ti (where we recall that b̃(ti )= min[b(ti ), Q(ti )]).
Use (2.1) to compute Q(ti + 1).
c) Use part (b) and (2.5) to show that if b̃(ti ) < b(ti ), then:

ti
a(ti ) ≥ Q(0) + [a(τ ) − b(τ )]
τ =0

Conclude that if b̃(ti ) < b(ti ) for an infinite number of slots ti , then a av ≤ bav .
d) Use part (c) to conclude that if a av > bav , there is some slot t ∗ ≥ 0 such that for all t > t ∗ ,
we have:

t−1
Q(t) = Q(t ∗ ) + [a(τ ) − b(τ )]
τ =t ∗
Use this to prove the result of Theorem 2.4b.

Exercise 2.5. (Strong stability implies steady state stability) Prove that strong stability implies
steady state stability using the fact that E {|Q(τ )|} ≥ MP r[|Q(τ )| > M].

Exercise 2.6. (Discrete time GI/B/1 queue) Consider a queue Q(t) with dynamics (2.1). Assume

is i.i.d. over slots with non-negative integer values, with E {a(t)} = λ and E a(t) =
that 2
2a(t)
E a . Assume that b(t) is independent of the arrivals and is i.i.d. over slots with P r[b(t) = 1] = μ,
P r[b(t) = 0] = 1 − μ. Thus, Q(t)
2 is always integer valued. Suppose that λ < μ, and that there are
finite values E {Q}, Q, Q , E Q such that:
av

1 1
t−1 t−1
lim E {Q(τ )} = Q , lim Q(τ ) = Qav with prob. 1
t→∞ t t→∞ t
τ =0 τ =0
lim E {Q(t)} = E {Q} , lim E Q(t)2 = E Q2
t→∞ t→∞

Using ergodic Markov chain theory, it can be shown that Q = Qav = E {Q} (see also Exercise 7.9).
Here we want to compute E {Q}, using the magic of a quadratic.
a) Take expectations of equation (2.2) to find limt→∞ E b̃(t) .
2.4. EXERCISES 27

b) Explain why b̃(t)2 = b̃(t) and Q(t)b̃(t) = Q(t)b(t).

c) Square equation (2.2) and use part (b) to prove:

Q(t + 1)2 = Q(t)2 + b̃(t) + a(t)2 − 2Q(t)(b(t) − a(t)) − 2b̃(t)a(t)

d) Take expectations in (c) and let t → ∞ to conclude that:

E a 2 + λ − 2λ2
E {Q} =
2(μ − λ)

We have used the fact that Q(t) is independent of b(t), even though it is not independent of b̃(t).
This establishes the average backlog for an integer-based GI/B/1 queue (where “GI” means the
arrivals are general and i.i.d. over slots, “B” means the service is i.i.d. Bernoulli, and “1” means
there is a single server). By Little’s Theorem (129), it follows that average delay (in units of slots) is
W = Q/λ. When the arrival process is Bernoulli, these formulas simplify to Q = λ(1 − λ)/(μ − λ)
and W = (1 − λ)/(μ − λ). Using reversible Markov chain theory (130)(66)(131), it can be shown
that the steady state output process of a B/B/1 queue is also i.i.d. Bernoulli with rate λ (regardless
of μ, provided that λ < μ), which makes analysis of tandems of B/B/1 queues very easy.

Exercise 2.7. (Server Scheduling) Consider the 3-queue, 2-server system example of Section 2.3.1
(Fig. 2.1). Assume the arrival vector (a1 (t), a2 (t), a3 (t)) is i.i.d. over slots with E {ai (t)} = λi for
i ∈ {1, 2, 3}. Design a randomized server allocation algorithm to make all queues rate stable when:
a) (λ1 , λ2 , λ3 ) = (0.2, 0.9, 0.6)
b) (λ1 , λ2 , λ3 ) = (3/4, 3/4, 1/2)
c) (λ1 , λ2 , λ3 ) = (0.6, 0.5, 0.9)
d) (λ1 , λ2 , λ3 ) = (0.7, 0.6, 0.5)
e) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (b).
f ) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (c).

Exercise 2.8. (Delay for Server Scheduling) Consider the 3-queue, 2-server system of Fig. 2.1 that
operates according to the randomized schedule of the solution given in part (d) of Section 2.3.1, so
that p1 = 0.3, p2 = 0.5, p3 = 0.2. Suppose a1 (t) is i.i.d. over slots and Bernoulli, with P r[a1 (t) =
0] = 0.35, P r[a1 (t) = 1] = 0.65. Use the formula of Exercise 2.6 to compute the average backlog
Q1 and average delay W 1 in queue 1. (First, you must convince yourself that queue 1 is indeed a
discrete time GI/B/1 queue).

Exercise 2.9. (Delay for Opportunistic Scheduling) Consider the 2-queue wireless downlink with
ON/OFF channels as described in the example of Section 2.3.2 (Fig. 2.2). The channel probabilities
28 2. INTRODUCTION TO QUEUES
are given as in that example: p00 = 0.24, p10 = 0.36, p01 = 0.16, p11 = 0.24. Suppose the arrival
process a1 (t) is i.i.d. Bernoulli with rate λ1 = 0.4, so that P r[a1 (t) = 1] = 0.4, P r[a1 (t) = 0] =
0.6. Suppose a2 (t) is i.i.d. Bernoulli with rate λ2 = 0.3. Design a randomized algorithm, using
parameter β as the probability that we transmit over channel 1 when S (t) = (ON, ON), that
ensures the average delay satisfies W 1 ≤ 25 slots and W 2 ≤ 25 slots. You should use the delay
formula in Exercise 2.6 (first convincing yourself that each queue is indeed a GI/B/1 queue) along
with an educated guess for β and/or trial and error for β.

Exercise 2.10. (Simulation of a B/B/1 queue) Write a computer program to simulate a

Bernoulli/Bernoulli/1 (B/B/1) queue. Specifically, we have Q(0) = 0, {a(t)}∞ t=0 is i.i.d over slots
with P r[a(t) = 1] = λ, P r[a(t) = 0] = 1 − λ, and {b(t)}∞ t=0 is independent of the arrival process
and is i.i.d. over slots with P r[b(t) = 1] = μ, P r[b(t) = 0] = 1 − μ. Assume that μ = 0.7, run
the experiment over 106 slots, and give the empirical time average Qav and the value of Q(t)/t for
t = 106 , for λ values of 0.4, 0.5, 0.6, 0.7, 0.8. Compare these to the exact value (given in Exercise
2.6) for t → ∞.

Exercise 2.11. (Virtual Queues) Suppose we have a system that operates in discrete time with slots
t ∈ {0, 1, 2, . . .}. A controller makes decisions every slot t about how to operate the system, and
these decisions incur power p(t). The controller wants to ensure the time average power expenditure
is no more than 12.3 power units per slot. Define a virtual queue Z(t) with Z(0) = 0, and with
update equation:
Z(t + 1) = max[Z(t) − 12.3, 0] + p(t) (2.16)
The controller keeps the value of Z(t) as a state variable, and updates Z(t) at the end of each slot
via (2.16) using the power p(t) that was spent on that slot.
a) Use Lemma 2.1 to prove that if Z(t) is rate stable, then:3

limt→∞ 1t t−1τ =0 p(τ ) ≤ 12.3 with probability 1

b) Suppose there is a positive constant Zmax such that Z(t) ≤ Zmax for all t ∈ {0, 1, 2, . . .}.
Use (2.3) to show that for any integer T > 0 and any interval of T slots, defined by {t1 , . . . , t1 +
T − 1} (where t1 ≥ 0), we have:
t1 +T −1
τ =t1 p(τ ) ≤ 12.3T + Zmax
This idea is used in (21) to ensure the total power used in a communication system over any interval
is less than or equal to the desired per-slot average power constraint multiplied by the interval size,
plus a constant allowable “power burst” Zmax . A variation of this technique is used in (137) to bound
the worst-case number of collisions with a primary user in a cognitive radio network.

3 For simplicity, we have implicitly assumed the limit lim 1 t−1 p(τ ) in Exercise 2.11(a) exists. More generally, the result
t→∞ t τ =0
holds when “lim” is replaced with “lim sup.”
29

CHAPTER 3

Dynamic Scheduling Example

The dynamic scheduling algorithms developed in this text use powerful techniques of Lyapunov
drift and Lyapunov optimization. To build intuition, this chapter introduces the main concepts for
a simple 2-user wireless downlink example, similar to the example given in Section 2.3.2 of the
previous chapter. First, the problem is formulated in terms of known arrival rates and channel state
probabilities. However, rather than using a randomized scheduling algorithm that bases decisions
only on the current channel states (as considered in the previous chapter), we use an alternative
approach based on minimizing the drift of a Lyapunov function. The advantage is that the drift-
minimizing approach uses both current channel states and current queue backlogs to stabilize the
system, and it does not require a-priori knowledge of traffic rates or channel probabilities. This
Lyapunov drift technique is extended at the end of the chapter to allow for joint stability and
average power minimization.

2
(0.14, 1.10)

Z
S1(t) in {0,1}
A1(t) Q1(t) max{ (0.49, 0.75)
Y
{
max

A2(t) Q2(t)
S2(t) in {0,1,2} (0.70, 0.33)

E{A1(t)} = 1
E{A2(t)} = 2 1
X
(a) (b)

Figure 3.1: (a) The 2-queue wireless downlink example with time-varying channels. (b) The capacity
region . For λ = (0.3, 0.7) (i.e., point Y illustrated), we have max (λ) = 0.12.

3.1 SCHEDULING FOR STABILITY

Consider a slotted system with two queues, as shown in Fig. 3.1(a). The arrival vector (A1 (t), A2 (t))
is i.i.d. over slots, where A1 (t) and A2 (t) take integer units of packets. The arrival rates

are given by λ1 = E {A1 (t)} and λ2 = E {A2 (t)}. The second moments E A21 = E A1 (t)2 and

E A22 = E A2 (t)2 are assumed to be finite. The wireless channels are time varying, and every
30 3. DYNAMIC SCHEDULING EXAMPLE
slot t we have a channel vector S (t) = (S1 (t), S2 (t)), where Si (t) is a non-negative integer that
represents the number of packets that can be transmitted over channel i on slot t (for i ∈ {1, 2}),
provided that the scheduler decides to transmit over that channel. The channel state processes S1 (t)
and S2 (t) are independent of each other and are i.i.d. over slots, with:
• P r[S1 (t) = 0] = 0.3 , P r[S1 (t) = 1] = 0.7
• P r[S2 (t) = 0] = 0.2 , P r[S2 (t) = 1] = 0.5 , P r[S2 (t) = 2] = 0.3
Every slot t the network controller observes the current channel state vector S (t) and chooses a
single channel over which to transmit. Let α(t) be the transmission decision on slot t, taking three
possible values:
α(t) ∈ {“Transmit over channel 1”, “Transmit over channel 2”, “Idle”}
where α(t) = “Idle” means that no transmission takes place on slot t. The queueing dynamics are
given by:
Qi (t + 1) = max[Qi (t) − bi (t), 0] + Ai (t) ∀i ∈ {1, 2}, ∀t ∈ {0, 1, 2, . . .} (3.1)
where bi (t) represents the amount of service offered to channel i on slot t (for i ∈ {1, 2}), defined
by a function b̂i (α(t), S (t)):

Si (t) if α(t) = “Transmit over channel i”
bi (t) = b̂i (α(t), S (t))= (3.2)
0 otherwise

3.1.1 THE S-ONLY ALGORITHM AND max

Let S represent the set of the 6 possible outcomes for channel state vector S (t) in the above system:

S= {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}
Consider first the class of S -only scheduling algorithms that make independent, stationary, and ran-
domized transmission decisions every slot t based only on the observed S (t) (and hence independent
of queue backlog). A particular S -only algorithm for this system is characterized by probabilities
q1 (S1 , S2 ) and q2 (S1 , S2 ) for all (S1 , S2 ) ∈ S , where qi (S1 , S2 ) is the probability of transmitting
over channel i if S (t) = (S1 , S2 ). These probabilities must satisfy q1 (S1 , S2 ) + q2 (S1 , S2 ) ≤ 1 for
all (S1 , S2 ) ∈ S , where we use inequality to allow the possibility of transmitting over neither channel
(useful for the power minimization problem considered later). Let α ∗ (t) represent the transmission

decisions under a particular S -only policy, and define b1∗ (t)= b̂1 (α ∗ (t), S (t)), b2∗ (t)= b̂2 (α ∗ (t), S (t))
as the resulting transmission rates offered by this policy on slot t. We thus have for every slot t:

E b1∗ (t) = P r[S1 , S2 ]S1 q1 (S1 , S2 )
(S1 ,S2 )∈S

E b2∗ (t) = P r[S1 , S2 ]S2 q2 (S1 , S2 )
(S1 ,S2 )∈S
3.1. SCHEDULING FOR STABILITY 31
where we have used P r[S1 , S2 ] as short-hand notation for P r[(S1 (t), S2 (t)) = (S1 , S2 )].
Note that the above expectations are over the random channel state vector S (t) and the random
transmission decision ∗
∗ in reaction to this vector. Under ∗
this S -only algorithm,
∗ b1 (t) is i.i.d. over slots
with mean E b1 (t) , and thus the time average of b1 (t) is equal to E b1 (t) with probability 1 (by
the law of large numbers). It follows
by
the Rate Stability Theorem (Theorem 2.4) that queue 1 is
rate stable if and only if λ1 ≤ E b1∗ (t) . Likewise, queue 2 is rate stable if and only if λ2 ≤ E b2∗ (t) .
However, for finite delay, it is useful to design the transmission rates to be strictly larger than the
arrival rates (see Exercises 2.6, 2.8, 2.9, 2.10). The following linear program
seeks to design
an

S -only policy that maximizes the value of for which λ1 + ≤ E b1∗ (t) and λ2 + ≤ E b2∗ (t) :

Maximize: (3.3)

Subject to: λ1 + ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S1 q1 (S1 , S2 ) (3.4)

λ2 + ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S2 q2 (S1 , S2 ) (3.5)
q1 (S1 , S2 ) + q2 (S1 , S2 ) ≤ 1 ∀(S1 , S2 ) ∈ S (3.6)
q1 (S1 , S2 ) ≥ 0, q2 (S1 , S2 ) ≥ 0 ∀(S1 , S2 ) ∈ S (3.7)

There are 8 known parameters that appear as constants in the above linear program:

λ1 , λ2 , P r[S1 , S2 ] ∀(S1 , S2 ) ∈ S (3.8)

There are 13 unknowns that act as variables to be optimized in the above linear program:

, q1 (S1 , S2 ), q2 (S1 , S2 ) ∀(S1 , S2 ) ∈ S (3.9)

Define λ= (λ1 , λ2 ), and define max (λ) as the maximum value of in the above problem.
It can be shown that the network capacity region is the set of all non-negative rate vectors λ for
which max (λ) ≥ 0. The value of max represents a measure of the distance between the rate vector
λ and the capacity region boundary. If the rate vector λ is interior to the capacity region , then
max (λ) > 0. In this simple example, it is possible to compute the capacity region explicitly, and that
is shown in Fig. 3.1(b). The figure also illustrates an example arrival rate vector (λ1 , λ2 ) = (0.3, 0.7)
(shown as point Y in the figure), for which we have max (0.3, 0.7) = 0.12.
It follows that for any rate vector λ = (λ1 , λ2 ) that is interior to the capacity region , we have
max (λ) > 0, and there exists an S -only algorithm that yields transmission variables (b1∗ (t), b2∗ (t))
that satisfy:

E b1∗ (t) ≥ λ1 + max (λ) , E b2∗ (t) ≥ λ2 + max (λ) (3.10)

3.1.2 LYAPUNOV DRIFT FOR STABLE SCHEDULING

Rather than trying to solve the linear program of the preceding sub-section (which would require
a-priori knowledge of the arrival rates and channel probabilities specified in (3.8)), here we pursue
queue stability via an algorithm that makes decisions based on both the current channel states and
the current queue backlogs. Thus, the algorithm we present is not an S -only algorithm. Remarkably,
32 3. DYNAMIC SCHEDULING EXAMPLE
the proof that it provides strong stability whenever the arrival rate vector is interior to the capacity
region will use the existence of the S -only algorithm that satisfies (3.10), without ever needing to
solve for the 13 variables in (3.9) that define this S -only algorithm.
Let Q(t) = (Q1 (t), Q2 (t)) be the vector of current queue backlogs, and define a Lyapunov
function L(Q(t)) as follows:
1
L(Q(t))= [Q1 (t)2 + Q2 (t)2 ] (3.11)
2
This represents a scalar measure of queue congestion in the network, and has the following properties:

• L(Q(t)) ≥ 0 for all backlog vectors Q(t) = (Q1 (t), Q2 (t)), with equality if and only if the
network is empty on slot t.

• L(Q(t)) being “small” implies that both queue backlogs are “small.”

• L(Q(t)) being “large” implies that at least one queue backlog is “large.”

For example, if L(Q(t)) ≤ 32, then Q1 (t)2 + Q2 (t)2 ≤ 64, and thus we know that both Q1 (t) ≤ 8
and Q2 (t) ≤ 8.
If there is a finite√constant M such that L(Q(t)) ≤ M for all t, then clearly all queue backlogs
are always bounded by 2M, and so all queues are trivially strongly stable. While we usually cannot
guarantee that the Lyapunov function is deterministically bounded, it is intuitively clear that design-
ing an algorithm to consistently push the queue backlog towards a region such that L(Q(t)) ≤ M
(for some finite constant M) will help to control congestion and stabilize the queues.
One may wonder why we use a quadratic Lyapunov function, when another function, such as a
linear function, would satisfy properties similar to those stated above. When computing the change
in the Lyapunov function from one slot to the next, we will find that the quadratic has important
dominant cross terms that include an inner product of queue backlogs and transmission rates. This
is important for the same reason that it was important to use a quadratic function in the delay
computation of Exercise 2.6, and readers seeking more intuition on the “magic” of the quadratic
function are encouraged to review that exercise.
To understand how we can consistently push the Lyapunov function towards a low congestion
region, we first use (3.1) to compute a bound on the change in the Lyapunov function from one slot
to the next:

1
2
L(Q(t + 1)) − L(Q(t)) = [Qi (t + 1)2 − Qi (t)2 ]
2
i=1
1
2
= (max[Qi (t) − bi (t), 0] + Ai (t))2 − Qi (t)2
2
i=1

2
[Ai (t)2 + bi (t)2 ]
2
≤ + Qi (t)[Ai (t) − bi (t)] (3.12)
2
i=1 i=1
3.1. SCHEDULING FOR STABILITY 33
where in the final inequality we have used the fact that for any Q ≥ 0, b ≥ 0, A ≥ 0, we have:
(max[Q − b, 0] + A)2 ≤ Q2 + A2 + b2 + 2Q(A − b)
Now define (Q(t)) as the conditional Lyapunov drift for slot t:

(Q(t))= E {L(Q(t + 1) − L(Q(t))|Q(t)} (3.13)
where the expectation depends on the control policy, and is with respect to the random channel states
and the (possibly random) control actions made in reaction to these channel states. From (3.12), we
have that (Q(t)) for a general control policy satisfies:

2
Ai (t)2 + bi (t)2 2 2
(Q(t)) ≤ E | Q(t) + Qi (t)λi − E Qi (t)bi (t)|Q(t) (3.14)
2
i=1 i=1 i=1

where we have used the fact that arrivals are i.i.d. over slots and hence independent of current queue
backlogs, so that E {Ai (t)|Q(t)} = E {Ai (t)} = λi . Now define B as a finite constant that bounds
the first term on the right-hand-side of the above drift inequality, so that for all t, all possible Q(t),
and all possible control actions that can be taken, we have:

2
Ai (t)2 + bi (t)2
E | Q(t) ≤ B
2
i=1

For our system, we have that at most one bi (t) value can be non-zero on a given slot t.The probability
that the non-zero bi (t) (if any) is equal to 2 is at most 0.3 (because P r[S2 (t) = 2] = 0.3), and if it
is not equal to 2, then it is at most 1. Hence:

1 2
22 (0.3) + 12 (0.7)
E bi (t) |Q(t) ≤
2
= 0.95
2 2
i=1

and thus we can define B as:

1
2

B =0.95 + E A2i (3.15)
2
i=1
Using this in (3.14) yields:

2
2
(Q(t)) ≤ B + Qi (t)λi − E Qi (t)bi (t)|Q(t)
i=1 i=1

To emphasize how the right-hand-side of the above inequality depends on the transmission decision
α(t), we use the identity bi (t) = b̂i (α(t), S (t)) to yield:

2 2
(Q(t)) ≤ B + Qi (t)λi − E Qi (t)b̂i (α(t), S (t))|Q(t) (3.16)
i=1 i=1
34 3. DYNAMIC SCHEDULING EXAMPLE
3.1.3 THE “MIN-DRIFT” OR “MAX-WEIGHT” ALGORITHM
Our dynamic algorithm is designed to observe the current queue backlogs (Q1 (t), Q2 (t)) and the
current channel states (S1 (t), S2 (t)) and to make a transmission decision α(t) to minimize the
right-hand-side of the drift bound (3.16). Note that the transmission decision on slot t only affects
the final term on the right-hand-side. Thus, we seek to design an algorithm that maximizes the
following expression:

2
E Qi (t)b̂i (α(t), S (t))|Q(t)
i=1

The above conditional expectation is with respect to the randomly observed channel states S (t) =
(S1 (t), S2 (t)) and the (possibly random) control decision α(t). We now use the concept of oppor-
tunistically maximizing an expectation: The above expression is maximized by the algorithm that
observes the current queues (Q1 (t), Q2 (t)) and channel states (S1 (t), S2 (t)) and chooses α(t) to
maximize:
2
Qi (t)b̂i (α(t), S (t)) (3.17)
i=1

This is often called the “max-weight” algorithm, as it seeks to maximize a weighted sum of the
transmission rates, where the weights are queue backlogs. As there are only three decisions (transmit
over channel 1, transmit over channel 2, or don’t transmit), it is easy to evaluate the weighted sum
(3.17) for each option:
2
• i=1 Qi (t)b̂i (α(t), S (t)) = Q1 (t)S1 (t) if we choose to transmit over channel 1.
2
• i=1 Qi (t)b̂i (α(t), S (t)) = Q2 (t)S2 (t) if we choose to transmit over channel 2.
2
• i=1 Qi (t)b̂i (α(t), S (t)) = 0 if we choose to remain idle.

It follows that the max-weight algorithm chooses to transmit over the channel i with the largest
(positive) value of Qi (t)Si (t), and remains idle if this value is 0 for both channels. This simple
algorithm just makes decisions based on the current queue states and channel states, and it does not
need knowledge of the arrival rates or channel probabilities.
Because this algorithm maximizes the weighted sum (3.17) over all alternative decisions, we
have:
2
2
Qi (t)b̂i (α(t), S (t)) ≥ Qi (t)b̂i (α ∗ (t), S (t))
i=1 i=1

where α ∗ (t)
represents any alternative (possibly randomized) transmission decision that can be made
on slot t. This includes the case when α ∗ (t) is an S -only decision that randomly chooses one of
the three transmit options (transmit 1, transmit 2, or idle) with a distribution that depends on the
observed S (t). Fixing a particular alternative (possibly randomized) decision α ∗ (t) for comparison
3.1. SCHEDULING FOR STABILITY 35
and taking a conditional expectation of the above inequality (given Q(t)) yields:

2
2
∗
E Qi (t)b̂i (α(t), S (t))|Q(t) ≥ E Qi (t)b̂i (α (t), S (t))|Q(t)
i=1 i=1

where the decision α(t) on the left-hand-side of the above inequality represents the max-weight
decision made on slot t, and the decision α ∗ (t) represents any other particular decision that could
have been made. Plugging the above directly into (3.16) yields:

2
2
∗
(Q(t)) ≤ B + Qi (t)λi − E Qi (t)b̂i (α (t), S (t))|Q(t) (3.18)
i=1 i=1

where the left-hand-side represents the drift under the max-weight decision α(t), and the final term
on the right-hand-side involves any other decision α ∗ (t). It is remarkable that the inequality (3.18)
holds true for all of the (infinite) number of possible randomized alternative decisions that can be
plugged into the final term on the right-hand-side. However, this should not be too surprising, as
we designed the max-weight policy to have exactly this property! Rearranging the terms in (3.18)
yields:

2

(Q(t)) ≤ B − Qi (t)[E bi∗ (t)|Q(t) − λi ] (3.19)
i=1

where we have used the identity bi∗ (t)= b̂i (α ∗ (t), S (t)) to represent the transmission rate that would
∗
be offered over channel i if decision α (t) were made.
Now suppose the arrival rates (λ1 , λ2 ) are interior to the capacity region , and consider the
particular S -only decision α ∗ (t) that chooses a transmit option independent of queue backlog to yield
(3.10). Because channel states are i.i.d. over slots, the resulting rates (b1∗ (t), b2∗ (t)) are independent
of current queue backlog, and so by (3.10), we have for i ∈ {1, 2}:

E bi∗ (t)|Q(t) = E bi∗ (t) ≥ λi + max (λ)

Plugging this directly into (3.19) yields:

2
(Q(t)) ≤ B − Qi (t)max (λ) (3.20)
i=1

where we recall that max (λ) > 0. The above is a drift inequality concerning the max-weight al-
gorithm on slot t, and it is now in terms of a value max (λ) associated with the linear program
(3.3)-(3.7). However, we did not need to solve the linear program to obtain this inequality or to
implement the algorithm! It was enough to know that the solution to the linear program exists!
36 3. DYNAMIC SCHEDULING EXAMPLE
3.1.4 ITERATED EXPECTATIONS AND TELESCOPING SUMS
Taking an expectation of (3.20) over the randomness of the Q1 (t) and Q2 (t) values yields:

2
E {(Q(t))} ≤ B − max (λ) E {Qi (t)} (3.21)
i=1

Using the definition of (Q(t)) in (3.13) with the law of iterated expectations yields:

E {(Q(t))} = E {E {L(Q(t + 1)) − L(Q(t))|Q(t)}} = E {L(Q(t + 1))} − E {L(Q(t))}

Substituting this identity into (3.21) yields:

2
E {L(Q(t + 1))} − E {L(Q(t))} ≤ B − max (λ) E {Qi (t)}
i=1

The above holds for all t ∈ {0, 1, 2, . . .}. Summing over t ∈ {0, 1, . . . , T − 1} for some integer
T > 0 yields (by telescoping sums):
−1
T 2
E {L(Q(T ))} − E {L(Q(0))} ≤ BT − max (λ) E {Qi (t)}
t=0 i=1

Rearranging terms, dividing by max (λ)T , and using the fact that L(Q(T )) ≥ 0 yields:
T −1 2
1 B E {L(Q(0))}
E {Qi (t)} ≤ +
T max (λ) max (λ)T
t=0 i=1

Assuming that E {L(Q(0))} < ∞ and taking a lim sup yields:

T −1 2
1 B
lim sup E {Qi (t)} ≤
T →∞ T max (λ)
t=0 i=1

Thus, all queues are strongly stable, and the total average backlog (summed over both queues) is less
than or equal to B/max (λ). Thus, the max-weight algorithm (developed by minimizing a bound
on the Lyapunov drift) ensures the queueing network is strongly stable whenever the rate vector
λ is interior to the capacity region , with an average queue congestion bound that is inversely
proportional to the distance the rate vector is away from the capacity region boundary.
As an example, assume λ1 = 0.3 and λ2 = 0.7, illustrated
by the point Y of Fig. 3.1(b). Then
max = 0.12. Assuming arrivals are Bernoulli so that E A2i = E {Ai } = λi and using the value of
B = 1.45 obtained from (3.15), we have:
1.45
Q1 + Q 2 ≤ = 12.083 packets
0.12
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 37

where Q1 + Q2 represents the lim sup time average expected queue backlog in the network. By
Little’s Theorem (129), average delay satisfies:

Q1 + Q 2
W = ≤ 12.083 slots
λ1 + λ 2

A simulation of the algorithm over 106 slots yields an empirical average queue backlog of
empirical empirical
Q1 + Q2 = 3.058 packets, and hence in this example, our upper bound overesti-
mates backlog by roughly a factor of 4.
Thus, the actual max-weight algorithm performs much better than the bound would suggest.
There are three reasons for this gap: (i) A simple upper bound was used when computing the
Lyapunov drift in (3.12), (ii) The value B used an upper bound on the second moments of service,
(iii) The drift inequality compares to a queue-unaware S -only algorithm, whereas the actual drift
is much better because our algorithm considers queue backlog. The third reason often dominates in
networks with many queues. For example, in (100) it is shown that average congestion and delay in
an N-queue wireless system with one server and ON/OFF channels is at least proportional to N if
a queue-unaware algorithm is used (a related result is derived for N × N packet switches in (99)).
However, a more sophisticated queue grouping analysis in (101) shows that the max-weight algorithm
on the ON/OFF downlink system gives average backlog and delay that is O(1), independent of the
number of queues. For brevity, we do not include queue grouping concepts in this text. The interested
reader is referred to the above references, see also queue grouping results in (102)(103)(104)(105).

3.1.5 SIMULATION OF THE MAX-WEIGHT ALGORITHM

Fig. 3.2 shows simulation results over 106 slots when the rate vector (λ1 , λ2 ) is pushed up the line
segment from X to Z in the figure, again assuming independent Bernoulli arrivals. The point Z
is (λ1 , λ2 ) = (0.372, 0.868). In the figure, the x-axis is a normalization factor ρ that specifies the
distance along the segment (so that ρ = 0 is the point X, ρ = 1 is the point Z, and ρ = 0.806 is
the point Y ). It can be seen that the network is strongly stable for all rates with ρ < 1, and it has
average backlog that increases to infinity at the vertical asymptote defined by the capacity region
boundary (i.e., at ρ = 1).
Also plotted in Fig. 3.2 is the upper-bound B/max (λ) (where we have computed max (λ)
for each input rate vector λ simulated). This bound shows the same qualitative behavior, but it is
roughly a factor of 4 larger than the empirically observed backlog.

3.2 STABILITY AND AVERAGE POWER MINIMIZATION

Now consider the same system, but define p(t) as the power expenditure incurred by the transmission
decision α(t) on slot t. To emphasize that power is a function of α(t), we write p(t) = p̂(α(t)) and
38 3. DYNAMIC SCHEDULING EXAMPLE
Average queue backlog versus ρ 2
50
(0.14, 1.10)
45
Average queue backlog E[Q1 + Q2]
40 Z
35
max{ (0.49, 0.75)
Y

{
30
max
25

20
Bound
15
(0.70, 0.33)
Simulation
10

0
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X
ρ

Figure 3.2: Average sum queue backlog (in units of packets) under the max-weight algorithm, as loading
is pushed from point X (i.e., ρ = 0) to point Z (i.e., ρ = 1). Each simulated data point is an average over
106 slots.

assume the following simple power function:

1 if α(t) ∈ {“Transmit over channel 1,” “Transmit over channel 2”}
p̂(α(t)) =
0 if α(t) = “Idle”

That is, we spend 1 unit of power if we transmit over either channel, and no power is spent if we
remain idle. Our goal is now to make transmission decisions to jointly stabilize the system while
also striving to minimize average power expenditure.
For a given rate vector (λ1 , λ2 ) in the capacity region , define (λ1 , λ2 ) as the minimum
average power that can be achieved by any S -only algorithm that makes all queues rate stable. The
value (λ1 , λ2 ) can be computed by solving the following linear program (compare with (3.3)-(3.7)):

Minimize: = )∈S P r[S1 , S2 ](q1 (S1 , S2 ) + q2 (S1 , S2 ))
(S1 ,S2
Subject to: λ1 ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S1 q1 (S1 , S2 )

λ2 ≤ (S1 ,S2 )∈S P r[S1 , S2 ]S2 q2 (S1 , S2 )
q1 (S1 , S2 ) + q2 (S1 , S2 ) ≤ 1 ∀(S1 , S2 ) ∈ S
q1 (S1 , S2 ) ≥ 0 , q2 (S1 , S2 ) ≥ 0 ∀(S1 , S2 ) ∈ S

Thus, for each λ ∈ , there is an S -only algorithm α ∗ (t) such that:

E b̂1 (α ∗ (t), S (t)) ≥ λ1 , E b̂2 (α ∗ (t), S (t)) ≥ λ2 , E p̂(α ∗ (t)) = (λ1 , λ2 )

It can be shown that (λ1 , λ2 ) is the minimum time average expected power expenditure that can
be achieved by any control policy that stabilizes the system (including policies that are not S -only)
(21). Further, (λ1 , λ2 ) is continuous, convex, and entrywise non-decreasing.
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 39
Now assume that λ = (λ1 , λ2 ) is interior to , so that (λ1 + , λ2 + ) ∈ for all such
that 0 ≤ ≤ max (λ). It follows that whenever 0 ≤ ≤ max (λ), there exists an S -only algorithm
α ∗ (t) such that:

E b̂1 (α ∗ (t), S (t)) ≥ λ1 + (3.22)

∗
E b̂2 (α (t), S (t)) ≥ λ2 + (3.23)
∗

E p̂(α (t)) = (λ1 + , λ2 + ) (3.24)

3.2.1 DRIFT-PLUS-PENALTY
Define the same Lyapunov function L(Q(t)) as in (3.11), and let (Q(t)) represent the conditional
Lyapunov drift for slot t. While taking actions to minimize a bound on (Q(t)) every slot t would
stabilize the system, the resulting average power expenditure might be unnecessarily large. For ex-
ample, suppose the rate vector is (λ1 , λ2 ) = (0, 0.4), and recall that P r[S2 (t) = 2] = 0.3. Then the
drift-minimizing algorithm of the previous section would transmit over channel 2 whenever the
queue is not empty and S2 (t) ∈ {1, 2}. In particular, it would sometimes use “inefficient” transmis-
sions when S2 (t) = 1, which spend one unit of power but only deliver 1 packet. However, if we
only transmit when S2 (t) = 2 and when the number of packets in the queue is at least 2, it can be
shown that the system is still stable, but power expenditure is reduced to its minimum of λ2 /2 = 0.2
units/slot.
Instead of taking a control action to minimize a bound on (Q(t)), we minimize a bound
on the following drift-plus-penalty expression:

(Q(t)) + V E {p(t)|Q(t)}

where V ≥ 0 is a parameter that represents an “importance weight” on how much we emphasize

power minimization. Such a control decision can be motivated as follows: We want to make (Q(t))
small to push queue backlog towards a lower congestion state, but we also want to make E {p(t)|Q(t)}
small so that we do not incur a large power expenditure. We thus decide according to the above
weighted sum. We now show that this intuitive algorithm leads to a provable power-backlog tradeoff:
Average power can be pushed arbitrarily close to (λ1 , λ2 ) by using a large value of V , at the expense
of incurring an average queue backlog that is O(V ).
We have already computed a bound on (Q(t)) in (3.16), and so adding V E {p(t)|Q(t)} to
both sides of (3.16) yields a bound on the drift-plus-penalty:

2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V E p̂(α(t))|Q(t) + Qi (t)λi
i=1

2
−E Qi (t)b̂i (α(t), S (t))|Q(t) (3.25)
i=1
40 3. DYNAMIC SCHEDULING EXAMPLE
where we have used the fact that p(t) = p̂(α(t)). The drift-plus-penalty algorithm then observes
(Q1 (t), Q2 (t)) and (S1 (t), S2 (t)) every slot t and chooses an action α(t) to minimize the right-
hand-side of the above inequality. Again, using the concept of opportunistically minimizing an
expectation, this is accomplished by greedily minimizing:

2
value = V p̂(α(t)) − Qi (t)b̂i (α(t), S (t))
i=1

We thus compare the following values and choose the action corresponding to the smallest (breaking
ties arbitrarily):

• value[1] = V − Q1 (t)S1 (t) if α(t) = “Transmit over channel 1.”

• value[2] = V − Q2 (t)S2 (t) if α(t) = “Transmit over channel 2.”

• value[Idle] = 0 if α(t) = “Idle.”

3.2.2 ANALYSIS OF THE DRIFT-PLUS-PENALTY ALGORITHM

Because our decisions α(t) minimize the right-hand-side of the drift-plus-penalty inequality (3.25)
on every slot t (given the observed Q(t)), we have:

2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V E p̂(α ∗ (t))|Q(t) + Qi (t)λi
i=1

2
∗
−E Qi (t)b̂i (α (t), S (t))|Q(t) (3.26)
i=1

where α ∗ (t) is any other (possibly randomized) transmission decision that can be made on slot t.
Now assume that λ is interior to , and fix any value such that 0 ≤ ≤ max (λ). Plugging the
S -only algorithm (3.22)-(3.24) into the right-hand-side of the above inequality and noting that
this policy makes decisions independent of queue backlog yields:

2
(Q(t)) + V E {p(t)|Q(t)} ≤ B + V (λ1 + , λ2 + ) + Qi (t)λi
i=1

2
− Qi (t)(λi + )
i=1

2
= B + V (λ1 + , λ2 + ) − Qi (t) (3.27)
i=1
3.2. STABILITY AND AVERAGE POWER MINIMIZATION 41
Taking expectations of the above inequality and using the law of iterated expectations as before
yields:

2
E {L(Q(t + 1))} − E {L(Q(t))} + V E {p(t)} ≤ B + V (λ1 + , λ2 + ) − E {Qi (t)}
i=1

Summing the above over t ∈ {0, 1, . . . , T − 1} for some positive integer T yields:

−1
T
E {L(Q(T ))} − E {L(Q(0))} + V E {p(t)} ≤ BT + V T (λ1 + , λ2 + )
t=0
−1
T 2
− E {Qi (t)} (3.28)
t=0 i=1

Rearranging terms in the above and neglecting non-negative quantities where appropriate yields the
following two inequalities:

T −1
1 B E {L(Q(0))}
E {p(t)} ≤ (λ1 + , λ2 + ) + +
T V VT
t=0
T −1 2 T −1
1 B + V [(λ1 + , λ2 + ) − T1 t=0 E {p(t)}] E {L(Q(0))}
E {Qi (t)} ≤ +
T T
t=0 i=1

where the first inequality follows by dividing (3.28) by V T and the second follows by dividing (3.28)
by T . Taking limits as T → ∞ shows that:1

T −1
1 B
p = lim E {p(t)} ≤ (λ1 + , λ2 + ) + (3.29)
T →∞ T V
t=0
−1
T 2
1 B V [(λ1 + , λ2 + ) − p]
Q1 + Q2 = lim E {Qi (t)} ≤ + (3.30)
T →∞ T
t=0 i=1

3.2.3 OPTIMIZING THE BOUNDS

The bounds (3.29) and (3.30) hold for any that satisfies 0 ≤ ≤ max (λ), and hence they can be
optimized separately. Plugging max (λ) into (3.30) shows that both queues are strongly stable. Using
= 0 in (3.29) thus yields:
B
(λ1 , λ2 ) ≤ p ≤ (λ1 , λ2 ) + (3.31)
V
1 In this simple example, the system evolves according to a countably infinite state space Discrete Time Markov Chain (DTMC),
and it can be shown that the limits in (3.29) and (3.30) are well defined.
42 3. DYNAMIC SCHEDULING EXAMPLE
where the first inequality follows because our algorithm stabilizes the network and thus cannot
yield time average expected power lower than (λ1 , λ2 ), the infimum time average expected power
required for stability of any algorithm.
Because p ≥ (λ1 , λ2 ), it can be shown that:

(λ1 + , λ2 + ) − p ≤ (λ1 + , λ2 + ) − (λ1 , λ2 ) ≤ 2

where the final inequality holds because it requires at most one unit of energy to support each new
packet, and so increasing the total input rate from λ1 + λ2 to λ1 + λ2 + 2 increases the minimum
required average power by at most 2. Plugging the above into (3.30) yields:
B
Q1 + Q 2 ≤ + 2V

The above holds for all that satisfy 0 ≤ ≤ max (λ), and so plugging = max (λ) yields:
B
Q1 + Q2 ≤ + 2V (3.32)
max (λ)
The performance bounds (3.31) and (3.32) demonstrate an [O(1/V ), O(V )] power-backlog trade-
off: We can use an arbitrarily large V to make B/V arbitrarily small, so that (3.31) implies the time
average power p is arbitrarily close to the optimum (λ1 , λ2 ). This comes with a tradeoff: The
average queue backlog bound in (3.32) is O(V ).

3.2.4 SIMULATIONS OF THE DRIFT-PLUS-PENALTY ALGORITHM

Consider the previous example of Bernoulli arrivals with λ1 = 0.3, λ2 = 0.7, max (λ) = 0.12, B =
1.45, which corresponds to point Y in Fig. 3.1(b). Then the bounds (3.31)-(3.32) become:

1.45
p ≤ (λ1 , λ2 ) + (3.33)
V
1.45
Q1 + Q 2 ≤ + 2V (3.34)
0.12
Figs. 3.3 and 3.4 plot simulations for this system together with the above power and backlog bounds.
Each simulated data point represents a simulation over 2 × 106 slots using a particular value of V .
Values of V in the range 0 to 100 are shown. It is clear from the figures that average power converges
to the optimal p∗ = 0.7 as V increases, while average backlog increases linearly in V .
Performance can be significantly improved by noting that the drift-plus-penalty algorithm
given in Section 3.2.1 never transmits from queue 1 unless Q1 (t) ≥ V (else, value[1] would be
place
positive). Hence, Q1 (t) ≥ Q1 = max[V − 1, 0] for all slots t ≥ 0, provided that this holds at
t = 0. Similarly, the algorithm never transmits from queue 2 unless Q2 (t) ≥ V /2, and so Q2 (t) ≥
place
Q2 = max[V /2 − 2, 0] for all slots t ≥ 0, provided this holds at t = 0. It follows that we can
stack the queues with fake packets (called place-holder packets) that never get transmitted, as described
3.3. GENERALIZATIONS 43
Average power versus V Average backlog versus V
1.6 250

1.5

Average backlog E[Q + Q ] (packets)

1.4 200
Bound (without placeholders)
1.3

2
Average power

Simulation (without placeholders)

1.2 150

1
1.1
Upper bound
1 100

Bound (with placeholders)

0.9

Simulation with and without placeholders Simulation (with placeholders)

0.8 50
(indistinguishable)
0.7
Optimal value p*
0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
V V

Figure 3.3: Average power versus V with Figure 3.4: Average backlog versus V with
(λ1 , λ2 ) = (0.3, 0.7). (λ1 , λ2 ) = (0.3, 0.7).

in more detail in Section 4.8 of the next chapter. This place-holder technique yields the same power
guarantee (3.33), but it has a significantly improved queue backlog bound given by:
1.45
(with place-holders) Q1 + Q2 ≤ + 2V − max[V − 1, 0] − max[V /2 − 2, 0]
0.12
Thus, the average queue bound under the place-holder technique grows like 0.5V , rather than 2V as
suggested in (3.34), a dramatic savings when V is large. Simulations of the place-holder technique
are also shown in Figs. 3.3 and 3.4. The queue backlog improvements due to placeholders are quite
significant (Fig. 3.4), with no noticeable difference in power expenditure (Fig. 3.3). Indeed, the sim-
ulated power expenditure curves for the cases with and without place-holders are indistinguishable
in Fig. 3.3. A plot of queue values over the first 3000 slots is given in Chapter 4, Fig. 4.2.

3.3 GENERALIZATIONS
The reader can easily see that the analysis in this chapter, which considers an example system of
2 queues, can be repeated for a larger system of K queues. Indeed, in that case the “min drift-

plus-penalty” algorithm generalizes to choosing α(t) to maximize K k=1 Qk (t)b̂k (α(t), S (t)) −
V p̂(α(t)). This holds for systems with more general channel states S (t), more general resource
allocation decisions α(t), and for arbitrary rate functions b̂k (α(t), S (t)) and “penalty functions”
p̂(α(t)). In particular:
• The vector S (t) might have an infinite number of possible outcomes (rather than just 6
outcomes).
• The decision α(t) might represent one of an infinite number of possible power allocation
options (rather than just one of three options). Alternatively, α(t) might represent one of an
44 3. DYNAMIC SCHEDULING EXAMPLE
infinite number of more sophisticated physical layer actions that can take place on slot t (such
as modulation, coding, beamforming, etc.).

• The rate function b̂k (α(t), S (t)) can be any function that maps a resource allocation decision
α(t) and a channel state vector S (t) into a transmission rate (and does not need to have the
structure (3.2)).

• The “penalty” function p̂(α(t)) does not have to represent power, and it can be any general
function of α(t).

The next chapter presents the general theory. It develops an important concept of virtual
queues to ensure general time average equality and inequality constraints are satisfied. It also considers
variable V algorithms that achieve the exact minimum average penalty subject to mean rate stability
(which typically incurs infinite average backlog). Finally, it shows how to analyze systems with
non-i.i.d. and non-ergodic arrival and channel processes.
45

CHAPTER 4

Optimizing Time Averages

This chapter considers the problem (1.1)-(1.5), which seeks to minimize the time average of a
network attribute subject to additional time average constraints. We first develop the main results
of Lyapunov drift and Lyapunov optimization theory.

4.1 LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION

Consider a system of N queues, and let (t) = (1 (t), . . . , N (t)) be the queue backlog vector.The
reason we use notation (t) to represent a queue vector, instead of Q(t), is that in later sections we

define (t)= [Q(t), Z (t), H (t)], where Q(t) is a vector of actual queues in the network and Z (t),
H (t) are suitably chosen virtual queues. Assume the (t) vector evolves over slots t ∈ {0, 1, 2, . . .}
according to some probability law. The components n (t) are real numbers and can possibly be
negative. Allowing n (t) to take negative values is often useful for the virtual queues that are
defined later.
As a scalar measure of the “size” of the vector (t), define a quadratic Lyapunov function
L((t)) as follows:
N
1
L((t))= wn n (t)2 (4.1)
2
n=1

where {wn }N
n=1are a collection of positive weights. We typically use wn = 1 for all n, as in (3.11)
of Chapter 3, although different weights are often useful to allow queues to be treated differently.
This function L((t)) is always non-negative, and it is equal to zero if and only if all components
of (t) are zero. Define the one-slot conditional Lyapunov drift ((t)) as follows:1

((t))= E {L((t + 1)) − L((t))|(t)} (4.2)

This drift is the expected change in the Lyapunov function over one slot, given that the current state
in slot t is (t).

4.1.1 LYAPUNOV DRIFT THEOREM

Theorem 4.1 (Lyapunov Drift) Consider the quadratic Lyapunov function (4.1), and assume
E {L((0))} < ∞. Suppose there are constants B > 0, ≥ 0 such that the following drift condition
1 Strictly speaking, better notation would be ((t), t), as the drift may be due to a non-stationary policy. However, we use the
simpler notation ((t)) as a formal representation of the right-hand-side of (4.2).
46 4. OPTIMIZING TIME AVERAGES
holds for all slots τ ∈ {0, 1, 2, . . .} and all possible (τ ):

N
((τ )) ≤ B − |n (τ )| (4.3)
n=1

Then:
a) If ≥ 0 then all queues n (t) are mean rate stable.
b) If > 0, then all queues are strongly stable and:

1
t−1 N
B
lim sup E {|n (τ )|} ≤ (4.4)
t→∞ t
τ =0 n=1

Proof. We first prove part (b). Taking expectations of (4.3) and using the law of iterated expectations
yields:
N
E {L((τ + 1))} − E {L((τ ))} ≤ B − E {|n (τ )|}
n=1
Summing the above over τ ∈ {0, 1, . . . , t − 1} for some slot t > 0 and using the law of telescoping
sums yields:

t−1 N
E {L((t))} − E {L((0))} ≤ Bt − E {|n (τ )|} (4.5)
τ =0 n=1
Now assume that > 0. Dividing by t, rearranging terms, and using the fact that E {L((t))} ≥ 0
yields:
1
t−1 N
B E {L((0))}
E {|n (τ )|} ≤ + (4.6)
t t
τ =0 n=1
The above holds for all slots t > 0. Taking a limit as t → ∞ proves part (b).
To prove part (a), we have from (4.5) that for all slots t > 0:

E {L((t))} − E {L((0))} ≤ Bt

Using the definition of L((t)) yields:

1
N
wn E n (t)2 ≤ E {L((0))} + Bt
2
n=1

Therefore, for all n ∈ {1, . . . , N}, we have:

2E {L((0))} 2Bt
E n (t)2 ≤ +
wn wn
4.1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 47

However, because the variance of |n (t)| cannot be negative, we have E n (t)2 ≥ E {|n (t)|}2 .
Thus, for all slots t > 0, we have:

2E {L((0))} 2Bt
E {|n (t)|} ≤ + (4.7)
wn wn
Dividing by t and taking a limit as t → ∞ proves that:

E {|n (t)|} 2E {L((0))} 2B
lim ≤ lim + =0
t→∞ t t→∞ t 2 wn twn
Thus, all queues n (t) are mean rate stable, proving part (a). 2
The above theorem shows that if the drift condition (4.3) holds with ≥ 0, so that ((t)) ≤
B, then all queues are mean rate stable. Further, if > 0, then all queues are strongly stable with time
average expected queue backlog bounded by B/. We note that the proof reveals further detailed
information concerning expected queue backlog for all slots t > 0, showing how the affect of the
initial condition (0) decays over time (see (4.6) and (4.7)).

4.1.2 LYAPUNOV OPTIMIZATION THEOREM

Suppose that, in addition to the queues (t) that we want to stabilize, we have an associated
stochastic “penalty” process y(t) whose time average we want to make less than (or close to) some
target value y ∗ . The process y(t) can represent penalties incurred by control actions on slot t, such
as power expenditures, packet drops, etc. Assume the expected penalty is lower bounded by a finite
(possibly negative) value ymin , so that for all t and all possible control actions, we have:
E {y(t)} ≥ ymin (4.8)

Theorem 4.2 (Lyapunov Optimization) Suppose L((t)) and ymin are defined by (4.1) and (4.8),
and that E {L((0))} < ∞. Suppose there are constants B ≥ 0, V ≥ 0, ≥ 0, and y ∗ such that for all
slots τ ∈ {0, 1, 2, . . .} and all possible values of (τ ), we have:

N
((τ )) + V E {y(τ )|(τ )} ≤ B + V y ∗ − |n (τ )| (4.9)
n=1

Then all queues n (t) are mean rate stable. Further, if V > 0 and > 0 then time average expected
penalty and queue backlog satisfy:

1
t−1
B
lim sup E {y(τ )} ≤ y ∗ + (4.10)
t→∞ t V
τ =0
1 B + V (y ∗ − ymin )
t−1 N
lim sup E {|n (τ )|} ≤ (4.11)
t→∞ t
τ =0 n=1
48 4. OPTIMIZING TIME AVERAGES
Finally, if V = 0 then (4.11) still holds, and if = 0 then (4.10) still holds.

Proof. Fix any slot τ . Because (4.9) holds for this slot, we can take expectations of both sides and
use the law of iterated expectations to yield:

N
E {L((τ + 1))} − E {L((τ ))} + V E {y(τ )} ≤ B + V y ∗ − E {|n (τ )|}
n=1

Summing over τ ∈ {0, 1, . . . , t − 1} for some t > 0 and using the law of telescoping sums yields:

t−1
t−1
N
E {L((t))} − E {L((0))} + V E {y(τ )} ≤ (B + V y ∗ )t − E {|n (τ )|} (4.12)
τ =0 τ =0 n=1

Rearranging terms and neglecting non-negative terms when appropriate, it is easy to show that the
above inequality directly implies the following two inequalities for all t > 0:

1
t−1
B E {L((0))}
E {y(τ )} ≤ y ∗ + + (4.13)
t V Vt
τ =0
1 B + V (y ∗ − ymin ) E {L((0))}
t−1 N
E {|n (τ )|} ≤ + (4.14)
t t
τ =0 n=1

where (4.13) follows by dividing (4.12) by V t, and (4.14) follows by dividing (4.12) by t. Taking
limits of the above as t → ∞ proves (4.10) and (4.11).
Rearranging (4.12) also yields:

E {L((t))} ≤ E {L((0))} + (B + V (y ∗ − ymin ))t

from which mean rate stability follows by an argument similar to that given in the proof of Theorem
4.1. 2

Theorem 4.2 can be understood as follows: If for any parameter V > 0, we can design a control
algorithm to ensure the drift condition (4.9) is satisfied on every slot τ , then the time average expected
penalty satisfies (4.10) and hence is either less than the target value y ∗ , or differs from y ∗ by no
more than a “fudge factor” B/V , which can be made arbitrarily small as V is increased. However, the
time average queue backlog bound increases linearly in the V parameter, as shown by (4.11). This
presents a performance-backlog tradeoff of [O(1/V ), O(V )]. Because Little’s Theorem tells us that
average queue backlog is proportional to average delay (129), we often call this a performance-delay
tradeoff. The proof reveals further details concerning the affect of the initial condition (0) on time
average expectations at any slot t (see (4.13) and (4.14)).
4.1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 49
This result suggests the following control strategy: Every slot τ , observe the current (τ )
values and take a control action that, subject to the known (τ ), greedily minimizes the drift-plus-
penalty expression on the left-hand-side of the desired drift inequality (4.9):

((τ )) + V E {y(τ )|(τ )} (4.15)

It follows that if on every slot τ , there exists a particular control action that satisfies the drift require-
ment (4.9), then the drift-plus-penalty minimizing policy must also satisfy this drift requirement.
For intuition, note that taking an action on slot τ to minimize the drift ((τ )) alone would
tend to push queues towards a lower congestion state, but it may incur a large penalty y(τ ). Thus,
we minimize a weighted sum of drift and penalty, where the penalty is scaled by an “importance”
weight V , representing how much we emphasize penalty minimization. Using V = 0 corresponds
to minimizing the drift ((τ )) alone, which reduces to the Tassiulas-Ephremides technique for
network stability in (7)(8). While this does not provide any guarantees on the resulting time average
penalty y(t) (as the bound (4.10) becomes infinity for V = 0), it still ensures strong stability by (4.11).
The case for V > 0 includes a weighted penalty term in the greedy minimization, and corresponds
to our technique for joint stability and performance optimization, developed for utility optimal flow
control in (17)(18) and used for average power optimization in (20)(21) and for problems similar to
the type (1.1)-(1.5) and (1.6)-(1.11) in (22).

4.1.3 PROBABILITY 1 CONVERGENCE

Here we present a version of the Lyapunov optimization theorem that treats probability 1 conver-
gence of sample path time averages, rather than time average expectations. We have the following
preliminary lemma, related to the Kolmogorov law of large numbers:

Lemma 4.3 Let X(t) be a random process defined over t ∈ {0, 1, 2, . . .}, and suppose that the following
hold:

• E X(t)2 is finite for all t ∈ {0, 1, 2, . . .} and satisfies:
∞
E X(t)2
<∞
t2
t=1

• There is a real-valued constant β such that for all t ∈ {1, 2, 3, . . .} and all possible X(0), . . . , X(t −
1), the conditional expectation satisfies:

E {X(t)|X(t − 1), X(t − 2), . . . , X(0)} ≤ β

Then:
1
t−1
lim sup X(τ ) ≤ β (w.p.1)
t→∞ t
τ =0
50 4. OPTIMIZING TIME AVERAGES
where “(w.p.1)” stands for “with probability 1.”

A proof of this lemma is given in (138) as a simple application of the Kolmogorov law of large
numbers for martingale differences. See (139)(140)(130)(141) for background on martingales and a
statement and proof of the Kolmogorov law of large numbers. The lemma is used in (138) to prove
the probability 1 version of the Lyapunov optimization theorem given below.
Let (t) be a vector of queues and y(t) a penalty process, as before. Rather than defining
a drift that conditions on (t), we must condition on the full history H(t), which includes values
of (τ ) for τ ∈ {0, . . . , t} and values of y(τ ) for τ ∈ {0, . . . , t − 1}. Specifically, for integers t ≥ 0
define:

H(t)= {(0), (1), . . . , (t), y(0), y(1), . . . , y(t − 1)}
Define (t, H(t)) by:

(t, H(t))= E {L((t + 1)) − L((t))|H(t)}

Assume that:

• The penalty process y(t) is deterministically lower bounded by a (possibly negative) constant
ymin , so that:
y(t) ≥ ymin ∀t (w.p.1) (4.16)

• The second moments E y(t)2 are finite for all t ∈ {0, 1, 2, . . .}, and:
∞
E y(t)2
<∞ (4.17)
t2
t=1

• There is a finite constant D > 0 such that for all n ∈ {1, . . . , N}, all t, and all possible H(t),
we have:
E (n (t + 1) − n (t))4 |H(t) ≤ D (4.18)

so that conditional fourth moments of queue changes are uniformly bounded.

Theorem 4.4 (Lyapunov Optimization with Probability 1 Convergence) Define L((t)) by (4.1),
assume that (0) is finite with probability 1, and suppose that assumptions (4.16)-(4.18) hold. Suppose
there are constants B ≥ 0, V > 0, > 0, and y ∗ such that for all slots τ ∈ {0, 1, 2, . . .} and all possible
H(τ ), we have:

N
(τ, H(τ )) + V E {y(τ )|H(τ )} ≤ B + V y ∗ − |n (τ )|
n=1
4.1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 51
Then all queues n (t) are rate stable, and:

1
t−1
B
lim sup y(τ ) ≤ y ∗ + (w.p.1) (4.19)
t→∞ t V
τ =0
1 B + V (y ∗ − ymin )
t−1 N
lim sup |n (τ )| ≤ (w.p.1) (4.20)
t→∞ t
τ =0 n=1

Further, if these same assumptions hold, and if there is a value y such that the following additional
inequality also holds for all τ and all possible (τ ):

(τ, H(τ )) + V E {y(τ )|H(τ )} ≤ B + V y

Then:
1
t−1
lim sup y(τ ) ≤ y + B/V (w.p.1) (4.21)
t→∞ t
τ =0

Proof. Fix (0) as a given finite initial condition. Define the process X(t) for t ∈ {0, 1, 2, . . .} as
follows:
N

X(t)= L((t + 1)) − L((t)) + V y(t) − B − V y ∗ + |n (t)|
n=1

The conditions
on y(t) and (t) are shown in (138) to ensure that the queues n (t) are rate stable,
that E X(t) is finite for all t, and that for all t > 0 and all possible values of X(t − 1), . . . , X(0):
2

∞
E X(t)2
< ∞ , E {X(t)|X(t − 1), X(t − 2), . . . , X(0)} ≤ 0
t2
t=1

Thus, we can apply Lemma 4.3 to X(t) to yield:

1
t−1
lim sup X(τ ) ≤ 0 (w.p.1) (4.22)
t→∞ t
τ =0

However, by definition of X(t), we have for all t > 0:

1 L((t)) − L((0)) 1
t−1 t−1 N
X(τ ) = + V y(τ ) + |n (τ )| − B − V y ∗
t t t
τ =0 τ =0 n=1
52 4. OPTIMIZING TIME AVERAGES
Rearranging terms in the above inequality and neglecting non-negative terms where appropriate
directly leads to the following two inequalities that hold for all t > 0 :

1 −L((0)) 1
t−1 t−1
X(τ ) ≥ + y(τ ) − [B/V + y ∗ ]
Vt Vt t
τ =0 τ =0
1 1 [B + V (y ∗ − ymin )]
t−1 t−1 N
−L((0))
X(τ ) ≥ + |n (τ )| −
t t t
τ =0 τ =0 n=1

Taking limits of the above two inequalities and using (4.22) proves the results (4.19)-(4.20). A similar
argument proves (4.21). 2
Conditioning on the history H(t) is needed to prove Theorem 4.4 via Lemma 4.3. A policy that
greedily minimizes (t, H(t)) + V E {y(t)|H(t)} every slot will also greedily minimize ((t)) +
V E {y(t)|(t)}. In this text, we focus primarily on time average expectations of the type (4.10) and
(4.11), with the understanding that the same bounds can be shown to hold for time averages (with
probability 1) if the additional assumptions (4.16)-(4.18) hold.

4.2 GENERAL SYSTEM MODEL

a1(t) Q1(t) b1(t) Attributes:

1) yl(t) for l in {1, ..., L}
a2(t) Q2(t) b2(t) 2) ej(t) for j in {1, ..., J}

Random State: (t)

Control Action: (t)
aK(t) QK(t) bK(t)

Figure 4.1: An illustration of a general K-queue network with attributes yl (t), ej (t).

Consider now a system with queue backlog vector Q(t) = (Q1 (t), . . . , QK (t)), as shown in
Fig. 4.1. Queue dynamics are given by:

Qk (t + 1) = max[Qk (t) − bk (t), 0] + ak (t) (4.23)

where a(t) = (a1 (t), . . . , aK (t)) and b(t) = (b1 (t), . . . , bK (t)) are general functions of a random
event ω(t) and a control action α(t):

ak (t) = âk (α(t), ω(t)) , bk (t) = b̂k (α(t), ω(t))

4.3. OPTIMALITY VIA ω-ONLY POLICIES 53
Every slot t the network controller observes ω(t) and chooses an action α(t) ∈ Aω(t) .The set Aω(t) is
the action space associated with event ω(t). In addition to affecting these arrival and service variables,
α(t) and ω(t) also determine the attribute vectors x(t), y (t), e(t) according to general functions
x̂m (α, ω), ŷl (α, ω), êj (α, ω), as described in Section 1.2.
We assume that ω(t) is a stationary process with a stationary probability distribution π(ω).
Assume that ω(t) takes values in some sample space . If is a finite or countably infinite set,
then for each ω ∈ , π(ω) represents a probability mass function associated with the stationary
distribution, and:
P r[ω(t) = ω] = π(ω) ∀t ∈ {0, 1, 2, . . .} (4.24)
If is uncountably infinite, then we assume ω(t) is a random vector, and that π(ω) represents a
probability density associated with the stationary distribution. The simplest model, which we mainly
consider in this text, is the case when ω(t) is i.i.d. over slots t with stationary probabilities π(ω).

4.2.1 BOUNDEDNESS ASSUMPTIONS

The arrival function âk (α, ω) is assumed to be non-negative for all ω ∈ and all α ∈ Aω . The
service function b̂k (·) and the attribute functions x̂m (·), ŷl (·), êj (·) can possibly take negative values.
All of these functions are general (possibly non-convex and discontinuous). However, we assume that
these functions, together with the stationary probabilities π(ω), satisfy the following boundedness
properties: For all t and all (possibly randomized) control decisions α(t) ∈ Aω(t) , we have:

E âk (α(t), ω(t))2 ≤ σ 2 ∀k ∈ {1, . . . , K} (4.25)

E b̂k (α(t), ω(t))2 ≤ σ 2 ∀k ∈ {1, . . . , K} (4.26)
E x̂m (α(t), ω(t))2 ≤ σ 2 ∀m ∈ {1, . . . , M} (4.27)
E ŷl (α(t), ω(t))2 ≤ σ 2 ∀l ∈ {1, . . . , L} (4.28)
E êj (α(t), ω(t))2 ≤ σ 2 ∀j ∈ {1, . . . , J } (4.29)

for some finite constant σ 2 > 0. Further, for all t and all actions α(t) ∈ Aω(t) , we require the
expectation of y0 (t) to be bounded by some finite constants y0,min , y0,max :

y0,min ≤ E ŷ0 (α(t), ω(t)) ≤ y0,max (4.30)

4.3 OPTIMALITY VIA ω-ONLY POLICIES

For each l ∈ {0, 1, . . . , L}, define y l (t) as the time average expectation of yl (t) over the first t slots
under a particular control strategy:

1
t−1

y l (t)= E {yl (τ )}
t
τ =0
54 4. OPTIMIZING TIME AVERAGES
where the expectation is over the randomness of the ω(τ ) values and the random control actions.
Define time average expectations a k (t), bk (t), ej (t) similarly. Define y l and ej as the limiting values
of y l (t) and ej (t), assuming temporarily that these limits are well defined. We desire a control policy
that solves the following problem:

Minimize: y0
Subject to: 1) y l ≤ 0 ∀l ∈ {1, . . . , L}
2) ej = 0 ∀j ∈ {1, . . . , J }
3) Queues Qk (t) are mean rate stable ∀k ∈ {1, . . . , K}
4) α(t) ∈ Aω(t) ∀t

The above description of the problem is convenient, although we can state the problem more precisely
without assuming limits are well defined as follows:

Minimize: lim sup y 0 (t) (4.31)

t→∞
Subject to: 1) lim sup y l (t) ≤ 0 ∀l ∈ {1, . . . , L} (4.32)
t→∞
2) lim ej (t) = 0 ∀j ∈ {1, . . . , J } (4.33)
t→∞
3) Queues Qk (t) are mean rate stable ∀k ∈ {1, . . . , K} (4.34)
4) α(t) ∈ Aω(t) ∀t (4.35)

An example of such a problem is when we have a K-queue wireless network that must be
stabilized subject to average power constraints P l ≤ Plav for each node l ∈ {1, . . . , L}, where P l
represents the time average power of node l, and Plav represents a pre-specified average power
constraint. Suppose the goal is to maximize the time average of the total admitted traffic. Then y0 (t)
is −1 times the admitted traffic on slot t. We also define yl (t) = Pl (t) − Plav , being the difference
between the average power expenditure of node l and its time average constraint, so that y l ≤ 0
corresponds to P l ≤ Plav . In this example, there are no time average equality constraints, and so
J = 0. See also Section 4.6 and Exercises 2.11, 4.7-4.14 for more examples.
Consider now the special class of stationary and randomized policies that we call ω-only
policies, which observe ω(t) for each slot t and independently choose a control action α(t) ∈ Aω(t)
as a pure (possibly randomized) function of the observed ω(t). Let α ∗ (t) represent the decisions
under such an ω-only policy over time t ∈ {0, 1, 2, . . .}. Because ω(t) has the stationary distribution
π(ω) for all t, the expectation of the arrival, service, and attribute values are the same for all t:

E ŷl (α ∗ (t), ω(t)) = y l ∀l ∈ {0, 1, . . . , L}
E êj (α ∗ (t), ω(t)) = ej ∀j ∈ {1, . . . , J }
E âk (α ∗ (t), ω(t)) = a k ∀k ∈ {1, . . . , K}
E b̂k (α ∗ (t), ω(t)) = bk ∀k ∈ {1, . . . , K}

for some quantities y l , ej , a k , bk . In the case when is finite or countably infinite, the expectations
above can be understood as weighted sums over all ω values, weighted by the stationary distribution
4.3. OPTIMALITY VIA ω-ONLY POLICIES 55
π(ω). Specifically:

E ŷl (α ∗ (t), ω(t)) = π(ω)E ŷl (α ∗ (t), ω)|ω(t) = ω
ω∈

The above expectations y l , ej , a k , bk are finite under any ω-only policy because of the bound-
edness assumptions (4.25)-(4.30). In addition to assuming ω(t) is a stationary process, we make
the following mild “law of large numbers” assumption concerning time averages (not time average
expectations): Under any ω-only policy α ∗ (t) that yields expectations y l , ej , a k , bk on every slot t, the
infinite horizon time averages of ŷl (α ∗ (t), ω(t)), êj (α ∗ (t), ω(t)), âk (α ∗ (t), ω(t)), b̂k (α ∗ (t), ω(t))
are equal to y l , ej , a k , bk with probability 1. For example:

1
t−1
lim ŷl (α ∗ (τ ), ω(τ )) = y l (w.p.1)
t→∞ t
τ =0

where “(w.p.1)” means “with probability 1.” This is a mild assumption that holds whenever ω(t) is
i.i.d. over slots. This is because, by the law of large numbers, the resulting ŷl (α ∗ (t), ω(t)) process
is i.i.d. over slots with finite mean y l . However, this also holds for a large class of other stationary
processes, including stationary processes defined over finite state irreducible Discrete Time Markov
Chains (as considered in Section 4.9). It does not hold, for example, for degenerate stationary
processes where ω(0) can take different values according to some probability distribution, but is
then held fixed for all slots thereafter so that ω(t) = ω(0) for all t.
Under these assumptions, we say that the problem (4.31)-(4.35) is feasible if there exists a
opt
control policy that satisfies the constraints (4.32)-(4.35). Assuming feasibility, define y0 as the in-
fimum value of the cost metric (4.31) over all control policies that satisfy the constraints (4.32)-(4.35).
opt
This infimum is finite by (4.30). We emphasize that y0 considers all possible control policies that
choose α(t) ∈ Aω(t) over slots t, not just ω-only policies. However, in Appendix 4.A, it is shown that
opt
y0 can be computed in terms of ω-only policies. Specifically, it is shown that the set of all possible
limiting time average expectations of the variables [(yl (t)), (ej (t)), (ak (t)), (bk (t))], considering all
possible algorithms, is equal to the closure of the set of all one-slot averages [(y l ), (ej ), (a k ), (bk )]
achievable under ω-only policies. Further, the next theorem shows that if the problem (4.31)-(4.35)
opt
is feasible, then the utility y0 and the constraints y l ≤ 0, ej ≤ 0, a k ≤ bk can be achieved arbitrarily
closely by ω-only policies.

Theorem 4.5 (Optimality over ω-only Policies) Suppose the ω(t) process is stationary with distribution
π(ω), and that the system satisfies the boundedness assumptions (4.25)-(4.30) and the law of large numbers
assumption specified above. If the problem (4.31)-(4.35) is feasible, then for any δ > 0 there is an ω-only
56 4. OPTIMIZING TIME AVERAGES
policy α ∗ (t) that satisfies α ∗ (t) ∈ Aω(t) for all t, and:

E ŷ0 (α ∗ (t), ω(t))
opt
≤ y0 + δ (4.36)
E ŷl (α ∗ (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L} (4.37)
|E êj (α ∗ (t), ω(t)) | ≤ δ ∀j ∈ {1, . . . , J } (4.38)

E âk (α ∗ (t), ω(t)) ≤ E b̂k (α ∗ (t), ω(t)) + δ ∀k ∈ {1, . . . , K} (4.39)

Proof. See Appendix 4.A. 2

The inequalities (4.36)-(4.39) are similar to those seen in Chapter 3, which related the ex-
istence of such randomized policies to the existence of linear programs that yield the desired time
averages. The stationarity of ω(t) simplifies the proof of Theorem 4.5 but is not crucial to its result.
Similar results are derived in (15)(21)(136) without the stationary assumption but under the addi-
tional assumption that ω(t) can take at most a finite (but arbitrarily large) number of values and has
well defined time averages.
We have stated Theorem 4.5 in terms of arbitrarily small values δ > 0. It may be of interest
to note that for most practical systems, there exists an ω-only policy that satisfies all inequalities
(4.36)-(4.39) with δ = 0. Appendix 4.A shows that this holds whenever the set , defined as the set
of all one-slot expectations achievable under ω-only policies, is closed. Thus, one may prefer a more
“aesthetically pleasing” version of Theorem 4.5 that assumes the additional mild closure property in
order to remove the appearance of “δ” in the theorem statement. We have presented the theorem in
the above form because it is sufficient for our purposes. In particular, we do not require the closure
property in order to apply the Lyapunov optimization techniques developed next.

4.4 VIRTUAL QUEUES

To solve the problem (4.31)-(4.35), we first transform all inequality and equality constraints into
queue stability problems. Specifically, define virtual queues Zl (t) and Hj (t) for each l ∈ {1, . . . , L}
and j ∈ {1, . . . , J }, with update equations:

Zl (t + 1) = max[Zl (t) + yl (t), 0] (4.40)

Hj (t + 1) = Hj (t) + ej (t) (4.41)

The virtual queue Zl (t) is used to enforce the y l ≤ 0 constraint. Indeed, recall that if Zl (t) satisfies
(4.40) then by our basic sample path properties in Chapter 2, we have for all t > 0:

1
t−1
Zl (t) Zl (0)
− ≥ yl (τ )
t t t
τ =0
4.4. VIRTUAL QUEUES 57
Taking expectations of the above and taking t → ∞ shows:

E {Zl (t)}
lim sup ≥ lim sup y l (t)
t→∞ t t→∞

where we recall that y l (t) is the time average expectation of yl (τ ) over τ ∈ {0, . . . , t − 1}. Thus, if
Zl (t) is mean rate stable, the left-hand-side of the above inequality is 0 and so:

lim sup y l (t) ≤ 0

t→∞

This means our desired time average constraint for yl (t) is satisfied. This turns the problem of
satisfying a time average inequality constraint into a pure queue stability problem! This discussion
is of course just a repeated derivation of Theorem 2.5 (as well as Exercise 2.11).
The virtual queue Hj (t) is designed to turn the time average equality constraint ej = 0 into a
pure queue stability problem.The Hj (t) queue has a different structure, and can possibly be negative,
because it enforces an equality constraint rather than an inequality constraint. It is easy to see by
summing (4.41) that for any t > 0:

t−1
Hj (t) − Hj (0) = ej (τ )
τ =0

Taking expectations and dividing by t yields:

E Hj (t) − E Hj (0)
= ej (t) (4.42)
t

Therefore, if Hj (t) is mean rate stable then:2

lim ej (t) = 0
t→∞

so that the desired equality constraint for ej (t) is satisfied.

It follows that if we can design a control algorithm that chooses α(t) ∈ Aω(t) for all t, makes
all actual queues Qk (t) and virtual queues Zl (t), Hj (t) mean rate stable, and yields a time average
opt
expectation of y0 (t) that is equal to our target y0 , then we have solved the problem (4.31)-(4.35).
This transforms the original problem into a problem of minimizing the time average of a cost function
subject to queue stability. We assume throughout that initial conditions satisfy Zl (0) ≥ 0for all
l ∈ {1, . . . , L}, Hj (0) ∈ R for all j ∈ {1, . . . , J }, and that E Zl (0) < ∞ and E Hj (0)2 < ∞
2

for all l and j .

2 Note by Jensen’s inequality that 0 ≤ |E {H (t)} | ≤ E {|H (t)|}, and so if E {|H (t)|} /t → 0, then E {H (t)} /t → 0.
58 4. OPTIMIZING TIME AVERAGES

4.5 THE MIN DRIFT-PLUS-PENALTY ALGORITHM

Let (t) = [Q(t), Z (t), H (t)] be a concatenated vector of all actual and virtual queues, with update
equations (4.23), (4.40), (4.41). Define the Lyapunov function:

1 1 1
K L J

L((t))= Qk (t)2 + Zl (t)2 + Hj (t)2 (4.43)
2 2 2
k=1 l=1 j =1

If there are no equality constraints, we have J = 0 and we remove the Hj (t) queues. If there are no
inequality constraints, then L = 0 and we remove the Zl (t) queues.

Lemma 4.6 Suppose ω(t) is i.i.d. over slots. Under any control algorithm, the drift-plus-penalty ex-
pression has the following upper bound for all t, all possible values of (t), and all parameters V ≥ 0:

K
((t)) + V E {y0 (t)|(t)} ≤ B + V E {y0 (t)|(t)} + Qk (t)E {ak (t) − bk (t) | (t)}
k=1

L
J

+ Zl (t)E {yl (t)|(t)} + Hj (t)E ej (t)|(t) (4.44)
l=1 j =1

where B is a positive constant that satisfies the following for all t:

1 1
K L
B ≥ E ak (t)2 + bk (t)2 | (t) + E yl (t)2 |(t)
2 2
k=1 l=1
1 J
K
+ E ej (t)2 |(t) − E b̃k (t)ak (t)|(t) (4.45)
2
j =1 k=1

where we recall that b̃k (t) = min[Qk (t), bk (t)]. Such a constant B exists because ω(t) is i.i.d. and the
boundedness assumptions in Section 4.2.1 hold.

Proof. Squaring the queue update equation (4.23) and using the fact that max[q − b, 0]2 ≤ (q − b)2
yields:

Qk (t + 1)2 ≤ (Qk (t) − bk (t))2 + ak (t)2 + 2 max[Qk (t) − bk (t), 0]ak (t)
= (Qk (t) − bk (t))2 + ak (t)2 + 2(Qk (t) − b̃k (t))ak (t) (4.46)

Therefore:
Qk (t + 1)2 − Qk (t)2 ak (t)2 + bk (t)2
≤ − b̃k (t)ak (t) + Qk (t)[ak (t) − bk (t)]
2 2
4.5. THE MIN DRIFT-PLUS-PENALTY ALGORITHM 59
Similarly,

Zl (t + 1)2 − Zl (t)2 yl (t)2

≤ + Zl (t)yl (t) (4.47)
2 2
Hj (t + 1) − Hj (t)
2 2 ej (t)2
= + Hj (t)ej (t)
2 2
Taking conditional expectations of the above three equations and summing over k ∈ {1, . . . , K},
l ∈ {1, . . . , L}, j ∈ {1, . . . , J } gives a bound on ((t)). Adding V E {y0 (t)|(t)} to both sides
proves the result. 2
Rather than directly minimize the expression ((t)) + V E {y0 (t)|(t)} every slot t, our
strategy actually seeks to minimize the bound given in the right-hand-side of (4.44). This is done via
the framework of opportunistically minimizing a (conditional) expectation as described in Section
1.8 (see also Exercise 4.5), and the resulting algorithm is given below.
Min Drift-Plus-Penalty Algorithm for solving (4.31)-(4.35): Every slot t, observe the current
queue states (t) and the random event ω(t), and make a control decision α(t) ∈ Aω(t) as follows:

Minimize: V ŷ0 (α(t), ω(t)) + K (α(t), ω(t)) − b̂k (α(t), ω(t))]
k=1 Qk (t)[âk
L
+ l=1 Zl (t)ŷl (α(t), ω(t)) + Jj=1 Hj (t)êj (α(t), ω(t)) (4.48)
Subject to: α(t) ∈ Aω(t) (4.49)

Then update the virtual queues Zl (t) and Hj (t) according to (4.40) and (4.41), and the actual queues
Qk (t) according to (4.23).
A remarkable property of this algorithm is that it does not need to know the probabilities π(ω).
After observing ω(t), it seeks to minimize a (possibly non-linear, non-convex, and discontinuous)
function of α over all α ∈ Aω(t) . Its complexity depends on the structure of the functions âk (·),
b̂k (·), ŷl (·), êj (·). However, in the case when the set Aω(t) contains a finite (and small) number of
possible control actions, the policy simply evaluates the function over each option and chooses the
best one.
Before presenting the analysis, we note that the problem (4.48)-(4.49) may not have a well
defined minimum when the set Aω(t) is infinite. However, rather than assuming our decisions obtain
the exact minimum every slot (or come close to the infimum), we analyze the performance when our
implementation comes within an additive constant of the infimum in the right-hand-side of (4.44).

Definition 4.7 For a given constant C ≥ 0, a C-additive approximation of the drift-plus-penalty

algorithm is one that, every slot t and given the current (t), chooses a (possibly randomized) action
α(t) ∈ Aω(t) that yields a conditional expected value on the right-hand-side of the drift expression
(4.44) (given (t)) that is within a constant C from the infimum over all possible control actions.

Definition 4.7 allows the deviation from the infimum to be in an expected sense, rather than a
deterministic sense, which is useful in some applications. These C-additive approximations are also
60 4. OPTIMIZING TIME AVERAGES
useful for implementations with out-of-date queue backlog information, as shown in Exercise 4.10,
and for achieving maximum throughput in interference networks via approximation algorithms, as
shown in Chapter 6.

Theorem 4.8 (Performance of Min Drift-Plus-Penalty Algorithm) Suppose that ω(t) is i.i.d. over slots
with probabilities π(ω), the problem (4.31)-(4.35) is feasible, and that E {L((0))} < ∞. Fix a value
C ≥ 0. If we use a C-additive approximation of the algorithm every slot t, then:
a) Time average expected cost satisfies:

1
t−1
opt B +C
lim sup E {y0 (τ )} ≤ y0 + (4.50)
t→∞ t V
τ =0
opt
where y0 is the infimum time average cost achievable by any policy that meets the required constraints,
and B is defined in (4.45).
b) All queues Qk (t), Zl (t), Hj (t) are mean rate stable, and all required constraints (4.32)-(4.35)
are satisfied.
c) Suppose there are constants > 0 and () for which the Slater condition of Assumption A1
holds, stated below in (4.61)-(4.64). Then:

1
t−1 K opt
B + C + V [() − y0 ]
lim sup E {Qk (τ )} ≤ (4.51)
t→∞ t
τ =0 k=1
opt
where [() − y0 ] ≤ y0,max − y0,min , and y0,min , y0,max are defined in (4.30).

We note that the bounds given in (4.50) and (4.51) are not just infinite horizon bounds:
Inequalities (4.58) and (4.59) in the below proof show that these bounds hold for all time t > 0 in
the case when all initial queue backlogs are zero, and that a “fudge factor” that decays like O(1/t)
must be included if initial queue backlogs are non-zero. The above theorem is for the case when
ω(t) is i.i.d. over slots. The same algorithm can be shown to offer similar performance under more
general ergodic ω(t) processes as well as for non-ergodic processes, as discussed in Section 4.9.
Proof. (Theorem 4.8) Because, every slot t, our implementation comes within an additive constant
C of minimizing the right-hand-side of the drift expression (4.44) over all α(t) ∈ Aω(t) , we have
for each slot t:

((t)) + V E {y0 (t)|(t)} ≤ B + C + V E y0∗ (t)|(t)
L
J
+ Zl (t)E yl∗ (t)|(t) + Hj (t)E ej∗ (t)|(t)
l=1 j =1

K

+ Qk (t)E ak∗ (t) − bk∗ (t) | (t) (4.52)
k=1
4.5. THE MIN DRIFT-PLUS-PENALTY ALGORITHM 61
where ak∗ (t), bk∗ (t), yl∗ (t), ej∗ (t) are the resulting arrival, service, and attribute values under

any alternative (possibly randomized) decision α ∗ (t) ∈ Aω(t) . Specifically, ak∗ (t)= âk (α ∗ (t), ω(t)),

bk∗ (t)= b̂k (α ∗ (t), ω(t)), yl∗ (t)= ŷl (α ∗ (t), ω(t)), ej∗ (t)= êj (α ∗ (t), ω(t)).
Now fix δ > 0, and consider the ω-only policy α ∗ (t) that yields (4.36)-(4.39). Because this
is an ω-only policy, and ω(t) is i.i.d. over slots, the resulting values of y0∗ (t), ak∗ (t), bk∗ (t), ej∗ (t) are
independent of the current queue backlogs (t), and we have from (4.36)-(4.39):

E y0∗ (t)|(t) = E y0∗ (t) ≤ y0 + δ
opt
(4.53)
E yl∗ (t)|(t) = E yl∗ (t) ≤ δ ∀l ∈ {1, . . . , L} (4.54)
|E ej∗ (t)|(t) | = |E ej∗ (t) | ≤ δ ∀j ∈ {1, . . . , J } (4.55)
∗
E ak (t) − bk∗ (t)|(t) = E ak∗ (t) − bk∗ (t) ≤ δ ∀k ∈ {1, . . . , K} (4.56)

Plugging these into the right-hand-side of (4.52) and taking δ → 0 yields:

opt
((t)) + V E {y0 (t)|(t)} ≤ B + C + V y0 (4.57)

This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2).
Hence, all queues are mean rate stable, and so all required time average constraints are satisfied,
which proves part (b). Further, from the above drift expression, we have for any t > 0 (from (4.13)
of Theorem 4.2, or simply from taking iterated expectations and telescoping sums):

1
t−1
opt B +C E {L((0))}
E {y0 (τ )} ≤ y0 + + (4.58)
t V Vt
τ =0

which proves part (a) by taking a lim sup as t → ∞.

To prove part (c), assume Assumption A1 holds (stated below). Plugging the ω-only policy
that yields (4.61)-(4.64) into the right-hand-side of the drift bound (4.52) yields:

K
((t)) + V E {y0 (t)|(t)} ≤ B + C + V () − Qk (t)
k=1

Taking iterated expectations, summing the telescoping series, and rearranging terms as usual yields:
t−1
1
t−1 K
B + C + V [() − τ =0 E {y0 (τ )}]
1
E {L((0))}
E {Qk (τ )} ≤ t
+ (4.59)
t t
τ =0 k=1

However, because our algorithm satisfies all of the desired constraints of the optimization problem
opt
(4.31)-(4.35), its limiting time average expectation for y0 (t) cannot be better than y0 :

1
t−1
opt
lim inf E {y0 (τ )} ≥ y0 (4.60)
t→∞ t
τ =0
62 4. OPTIMIZING TIME AVERAGES
Indeed, this fact is shown in Appendix 4.A (equation (4.96)). Taking a lim sup of (4.59) as t → ∞
and using (4.60) yields:

1
t−1 K opt
B + C + V [() − y0 ]
lim sup E {Qk (τ )} ≤
t→∞ t
τ =0 k=1
2
The following is the Assumption A1 needed in part (c) of Theorem 4.8.
Assumption A1 (Slater Condition): There are values > 0 and () (where y0min ≤ () ≤
y0max ) and an ω-only policy α ∗ (t) that satisfies:

E ŷ0 (α ∗ (t), ω(t)) = () (4.61)
E ŷl (α ∗ (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (4.62)
E êj (α ∗ (t), ω(t)) = 0 ∀j ∈ {1, . . . , J } (4.63)
∗
∗
E âk (α (t), ω(t)) ≤ E b̂k (α (t), ω(t)) − ∀k ∈ {1, . . . , K} (4.64)
Assumption A1 ensures strong stability of the Qk (t) queues. However, often the structure of
a particular problem allows stronger deterministic queue bounds, even without Assumption A1 (see
Exercise 4.9). A variation on the above proof that considers probability 1 convergence is treated in
Exercise 4.6.

4.5.1 WHERE ARE WE USING THE I.I.D. ASSUMPTIONS?

In (4.53)-(4.56) of the above proof, we used equalities of the form E yl∗ (t)|(t) = E yl∗ (t) ,
which hold for any ω-only policy α ∗ (t) when ω(t) is i.i.d. over slots. Because past values of ω(τ )
for τ < t have influenced the current queue states (t), this influence might skew the conditional
distribution of ω(t) (given (t)) unless ω(t) is independent of the past. However, while the i.i.d.
assumption is crucial for the above proof, it is not crucial for efficient performance of the algorithm,
as shown in Section 4.9.

4.6 EXAMPLES
Here we provide examples of using the drift-plus-penalty algorithm for the same systems considered
in Sections 2.3.1 and 2.3.2. More examples are given in Exercises 4.7-4.15.

4.6.1 DYNAMIC SERVER SCHEDULING

Example Problem: Consider the 3-queue, 2-server system described in Section 2.3.1 (see Fig. 2.1).

Define ω(t)= (a1 (t), a2(t), a3 (t)) as the random arrivals on slot t, and assume ω(t) is i.i.d. over slots
with E {ai (t)} = λi , E ai (t)2 = E ai2 for i ∈ {1, 2, 3}.
a) Suppose (λ1 , λ2 , λ3 ) ∈ , where we recall that is defined by the constraints 0 ≤ λi ≤ 1
for all i ∈ {1, 2, 3}, and λ1 + λ2 + λ3 ≤ 2. State the drift-plus-penalty algorithm (with V = 0 and
C = 0) for stabilizing all three queues.
4.6. EXAMPLES 63
b) Suppose the Slater condition (Assumption A1) holds for a value > 0. Using the drift-
plus-penalty algorithm with V = 0, C = 0, derive a value B such that time average queue backlog
satisfies Q1 + Q2 + Q3 ≤ B/, where Q1 + Q2 + Q3 is the lim sup time average expected backlog
in the system.
c) Suppose we must choose b(t) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} every slot t. Suppose that
choosing b(t) = (1, 1, 0) or b(t) = (1, 0, 1) consumes one unit of power per slot, but using the
vector b(t) = (0, 1, 1) uses two units of power per slot. State the drift-plus-penalty algorithm (with
V > 0 and C = 0) that seeks to minimize time average power subject to queue stability. Conclude
that p ≤ popt + B/V , where p is the lim sup time average expected power expenditure of the
algorithm, and p opt is the minimum possible time average power expenditure required for queue
stability. Assuming the Slater condition of part (b), conclude that Q1 + Q2 + Q3 ≤ (B + V )/.
Solution:
a) We have K = 3 with queues Q1 (t), Q2 (t), Q3 (t). There is no penalty to minimize, so
y0 (t) = 0 (and so we also choose V = 0).There are no additional yl (t) or ej (t) attributes, and so L =
J = 0.The control action α(t) determines the server allocations, so that α(t) = (b1 (t), b2 (t), b3 (t)),
and the set of possible action vectors is A = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (so that we choose
which two queues to serve on each slot). The control action does not affect the arrivals, and so
âk (α(t), ω(t)) = ak (t). The algorithm (4.48)-(4.49) with V = 0 reduces to observing the queue
backlogs every slot t and choosing (b1 (t), b2 (t), b3 (t)) as follows:

Minimize: − 3k=1 Qk (t)bk (t) (4.65)
Subject to: (b1 (t), b2 (t), b3 (t)) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (4.66)

Then update the queues Qk (t) according to (4.23). Note that the problem (4.65)-(4.66) is equivalent

to minimizing 3k=1 Qk (t)[ak (t) − bk (t)] subject to the same constraints, but to minimize this, it

suffices to minimize only the terms we can control (so we can remove the 3k=1 Qk (t)ak (t) term
that is the same regardless of our control decision). It is easy to see that the problem (4.65)-(4.66)
reduces to choosing the two largest queues to serve every slot, breaking ties arbitrarily. This simple
policy does not require any knowledge of (λ1 , λ2 , λ3 ), yet ensures all queues are mean rate stable
whenever possible!
b) From (4.45) and using the fact that L = J = 0 and b̃k (t)ak (t) ≥ 0, we want to find a value
B that satisfies:
1
3 3
1
B≥ E ak2 (t)|(t) + E bk (t)2 |(t)
2 2
k=1 k=1

Because ak (t) is i.i.d. over slots, it is independent of (t) and so E ak (t)2 |(t) = E ak2 . Further,
bk (t)2 = bk (t) (because bk (t) ∈ {0, 1}). Thus, it suffices to find a value B that satisfies:

1
3 3
1
B≥ E ak2 + E bk (t)|(t)
2 2
k=1 k=1
64 4. OPTIMIZING TIME AVERAGES
However, since b1 (t) + b2 (t) + b3 (t) ≤ 2 for all t (regardless of (t)), we can choose:

1
3
B= E ak2 + 1
2
k=1

Because Assumption A1 is satisfied and V = C = 0, we have from (4.51) that:

Q1 + Q2 + Q3 ≤ B/

c) We now define penalty y0 (t) = ŷ0 (b1 (t), b2 (t), b3 (t)), where:

1 if (b1 (t), b2 (t), b3 (t)) ∈ {(1, 1, 0) ∪ (1, 0, 1)}
ŷ0 (b1 (t), b2 (t), b3 (t)) =
2 if (b1 (t), b2 (t), b3 (t)) = (0, 1, 1)

Then the drift-plus-penalty algorithm (with V > 0) now observes (Q1 (t), Q2 (t), Q3 (t)) every slot
t and chooses a server allocation to solve:

Minimize: V ŷ0 (b1 (t), b2 (t), b3 (t)) − 2k=1 Qk (t)bk (t) (4.67)
Subject to: (b1 (t), b2 (t), b3 (t)) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (4.68)

This can be solved easily by comparing the value of (4.67) associated with each option:

• Option (1, 1, 0): value = V − Q1 (t) − Q2 (t).

• Option (1, 0, 1): value = V − Q1 (t) − Q3 (t).

• Option (0, 1, 1): value = 2V − Q2 (t) − Q3 (t).

Thus, every slot t we pick the option with the smallest of the above three values, breaking ties
arbitrarily. This is again a simple dynamic algorithm that does not require knowledge of the rates

(λ1 , λ2 , λ3 ). By (4.50), we know that the achieved time average power p (where p= y 0 ) satisfies
p ≤ p + B/V , where B is defined in part (b). Because y0,max = 2 and y0,min = 1, by (4.51),
opt

we know the resulting average backlog satisfies Q1 + Q2 + Q3 ≤ (B + (2 − 1)V )/, where is

defined in (b). This illustrates the [O(1/V ), O(V )] tradeoff between average power and average
backlog.
The above problem assumes we must allocate exactly two servers on every slot. The problem
can of course be modified if we allow the option of serving only 1 queue, or 0 queues, at some reduced
power expenditure.

4.6.2 OPPORTUNISTIC SCHEDULING

Example Problem: Consider the 2-queue wireless system with ON/OFF channels described in
Section 2.3.2 (see Fig. 2.2). Suppose channel vectors (S1 (t), S2 (t)) are i.i.d. over slots with
Si (t) ∈ {ON, OF F }, as before. However, suppose that new arrivals are not immediately sent into the
4.6. EXAMPLES 65
queue, but are only admitted via a flow control decision. Specifically, suppose that (A1 (t), A2 (t)) repre-
sents the random vector of new packet arrivals on slot t, where A1 (t) is i.i.d. over slots and Bernoulli
with P r[A1 (t) = 1] = λ1 , and A2 (t) is i.i.d. over slots and Bernoulli with P r[A2 (t) = 1] = λ2 .
Every slot a flow controller observes (A1 (t), A2 (t)) and makes an admission decision a1 (t), a2 (t),
subject to the constraints:

a1 (t) ∈ {0, A1 (t)}, a2 (t) ∈ {0, A2 (t)}

Packets that are not admitted are dropped. We thus have ω(t) = [(S1 (t), S2 (t)), (A1 (t), A2 (t))].
The control action is given by α(t) = [(α1 (t), α2 (t)); (β1 (t), β2 (t))] where αk (t) is a binary value
that is 1 if we choose to admit the packet (if any) arriving to queue k on slot t, and βk (t) is a binary
value that is 1 if we choose serve queue k on slot t, with the constraint β1 (t) + β2 (t) ≤ 1.
a) Use the drift-plus-penalty method (with V > 0 and C = 0) to stabilize the queues while
seeking to maximize the linear utility function of throughput w1 a 1 + w2 a 2 , where w1 and w2 are
given positive weights and a k represents the time average rate of data admitted to queue k.
b) Assuming the Slater condition of Assumption A1 holds for some value > 0, state the
resulting utility and average backlog performance.
c) Redo parts (a) and (b) with the additional constraint that a 1 ≥ 0.1 (assuming this constraint,
is feasible).
Solution:
a) We have K = 2 queues to stabilize. We have penalty function y0 (t) = −w1 a1 (t) − w2 a2 (t)
(so that minimizing the time average of this penalty maximizes w1 a 1 + w2 a 2 ). There are no
other attributes yl (t) or ej (t), so L = J = 0. The arrival and service variables are given by
ak (t) = âk (αk (t), Ak (t)) and bk (t) = b̂k (βk (t), Sk (t)) for k ∈ {1, 2}, where:

âk (αk (t), Ak (t)) = αk (t)Ak (t) , b̂k (βk (t), Sk (t)) = βk (t)1{Sk (t)=ON }

where 1{Sk (t)=ON } is an indicator function that is 1 if Sk (t) = ON , and 0 else. The drift-plus-penalty
algorithm of (4.48) thus reduces to observing the queue backlogs (Q1 (t), Q2 (t)) and the current
network state ω(t) = [(S1 (t), S2 (t)), (A1 (t), A2 (t))] and making flow control and transmission
actions αk (t) and βk (t) to solve:

2
Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ]
k=1
Subj. to: αk (t) ∈ {0, 1} ∀k ∈ {1, 2} , βk (t) ∈ {0, 1} ∀k ∈ {1, 2}, β1 (t) + β2 (t) ≤ 1

The flow control and transmission decisions appear in separate terms in the above problem,
and so they can be chosen to minimize their respective terms separately.This reduces to the following
simple algorithm:
• (Flow Control) For each k ∈ {1, 2}, choose αk (t) = 1 (so that we admit Ak (t) to queue k)
whenever V wk ≥ Qk (t), and choose αk (t) = 0 else.
66 4. OPTIMIZING TIME AVERAGES
• (Transmission) Choose (β1 (t), β2 (t)) subject to the constraints to maximize
Q1 (t)β1 (t)1{S1 (t)=ON } + Q2 (t)β2 (t)1{S2 (t)=ON } . This reduces to the “Longest Con-
nected Queue” algorithm of (8). Specifically, we place the server to the queue that is ON and
that has the largest value of queue backlog, breaking ties arbitrarily.

b) We compute B from (4.45). Because L = J = 0, we choose B to satisfy:

1 1
2 2
B≥ E ak (t)2 |(t) + E bk (t)2 |(t)
2 2
k=1 k=1

Because
arrivals
are i.i.d. Bernoulli, they are independent of queue backlog and so E ak (t)2 |(t) =
E ak (t)2 = E {ak (t)} = λk . Further, bk (t)2 = bk (t), and b1 (t) + b2 (t) ≤ 1. Thus we can choose:
B = (λ1 + λ2 + 1)/2. It follows from (4.50) that:

w1 a 1 + w2 a 2 ≥ utility opt − B/V

where utility opt is the maximum possible utility value subject to stability. Further, because y0,min =
−(w1 + w2 ) and y0,max = 0, we have from (4.51):

Q1 + Q2 ≤ (B + V (w1 + w2 ))/

c) The constraint a 1 ≥ 0.1 is equivalent to 0.1 − a 1 ≤ 0. To enforce this constraint, we simply

introduce a virtual queue Z1 (t) as follows:

Z1 (t + 1) = max[Z1 (t) + 0.1 − a1 (t), 0] (4.69)

This can be viewed as introducing an additional penalty y1 (t) = 0.1 − a1 (t). The drift-plus-penalty
algorithm (4.48) reduces to observing the queue backlogs and network state ω(t) every slot t and
making actions to solve

Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + 2k=1 Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ]
+Z1 (t)[0.1 − α1 (t)A1 (t)]
Subj. to: αk (t) ∈ {0, 1} ∀k ∈ {1, 2} , βk (t) ∈ {0, 1} ∀k ∈ {1, 2}, β1 (t) + β2 (t) ≤ 1

Then update virtual queue Z1 (t) according to (4.69) at the end of the slot, and update the queues
Qk (t) according to (4.23). This reduces to:

• (Flow Control) Choose α1 (t) = 1 whenever V w1 + Z1 (t) ≥ Q1 (t), and choose α1 (t) = 0
else. Choose α2 (t) = 1 whenever V w2 ≥ Q2 (t), and choose α2 (t) = 0 else.

• (Transmission) Choose (β1 (t), β2 (t)) the same as in part (a).

4.7. VARIABLE V ALGORITHMS 67

4.7 VARIABLE V ALGORITHMS

The [O(1/V ), O(V )] performance-delay tradeoff suggests that if we use a variable parameter V (t)
that gradually increases with time, then we can maintain mean rate stability while driving the time
opt
average penalty to its exact optimum value y0 .This is shown below, and is analogous to diminishing
stepsize methods for static convex optimization problems (133)(134).

Theorem 4.9 Suppose that ω(t) is i.i.d. over slots with probabilities π(ω), the problem (4.31)-(4.35)
is feasible, and E {L((0))} < ∞. Suppose that every slot t, we implement a C-additive approximation
that comes within C ≥ 0 of the infimum of a modified right-hand-side of (4.44), where the V parameter
is replaced with V (t), defined:

V (t)= V0 (t + 1)β ∀t ∈ {0, 1, 2, . . .} (4.70)

for some constants V0 > 0 and β such that 0 < β < 1. Then all queues are mean rate stable, all required
constraints (4.32)-(4.35) are satisfied, and:

1
t−1
opt
lim E {y0 (τ )} = y0
t→∞ t
τ =0

The manner in which the V0 and β parameters affect convergence is described in the proof, specifically in
(4.72) and (4.73).
opt
While this variable V approach yields the exact optimum y0 , its disadvantage is that we
achieve only mean rate stability and not strong stability, so that there is no finite bound on average
queue size and average delay. In fact, it is known that for typical problems (except for those with
a trivial structure), average backlog and delay necessarily grow to infinity as we push performance
closer and closer to optimal, becoming infinity at the optimal point (50)(51)(52)(53). The very large
queue sizes incurred by this variable V algorithm also make it more difficult to adapt to changes in
system parameters, whereas fixed V algorithms can easily adapt.
Proof. (Theorem 4.9) Repeating the proof of Theorem 4.8 by replacing V with V (t) for a given slot
t, the equation (4.57) becomes:
opt
((t)) + V (t)E {y0 (t)|(t)} ≤ B + C + V (t)y0

Taking expectations of both sides of the above and using iterated expectations yields:
opt
E {L((t + 1))} − E {L((t))} + V (t)E {y0 (t)} ≤ B + C + V (t)y0 (4.71)

Noting that E {y0 (t)} ≥ y0,min yields:

opt
E {L((t + 1))} − E {L((t))} ≤ B + C + V (t)(y0 − y0,min )
68 4. OPTIMIZING TIME AVERAGES
The above holds for all t ≥ 0. Summing over τ ∈ {0, . . . , t − 1} yields:

opt

t−1
E {L((t))} − E {L((0))} ≤ (B + C)t + (y0 − y0,min ) V (τ )
τ =0
Using the definition of the Lyapunov function in (4.43) yields the following for all t > 0:

K
L
J
E Qk (t)2 + E Zl (t)2 + E Hj (t)2 ≤
k=1 l=1 j =1

opt

t−1
2(B + C)t + 2E {L((0))} + 2(y0 − y0,min ) V (τ )
τ =0

Take any queue Qk (t). Because E {Qk (t)}2 ≤ E Qk (t)2 , we have for all queues Qk (t):

t−1
E {Qk (t)} ≤ 2(B + C)t + 2E {L((0))} + 2(y0 − y0,min )
opt
V (τ )
τ =0

and the same bound holds for E {Zl (t)} and E |Hj (t)| for all l ∈ {1, . . . , L}, j ∈ {1, . . . , J }.
Dividing both sides of the above inequality by t yields the following for all t > 0:

E {Qk (t)} 1
t−1
2(B + C) 2E {L((0))}
≤
opt
+ + 2(y 0 − y 0,min ) V (τ ) (4.72)
t t t2 t2
τ =0

and the same bound holds for all E {Zl (t)} /t and E |Hj (t)| /t. However, we have:

1 V0
t−1 t−1
V0 t V0 (1 + t)1+β − 1
0≤ 2 V (τ ) = 2 (1 + τ ) ≤ 2
β
(1 + v) dv = 2
β
t t t 0 t 1+β
τ =0 τ =0

Because 0 < β < 1, taking a limit of the above as t → ∞ shows that t12 t−1 τ =0 V (τ ) → 0. Using
this and taking a limit of (4.72) shows that all queues are mean rate stable, and hence (by Section
4.4)) all required constraints (4.32)-(4.35) are satisfied.
opt
To prove that the time average expectation of y0 (t) converges to y0 , consider again the
inequality (4.71), which holds for all t. Dividing both sides of (4.71) by V (t) yields:
E {L((t + 1))} − E {L((t))} B +C opt
+ E {y0 (t)} ≤ + y0
V (t) V (t)
Summing the above over τ ∈ {0, 1, . . . , t − 1} and collecting terms yields:

E {L((t))} E {L((0))}
t−1 t−1
1 1
− + E {L((τ ))} − + E {y0 (τ )} ≤
V (t − 1) V (0) V (τ − 1) V (τ )
τ =1 τ =0

opt

t−1
1
ty0 + (B + C)
V (τ )
τ =0
4.8. PLACE-HOLDER BACKLOG 69
Because V (t) is non-decreasing, we have for all τ ≥ 1:

1 1
− ≥0
V (τ − 1) V (τ )
Using this in the above inequality and dividing by t yields:

1 1 1
t−1 t−1
opt E {L((0))}
E {y0 (τ )} ≤ y0 + (B + C) + (4.73)
t t V (τ ) V (0)t
τ =0 τ =0

However:

1 1
t−1
1 1 t−1 1 1 1 t 1−β − 1
0≤ ≤ + dv = +
t V (τ ) tV (0) V0 t 0 (1 + v) β tV (0) V0 t 1−β
τ =0

Taking a limit as t → ∞ shows that this term vanishes, and so the lim sup of the left-hand-side in
opt
(4.73) is less than or equal to y0 . However, the policy satisfies all constraints (4.32)-(4.35) and so
opt
the lim inf must be greater than or equal to y0 (by the Appendix 4.A result (4.96)), so the limit
opt
exists and is equal to y0 . 2

4.8 PLACE-HOLDER BACKLOG

Here we present a simple delay improvement for the fixed-V drift-plus-penalty algorithm. The
queue backlogs under this algorithm can be viewed as a stochastic version of a Lagrange multiplier
for classical static convex optimization problems (see (45)(37) for more intuition on this), and they
need to be large to appropriately inform the stochastic optimizer about good decisions to take.
However, for many such problems, we can trick the stochastic optimizer by making it think actual
queue backlog is larger than it really is. This allows the same performance with reduced queue
backlog. To develop the technique, we make the following three preliminary observations:
• The infinite horizon time average expected penalty and backlog bounds of Theorem 4.8 are
insensitive to the initial condition (0).
• All sample paths of backlog and penalty are the same under any service order for the Qk (t)
queues, provided that queueing dynamics satisfy (4.23). In particular, the results are the same
if service is First-In-First-Out (FIFO) or Last-In-First-Out (LIFO).
• It is often the case that, under the drift-plus-penalty algorithm (or a particular C-additive
approximation of it), some queues are never served until they have at least a certain minimum
amount of backlog.
The third observation motivates the following definition.
place
Definition 4.10 (Place-Holder Values) A non-negative value Qk is a place-holder value for
network queue Qk (t) with respect to a given algorithm if for all possible sample paths, we have
70 4. OPTIMIZING TIME AVERAGES
place place place
Qk (t) ≥ Qk for all slots t ≥ 0 whenever Qk (0) ≥ Qk . Likewise, a non-negative value Zl
place
is a place-holder value for queue Zl (t) if for all possible sample paths, we have Zl (t) ≥ Zl for
place
all t ≥ 0 whenever Zl (0) ≥ Zl .

Clearly 0 is a place-holder value for all queues Qk (t) and Zl (t), but the idea is to compute
the largest possible place-holder values. It is often easy to pre-compute positive place-holder values
without knowing anything about the system probabilities. This is done in the Chapter 3 example
for minimizing average power expenditure subject to stability (see Section 3.2.4), and Exercises 4.8
and 4.11 provide further examples. Suppose now we run the algorithm with initial queue backlog
place
Qk (0) = Qk for all k ∈ {1, . . . , K}. Then we achieve exactly the same backlog and penalty
place
sample paths under either FIFO or LIFO. However, none of the initial backlog Qk would ever
exit the system under LIFO! Thus, we can achieve the same performance by replacing this initial
place
backlog Qk with fake backlog, called place-holder backlog (142)(143). Whenever a transmission
opportunity arises, we transmit only actual data whenever possible, serving the actual data in any
place
order we like (such as FIFO or LIFO). Because queue backlog never dips below Qk , we never
have to serve any fake data. Thus, the actual queue backlog under this implementation is equal to
place
Qactual
k (t) = Qk (t) − Qk for all t, which reduces the actual backlog by an amount exactly equal
place
to Qk . This does not affect the sample path and hence does not affect the time average penalty.
Specifically, for all k ∈ {1, . . . , K} and l ∈ {1, . . . , L}, we initialize the actual backlog
place place
Qactual
k (0) = Zlactual (0) = 0, but we use place-holder backlogs Qk , Zl so that:
place place
Qk (0) = Qk , Zl (0) = Zl ∀k ∈ {1, . . . , K}, l ∈ {1, . . . , L}

We then operate the algorithm using the Qk (t) and Zl (t) values (not the actual values Qactual
k (t)
and Zlactual (t)). The above discussion ensures that for all time t, we have:
place place
Qactual
k (t) = Qk (t) − Qk , Zlactual (t) = Zl (t) − Zl ∀t ≥ 0

Because the bounds in Theorem 4.8 are independent of the initial condition, the same penalty and
place place
backlog bounds are achieved. However, the actual backlog is reduced by exactly Qk and Zl
at every instant of time. This is a “free” reduction in the queue backlog, with no impact on the
limiting time average penalty. This has already been illustrated in the example minimum average
power problem of the previous chapter (Section 3.2.4, Figs. 3.3-3.4). The Fig. 4.2 below provides
further insight: Fig. 4.2 shows a sample path of Q2 (t) for the same example system of Section 3.2.4
place
(using V = 100 and (λ1 , λ2 ) = (0.3, 0.7)). We use Q2 = min[V − 2, 0] = 48 as the initial
backlog, and the figure illustrates that Q2 (t) indeed never drops below 48. The place-holder savings
is illustrated in the figure.
We developed this method of place-holder bits in (143) for use in dynamic data compression
problems and in (142) for general constrained cost minimization problems (including multi-hop
wireless networks with unreliable channels). The reader is referred to the examples and simulations
4.8. PLACE-HOLDER BACKLOG 71
Backlog Q2(t) versus time
120

100

Backlog Q (t) (packets)

60
2

40 place
Placeholder value Q2

savings
20

0
0 500 1000 1500 2000 2500 3000
t

Figure 4.2: A sample path of Q2 (t) over 3000 slots for the example system of Section 3.2.4.

given in (143)(142). A more aggressive place-holder technique is developed in (37). The idea of
(37) can be illustrated easily from Fig. 4.2: While the figure illustrates that Q2 (t) never drops be-
place
low Q2 , the backlog actually increases until it reaches a “plateau” around 100 packets, and then
oscillates with some noise about this value. Intuitively, we can almost double the place-holder value
in the figure, raising the horizontal line up to a level that is close to the minimum backlog value
seen in the plateau. While we cannot guarantee that backlog will never drop below this new line,
the idea is to show that such events occur rarely. Work in (45) shows that scaled queue backlog con-
verges to a Lagrange multiplier of a related static optimization problem, and work in (37) shows that
actual queue backlog oscillates very closely about this Lagrange multiplier. Specifically, it is shown
in (37) that, under mild assumptions, the steady state backlog distribution decays exponentially in
distance from the Lagrange multiplier value. It then develops an algorithm that uses a place-holder
that is a distance of O(log2 (V )) from the Lagrange multiplier, showing that deviations by more
than this amount are rare and can be handled separately by dropping a small amount of pack-
ets. The result fundamentally changes the performance-backlog tradeoff from [O(1/V ), O(V )] to
[O(1/V ), O(log2 (V ))] (within a logarithmic factor of the optimal tradeoff shown in (52)(51)(53)).
A disadvantage of this aggressive approach is that Lagrange multipliers must be known in
advance, which is difficult as they may depend on system statistics and they may be different for each
queue in the system.This is handled elegantly in a Last-In-First-Out (LIFO) implementation of the
drift-plus-penalty method, developed in (54). That LIFO can improve delay can be understood by
Fig. 4.2: First, a LIFO implementation would achieve all of the savings of the original place-holder
place
value of Q2 = 48 (at the cost of never serving the first 48 packets). Next, a LIFO implementation
72 4. OPTIMIZING TIME AVERAGES
would intuitively lead to delays of “most” packets that are on the order of the magnitude of noise
variations in the plateau area. That is, LIFO can achieve the more aggressive place-holder gains
without computing the Lagrange multipliers! This is formally proven in (55). Experiments with the
LIFO drift-plus-penalty method on an actual multi-hop wireless network deployment in (54) show
a dramatic improvement in delay (by more than an order of magnitude) for all but 2% of the packets.

4.9 NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING

Here we show that the same drift-plus-penalty algorithm provides similar [O(1/V ), O(V )] per-
formance guarantees when ω(t) varies according to a more general ergodic (possibly non-i.i.d.)
process. We then show it also provides efficient performance for arbitrary (possibly non-ergodic)
sample paths. The main proof techniques are the same as those we have already developed, with the
exception that we use a multi-slot drift analysis rather than a 1-slot drift analysis.
We consider the same system as in Section 4.2.1, with K queues with dynamics (4.23), and
attributes yl (t) = ŷl (α(t), ω(t)) for l ∈ {1, . . . , L}. For simplicity, we eliminate the attributes ej (t)
associated with equality constraints (so that J = 0). We seek an algorithm for choosing α(t) ∈ Aω(t)
every slot to minimize y 0 subject to mean rate stability of all queues Qk (t) and subject to y l ≤ 0
for all l ∈ {1, . . . , L}. The virtual queues Zl (t) for l ∈ {1, . . . , L} are the same as before, defined in
(4.40). For simplicity of exposition, we assume:

• The exact drift-plus-penalty algorithm of (4.48)-(4.49) is used, rather than a C-additive ap-
proximation (so that C = 0).

• The functions âk (·), b̂k (·), ŷl (·) are deterministically bounded, so that:

0 ≤ âk (α(t), ω(t)) ≤ akmax ∀k ∈ {1, . . . , K}, ∀ω(t), α(t) ∈ Aω(t) (4.74)
0 ≤ b̂k (α(t), ω(t)) ≤ bkmax ∀k ∈ {1, . . . , K}, ∀ω(t), α(t) ∈ Aω(t) (4.75)
yl ≤ ŷl (α(t), ω(t)) ≤ ylmax ∀l ∈ {0, 1, . . . , L}, ∀ω(t), α(t) ∈ Aω(t)
min
(4.76)

Define (t)= [Q(t), Z (t)], and define the Lyapunov function L((t)) as follows:

1 1
K L

L((t))= Qk (t)2 + Zl (t)2 (4.77)
2 2
k=1 l=1
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 73
We have the following preliminary lemma.

Lemma 4.11 (T -slot Drift) Assume (4.74)-(4.76) hold. For any slot t, any queue backlogs (t), and
any integer T > 0, the drift-plus-penalty algorithm ensures that:

−1
t+T −1
t+T
L((t + T )) − L((t)) + V ŷ0 (α(τ ), ω(τ )) ≤ DT + V 2
ŷ0 (α ∗ (τ ), ω(τ ))
τ =t τ =t

L −1
t+T
+ Zl (t) [ŷl (α ∗ (τ ), ω(τ ))]
l=1 τ =t

K −1
t+T
+ Qk (t) [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1 τ =t

where L((t)) is defined in (4.77), α ∗ (τ ) for τ ∈ {t, . . . , t + T − 1} is any sequence of alternative

decisions that satisfy α ∗ (τ ) ∈ Aω(τ ) , and the constant D is defined:

1 max 2 1
K L

D= [(ak ) + (bkmax )2 ] + max[(ylmin )2 , (ylmax )2 ] (4.78)
2 2
k=1 l=1

Proof. From (4.46)-(4.47), we have for any slot τ :

K
L((τ + 1)) − L((τ )) ≤ D + Qk (τ )[âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))]
k=1

L
+ Zl (τ )ŷl (α(τ ), ω(τ ))
l=1

where D is defined in (4.78). We then add V ŷ0 (α(τ ), ω(τ )) to both sides. Because the drift-plus-
penalty algorithm is designed to choose α(τ ) to deterministically minimize the right-hand-side of
the resulting inequality when this term is added, it follows that:

L((τ + 1)) − L((τ )) + V ŷ0 (α(τ ), ω(τ )) ≤ D + V ŷ0 (α ∗ (τ ), ω(τ ))

K
+ Qk (τ )[âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1

L
+ Zl (τ )ŷl (α ∗ (τ ), ω(τ ))
l=1
74 4. OPTIMIZING TIME AVERAGES
where α ∗ (τ ) is any other decision that satisfies α ∗ (τ ) ∈ Aω(τ ) . However, we now note that for all
τ ∈ {t, . . . , t + T − 1}:

|Qk (τ ) − Qk (t)| ≤ (τ − t) max[akmax , bkmax ]

Plugging these in, it can be shown that:

L((τ + 1)) − L((τ )) + V ŷ0 (α(τ ), ω(τ )) ≤ D + 2D × (τ − t) + V ŷ0 (α ∗ (τ ), ω(τ ))

L
+ Zl (t)ŷl (α ∗ (τ ), ω(τ ))
l=1

K
+ Qk (t)[âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1
t+T −1
Summing the above over τ ∈ {t, . . . , t + T − 1} and using the fact that τ =t (τ − t) = (T −
1)T /2 yields the result. 2

4.9.1 MARKOV MODULATED PROCESSES

Here we present a method developed in (144) for proving that the [O(1/V ), O(V )] behavior of
the drift-plus-penalty algorithm is preserved in ergodic (but non-i.i.d.) contexts. Let (t) be an
irreducible (possibly not aperiodic) Discrete Time Markov Chain (DTMC) with a finite state space
S .3 Let πi represent the stationary distribution over states i ∈ S . Such a distribution always exists
(and is unique) for irreducible finite state Markov chains. It is well known that all πi probabilities
are positive, and the time average fraction of time being in state i is πi with probability 1. Further,
1/πi represents the (finite) mean recurrence time to state i, which is the average number of slots
required to get back to state i, given that we start in state i. Finally, it is known that second moments
of recurrence time are also finite (see (132)(130) for more details on DTMCs).
The random network event process ω(t) is modulated by the DTMC (t) as follows: When-
ever (t) = i, the value of ω(t) is chosen independently with some distribution pi (ω). Then the
stationary distribution of ω(t) is given by:

P r[ω(t) = ω] = πi pi (ω)
i∈S

Assume the state space S has a state “0” that we designate as a “renewal” state. Assume for simplicity
that (0) = 0, and let the sequence {T0 , T1 , T2 , . . .} represent the recurrence times
2to
state 0. Clearly
{Tr }∞
r=0 is an i.i.d. sequence with E {Tr } = 1/π 0 for all r. Define E {T } and E T as the first and
second moments of these recurrence times (so that E {T } = 1/π0 ). Define t0 = 0, and for integers
3This subsection (Subsection 4.9.1) assumes familiarity with DTMC theory and can be skipped without loss of continuity.
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 75
r
r > 0 define tr as the time of the rth revisitation to state 0, so that tr = j =1 Tj . We now define
the variable slot drift ((tr )) as follows:

((tr ))= E {L((tr+1 )) − L((tr ))|(tr )}

This drift represents the expected change in the Lyapunov function from renewal time tr to re-
newal time tr+1 , where the expectation is over the random duration of the renewal period and the
random events on each slot of this period. By plugging t = tr and T = Tr into Lemma 4.11 and
taking conditional expectations given (tr ), we have the following variable-slot drift-plus-penalty
expression:

tr +T

r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE Tr2 |(tr )
τ =tr
tr +T

r −1
∗
+V E ŷ0 (α (τ ), ω(τ ))|(tr )
τ =tr
tr +T

L r −1
+ Zl (tr )E ŷl (α ∗ (τ ), ω(τ ))|(tr )
τ =tr
tr +T
l=1

K r −1
+ Qk (tr )E [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]|(tr )
k=1 τ =tr

where α ∗ (τ ) are decisions from any other policy. First note that E Tr2 |(tr ) = E T 2 because the
renewal duration is independent of the queue state (tr ). Next, note that the conditional expectations
in the next three terms on the right-hand-side of the above inequality can be changed into pure
expectations (given that tr is a renewal time) under the assumption that the policy α ∗ (τ ) is ω-only.
Thus:

tr +T

r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 (4.79)
τ =tr
tr +T

r −1
+V E ŷ0 (α ∗ (τ ), ω(τ ))
τ =tr
tr +T

L r −1
∗
+ Zl (tr )E ŷl (α (τ ), ω(τ ))
τ =tr
tr +T
l=1

K r −1
∗ ∗
+ Qk (tr )E [âk (α (τ ), ω(τ )) − b̂k (α (τ ), ω(τ ))]
k=1 τ =tr
76 4. OPTIMIZING TIME AVERAGES
The expectations in the final terms are expected rewards over a renewal period, and so by basic
renewal theory (130)(66), we have for all l ∈ {0, 1, . . . , L} and all k ∈ {1, . . . , K}:
tr +T

r −1
E ŷl (α(τ ), ω(τ )) = E {T } yl∗ (4.80)
τ =tr
tr +T

r −1
E [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))] = E {T } (ak∗ − bk∗ ) (4.81)
τ =tr

where yl∗ , ak∗ , bk∗ are the infinite horizon time average values achieved for the ŷl (α ∗ (t), ω(t)),
âk (α ∗ (t), ω(t)), and b̂k (α ∗ (t), ω(t)) processes under the ω-only policy α ∗ (t). This basic renewal
theory fact can easily be understood as follows (with the below equalities holding with probability
1):4
t −1
1 R
yl∗ = lim ŷl (α ∗ (τ ), ω(τ ))
R→∞ tR
τ =0
R−1 tr +Tr −1
r=0 τ =tr ŷl (α ∗ (τ ), ω(τ ))
= lim R−1
R→∞
r=0 Tr
1 R−1 tr +Tr −1
limR→∞ R r=0 τ =tr ŷl (α ∗ (τ ), ω(τ ))
= R−1
limR→∞ R1 r=0 Tr
T0 −1 ∗
E τ =0 ŷl (α (τ ), ω(τ ))
=
E {T }
where the final equality holds by the strong law of large numbers (noting that both the numerator
and denominator are just a time average of i.i.d. quantities). In particular, the numerator is a sum of
i.i.d. quantities because the policy α ∗ (t) is ω-only, and so the sum penalty over each renewal period
is independent but identically distributed. Plugging (4.80)-(4.81) into (4.79) yields:
tr +T

r −1
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 + V E {T } y0∗
τ =tr

L
K
+ Zl (t)E {T } yl∗ + Qk (t)E {T } (ak∗ − bk∗ )
l=1 k=1

The above holds for any time averages {yl∗ , ak∗ , bk∗ } that can be achieved by ω-only policies. However,
by Theorem 4.5, we know that if the problem is feasible, then either there is a single ω-only policy that
achieves time averages y0∗ = y0 , yl∗ ≤ 0 for all l ∈ {1, . . . , L}, (ak∗ − bk∗ ) ≤ 0 for all k ∈ {1, . . . , K},
opt

4 Because the processes are deterministically bounded and have time averages that converge with probability 1, the Lebesgue
Dominated Convergence Theorem (145) ensures the time average expectations are the same as the pure time averages (see
Exercise 7.9).
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 77
or there is an infinite sequence of ω-only policies that approach these averages. Plugging this into
the above yields:
tr +T

r −1
opt
((tr )) + V E ŷ0 (α(τ ), ω(τ ))|(tr ) ≤ DE T 2 + V E {T } y0
τ =tr

Taking expectations of the above, summing the resulting telescoping series over r ∈ {0, . . . , R − 1},
and dividing by V RE {T } yields:

R −1
t
E {L((tR ))} − E {L((0))} 1 opt DE T 2
+ E ŷ0 (α(τ ), ω(τ )) ≤ y0 +
V E {T } R E {T } R V E {T }
τ =0

Because tR /R → E {T } with probability 1 (by the law of large numbers), it can be shown that the
middle term has a lim sup that is equal to the lim sup time average expected penalty. Thus, assuming
E {L((0))} < ∞, we have:

1
t−1
opt DE T 2 opt
y 0 = lim sup E ŷ0 (α(τ ), ω(τ )) ≤ y0 + = y0 + O(1/V ) (4.82)
t→∞ t V E {T }
τ =0

where we note that the constants D, E {T }, and E T 2 do not depend on V . Similarly, it can
be shown that if the problem is feasible then all queues are mean rate stable, and if the slackness
condition of Assumption A1 holds, then sum average queue backlog is O(V ) (144). This leads to
the following theorem.

Theorem 4.12 (Markov Modulated Processes (144)) Assume the ω(t) process is modulated by the
DTMC (t) as described above, the boundedness assumptions (4.74)-(4.76) hold, E {L((0))} < ∞,
and that the drift-plus-penalty algorithm is used every slot t. If the problem is feasible, then:
opt
(a) The penalty satisfies (4.82), so that y 0 ≤ y0 + O(1/V ).
(b) All queues are mean rate stable, and so y l ≤ 0 for all l ∈ {1, . . . , L}.
(c) If the Slackness Assumption A1 holds, then all queues Qk (t) are strongly stable with average
backlog O(V ).

4.9.2 NON-ERGODIC MODELS AND ARBITRARY SAMPLE PATHS

Now assume that the ω(t) process follows an arbitrary sample path, possibly one with non-ergodic
behavior. However, continue to assume that the deterministic bounds (4.74)-(4.76) hold, so that
Lemma 4.11 applies. We present a technique developed in (41)(40) for stock market trading and
modified in (39)(38) for use in wireless networks with arbitrary traffic, channels and mobility. Because
ω(t) follows an arbitrary sample path, usual “equilibrium” notions of optimality are not relevant,
and so we use a different metric for evaluation of the drift-plus-penalty algorithm, called the T -
slot lookahead metric. Specifically, let T and R be positive integers, and consider the first RT slots
78 4. OPTIMIZING TIME AVERAGES
{0, 1, . . . , RT − 1} being divided into R frames of size T . For the rth frame (for r ∈ {0, . . . , R − 1}),
we define cr∗ as the optimal cost associated with the following static optimization problem, called the
T -slot lookahead problem. This problem has variables α(τ ) ∈ {rT , . . . , (r + 1)T − 1}, and treats
the ω(τ ) values in this interval as known quantities:

1 −1
(r+1)T

Minimize: cr = ŷ0 (α(τ ), ω(τ )) (4.83)
T
τ =rT
−1
(r+1)T
Subject to: 1) ŷl (α(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L}
τ =rT
−1
(r+1)T
2) [âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))] ≤ 0 ∀k ∈ {1, . . . , K}
τ =rT
3) α(τ ) ∈ Aω(τ ) ∀τ ∈ {rT , . . . , (r + 1)T − 1}
The value cr∗ thus represents the optimal empirical average penalty for frame r over all policies
that have full knowledge of the future ω(τ ) values over the frame and that satisfy the constraints.5
We assume throughout that the constraints are feasible for the above problem. Feasibility is often
guaranteed when there is an “idle” action, such as the action of admitting and transmitting no data,
which can be used on all slots to trivially satisfy the constraints in the form 0 ≤ 0.
Frame r consists of slots τ ∈ {rT , . . . , (r + 1)T − 1}. Let α ∗ (τ ) represent the decisions that
solve the T -slot lookahead problem (4.83) over this frame to achieve cost cr∗ .6 It is generally im-
possible to solve for the α ∗ (τ ) decisions, as these would require knowledge of the ω(τ ) values up to
T -slots into the future. However, the α ∗ (τ ) values exist, and can still be plugged into Lemma 4.11
to yield the following (using t = rT and T as the frame size):
+T −1
rT
L((rT + T )) − L((rT )) + V ŷ0 (α(τ ), ω(τ ))
τ =rT
+T −1
rT
L +T −1
rT
≤ DT 2 + V ŷ0 (α ∗ (τ ), ω(τ )) + Zl (rT ) [ŷl (α ∗ (τ ), ω(τ ))]
τ =rT l=1 τ =rT

K +T −1
rT
+ Qk (rT ) [âk (α ∗ (τ ), ω(τ )) − b̂k (α ∗ (τ ), ω(τ ))]
k=1 τ =rT
≤ DT 2 + V T cr∗
where the final inequality follows by noting that the α ∗ (τ ) policy satisfies the constraints of the
T -slot lookahead problem (4.83) and yields cost cr∗ .
5Theorem 4.13 holds exactly as stated in the extended case when c∗ is re-defined by a T -slot lookahead problem that al-
r
lows actions [(ỹl∗ (τ )), (ãk∗ (τ )), (b̃k∗ (τ ))] every slot τ to be taken within the convex hull of the set of all possible values of
[(ŷl (α, ω(τ ))), (âk (α, ω(τ ))), (b̂k (α, ω(τ )))] under α ∈ Aω(τ ) , but we skip this extension for simplicity of exposition.
6 For simplicity, we assume the infimum cost is achievable. Else, we can derive the same result by taking a limit over policies that
approach the infimum.
4.9. NON-I.I.D. MODELS AND UNIVERSAL SCHEDULING 79
Summing the above over r ∈ {0, . . . , R − 1} (for any integer R > 0) yields:

RT −1
R−1
L((RT )) − L((0)) + V ŷ0 (α(τ ), ω(τ )) ≤ DT 2 R + V T cr∗ (4.84)
τ =0 r=0

Dividing by V T R, using the fact that L((RT )) ≥ 0, and rearranging terms yields:
RT −1
1 1 ∗ DT
R−1
L((0))
ŷ0 (α(τ ), ω(τ )) ≤ cr + + (4.85)
RT R V VTR
τ =0 r=0

where we recall that α(τ ) represents the decisions under the drift-plus-penalty algorithm. The
inequality (4.85) holds for all integers R > 0. When R is large, the final term on the right-hand-
side above goes to zero (this term is exactly zero if L((0)) = 0). Thus, we have that the time
average cost is within O(1/V ) of the time average of the cr∗ values. The above discussion proves part
(a) of the following theorem:

Theorem 4.13 (Universal Scheduling) Assume the ω(t) sample path satisfies the boundedness assump-
tions (4.74)-(4.76), and that initial queue backlog is finite. Fix any integers R > 0 and T > 0, and
assume the T -slot lookahead problem (4.83) is feasible for every frame r ∈ {0, 1, . . . , R − 1}. If the
drift-plus-penalty algorithm is implemented every slot t, then:
(a) The time average cost over the first RT slots satisfies (4.85). In particular,7

1 1 ∗
t−1 R−1
lim sup ŷ0 (α(τ ), ω(τ )) ≤ lim sup cr + DT /V
t→∞ t R→∞ R
τ =0 r=0

where cr∗ is the optimal cost in the T -slot lookahead problem (4.83) for frame r, and D is defined in (4.78).
(b) All actual and virtual queues are rate stable, and so we have:

1
t−1
lim sup ŷl (α(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L}
t→∞ t
τ =0

(c) Suppose there exists an > 0 and a sequence of decisions α̃(τ ) ∈ Aω(τ ) that satisfies the following
slackness assumptions for all frames r:
+T −1
rT
ŷl (α̃(τ ), ω(τ )) ≤ 0 ∀l ∈ {1, . . . , L} (4.86)
τ =rT
+T −1
rT
1
[âk (α̃(τ ), ω(τ )) − b̂k (α̃(τ ), ω(τ ))] ≤ − ∀k ∈ {1, . . . , K} (4.87)
T
τ =rT
7 It is clear that the lim sup over times sampled every T slots is the same as the regular lim sup because the ŷ (·) values are bounded.
0
t/T T t/T T
Indeed, we have τ =0 ŷ0 (α(τ ), ω(τ )) + T y0min ≤ tτ =0 ŷ0 (α(τ ), ω(τ )) ≤ τ =0 ŷ0 (α(τ ), ω(τ )) + T y0max . Dividing
both sides by t and taking limits shows these limits are equal.
80 4. OPTIMIZING TIME AVERAGES
Then:
1 V (y0max − y0min ) T − 1
t−1 K K
DT
lim sup Qk (τ ) ≤ + + max[akmax , bkmax ]
t→∞ t 2
τ =0 k=1 k=1

Proof. Part (a) has already been shown in the above discussion. We provide a summary of parts (b)
and (c): The inequality (4.84) plus the boundedness assumptions (4.74)-(4.76) imply that there is
a finite constant F > 0 such that L((RT )) ≤ F R for all R. By an argument similar to part (a)
of Theorem 4.1, it can then be shown that limR→∞ Qk (RT )/(RT ) = 0 for all k ∈ {1, . . . , K} and
limR→∞ Zl (RT )/(RT ) = 0 for all l ∈ {1, . . . , L}. Further, these limits that sample only on slots
RT (as R → ∞) are clearly the same when taken over all t → ∞ because the queues can change
by at most a constant proportional to T in between the sample times. This proves part (b).
Part (c) follows by plugging the policy α̃(τ ) for τ ∈ {rT , . . . , (r + 1)T − 1} into Lemma 4.11
and using (4.86)-(4.87) to yield:
+T −1
rT
K
L((rT + T )) − L((rT )) + V ŷ0 (α(τ ), ω(τ )) ≤ DT 2
+ V T y0max −T Qk (rT )
τ =rT k=1
and hence:

K
L((rT + T )) − L((rT )) ≤ DT 2 + V T (y0max − y0min ) − T Qk (rT )
k=1
−1
K T
≤ DT 2 + V T (y0max − y0min ) − Qk (rT + j )
k=1 j =0
−1
K T
+ j max[akmax , bkmax ]
k=1 j =0
−1
K T
= DT 2 + V T (y0max − y0min ) − Qk (rT + j )
k=1 j =0

(T − 1)T
K
+ max[akmax , bkmax ]
2
k=1

Summing the above over r ∈ {0, . . . , R − 1} yields:

RT −1
K
L((RT )) − L((0)) + Qk (τ ) ≤ RDT 2 + RV T (y0max − y0min )
τ =0 k=1
R(T − 1)T
K
+ max[akmax , bkmax ]
2
k=1
4.10. EXERCISES 81
Using L((RT )) ≥ 0, dividing by RT and taking a lim sup as R → ∞ yields:
RT −1 K
1 V (y0max − y0min ) T − 1
K
DT
lim sup Qk (τ ) ≤ + + max[akmax , bkmax ]
R→∞ RT 2
τ =0 k=1 k=1

2
Inequality (4.85) holds for all R and T , and hence it can be viewed as a family of bounds that
apply to the same sample path under the drift-plus-penalty algorithm. Note also that increasing the
value of T changes the frame size and typically improves the cr∗ values (as it allows these values to be
achieved with a larger future lookahead). However, this affects the error term DT /V , requiring V
to also be increased as T increases. Increasing V creates a larger queue backlog. We thus see a similar
[O(1/V ), O(V )] cost-backlog tradeoff for this sample path context. If the slackness assumptions
(4.86)-(4.87) are modified to also include slackness in the yl (·) constraints, a modified argument
can be used to show the worst case queue backlog is bounded for all time by a constant that is O(V )
(see also (146)(39)(38)).
R−1 ∗
The target value R1 r=0 cr that we use for comparison does not represent the optimal cost
that can be achieved over the full horizon RT if the entire future were known. However, when T is
large it still represents a meaningful target that is not trivial to achieve, as it is one that is defined in
terms of an ideal policy with T -slot lookahead. It is remarkable that the drift-plus-penalty algorithm
can closely track such an “ideal” T -slot lookahead algorithm.

4.10 EXERCISES

Exercise 4.1. Let Q = (Q1 , . . . , QK ) and L(Q) = 21 K 2
k=1 Qk .
√
a) If L(Q) ≤ 25, show that Qk ≤ 50 for all k ∈ {1, . . . , K}.
√
b) If L(Q) > 25, show that Qk > 50/K for at least one queue k ∈ {1, . . . , K}.
c) Let K = 2. Plot the region of all non-negative vectors (Q1 , Q2 ) such that L(Q) = 2. Also
plot for L(Q) = 2.5. Give an example where L(Q1 (t), Q2 (t)) = 2.5, L(Q1 (t + 1), Q2 (t + 1)) =
2, but where Q1 (t) < Q1 (t + 1).

Exercise 4.2. For any constants Q ≥ 0, b ≥ 0, a ≥ 0, show that:

(max[Q − b, 0] + a)2 ≤ Q2 + b2 + a 2 + 2Q(a − b)

Exercise 4.3. Let Q(t) be a discrete time vector process with Q(0) = 0, and let f (t) and g(t)
be discrete time real valued processes. Suppose there is a non-negative function L(Q(t)) such that
82 4. OPTIMIZING TIME AVERAGES
L(0) = 0, and such that its conditional drift (Q(t)) satisfies the following every slot τ and for all
possible Q(τ ):
(Q(τ )) + E {f (τ )|Q(τ )} ≤ E {g(τ )|Q(τ )}
a) Use the law of iterated expectations to prove that:

E {L(Q(τ + 1))} − E {L(Q(τ ))} + E {f (τ )} ≤ E {g(τ )}

b) Use telescoping sums together with part (a) to prove that for any t > 0:
1 t−1 1 t−1
t τ =0 E {f (τ )} ≤ t τ =0 E {g(τ )}

Exercise 4.4. (Opportunistically Minimizing an Expectation) Consider the game described in

Section 1.8. Suppose that ω is a Gaussian random variable with mean m and variance σ 2 . Define
c(α, ω) = ω2 + ω(3 − 2α) + α 2 .
a) Compute the optimal choice of α (as a function of the observed ω) to minimize E {c(α, ω)}.
Compute E {c(α, ω)} under your optimal policy.
b) Suppose that ω is exponentially distributed with mean 1/λ. Does the optimal policy change?
Does E {c(α, ω)} change?
c) Let ω = (ω1 , . . . , ω ), α = (α1 , . . . , αK ), = (1 , . . . , K ) be non-negative vectors.
K K
Define c(α, ω, ) = k=1 V αk − k log(1 + αk ωk ) , where log(·) denotes the natural logarithm
and V ≥ 0. We choose α subject to 0 ≤ αk ≤ 1 for all k, and αk αj = 0 for k = j . Design a policy
that observes ω and chooses α to minimize E {c(α, ω, )|}. Hint: First compute the solution
assuming that αk > 0.

Exercise 4.5. (The Drift-Plus-Penalty Method) Explain, using the game of opportunistically min-
imizing an expectation described in Section 1.8, how choosing α(t) ∈ Aω(t) according to (4.48)-
(4.49) minimizes the right-hand-side of (4.44).

Exercise 4.6. (Probability 1 Convergence) Consider the fixed-V drift-plus-penalty algorithm

(4.48)-(4.49), but assume the following modified Slater condition holds:
Assumption A2: There is an > 0 such that for any J -dimensional vector h = (h1 , . . . , hJ )
that consists only of values 1 and −1, there is an ω-only policy α ∗ (t) (which depends on h) that
satisfies:

E ŷ0 (α ∗ (t), ω(t)) ≤ y0,max (4.88)
∗
E ŷl (α (t), ω(t)) ≤ − ∀l ∈ {1, . . . , L} (4.89)
E êj (α ∗ (t), ω(t)) = hj ∀j ∈ {1, . . . , J } (4.90)

E âk (α ∗ (t), ω(t)) ≤ E b̂k (α ∗ (t), ω(t)) − ∀k ∈ {1, . . . , K} (4.91)
4.10. EXERCISES 83
Using H(t) and (t, H(t)) as defined in Section 4.1.3, it can be shown that for all t and all possible
H(t), we have (compare with (4.52)):

(t, H(t)) + V E {y0 (t)|H(t)} ≤ B + C + V E y0∗ (t)|H(t)
L
J
+ Zl (t)E yl∗ (t)|H(t) + Hj (t)E ej∗ (t)|H(t)
l=1 j =1
K

+ Qk (t)E ak∗ (t) − bk∗ (t) | H(t) (4.92)
k=1

where yl∗ (t), ej∗ (t), ak∗ (t), bk∗ (t) represent decisions under any other (possibly randomized) action
α ∗ (t) that can be made on slot t (so that yl∗ (t) = ŷl (α ∗ (t), ω(t)), etc.).
a) Define h = (h1 , . . . , hJ ) by:

−1 if Hj (t) ≥ 0
hj =
1 if Hj (t) < 0
Using this h, plug the ω-only policy α ∗ (t) from (4.88)-(4.91) into the right-hand-side of (4.92) to
obtain:

(t, H(t)) + V E {y0 (t)|H(t)} ≤ B +⎡C + V y0,max ⎤

L K
J
− ⎣ Zl (t) + Qk (t) + |Hj (t)|⎦
l=1 k=1 j =1

b) Assume that (4.16)-(4.17) hold for y0 (t), and that the fourth moment assumption (4.18)
holds. Use this with part (a) to obtain probability 1 bounds on the lim sup time average queue backlog
via Theorem 4.4.
c) Now consider the ω-only policy that yields (4.53)-(4.56), and plug this into the right-hand-
side of (4.92) to yield a probability 1 bound on the lim sup time average of y0 (t), again by Theorem
4.4.

Exercise 4.7. (Min Average Power (21)) Consider a wireless downlink with arriving data a(t) =
(a1 (t), . . . , aK (t)) every slot t. The data is stored in separate queues Q(t) = (Q1 (t), . . . , QK (t))
for transmission over K different channels. The update equation is (4.23). Service variables bk (t) are
determined by a power allocation vector P (t) = (P1 (t), . . . , PK (t)) according to bk (t) = log(1 +
Sk (t)Pk (t)), where log(·) denotes the natural logarithm, and S (t) = (S1 (t), . . . , SK (t)) is a vector
of channel attenuations. Assume that S (t) is known at the beginning of each slot t, and satisfies
0 ≤ Sk (t) ≤ 1 for all k. Power is allocated subject to P (t) ∈ A, where A is the set of all power
vectors with at most one non-zero element and such that 0 ≤ Pk ≤ Pmax for all k ∈ {1, . . . , K},
where Pmax is a peak power constraint. Assume that the vectors a(t) and S (t) are i.i.d. over slots,
and that 0 ≤ ak (t) ≤ akmax for all t, for some finite constants akmax .
84 4. OPTIMIZING TIME AVERAGES

a) Using ω(t)= (a(t), S (t)), α(t) = P (t), J = 0, L = 0, y0 (t) = K
k=1 Pk (t), state the drift-
plus-penalty algorithm for a fixed V in this context.
b) Assume we use an exact implementation of the algorithm in part (a) (so that C = 0), and
that the problem is feasible. Use Theorem 4.8 to conclude that all queues are mean rate stable, and
compute a value B such that:

1
t−1 K
opt
lim sup E {Pk (τ )} ≤ Pav + B/V
t→∞ t
τ =0 k=1

opt
where Pav is the minimum average power over any stabilizing algorithm.
c) Assume Assumption A1 holds for a given > 0. Use Theorem 4.8c to give a bound on the
time average sum of queue backlog in all queues.

Exercise 4.8. (Place-Holder Backlog)

a) Show that for any values V , p, s, q such that V > 0, p ≥ 0, q ≥ 0, 0 ≤ s ≤ 1, if q < V ,
we have Vp − q log(1 + sp) > 0 whenever p > 0 (where log(·) denotes the natural logarithm).
Conclude that the algorithm from Exercise 4.7 chooses Pk (t) = 0 whenever Qk (t) < V .
b) Use part (a) to conclude that Qk (t) ≥ max[V − log(1 + Pmax ), 0] for all t greater than or
equal to the time t ∗ for which this inequality first holds. By how much can place-holder bits reduce
average backlog from the bound given in part (c) of Exercise 4.7? This exercise computes a simple
place
place-holder Qk that is not the largest possible. A more detailed analysis in (143) computes a
larger place-holder value.

Exercise 4.9. (Maximum Throughput Subject to Peak and Average Power Constraints (21)) Con-
sider the same system of Exercise 4.7, with the exception that it is now a wireless uplink, and queue
backlogs now satisfy:
Qk (t + 1) = max[Qk (t) − bk (t), 0] + xk (t)
where xk (t) is a flow control decision for slot t, made subject to the constraint 0 ≤ xk (t) ≤ ak (t) for all
t. The control action is now a joint flow control and power allocation decision α(t) = [x(t), P (t)].
We want the average power expenditure over each link k to be less than or equal to Pkav , where Pkav
is a fixed constant for each k ∈ {1, . . . , K} (satisfying Pkav ≤ Pmax ). The new goal is to maximize

a weighted sum of admission rates K k=1 θk x k subject to queue stability and to all average power
constraints, where {θ1 , . . . , θK } are a given set of positive weights.

a) Using J = 0, L = K, y0 (t) = − K k=1 θk xk (t), and a fixed V , state the drift-plus-penalty
algorithm for this problem. Note that the constraints P k ≤ Pkav should be enforced by virtual queues
Zk (t) of the form (4.40) with a suitable definition of yk (t).
4.10. EXERCISES 85
b) Use Theorem 4.8 to conclude that all queues are mean rate stable (and hence all average
power constraints are met), and compute a value B such that:

1
t−1 K
lim inf θk E {xk (τ )} ≥ util opt − B/V
t→∞ t
τ =0 k=1

where util opt is the optimal weighted sum of admitted rates into the network under any algorithm
that stabilizes the queues and satisfies all average power constraints.
c) Show that the algorithm is such that xk (t) = 0 whenever Qk (t) > V θk . Assume that all
queues are initially empty, and compute values Qmax k such that Qk (t) ≤ Qmax k for all t ≥ 0 and
all k ∈ {1, . . . , K}. This shows that queues are deterministically bounded, even without the Slater
condition of Assumption A1.
d) Show that the algorithm is such that Pk (t) = 0 whenever Zk (t) > Qk (t). Conclude that
max
Zk (t) ≤ Zkmax , where Zkmax is defined Zkmax = Qk + (Pmax − Pkav ).
e) Use part (d) and the sample path input-output inequality (2.3) to conclude that for any
positive integer T , the total power expended by each link k over any T -slot interval is deterministically
less than or equal to T Pkav + Zkmax . That is:

t0 +T
−1
Pk (τ ) ≤ T Pkav + Zkmax ∀t0 ∈ {0, 1, 2, . . .}, ∀T ∈ {1, 2, 3, . . .}
τ =t0

f ) Suppose link k is a wireless transmitter with a battery that has initial energy Ek . Use part
(e) to provide a guarantee on the lifetime of the link.

Exercise 4.10. (Out-of-Date Queue Backlog Information) Consider the K-queue problem with
L = J = 0, and 0 ≤ ak (t) ≤ amax and 0 ≤ bk (t) ≤ bmax for all k and all t, for some finite constants
amax and bmax . The network controller attempts to perform the drift-plus-penalty algorithm (4.48)-
(4.49) every slot. However, it does not have access to the current queue backlogs Qk (t), and only
receives delayed information Qk (t − T ) for some integer T ≥ 0. It thus uses Qk (t − T ) in place
of Qk (t) in (4.48). Let α ideal (t) be the optimal decision of (4.48)-(4.49) in the ideal case when
current queue backlogs Qk (t) are used, and let α approx (t) be the implemented decision that uses the
out-of-date queue backlogs Qk (t − T ). Show that α approx (t) yields a C-additive approximation for
some finite constant C. Specifically, compute a value C such that:

K
V ŷ0 (α approx
(t), ω(t)) + Qk (t)[âk (α approx (t), ω(t)) − b̂k (α approx (t), ω(t))] ≤
k=1

K
V ŷ0 (α ideal (t), ω(t)) + Qk (t)[âk (α ideal (t), ω(t)) − b̂k (α ideal (t), ω(t))] + C
k=1
86 4. OPTIMIZING TIME AVERAGES
This shows that we can still optimize the system and provide stability with out-of-date queue backlog
information. Treatment of delayed queue information for Lyapunov drift arguments was perhaps
first used in (147), where random delays without a deterministic bound are also considered.

t 0 1 2 3 4 5 6 7 8
Arrivals a1 (t) 3 0 3 0 0 1 0 1 0
a2 (t) 2 0 1 0 1 1 0 0 0
Channels S1 (t) G G M M G G M M G
S2 (t) M M B M B M B G B

Max Qi bi Q1 (t) 0 3 0 3 1 0 1 1 2
Policy Q2 (t) 0 2 2 2 2 3 2 1 0

Figure 4.3: Arrivals, channel conditions, and queue backlogs for a two queue wireless downlink.

Exercise 4.11. (Simulation) Consider a 2-queue system with time varying channels (S1 (t), S2 (t)),
where Si (t) ∈ {G, M, B}, representing “Good,” “Medium,” “Bad” channel conditions for i ∈ {1, 2}.
Only one channel can be served per slot. All packets have fixed length, and 3 packets can be served
when a channel is “Good,” 2 when “Medium,” and 1 when “Bad.” Exactly one unit of power is
expended when we serve any channel (regardless of its condition). A sample path example is given in
Fig. 4.3, which expends 8 units of power over the first 9 slots under the policy that serves the queue
that yields the largest Qi (t)bi (t) value, which is a special case of the drift-plus-penalty algorithm
for K = 2, J = L = 0, V = 0.
a) Given the full future arrival and channel events as shown in the table, and given Q1 (0) =
Q2 (0) = 0, select a different set of channels to serve over slots {0, 1, . . . , 8} that also leaves the
system empty on slot 9, but that minimizes the amount of power required to do so (so that more
than 1 slot will be idle). How much power is used?
b) Assume these arrivals and channels are repeated periodically every 9 slots. Simulate the
system using the drift-plus-penalty policy of choosing the queue i that maximizes Qi (t)bi (t) − V
whenever this quantity is non-negative, and remains idle if this is negative for both i = 1 and i = 2.
Find the empirical average power expenditure and the empirical average queue backlog over 106
slots when V = 0. Repeat for V = 1, V = 5, V = 10, V = 20, V = 50, V = 100, V = 200.
c) Repeat part (b) in the case when arrival vectors (a1 (t), a2 (t)) and channel vectors
(S1 (t), S2 (t)) are independent and i.i.d. over slots with the same empirical distribution as that
achieved over 9 slots in the table, so that P r[(a1 , a2 ) = (3, 2)] = 1/9, P r[(S1 , S2 ) = (G, M)] =
3/9, P r[(S1 , S2 ) = (M, B)] = 2/9, etc. Note: You should find that the resulting minimum power that is
approached as V is increased is the same as part (b), and is strictly less than the empirical power expenditure
of part (a).
4.10. EXERCISES 87
d) Show that queue i is only served if Qi (t) ≥ V /3. Conclude that Qi (t) ≥ max[V /3 −

3, 0]=Qplace for all t, provided that this inequality holds for Qi (0). Hence, using Qplace place-holder
packets would reduce average backlog by exactly this amount, with no loss of power performance.

Exercise 4.12. (Wireless Network Coding) Consider a system of 4 wireless users that communicate
to each other through a base station (Fig. 4.4). User 1 desires to send data to user 2 and user 2 desires
to send data to user 1. Likewise, user 3 desires to send data to user 4 and user 4 desires to send data
to user 3.

Base Base
Station Station

p2 p3+p4 p3+p4
p4
p3+p4
p3+p4
p1 p3
2 2
4 4
1 3 1 3

Phase 1: Phase 2:
Uplink transmission Downlink Broadcast
of different packets pi of an XORed packet

Figure 4.4: An illustration of the 2 phases forming a cycle.

Let t ∈ {0, 1, 2, . . .} index a cycle. Each cycle t is divided into 2 phases: In the first phase,
users 1, 2, 3, and 4 all send a new packet (if any) to the base station (this can be accomplished, for
example, using TDMA or FDMA in the first phase). In the second phase, the base station makes
a transmission decision α(t) ∈ {{1, 2}, {3, 4}}. If α(t) = {1, 2}, the head-of-line packets for users 1
and 2 are XORed together, XORing with 0 if only one packet is available, and creating a null packet
if no packets from users 1 or 2 are available. The XORed packet (or null packet) is then broadcast
to all users. We assume all packets are labeled with sequence numbers, and the sequence numbers of
both XORed packets are placed in a packet header. As in (148), users 1 and 2 can decode the new
data if they keep copies of the previous packets they sent. If α(t) = {3, 4}, a similar XOR operation
is done for user 3 and 4 packets.
Assume that downlink channel conditions are time-varying and known at the beginning of
each cycle, with channel state vector S(t) = (S1 (t), S2 (t), S3 (t), S4 (t)), where Si (t) ∈ {ON, OF F }.
Only users with ON channel states can receive the transmission. The queueing dynamics from one
cycle to the next thus satisfy:

Q1 (t + 1) = max[Q1 (t) − b1 (t), 0] + a2 (t) , Q2 (t + 1) = max[Q2 (t) − b2 (t), 0] + a1 (t)

Q3 (t + 1) = max[Q3 (t) − b3 (t), 0] + a4 (t) , Q4 (t + 1) = max[Q4 (t) − b4 (t), 0] + a3 (t)
88 4. OPTIMIZING TIME AVERAGES
where Qk (t) is the integer number of packets waiting in the base station for transmission to desti-
nation k, bk (t) ∈ {0, 1} is the number of packets transmitted over the downlink to node k during
cycle t, satisfying:

1 if Sk (t) = ON and k ∈ α(t)
bk (t) = b̂k (α(t), S(t)) =
0 otherwise

and ak (t) is the number of packets arriving over the uplink from node k during cycle t (notice that
data destined for node 1 arrives as the process a2 (t), etc.). Suppose that S(t) is i.i.d. over cycles, with
probabilities πs = P r[S(t) = s], where s = (S1 , S2 , S3 , S4 ). Arrivals ak (t) are i.i.d. over cycles with
rate λk = E {ak (t)}, for k ∈ {1, . . . , 4}, and with bounded second moments.
a) Suppose that S(t) = (ON, ON, OF F, ON) and that Qk (t) > 0 for all queues k ∈
{1, 2, 3, 4}. It is tempting to assume that mode α(t) = {1, 2} is the best choice in this case,
although this is not always true. Give an example where it is impossible to stabilize the sys-
tem if the controller always chooses α(t) = {1, 2} whenever S(t) = (ON, ON, OF F, ON) or
S(t) = (ON, ON, ON, OF F ), but where a more intelligent control choice would stabilize the
system.8

b) Define L(Q(t)) = 21 4k=1 Qk (t)2 . Compute (Q(t)) and show it has the form:
4

(Q(t)) ≤ B − E Q
k=1 k (t)[b k (t) − λ ]
m(k) Q(t) (4.93)

where m(1) = 2, m(2) = 1, m(3) = 4, m(4) = 3, and where B < ∞. Design a control policy that
observes S(t) and chooses actions α(t) to minimize the right-hand-side of (4.93) over all feasible
control policies.
c) Consider all possible S-only algorithms that choose a transmission mode as a stationary
and random function of the observed S(t) (and independent of queue backlog). Define the S-only
throughput region as the set of all (λ1 , λ2 , λ3 , λ4 ) vectors for which there exists an S-only policy
α ∗ (t) such that:

E b̂1 (α ∗ (t), S(t)), b̂2 (α ∗ (t), S(t)), b̂3 (α ∗ (t), S(t)), b̂4 (α ∗ (t), S(t)) ≥ (λ2 , λ1 , λ4 , λ3 )

Suppose that (λ1 , λ2 , λ3 , λ4 ) is interior to , so that (λ1 + , λ2 + , λ3 + , λ4 + ) ∈ for some

value > 0. Conclude that the drift-minimizing policy of part (b) makes all queues strongly stable,
and provide an upper bound on time average expected backlog.

Exercise 4.13. (A modified algorithm) Suppose the conditions of Theorem 4.8 hold. However,
suppose that every slot t we observe (t), ω(t) and choose an action α(t) ∈ Aω(t) that minimizes the
8 It can also be shown that an algorithm that always chooses α(t) = {1, 2} under states (ON, ON, OF F, ON ) or
(ON, ON, ON, OF F ) and when there are indeed two packets to serve will not necessarily work—we need to take queue
length into account. See (10) for related examples in the context of a 3 × 3 packet switch.
4.10. EXERCISES 89

exact drift-plus-penalty expression ((t)) + V E ŷ0 (α(t), ω(t))|(t) , rather than minimizing
the upper bound on the right-hand-side of (4.44).
a) Show that the same performance guarantees of Theorem 4.8 hold.
b) Using (2.2), state this algorithm (for C = 0) in the special case when L = J = 0,
yl (t) = ej (t) = 0, ω(t) = [(a1 (t), . . . , aK (t)), (S1 (t), . . . , SK (t))], âk (α(t), ω(t)) = ak (t), α(t) ∈
{1, . . . , K} (representing a single queue that we serve every slot), and:

Sk (t) if α(t) = k
b̂k (α(t), ω(t)) =
0 if α(t) = k

a(t) b(t)
(A(t), (t)) Compressor Q(t)

Distortion d(t)

Figure 4.5: A dynamic data compression system for Exercise 4.14.

Exercise 4.14. (Distortion-Aware Data Compression (143)) Consider a single queue Q(t) with
dynamics (2.1), where b(t) is an i.i.d. transmission rate process with bounded second moments. As
shown in Fig. 4.5, the arrival process a(t) is generated as the output of a data compression operation.
Specifically, every slot t a new packet of size A(t) bits arrives to the system (where A(t) = 0 if
no packet arrives). This packet has meta-data β(t), where β(t) ∈ B , where B represents a set of
different data types. Assume the pair (A(t), β(t)) is i.i.d. over slots. Every slot t, a network controller
observes (A(t), β(t)) and chooses a data compression option c(t) ∈ {0, 1, . . . , C}, where c(t) indexes a
collection of possible data compression algorithms.The output of the compressor is a compressed packet
of random size a(t) = â(A(t), β(t), c(t)), causing a random distortion d(t) = d̂(A(t), β(t), c(t)).
Note that â(·) and d̂(·) are random functions. Assume the pair (a(t), d(t)) is i.i.d. over all slots with
the same A(t), β(t), c(t). Define functions m(A, β, c) and δ(A, β, c) as follows:

m(A, β, c) = E â(A(t), β(t), c(t))|A(t) = A, β(t) = β, c(t) = c

δ(A, β, c) = E d̂(A(t), β(t), c(t))|A(t) = A, β(t) = β, c(t) = c

Assume that c(t) = 0 corresponds to no compression, so that m(A, β, 0) = A, δ(A, β, 0) = 0 for

all (A, β). Further, assume that c(t) = C corresponds to throwing the packet away, so that
m(A, β, C) = 0 for all (A, β). Further assume there is a finite constant σ 2 such that for all (A, β, c),
we have:
E â(A(t), β(t), c(t))2 |A(t) = A, β(t) = β, c(t) = c ≤ σ2
E d̂(A(t), β(t), c(t))2 |A(t) = A, β(t) = β, c(t) = c ≤ σ2
90 4. OPTIMIZING TIME AVERAGES
Assume the functions m(A, β, c) and δ(A, β, c) are known. We want to design an algorithm
that minimizes the time average expected distortion d subject to queue stability. It is clear that this
problem is feasible, as we can always choose c(t) = C (although this would maximize distortion).
Use the drift-plus-penalty framework (with fixed V ) to design such an algorithm. Hint: Use iterated
expectations to claim that:

E â(A(t), β(t), c(t))|Q(t) = E E â(A(t), β(t), c(t))|Q(t), A(t), β(t), c(t) |Q(t)
= E {m(A(t), β(t), c(t))|Q(t)}

Exercise 4.15.(Weighted Lyapunov Functions) Recompute the drift-plus-penalty bound in

Lemma 4.6 under the following modified Lyapunov function:

1 1 1
K L J
L((t)) = wk Qk (t)2 + Zl (t)2 + Hj (t)2
2 2 2
k=1 l=1 j =1

where {wk }K
k=1 are a positive weights. How does the drift-plus-penalty algorithm change?

Y(t)

Q1(t) 1(t)
a1(t) Q3(t) 3(t)
X(t)
a2(t)
Q2(t)
2(t)

Figure 4.6: The 3-node multi-hop network for Exercise 4.16.

Exercise 4.16. (Multi-Hop with Orthogonal Channels) Consider the 3-node wireless network of
Fig. 4.6.The network operates in discrete time with unit time slots t ∈ {0, 1, 2, . . .}. It has orthogonal
channels, so that node 3 can send and receive at the same time. The network controller makes power
allocation decisions and routing decisions.
• (Power Allocation) Let μi (t) be the transmission rate at node i on slot t, for i ∈ {1, 2, 3}. This
transmission rate depends on the channel state Si (t) and the power allocation decision Pi (t)
by the following function:

μi (t) = log(1 + Pi (t)Si (t)) ∀i ∈ {1, 2, 3}, ∀t

4.10. EXERCISES 91
where log(·) denotes the natural logarithm. Every time slot t, the network controller
observes the channels (S1 (t), S2 (t), S3 (t)) and determines the power allocation decisions
(P1 (t), P2 (t), P3 (t)), made subject to the following constraints:
0 ≤ Pi (t) ≤ 1 ∀i ∈ {1, 2, 3}, ∀t

• (Routing) There are two arrival processes X(t) and Y (t), taking units of bits. The X(t) process
can be routed to either queue 1 or 2. The Y (t) process goes directly into queue 3. Let a1 (t)
and a2 (t) represent the routing decision variables, where a1 (t) is the amount of bits routed to
queue 1, and a2 (t) is the amount of bits routed to queue 2. The network controller observes
X(t) every slot and makes decisions for (a1 (t), a2 (t)) subject to the following constraints:
a1 (t) ≥ 0 , a2 (t) ≥ 0 , a1 (t) + a2 (t) = X(t) ∀t

It can be shown that the Lyapunov drift (Q(t)) satisfies the following every slot t:
(Q(t)) ≤ B + Q1 (t)E {a1 (t) − μ1 (t)|Q(t)} + Q2 (t)E {a2 (t) − μ2 (t)|Q(t)}
+Q3 (t)E {μ1 (t) + Y (t) − μ3 (t)|Q(t)}
where B is a positive constant. We want to design a dynamic algorithm that solves the following
problem:
Minimize: P1 + P2 + P3
Subject to: 1) Qi (t) is mean rate stable ∀i ∈ {1, 2, 3}
2) a1 (t) ≥ 0 , a2 (t) ≥ 0 , a1 (t) + a2 (t) = X(t) ∀t
3) 0 ≤ Pi (t) ≤ 1 ∀i ∈ {1, 2, 3}, ∀t
a) Using a fixed parameter V > 0, state the drift-plus-penalty algorithm for this problem.
The algorithm should have separable power allocation and routing decisions.
b) Suppose that V = 20, Q1 (t) = 50, Q2 (t) = Q3 (t) = 20, S1 (t) = S2 (t) = S3 (t) = 1.
What should the value of P1 (t) be under the drift-plus-penalty algorithm? (give a numeric value)
c) Suppose (X(t), Y (t)) is i.i.d. over slots with E {X(t)} = λX and E {Y (t)} = λY .
Suppose (S1 (t), S2 (t), S3 (t)) is i.i.d. over slots. Suppose there is a stationary and ran-
domized policy that observes (X(t), Y (t), S1 (t), S2 (t), S3 (t)) every slot t, and makes ran-
domized decisions (a1∗ (t), a2∗ (t), P1∗ (t), P2∗ (t), P3∗ (t)) based only on the observed vector
(X(t), Y (t), S (t), S (t), S (t)). State desirable properties for the expectations of E a ∗ (t) ,
1 2 3 1
E a2∗ (t) , E log(1 + Pi∗ (t)Si (t)) for i ∈ {1, 2, 3} that would ensure your algorithm of part (a)
would make all queues mean rate stable with time average expected power expenditure given by:
P 1 + P 2 + P 3 ≤ φ + B/V
where φ is a desired value for the sum time average power. Your properties should be in the form of
desirable inequalities.
92 4. OPTIMIZING TIME AVERAGES

4.11 APPENDIX 4.A — PROVING THEOREM 4.5

This appendix characterizes the set of all possible time average expectations for the variables [(yl (t)),
(ej (t)), (ak (t)), (bk (t))] defined in Section 4.2. It concludes with a proof of Theorem 4.5, which
shows that optimality for the problem (4.31)-(4.35) can be defined over the class of ω-only policies.
The proof involves set theoretic concepts of convex sets, closed sets, limit points, and convergent subse-
quences. In particular, we use the well known fact that if {x(t)}∞ t=0 is an infinite sequence of vectors
that are contained in some bounded set X ⊆ Rk (for some finite integer k > 0), then there must
exist a convergent subsequence {x(ti )}∞ i=1 that converges to a point x in the closure of X (see, for
example, A14 of (145)). Specifically, there is a vector x in the closure of X and an infinite sequence
of increasing positive integers {t1 , t2 , t3 , . . .} such that:

lim x(ti ) = x
i→∞

4.11.1 THE REGION

Let represent the region of all [(y l )L
l=0 , (ej )j =1 , (a k )k=1 , (bk )k=1 ] values that can be achieved by
J K K

ω-only policies. Equivalently, this can be viewed as the region of all one-slot expectations that can
be achieved via randomized decisions when the ω(t) variable takes values according to its stationary
distribution. The boundedness assumptions (4.25)-(4.30) ensure that the set is bounded. It is easy
to show that is also convex by using an ω-only policy that is a mixture of two other ω-only policies.
Now note that for any slot τ and assuming that ω(τ ) has its stationary distribution, the one-
slot expectation under any decision α(τ ) ∈ Aω(τ ) is in the set , even if that decision is from an
arbitrary policy that is not an ω-only policy. That is:

E [(ŷl (α(τ ), ω(τ )), (êj (α(τ ), ω(τ )), (âk (α(τ ), ω(τ )), (b̂k (α(τ ), ω(τ ))] ∈

where the expectation is with respect to the random ω(τ ) (which has the stationary distribution)
and the possibly random α(τ ) that is made by the policy in reaction to the observed ω(τ ). This
expectation is in because any sample path of events that lead to the policy choosing α(τ ) on
slot τ simply affects the conditional distribution of α(τ ) given the observed ω(τ ), and hence the
expectation can be equally achieved by the ω-only policy that uses the same conditional distribution.9
This observation directly leads to the following simple lemma.

Lemma 4.17 If ω(τ ) is in its stationary distribution for all slots τ , then for any policy that chooses
α(τ ) ∈ Aω(τ ) over time (including policies that are not ω-only), we have for any slot t > 0:

1
t−1
E [(ŷl (α(τ ), ω(τ )), (êj (α(τ ), ω(τ )), (âk (α(τ ), ω(τ )), (b̂k (α(τ ), ω(τ ))] ∈ (4.94)
t
τ =0
9 We implicitly assume that the decision α(τ ) on slot τ has a well defined conditional distribution.
4.11. APPENDIX 4.A — PROVING THEOREM 4.5 93
Thus, if r∗ is a limit point of the time average on the left-hand-side of (4.94) over a subsequence of times
ti that increase to infinity, then r ∗ is in the closure of .

Proof. Each term in the time average is itself in , and so the time average is also in because is
convex. 2
Thus, the finite horizon time average expectation under any policy cannot escape the set ,
and any infinite horizon time average that converges to a limit point cannot escape the closure of
. If the set is closed, then any limit point r ∗ is inside and hence (by definition of ) can
be exactly achieved as the one-slot average under some ω-only policy. If is not closed, then r ∗
can be achieved arbitrarily closely (i.e., within a distance δ, for any arbitrarily small δ > 0), by an
ω-only policy. This naturally leads to the following characterization of optimality in terms of ω-only
policies.

4.11.2 CHARACTERIZING OPTIMALITY

Define ˜ as the set of all points [(yl ), (ej ), (ak ), (bk )] in the closure of that satisfy:

yl ≤ 0 ∀l ∈ {1, . . . , L} , ej = 0 ∀j ∈ {1, . . . , J } , ak ≤ bk ∀k ∈ {1, . . . , K} (4.95)

It can be shown that, if non-empty, ˜ is closed and bounded. If ˜ is non-empty, define y0∗ as the
˜ Intuitively, the set ˜
minimum value of y0 for which there is a point [(yl ), (ej ), (ak ), (bk )] ∈ .
is the set of all time averages achievable by ω-only policies that meet the required time average
constraints and that have time average expected arrivals less than or equal to time average expected
service, and y0∗ is the minimum time average penalty achievable by such ω-only policies. We now
show that y0∗ = y0 .
opt

Theorem 4.18 Suppose the ω(t) process is stationary with distribution π(ω), and that the system
satisfies the boundedness assumptions (4.25)-(4.30) and the law of large numbers assumption specified in
Section 4.2. Suppose the problem (4.31)-(4.35) is feasible. Let α(t) be any control policy that satisfies the
constraints (4.32)-(4.35), and let r (t) represent the t-slot expected time average in the left-hand-side of
(4.94) under this policy.
a) Any limit point [(yl ), (ej ), (ak ), (bk )] of {r (t)}∞ ˜ ˜
t=1 is in the set . In particular, the set is
non-empty.
b) The time average expected penalty under the algorithm α(t) satisfies:

1
t−1

lim inf E ŷ0 (α(t), ω(t)) ≥ y0∗ (4.96)
t→∞ t
τ =0

Thus, no algorithm that satisfies the constraints (4.32)-(4.35) can yield a time average expected penalty
smaller than y0∗ . Further, y0∗ = y0 .
opt
94 4. OPTIMIZING TIME AVERAGES
Proof. To prove part (a), note from Lemma 4.17 that r (t) is always inside the (bounded) set .
Hence, it has a limit point, and any such limit point is in the closure of . Now consider a particular
limit point [(yl ), (ej ), (ak ), (bk )], and let {ti }∞
i=1 be the subsequence of non-negative integer time
slots that increase to infinity and satisfy:

lim r (ti ) = [(yl ), (ej ), (ak ), (bk )]

i→∞

Because the constraints (4.32) and (4.33) are satisfied, it must be the case that:

yl ≤ 0 ∀l ∈ {1, . . . , L} , ej = 0 ∀j ∈ {1, . . . , J } (4.97)

Further, by the sample-path inequality (2.5), we have for all ti > 0 and all k:
ti −1
E {Qk (ti )} E {Qk (0)} 1
− ≥ E âk (α(τ ), ω(τ )) − b̂k (α(τ ), ω(τ ))
ti ti ti
τ =0

Because the control policy makes all queues mean rate stable, taking a limit of the above over the
times ti → ∞ yields 0 ≥ ak − bk , and hence we find that:

ak ≤ bk ∀k ∈ {1, . . . , K} (4.98)

˜
The results (4.97) and (4.98) imply that the limit point [(yl ), (ej ), (ak ), (bk )] is in the set .
To prove part (b), let {ti }∞
i=1 be a subsequence of non-negative integer time slots that increase
to infinity, that yield the lim inf by:
ti −1
1 1
t−1

lim E ŷ0 (α(τ ), ω(τ )) = lim inf E ŷ0 (α(τ ), ω(τ )) (4.99)
i→∞ ti t→∞ t
τ =0 τ =0

and that yield well defined time averages [(yl ), (ej ), (ak ), (bk )] for r (ti ) (such a subsequence can be
constructed by first taking a subsequence {ti } that achieves the lim inf, and then taking a convergent
subsequence {ti } of {ti } that ensures the r (ti ) values converge to a limit point). Then by part (a), we
˜ and so its y0 component (being the lim inf value in (4.99)) is
know that [(yl ), (ej ), (ak ), (bk )] ∈ ,
∗
greater than or equal to y0 because y0∗ , is the smallest possible y0 value of all points in . ˜
It follows that no control algorithm that satisfies the required constraints has a time average
expected penalty less than y0∗ . We now show that it is possible to achieve y0∗ , and so y0∗ = y0 . For
opt

simplicity, we consider only the case when is closed. Let [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] be the point in ˜
that has component y0∗ . Because is closed, ˜ is a subset of , and so [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] ∈ . It
follows there is an ω-only algorithm α ∗ (t) with expectations exactly equal to [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )]
on every slot t. Thus, the time average penalty is y0∗ , and the constraints (4.32), (4.33) are satisfied
because yl∗ ≤ 0 for all l ∈ {1, . . . , L}, ej∗ = 0 for all j ∈ {1, . . . , J }. Further, our “law-of-large-
number” assumption on ω(t) ensures the time averages of âk (α ∗ (t), ω(t)) and b̂k (α ∗ (t), ω(t)),
4.11. APPENDIX 4.A — PROVING THEOREM 4.5 95
achieved under the ω-only algorithm α ∗ (t),
are equal to and ak∗ bk∗
with probability 1. Because
∗ ∗
ak ≤ bk and the second moments of ak (t) and bk (t) are bounded by a finite constant σ 2 for all t,
the Rate Stability Theorem (Theorem 2.4) ensures that all queues Qk (t) are mean rate stable. 2
We use this result to prove Theorem 4.5.
Proof. (Theorem 4.5) Let [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] be the point in ˜ that has component y0∗ (where
y0∗ = y0 by Theorem 4.18). Note by definition that ˜ is in the closure of . If is closed,
opt

then [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] ∈ and so there exists an ω-only policy α ∗ (t) that achieves the av-
erages [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] and thus satisfies (4.36)-(4.39) with δ = 0. If is not closed, then
[(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )] is a limit point of and so there is an ω-only policy that gets arbitrarily close
to [(yl∗ ), (ej∗ ), (ak∗ ), (bk∗ )], yielding (4.36)-(4.39) for any δ > 0. 2
The above proof shows that if the assumptions of Theorem 4.5 hold and if the set is closed,
then an ω-only policy exists that satisfies the inequalities (4.36)-(4.39) with δ = 0.
97

CHAPTER 5

Optimizing Functions of Time

Averages
Here we use the drift-plus-penalty technique to develop methods for optimizing convex functions of
time averages, and for finding local optimums for non-convex functions of time averages. To begin,
consider a discrete time queueing system Q(t) = (Q1 (t), . . . , QK (t)) with the standard update
equation:
Qk (t + 1) = max[Qk (t) − bk (t), 0] + ak (t) (5.1)
Let x(t) = (x1 (t), . . . , xM (t)), y (t) = (y1 (t), . . . , yL (t)) be attribute vectors. As before, the ar-
rival, service, and attribute variables are determined by general functions ak (t) = âk (α(t), ω(t)),
bk (t) = b̂k (α(t), ω(t)), xm (t) = x̂m (α(t), ω(t)) and yl (t) = ŷl (α(t), ω(t)). Consider now the fol-
lowing problem:
Maximize: φ(x) (5.2)
Subject to: 1) y l ≤ 0 ∀l ∈ {1, . . . , L} (5.3)
2) All queues Qk (t) are mean rate stable (5.4)
3) α(t) ∈ Aω(t) ∀t (5.5)
where φ(x) is a concave, continuous, and entrywise non-decreasing utility function defined over an
appropriate region of RM (such as the non-negative orthant when xm (t) attributes are non-negative,
or all RM otherwise). A more general problem, without the entrywise non-decreasing assumption,
is considered in Section 5.4.
Problems with the structure (5.2)-(5.5) arise, for example, when maximizing network
throughput-utility, where x represents a vector of achieved throughput and φ(x) is a concave
function that measures network fairness. An example utility function that is useful when attributes
xm (t) are non-negative is:

M
φ(x) = log(1 + νm xm ) (5.6)
m=1
where νm are positive constants. This is useful because each component function log(1 + νm xm )
has a diminishing returns property as xm is increased, has maximum derivative νm , and is 0 when
xm = 0. Another common example is:

M
φ(x) = log(xm ) (5.7)
m=1
98 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
This corresponds to the proportional fairness objective (1)(2)(5). The function φ(x) does not need to
be differentiable. An example non-differentiable function that is concave, continuous, and entrywise
non-decreasing is φ(x) = min[x1 , x2 , . . . , xM ].
The problem (5.2)-(5.5) is different from all of the problems seen in Chapter 4 because it
involves a function of a time average. It does not conform to the structure required for the drift-plus-
penalty framework of Chapter 4 unless the function φ(x) is linear, because a linear function of a
time average is equal to the time average of the linear function. In the case when φ(x) is concave but
nonlinear, maximizing the time average of φ(x(t)) is typically not the same as maximizing φ(x) (see
Exercise 5.12 for a special case when it is the same). Below we transform the problem by adding a
rectangle constraint and auxiliary variables in such a way that the transformed problem involves only
time averages (not functions of time averages), so that the drift-plus-penalty framework of Chapter
4 can be applied. The key step in analyzing the transformed problem is Jensen’s inequality.

5.0.3 THE RECTANGLE CONSTRAINT R

Define φ opt as the maximum utility associated with the above problem, augmented with the following
rectangle constraint:
x∈R (5.8)
where R is defined:

R= {(x1 , . . . , xM ) ∈ RM |γm,min ≤ xm ≤ γm,max ∀m ∈ {1, . . . , M}}

where γm,min and γm,max are finite constants (we typically choose γm,min = 0 in cases when attributes
xm (t) are non-negative).This rectangle constraint is useful because it limits the x vector to a bounded
region, and it will ensure that the auxiliary variables that we soon define are also bounded. While
this x ∈ R constraint may limit optimality, it is clear that φ opt increases to the maximum utility of
the problem without this constraint as the rectangle R is expanded. Further, φ opt is exactly equal to
the maximum utility of the original problem (5.2)-(5.5) whenever the rectangle R is chosen large
enough to contain a time average attribute vector x that is optimal for the original problem.

5.0.4 JENSEN’S INEQUALITY

Assume the concave utility function φ(x) is defined over the rectangle region x ∈ R. Let X =
(X1 , . . . , XM ) be a random vector that takes values in R. Jensen’s inequality for concave functions
states that:
E {X } ∈ R , and E {φ(X )} ≤ φ(E {X }) (5.9)
Indeed, even though we stated Jensen’s inequality in Section 1.8 in terms of convex functions
f (x) with a reversed inequality E {f (X )} ≥ f (E {X }), this immediately implies (5.9) by defining
f (X ) = −φ(X ).
Now let γ (τ ) = (γ1 (τ ), . . . , γM (τ )) be an infinite sequence of random vectors that take values
in the set R for τ ∈ {0, 1, 2, . . .}. It is easy to show that Jensen’s inequality for concave functions
99
directly implies the following for all t > 0 (see Exercise 5.3):

1 1 1
t−1 t−1 t−1
γ (τ ) ∈ R and φ(γ (τ )) ≤ φ γ (τ ) (5.10)
t t t
τ =0 τ =0 τ =0
t−1
1 1 1
t−1 t−1
E {γ (τ )} ∈ R and E {φ(γ (τ ))} ≤ φ E {γ (τ )} (5.11)
t t t
τ =0 τ =0 τ =0

Taking limits of (5.11) as t → ∞ yields:

γ ∈ R and φ(γ ) ≤ φ(γ )
where γ and φ(γ ) are defined as the following limits:

1 1
t−1 t−1

γ= lim E {γ (τ )} , φ(γ )= lim E {φ(γ (τ ))} (5.12)
t→∞ t t→∞ t
τ =0 τ =0

where we temporarily assume the above limits exist. We have used the fact that the rectangle R is a
closed set to conclude that a limit of vectors in R is also in R.
In summary, whenever the limits of γ and φ(γ ) exist, we can conclude by Jensen’s inequality
that φ(γ ) ≥ φ(γ ). That is, the utility function evaluated at the time average expectation γ is greater
than or equal to the time average expectation of φ(γ (t)).

5.0.5 AUXILIARY VARIABLES

Let γ (t) = (γ1 (t), . . . , γM (t)) be a vector of auxiliary variables chosen within the set R every slot.
We consider the following modified problem:
Maximize: φ(γ ) (5.13)
Subject to: 1) y l ≤ 0 ∀l ∈ {1, . . . , L} (5.14)
2) γ m ≤ x m ∀m ∈ {1, . . . , M} (5.15)
3) All queues Qk (t) are mean rate stable (5.16)
4) γ (t) ∈ R ∀t (5.17)
5) α(t) ∈ Aω(t) ∀t (5.18)
where φ(γ ) and γ = (γ 1 , . . . , γ M ) are defined in (5.12). This transformed problem involves only
time averages, rather than functions of time averages, and hence can be solved with the drift-plus-

penalty framework of Chapter 4. Indeed, we can define y0 (t)= − φ(γ (t)), and define a new control
action α (t) = (α(t), γ (t)) subject to α (t) ∈ [Aω(t) , R].
This transformed problem (5.13)-(5.18) relates to the original problem as follows: Suppose
we have an algorithm that makes decisions α ∗ (t) and γ ∗ (t) over time t ∈ {0, 1, 2, . . .} to solve the
transformed problem. That is, assume the solution meets all constraints (5.14)-(5.18) and yields a
maximum value for the objective (5.13). For simplicity, assume all limiting time average expectations
x∗ , y ∗l , γ ∗ , φ(γ ∗ ) exist, where φ(γ ∗ ) is the maximum objective value. Then:
100 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
• The decisions α ∗ (t) produce time averages that satisfy all desired constraints of the original
problem (5.2)-(5.5) (so that y ∗l ≤ 0 for all l and all queues Qk (t) are mean rate stable), and
the resulting time average attribute vector x∗ satisfies φ(x∗ ) ≥ φ(γ ∗ ). This is because:
φ(x∗ ) ≥ φ(γ ∗ ) ≥ φ(γ ∗ )
where the first inequality is due to (5.15) and the entrywise non-decreasing property of φ(x),
and the second inequality is Jensen’s inequality.
• φ(γ ∗ ) ≥ φ opt .That is, the maximum utility of the transformed problem (5.13)-(5.18) is greater
than or equal to φ opt . This is shown in Exercise 5.2.
The above two observations imply that φ(x∗ ) ≥ φ opt . Thus, designing a policy to solve the
transformed problem ensures all desired constraints of the original problem (5.2)-(5.5) are satisfied while
producing a utility that is at least as good as φ opt .

5.1 SOLVING THE TRANSFORMED PROBLEM

Following the drift-plus-penalty method (using a fixed V ), we enforce the constraints y l ≤ 0 and
γ m ≤ x m in the transformed problem (5.13)-(5.18) with virtual queues Zl (t) and Gm (t):
Zl (t + 1) = max[Zl (t) + yl (t), 0] , ∀l ∈ {1, . . . , L} (5.19)
Gm (t + 1) = max[Gm (t) + γm (t) − xm (t), 0] , ∀m ∈ {1, . . . , M} (5.20)

Define (t)= [Q(t), Z (t), G(t)], and define the Lyapunov function:
1 K L M
k=1 Qk (t) + l=1 Zl (t) +
L((t))= 2 2 2
2 m=1 Gm (t)

Assume that ω(t) is i.i.d., and that yl (t), xm (t), ak (t), bk (t) satisfy the boundedness assump-
tions (4.25)-(4.28). It is easy to show the drift-plus-penalty expression satisfies:

L
((t)) − V E {φ(γ (t))|(t)} ≤ D − V E {φ(γ (t))|(t)} + Zl (t)E {yl (t)|(t)}
l=1

K
M
+ Qk (t)E {ak (t) − bk (t)|(t)} + Gm (t)E {γm (t) − xm (t)|(t)} (5.21)
k=1 m=1

where D is a finite constant related to the worst-case second moments of yl (t), xm (t), ak (t), bk (t).
A C-additive approximation chooses γ (t) ∈ R and α(t) ∈ Aω(t) such that, given (t), the right-
hand-side of (5.21) is within C of its infimum value. A 0-additive approximation thus performs the
following:
• (Auxiliary Variables) For each slot t, observe G(t) and choose γ (t) to solve:

Maximize: V φ(γ (t)) − M m=1 Gm (t)γm (t) (5.22)
Subject to: γm,min ≤ γm (t) ≤ γm,max ∀m ∈ {1, . . . , M} (5.23)
5.1. SOLVING THE TRANSFORMED PROBLEM 101
• (α(t) Decision) For each slot t, observe (t) and ω(t), and choose α(t) ∈ Aω(t) to minimize:

L
K
Zl (t)ŷl (α(t), ω(t)) + Qk (t)[âk (α(t), ω(t)) − b̂k (α(t), ω(t))]
l=1 k=1

M
− Gm (t)x̂m (α(t), ω(t))
m=1

• (Queue Update) Update the virtual queues Zl (t) and Gm (t) according to (5.19) and (5.20),
and the actual queues Qk (t) by (5.1).
Define time average expectations x(t), γ (t), y l (t) by:

t−1
t−1
t−1
1 1 1
x(t)= E {x(τ )} , γ (t)= E {γ (τ )} , y l (t)= E {yl (τ )} (5.24)
t t t
τ =0 τ =0 τ =0

Define φ max as an upper bound on φ(γ (t)) for all t, and assume it is finite:

φ max = φ(γ1,max , γ2,max , . . . , γm,max ) < ∞ (5.25)

Theorem 5.1 Suppose the boundedness assumptions (4.25)-(4.28), (5.25) hold, the function φ(x) is
continuous, concave, and entrywise non-decreasing, the problem (5.2)-(5.5), (5.8) (including the constraint
x ∈ R) is feasible, and E {L((0))} < ∞. If ω(t) is i.i.d. over slots and any C-additive approximation
is used every slot, then all actual and virtual queues are mean rate stable and:

lim inf φ(x(t)) ≥ φ opt − (D + C)/V (5.26)

t→∞
lim sup y l (t) ≤ 0 , ∀l ∈ {1, . . . , L} (5.27)
t→∞

where φ opt is the maximum utility of the problem (5.2)-(5.5), (5.8) (including the constraint x ∈ R),
and x(t), y l (t) are defined in (5.24).

The following extended result provides average queue bounds and utility bounds for all slots
t.

Theorem 5.2 Suppose the assumptions of Theorem 5.1 hold.

(a) If there is an > 0, an ω-only policy α ∗ (t), and a finite constant φ such that the following
Slater-type conditions hold:

E ŷl (α ∗ (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.28)
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ − ∀k ∈ {1, . . . , K} (5.29)
∗

γm,min ≤ E x̂m (α (t), ω(t)) ≤γm,max ∀m ∈ {1, . . . , M} (5.30)
φ(E x̂(α ∗ (t), ω(t)) ) = φ (5.31)
102 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
then all queues Qk (t) are strongly stable and for all t > 0, we have:
! "
+ + ∗) − φ
1
t−1 K D C V φ(γ E {L((0))}
E {Qk (τ )} ≤ +
t t
τ =0 k=1

where φ(γ ∗ ) is the maximum objective function value for the transformed problem (5.13)-(5.18).
(b) If all virtual and actual queues are initially empty (so that (0) = 0) and if there are finite
constants νm ≥ 0 such that for all γ (t) and all x(t), we have:

M
|φ(γ (t)) − φ(x(t))| ≤ νm |γm (t) − xm (t)| (5.32)
m=1

then for all t > 0, we have:

D + C νm E {Gm (t)}
M
φ(x(t)) ≥ φ opt − − (5.33)
V t
m=1
√
where E {Gm (t)} /t is O(1/ t) for all m ∈ {1, . . . , M}.

The assumption that all queues are initially empty, made in part (b) of the above theorem,
is made only for convenience. The right-hand-side of (5.33) would be modified by subtracting
the additional term E {L((0))} /V t otherwise. We note that the νm constraint (5.32) needed
in part (b) of the above theorem is satisfied for the example utility function in (5.6), but not for
the proportionally fair utility function in (5.7). Further, the algorithm developed in this section
(or C-additive approximations of the algorithm) often result in deterministically bounded queues,
regardless of whether or not the Slater assumptions (5.28)-(5.31) hold (see flow control examples
in Sections 5.2-5.3 and Exercises 5.5-5.7). For example, it can be shown that if (5.32) holds, if γ (t)
is chosen by (5.22)-(5.23), and if xm (t) ≥ γm,min for all t, then Gm (t) ≤ V νm + γm,max for all t
√
(provided this holds at t = 0). In this case, E {Gm (t)} /t is O(1/t), better than the O(1/ t) bound
given in the above theorem. As before, the same algorithm can be shown to perform efficiently when
the ω(t) process is non-i.i.d. (38)(39)(136)(42). This is because the auxiliary variables transform the
problem to a structure that is the same as that covered by the ergodic theory and universal scheduling
theory of Section 4.9.
Proof. (Theorem 5.1) Because the C-additive approximation comes within C of minimizing the
right-hand-side of (5.21), we have:

L

∗
((t)) − V E {φ(γ (t))|(t)} ≤ D + C − V φ(γ ) + Zl (t)E yl∗ (t)|(t)
l=1

K

M

+ Qk (t)E ak∗ (t) − bk∗ (t)|(t) + Gm (t)E γm∗ − xm
∗
(t)|(t) (5.34)
k=1 m=1
5.1. SOLVING THE TRANSFORMED PROBLEM 103
where γ ∗ = (γ1∗ , . . . , γM
∗ ) is any vector in R, and y ∗ (t), a ∗ (t), b∗ (t), x ∗ (t) are from any alternative
l k k m
(possibly randomized) policy α ∗ (t) ∈ Aω(t) . Now note that feasibility of the problem (5.2)-(5.5),
(5.8) implies feasibility of the transformed problem (5.13)-(5.18).1 This together with Theorem 4.5
implies that for any δ > 0, there is an ω-only policy α ∗ (t) ∈ Aω(t) and a vector γ ∗ ∈ R such that:
∗
−φ(γ
) ≤ −φ opt + δ
∗
E ŷl (α (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L}
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ δ ∀k ∈ {1, . . . , K}

E γm∗ − x̂m (α ∗ (t), ω(t)) ≤ δ ∀m ∈ {1, . . . , M}
Assuming that δ = 0 for convenience and plugging the above into (5.34) gives:2
((t)) − V E {φ(γ (t))|(t)} ≤ D + C − V φ opt (5.35)
This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2)
and hence by that theorem (or, equivalently, by using iterated expectations and telescoping sums in
the above inequality), for all t > 0, we have:

1
t−1
E {φ(γ (τ ))} ≥ φ opt − (D + C)/V − E {L((0))} /(V t)
t
τ =0
By Jensen’s inequality for the concave function φ(γ ), we have for all t > 0:
φ(γ (t)) ≥ φ opt − (D + C)/V − E {L((0))} /(V t) (5.36)
Taking a lim inf of both sides yields:
lim inf φ(γ (t)) ≥ φ opt − (D + C)/V (5.37)
t→∞
On the other hand, rearranging (5.35) yields:
((t)) ≤ D + C + V (φ max − φ opt )
Thus, by the Lyapunov Drift Theorem (Theorem 4.1), we know that all queues Qk (t), Zl (t), Gm (t)
√
are mean rate stable (in fact, we know that E {Qk (t)} /t, E {Gm (t)} /t, and E {Zl (t)} /t are O(1/ t)).
Mean rate stability of Zl (t) and Gm (t) together with Theorem 2.5 implies that (5.27) holds, and
that for all m ∈ {1, . . . , M}:
lim sup[γ m (t) − x m (t)] ≤ 0
t→∞
Using this with the continuity and entrywise non-decreasing properties of φ(x), it can be shown
that:
lim inf φ(γ (t)) ≤ lim inf φ(x(t))
t→∞ t→∞
Using this in (5.37) proves (5.26). 2
1To see this, the transformed problem can just use the same α(t) decisions, and it can choose γ (t) = x for all t.
2The same can be derived using δ > 0 and then taking a limit as δ → 0.
104 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
Proof. (Theorem 5.2) We first prove part (b). We have:

φ(γ (t)) = φ(x(t) + [γ (t) − x(t)])

≤ φ(x(t) + max[γ (t) − x(t), 0]) (5.38)
N
≤ φ(x(t)) + νm max[γ m (t) − x m (t), 0] (5.39)
m=1

where (5.38) follows by the entrywise non-decreasing property of φ(x) (where the max[·] rep-
resents an entrywise max), and (5.39) follows by (5.32). Substituting this into (5.36) and using
E {L((0))} = 0 yields:

M
φ(x(t)) ≥ φ opt − (D + C)/V − νm max[γ m (t) − x m (t), 0] (5.40)
m=1

By definition of Gm (t) in (5.20) and the sample path queue property (2.5) together with the fact
that Gm (0) = 0, we have for all m ∈ {1, . . . , M} and any t > 0:

1 1
t−1 t−1
Gm (t)
≥ γm (τ ) − xm (τ )
t t t
τ =0 τ =0

Taking expectations above yields for all t > 0:

E {Gm (t)} E {Gm (t)}

≥ γ m (t) − x m (t) ⇒ ≥ max[γ m (t) − x m (t), 0]
t t
Using this in (5.40) proves part (b) of the theorem.
∗ ∗
To ∗
(a), we plug the ω-only policy α (t) from (5.28)-(5.31) (using γ (t) =
prove part
E x̂(α (t), ω(t)) ) into (5.34). This directly leads to a version of part (a) of the theorem with
φ(γ ∗ ) replaced with φ max . A more detailed analysis shows this can be replaced with φ(γ ∗ ) because
all constraints of the transformed problem are satisfied and so the lim sup time average objective can
be no bigger than φ(γ ∗ ) (recall (4.96) of Theorem 4.18). 2

5.2 A FLOW-BASED NETWORK MODEL

Here we apply the stochastic utility maximization framework to a simple flow based network model,
where we neglect the actual network queueing and develop a flow control policy that simply ensures
the flow rate over each link is no more than the link capacity (similar to the flow based models for
internet and wireless systems in (2)(23)(29)(149)(150)). Section 5.3 treats a more extensive network
model that explicitly accounts for all queues.
Suppose there are N nodes and L links, where each link l ∈ {1, . . . , L} has a possibly time-
varying link capacity bl (t), for slotted time t ∈ {0, 1, 2, . . .}. Suppose there are M sessions, and let
5.2. A FLOW-BASED NETWORK MODEL 105
Am (t) represent the new arrivals to session m on slot t. Each session m ∈ {1, . . . , M} has a particular
source node and a particular destination node. The random network event ω(t) is thus:

ω(t)= [(b1 (t), . . . , bL (t)); (A1 (t), . . . , AM (t))] (5.41)

The control action taken every slot is to first choose xm (t), the amount of type m traffic admitted
into the network on slot t, according to:

0 ≤ xm (t) ≤ Am (t) ∀m ∈ {1, . . . , M}, ∀t (5.42)

The constraint (5.42) is just one example of a flow control constraint. We can easily modify this
to the constraint xm (t) ∈ {0, Am (t)}, which either admits all newly arriving data, or drops all of
it. Alternatively, the flow controller could place all non-admitted data into a transport layer storage
reservoir (rather than dropping it), as in (18)(22)(19)(17) (see also Section 5.6). One can model a
network where all sources always have data to send by Am (t) = γm,max for all t, for some finite value
γm,max used to limit the amount of data admitted to the network on any slot.
Next, we must specify a path for the newly arriving data from a collection of paths Pm associated
with path options of session m on slot t (possibly being the set of all possible paths in the network
from the source of session m to its destination). Here, a path is defined in the usual sense, being a
sequence of links starting at the source, ending at the destination, and being such that the end node
of each link is the start node of the next link. Let 1l,m (t) be an indicator variable that is 1 if the
data xm (t) is selected to use a path that contains link l, and is 0 else. The (1l,m (t)) values completely
specify the chosen paths for slot t, and hence the decision variable for slot t is given by:

α(t)= [(x1 (t), . . . , xM (t)); (1l,m (t))|l∈{1,...,L},m∈{1,...,M} ]

Let x = (x 1 , . . . , x M ) be a vector of the infinite horizon time average admitted flow rates.

Let φ(x) = M m=1 φm (xm ) be a separable utility function, where each φm (x) is a continuous, concave,
non-decreasing function in x. Our goal is to maximize the throughput-utility φ(x) subject to the
constraint that the time average flow over each link l is less than or equal to the time average capacity
of that link. The infinite horizon utility optimization problem of interest is thus:

M
Maximize: m=1 φm (x m ) (5.43)
M
Subject to: m=1 1l,m xm ≤ bl ∀l ∈ {1, . . . , L} (5.44)
0 ≤ xm (t) ≤ Am (t) , (1l,m (t)) ∈ Pm ∀m ∈ {1, . . . , M}, ∀t (5.45)
106 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
where the time averages are defined:

1
t−1

xm = t→∞
lim E {xm (τ )}
t
τ =0
1
t−1

1l,m xm = t→∞
lim E 1l,m (τ )xm (τ )
t
τ =0
1
t−1

bl = lim E {bl (τ )}
t→∞ t
τ =0

We emphasize that while the actual network can queue data at each link l, we are not explicitly
accounting for such queueing dynamics. Rather, we are only ensuring the time average flow rate on
each link l satisfies (5.44).
Define φ opt as the maximum utility associated with the above problem and subject to the
additional constraint that:
0 ≤ x m ≤ γm,max ∀m ∈ {1, . . . , M} (5.46)
for some finite values γm,max . This fits the framework of the utility maximization problem (5.2)-
M
(5.5) with yl (t)= m=1 1l,m (t)xm (t) − bl (t), K = 0, and with R being all γ vectors that satisfy
0 ≤ γm ≤ γm,max for all m ∈ {1, . . . , M} (we choose γm,min = 0 because attributes xm (t) are non-
negative). As there are no actual queues Qk (t) in this model, we use only virtual queues Zl (t) and
Gm (t), defined by update equations:

M
Zl (t + 1) = max Zl (t) + 1l,m (t)xm (t) − bl (t), 0 (5.47)
m=1
Gm (t + 1) = max[Gm (t) + γm (t) − xm (t), 0] (5.48)
where γm (t) are auxiliary variables for m ∈ {1, . . . , M}. The algorithm given in Section 5.0.5 thus
reduces to:
• (Auxiliary Variables) Every slot t, each session m ∈ {1, . . . , M} observes Gm (t) and chooses
γm (t) as the solution to:
Maximize: V φm (γm (t)) − Gm (t)γm (t) (5.49)
Subject to: 0 ≤ γm (t) ≤ γm,max (5.50)

• (Routing and Flow Control) For each slot t and each session m ∈ {1, . . . , M}, observe the
new arrivals Am (t), the virtual queue backlogs Gm (t), and the link queues Zl (t), and choose
xm (t) and a path to maximize:

Maximize: xm (t)Gm (t) − xm (t) L l=1 1l,m (t)Zl (t)
Subject to: 0 ≤ xm (t) ≤ Am (t)
The path specified by (1l,m (t)) is in Pm
5.2. A FLOW-BASED NETWORK MODEL 107
This reduces to the following: First find a shortest path from the source of session m to the
destination of session m, using link weights Zl (t) as link costs. If the total weight of the
shortest path is less than or equal to Gm (t), choose xm (t) = Am (t) and route this data over
this single shortest path. Else, there is too much congestion in the network, and so we choose
xm (t) = 0 (thereby dropping all data Am (t)).
• (Virtual Queue Updates) Update the virtual queues according to (5.47) and (5.48).
The shortest path routing in this algorithm is similar to that given in (149), which treats a
flow-based network stability problem under the assumption that arriving traffic is admissible (so that
flow control is not used). This problem with flow control was introduced in (39) using the universal
scheduling framework of Section 4.9.2, where there are no probabilistic assumptions on the arrivals
or time varying link capacities.

5.2.1 PERFORMANCE OF THE FLOW-BASED ALGORITHM

To apply Theorems 5.1 and 5.2, assume ω(t) = [(b1 (t), . . . , bL (t)); (A1 (t), . . . , AM (t))] is i.i.d.
over slots, and that the bl (t) and Am (t) processes have bounded second moments. Note that the
problem (5.43)-(5.46) is trivially feasible because it is always possible to satisfy the constraints by
admitting no new arrivals on any slot. Suppose we use any C-additive approximation (where a 0-
additive approximation is an exact implementation of the above algorithm). It follows from Theorem
5.1 that all virtual queues are mean rate stable, and so the time average constraints (5.44) are satisfied,
and the achieved utility satisfies:
lim inf φ(x(t)) ≥ φ opt − (D + C)/V (5.51)
t→∞
where D is a finite constant related to the maximum second moments of Am (t) and bl (t). Thus,
utility can be pushed arbitrarily close to optimal by increasing V .
We now show that, under some mild additional assumptions, the flow control structure of
this algorithm yields tight deterministic bounds of size O(V ) on the virtual queues. Suppose that
Am (t) ≤ Am,max for all t, for some finite constant Am,max . Further, to satisfy the constraints (5.32)
needed for Theorem 5.2, assume the utility functions φm (x) have finite right derivatives at x = 0,
given by constants νm ≥ 0, so that for any non-negative x and y we have:
|φm (x) − φm (y)| ≤ νm |x − y| (5.52)
It can be shown that if Gm (t) > V νm , then the solution to (5.49)-(5.50) is γm (t) = 0 (see
Exercise 5.5). Because γm (t) acts as the arrival to virtual queue Gm (t) defined in (5.48), it follows
that Gm (t) cannot increase on the next slot. Therefore, for all m ∈ {1, . . . , M}:
0 ≤ Gm (t) ≤ V νm + γm,max ∀t ∈ {0, 1, 2, . . .} (5.53)
provided that this is true for Gm (0) (which is indeed the case if Gm (0) = 0). This allows one to
deterministically bound the queue sizes Zl (t) for all l ∈ {1, . . . , L}:
0 ≤ Zl (t) ≤ V ν max + γ max + MAmax ∀t (5.54)
108 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
provided this holds at time 0, and where ν max , γ max , Amax are defined as the maximum of all νm ,
γm,max , Am,max values:

ν max = max νm , γ max = max γm,max , Amax = max Am,max
m∈{1,...,M} m∈{1,...,M} m∈{1,...,M}

To prove this fact, note that if a link l satisfies Zl (t) ≤ V ν max + γ max , then on the next slot, we
have Zl (t + 1) ≤ V ν max + γ max + MAmax because the queue can increase by at most MAmax on
any slot (see update equation (5.47)). Else, if Zl (t) > V ν max + γ max , then any path that uses this
link incurs a cost larger than V ν max + γ max , and thus would incur a cost larger than Gm (t) for any
session m. Thus, by the routing and flow control algorithm, no session will choose a path that uses
this link on the current slot, and so Zl (t) cannot increase on the next slot.
Using the sample path inequality (2.3) with the deterministic bound on Zl (t) in (5.54), it
follows that over any interval of T slots (for any positive integer T and any initial slot t0 ), the data
injected for use over link l is no more than V ν max + γ max + MAmax beyond the total capacity
offered by the link over that interval:
t0 +T
−1
M t0 +T
−1
1l,m (τ )xm (τ ) ≤ bl (τ ) + V ν max + γ max + MAmax (5.55)
τ =t0 m=1 τ =t0

5.2.2 DELAYED FEEDBACK

We note that it may be difficult to use the exact queue values Zl (t) when solving for the shortest
path, as these values change every slot. Hence, a practical implementation may use out-of-date values
Zl (t − τl,t ) for some time delay τl,t that may depend on l and t. Further, the virtual queue updates
for Zl (t) in (5.47) are most easily done at each link l, in which case, the actual admitted data xm (t)
for that link may not be known until some time delay, arriving as a process xm (t − τl,m.t ). However,
as the virtual queue size cannot change by more than a fixed amount every slot, the queue value
used differs from the ideal queue value by no more than an additive constant that is proportional to
the maximum time delay. In this case, provided that the maximum time delay is bounded, we are
simply using a C-additive approximation and the utility and queue bounds are adjusted accordingly
(see Exercise 4.10 and also Section 6.1.1). A more extensive treatment of delayed feedback for the
case of networks without dynamic arrivals or channels is found in (150), which uses a differential
equation method.

5.2.3 LIMITATIONS OF THIS MODEL

While (5.55) is a very strong deterministic bound that says no link is given more data than it can
handle, it does not directly imply anything about the actual network queues (other than the links
are not overloaded). The (unproven) understanding is that, because the links are not overloaded, the
actual network queues will be stable and all data can arrive to its destination with (hopefully small)
delay.
5.3. MULTI-HOP QUEUEING NETWORKS 109
One might approximate average congestion or delay on a link as a convex function of the time
average flow rate over the link, as in (151)(129)(150).3 However, we emphasize that this is only an
approximation and does not represent the actual network delay, or even a bound on delay. Indeed,
while it is known that average queue congestion and delay is convex if a general stream of traffic
is probabilistically split (152), this is not necessarily true (or relevant) for dynamically controlled
networks, particularly when the control depends on the queue backlogs and delays themselves. Most
problems involving optimization of actual network delay are difficult and unsolved. Such prob-
lems involve not only optimization of rate based utility functions, but engineering of the Lagrange
multipliers (which are related to queue backlogs) associated with those utility functions.
Finally, observe that the update equation for Zl (t) in (5.47) can be interpreted as a queueing
model where all admitted data on slot t is placed immediately on all links l of its path. Similar models
are used in (23)(29)(150)(31). However, this is clearly an approximation because data in an actual
network will traverse its path one link at a time. It is assumed that the actual network stamps all
data with its intended path, so that there is no dynamic re-routing mid-path. Section 5.3 treats an
actual multi-hop queueing network and allows such dynamic routing.

5.3 MULTI-HOP QUEUEING NETWORKS

Here we consider a general multi-hop network, treating the actual queueing rather than using the
flow-based model of the previous section. Suppose the network has N nodes and operates in slotted
time. There are M sessions, and we let A(t) = (A1 (t), . . . , AM (t)) represent the vector of data that
exogenously arrives to the transport layer for each session on slot t (measured either in integer units
of packets or real units of bits).
Each session m ∈ {1, . . . , M} has a particular source node and destination node. Data delivery
takes place by transmissions over possibly multi-hop paths. We assume that a transport layer flow
controller observes Am (t) every slot and decides how much of this data to add to the network layer
at its source node and how much to drop (flow control decisions are made to limit queue buffers and
ensure the network is stable). Let (xm (t))|M m=1 be the collection of flow control decision variables on
slot t. These decisions are made subject to the constraints 0 ≤ xm (t) ≤ Am (t) (see also discussion
after (5.42) on modifications of this constraint).
All data that is intended for destination node c ∈ {1, . . . , N} is called commodity c data,
(c)
regardless of its particular session. For each n ∈ {1, . . . , N} and c ∈ {1, . . . , N}, let Mn denote
the set of all sessions m ∈ {1, . . . , M} that have source node n and commodity c. All data is queued
(c)
according to its commodity, and we define Qn (t) as the amount of commodity c data in node n on
(n)
slot t. We assume that Qn (t) = 0 for all t, as data that reaches its destination is removed from the
network. Let Q(t) denote the matrix of current queue backlogs for all nodes and commodities.

3 Convex constraints can be incorporated using the generalized structure of Section 5.4.
110 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
The queue backlogs change from slot to slot as follows:

N
(c)

N
(c)

n (t + 1) = Qn (t) −
Q(c) μ̃nj (t) + μ̃in (t) +
(c)
xm (t)
j =1 i=1 (c)
m∈Mn

(c)
where μ̃ij (t) denotes the actual amount of commodity c data transmitted from node i to node j
(c)
(i.e., over link (i, j )) on slot t. It is useful to define transmission decision variables μij (t) as the bit
rate offered by link (i, j ) to commodity c data, where this full amount is used if there is that much
commodity c data available at node i, so that:
(c) (c)
μ̃ij (t) ≤ μij (t) ∀i, j, c ∈ {1, . . . , N}, ∀t

For simplicity, we assume that if there is not enough data to send at the offered rate, then null data
is sent, so that:4
⎡ ⎤

N
N
Q(c)
n (t + 1) = max ⎣ Q (c)
n (t) − μ
(c)
(t), 0 ⎦ +
(c)
μin (t) + xm (t) (5.56)
nj
j =1 i=1 m∈Mn
(c)

(c)
This satisfies (5.1) if we relate index k (for Qk (t) in (5.1)) to index (n, c) (for Qn (t) in (5.56)), and
if we define:

N
N
(c) (c)
bn(c) (t)= μnj (t) , an(c) (t)= μin (t) + xm (t)
j =1 i=1 (c)
m∈Mn

5.3.1 TRANSMISSION VARIABLES

Let S(t) represent the topology state of the network on slot t, observed on each slot t as in (22). The
value of S(t) is an abstract and possibly multi-dimensional quantity that describes the current link
conditions between all nodes under the current slot. The collection of all transmission rates that
can be offered over each link (i, j ) of the network is given by a general transmission rate function
b(I (t), S(t)):5
b(I (t), S(t)) = (bij (I (t), S(t)))i,j ∈{1,...,N },i=j
where I (t) is a general network-wide resource allocation decision (such as link scheduling, bandwidth
selection, modulation, etc.) and takes values in some abstract set IS(t) that possibly depends on the
current S(t).
4 All results hold exactly as stated if this null data is not sent, so that “=” in (5.56) is modified to “≤” (22).
5 It is worth noting now that for networks with orthogonal channels, our “max-weight” transmission algorithm (to be defined in
the next subsection) decouples to allow nodes to make transmission decisions based only on those components of the current
topology state S(t) that relate to their own local channels. Of course, for wireless interference networks, all channels are coupled,
although distributed approximations of max-weight transmission exist in this case (see Chapter 6).
5.3. MULTI-HOP QUEUEING NETWORKS 111
Every slot the network controller observes the current S(t) and makes a resource alloca-
(c)
tion decision I (t) ∈ IS(t) . The controller then chooses μij (t) variables subject to the following
constraints:
(c)
μij (t) ≥ 0 ∀i, j, c ∈ {1, . . . , N} (5.57)
(c) (i)
μii (t) = μij (t) = 0 ∀i, j, c ∈ {1, . . . , N} (5.58)

N
(c)
μij ≤ bij (I (t), S(t)) ∀i, j ∈ {1, . . . , N} (5.59)
c=1

Constraints (5.58) are due to the common-sense observation that it makes no sense to transmit
data from a node to itself, or to keep transmitting data that has already arrived to its destination.
One can easily incorporate additional constraints that restrict the set of allowable links that certain
commodities are allowed to use, as in (22).

5.3.2 THE UTILITY OPTIMIZATION PROBLEM

This problem fits our general framework by defining the random event ω(t)= [A(t); S(t)]. The
control action α(t) is defined by:
(c)
α(t)= [I (t); (μij (t))|i,j,c∈{1,...,N } ; (xm (t))|M
m=1 ]

representing the resource allocation, transmission, and flow control decisions.The action space Aω(t)
(c)
is defined by the set of all I (t) ∈ IS(t) , all (μij (t)) that satisfy (5.57)-(5.59), and all (xm (t)) that
satisfy 0 ≤ xm (t) ≤ Am (t) for all m ∈ {1, . . . , M}.
Define x as the time average expectation of the vector x(t). Our objective is to solve the
following problem:
Maximize: φ(x) (5.60)
Subject to: α(t) ∈ Aω(t) ∀t (5.61)
(c)
All queues Qn (t) are mean rate stable (5.62)
M
where φ(x) = m=1 φm (xm ) is a continuous, concave, and entrywise non-decreasing utility func-
tion.

5.3.3 MULTI-HOP NETWORK UTILITY MAXIMIZATION

The rectangle R is defined by all (γ1 , . . . , γM ) vectors such that 0 ≤ γm ≤ γm,max . Define φ opt as
the maximum utility for the problem (5.60)-(5.62) augmented with the additional constraint x ∈ R.
Because we have not specified any additional constraints, there are no Zl (t) queues. However, we
have auxiliary variables γm (t) and virtual queues Gm (t) for m ∈ {1, . . . , M}, with update:
Gm (t + 1) = max[Gm (t) + γm (t) − xm (t), 0] (5.63)
The algorithm of Section 5.0.5 is thus:
112 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
• (Auxiliary Variables) For each slot t, each session m ∈ {1, . . . , M} observes the current virtual
queue Gm (t) and chooses auxiliary variable γm (t) to solve:

Maximize: V φm (γm (t)) − Gm (t)γm (t) (5.64)

Subject to: 0 ≤ γm (t) ≤ γm,max

• (Flow Control) For each slot t, each session m observes Am (t) and the queue values Gm (t),
(c )
Qnmm (t) (where nm denotes the source node of session m, and cm represents its destination).
Note that these queues are all local to the source node of the session, and hence they can be
observed easily. It then chooses xm (t) to solve:
(c )
Maximize: Gm (t)xm (t) − Qnmm (t)xm (t) (5.65)
Subject to: 0 ≤ xm (t) ≤ Am (t)
(c )
This reduces to the “bang-bang” flow control decision of choosing xm (t) = Am (t) if Qnmm (t) ≤
Gm (t), and xm (t) = 0 otherwise.

• (Resource Allocation and Transmission) For each slot t, the network controller observes queue
(c) (c)
backlogs {Qn (t)} and the topology state S(t) and chooses I (t) ∈ IS(t) and {μij (t)} to solve:
(c) (c) N (c)
Maximize: n,c Qn (t)[ N j =1 μnj (t) − i=1 μin (t)] (5.66)
Subject to: I (t) ∈ IS(t) and (5.57)-(5.59)

• (Queue Updates) Update the virtual queues Gm (t) according to (5.63) and the actual queues
(c)
Qn (t) according to (5.56).
The resource allocation and transmission decisions that solve (5.66) are described in Subsection
5.3.4 below. Before covering this, we state the performance of the algorithm under a general C-
additive approximation. Assuming that second moments of arrivals and service variables are finite,
and that ω(t) is i.i.d. over slots, by Theorem 5.1, we have that all virtual and actual queues are mean
rate stable, and:

lim inf φ(x(t)) ≥ φ opt − (D + C)/V (5.67)

t→∞

where D is a constant related to the maximum second moments of arrivals and transmission rates.
(c)
The queues Qn (t) can be shown to be strongly stable with average size O(V ) under an additional
Slater-type condition. If the φm (x) functions are bounded with bounded right derivatives, it can be
shown that the queues Gm (t) are deterministically bounded. A slight modification of the algorithm
that results in a C-additive approximation can deterministically bound all actual queues by a constant
of size O(V ) (38)(42)(153), even without the Slater condition. The theory of Section 4.9 can be
used to show that the same algorithm operates efficiently for non-i.i.d. traffic and channel processes,
including processes that arise from arbitrary node mobility (38).
5.3. MULTI-HOP QUEUEING NETWORKS 113
5.3.4 BACKPRESSURE-BASED ROUTING AND RESOURCE ALLOCATION
By switching the sums in (5.66), it is easy to show that the resource allocation and transmission
maximization reduces to the following generalized “max-weight” and “backpressure” algorithms
(see (7)(22)): Every slot t, choose I (t) ∈ IS(t) to maximize:

N
N
bij (I (t), S(t))Wij (t) (5.68)
i=1 j =1

where Wij (t) are weights defined by:

(c) (c) (c)

Wij (t)= Qi (t) − Qj (t)

The transmission decision variables are then given by:

bij (I (t), S(t)) ∗ (t) and W (t) ≥ 0

if c = cij
(c)
(c)
μij (t) = ij (5.70)
0 otherwise

∗ (t) is defined as the commodity c ∈ {1, . . . , N} that maximizes the differential backlog
where cij
(c)
Wij (t) (breaking ties arbitrarily).
This backpressure approach achieves throughput optimality, but, because it explores all pos-
sible routes, may incur large delay. A useful C-additive approximation that experimentally improves
delay is to combine the queue differential with a shortest path estimate for each link. This is pro-
posed in (15) as an enhancement to backpressure routing, and it is shown to perform quite well in
simulations given in (154)(22) ((154) extends to networks with unreliable channels). Related work
that combines shortest paths and backpressure using the drift-plus-penalty method is developed in
(155) to treat maximum hop count constraints. A theory of more aggressive place-holder packets
for delay improvement in backpressure is developed in (37), although the algorithm ideally requires
knowledge of Lagrange multiplier information in advance. A related and very simple Last-In-First-
Out (LIFO) implementation of backpressure that does not need Lagrange multiplier information
is developed in (54), where experiments on wireless sensor networks show delay improvements by
more than an order of magnitude over FIFO implementations (for all but 2% of the packets) while
preserving efficient throughput (note that LIFO does not change the dynamics of (5.1) or (5.56)).
Analysis of the LIFO rule and its connection to place-holders and Lagrange multipliers is in (55).
114 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES

5.4 GENERAL OPTIMIZATION OF CONVEX FUNCTIONS

OF TIME AVERAGES
Here we provide a recipe for the following more general problem of optimizing convex functions of
time averages:

Minimize: y 0 + f (x ) (5.71)
Subject to: 1) y l + gl (x) ≤ 0 ∀l ∈ {1, . . . , L} (5.72)
2) x∈X ∩R (5.73)
3) All queues Qk (t) are mean rate stable (5.74)
4) α(t) ∈ Aω(t) ∀t (5.75)

where f (x) and gl (x) are continuous and convex functions of x ∈ RM , X is a closed and convex
subset of RM , and R is an M-dimensional hyper-rectangle defined as:

R = {(x1 , . . . , xM ) ∈ RM |γm,min ≤ xm ≤ γm,max ∀m ∈ {1, . . . , M}}

where γm,min and γm,max are finite constants (this rectangle set R is only added to bound the auxiliary
variables that we use, as in the previous sections).
Let γ (t) = (γ1 (t), . . . , γM (t)) be a vector of auxiliary variables that can be chosen within the
set X ∩ R every slot t. We transform the problem (5.71)-(5.75) to:

Minimize: y 0 + f (γ ) (5.76)
Subject to: 1) y l + gl (γ ) ≤ 0 ∀l ∈ {1, . . . , L} (5.77)
2) γ m = x m ∀m ∈ {1, . . . , M} (5.78)
3) All queues Qk (t) are mean rate stable (5.79)
4) γ (t) ∈ X ∩ R ∀t (5.80)
5) α(t) ∈ Aω(t) ∀t (5.81)

where we define:

1 1
t−1 t−1

f (γ )= lim E {f (γ (τ ))} , gl (γ )= lim E {gl (γ (τ ))}
t→∞ t t→∞ t
τ =0 τ =0

It is not difficult to show that this transformed problem is equivalent to the problem (5.71)-(5.75),
in that the maximum utility values are the same, and any solution to one can be used to construct a
solution to the other (see Exercise 5.9).
We solve the transformed problem (5.76)-(5.81) simply by re-stating the drift-plus-penalty
algorithm for this context. While a variable-V implementation can be developed, we focus here on
the fixed V algorithm as specified in (4.48)-(4.49). For each inequality constraint (5.77), define a
virtual queue Zl (t) with update equation:

Zl (t + 1) = max[Zl (t) + ŷl (α(t), ω(t)) + gl (γ (t)), 0] ∀l ∈ {1, . . . , L} (5.82)

5.4. GENERAL OPTIMIZATION OF CONVEX FUNCTIONS OF TIME AVERAGES 115
For each equality constraint (5.78), define a virtual queue Hm (t) with update equation:

Hm (t + 1) = Hm (t) + γm (t) − x̂m (α(t), ω(t)) ∀m ∈ {1, . . . , M} (5.83)

Define (t) = [Q(t), Z (t), H (t)]. Assume the boundedness assumptions (4.25)-(4.30) hold, and
that ω(t) is i.i.d. over slots. For the Lyapunov function (4.43), we have the following drift bound:

((t)) + V E {y0 (t) + f (γ (t))|(t)} ≤ D + V E {y0 (t) + f (γ (t))|(t)}

L
+ Zl (t)E {yl (t) + gl (γ (t))|(t)}
l=1

K
+ Qk (t)E {ak (t) − bk (t)|(t)}
k=1
M
+ Hm (t)E {γm (t) − xm (t)|(t)} (5.84)
m=1

where D is a finite constant related to the worst case second moments of the arrival, service, and
attribute vectors. Now define a C-additive approximation as any algorithm for choosing γ (t) ∈
X ∩ R and α(t) ∈ Aω(t) every slot t that, subject to a given (t), yields a right-hand-side in (5.84)
that is within a distance C from its infimum value.

Theorem 5.3 (Algorithm Performance) Suppose the boundedness assumptions (4.25)-(4.30) hold, the
problem (5.71)-(5.75) is feasible, and E {L((0))} < ∞. Suppose the functions f (γ ) and gl (γ ) are upper
and lower bounded by finite constants over γ ∈ X ∩ R. If ω(t) is i.i.d. over slots and any C-additive
approximation is used every slot, then:
opt D+C
lim sup y 0 (t) + f (x(t)) ≤ y0 + f opt + (5.85)
t→∞ V
opt
where y0 + f opt represents the infimum cost metric of the problem (5.71)-(5.75) over all feasible policies.
Further, all actual and virtual queues are mean rate stable, and:

lim sup y l (t) + gl (x(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.86)
t→∞
lim dist (x(t), X ∩ R) = 0 (5.87)
t→∞

where dist(x(t), X ∩ R) represents the distance between the vector x(t) and the set X ∩ R, being zero
if and only if x(t) is in the (closed) set X ∩ R.

Proof. See Exercise 5.10. 2

As before, an O(V ) backlog bound can also be derived under a Slater assumption.
116 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES

5.5 NON-CONVEX STOCHASTIC OPTIMIZATION

Consider now the problem:
Minimize: f (x) (5.88)
Subject to: y l ≤ 0 ∀l ∈ {1, . . . , L} (5.89)
α(t) ∈ Aω(t) (5.90)
All queues Qk (t) are mean rate stable (5.91)
where f (x) is a possibly non-convex function that is assumed to be continuously differentiable with
upper and lower bounds fmin and fmax , and with partial derivatives ∂f (x)/∂xm having bounded
magnitudes νm ≥ 0. Applications of such problems include throughput-utility maximization with
f (x) given by −1 times a sum of non-concave “sigmoidal” functions that give low utility until
throughput exceeds a certain threshold (see Fig. 5.1). Such problems are treated in a non-stochastic
(static) network optimization setting in (156)(157). A related utility-proportional fairness objective
is studied for static networks in (158), which treats a convex optimization problem that has a fairness
interpretation with respect to a non-concave utility function.The stochastic problem we present here
is developed in (43). An application to risk management in network economics is given in Exercise
5.11.

Utility(x)

Attribute x (such as throughput)

Figure 5.1: An example non-concave utility function of a time average attribute.

Performing such a general non-convex optimization is, in some cases, as hard as combinatorial
bin-packing, and so we do not expect to find a global optimum. Rather, we seek an algorithm that
satisfies the constraints (5.89)-(5.91) and that yields a local optimum of f (x).
We use the drift-plus-penalty framework with the same virtual queues as before:
Zl (t + 1) = max[Zl (t) + ŷl (α(t), ω(t)), 0] (5.92)

The actual queues Qk (t) are assumed to satisfy (5.1). Define (t)= [Q(t), Z (t), xav (t)], where
xav (t) is defined as an empirical running time average of the attribute vector:
1 t−1
x (τ ) if t > 0
xav (t)= t τ =0 m
x̂m (α(−1), ω(−1)) if t = 0
5.5. NON-CONVEX STOCHASTIC OPTIMIZATION 117
where x̂m (α(−1), ω(−1)) can be viewed as an initial sample taken at time “t = −1” before the
1 K L
2 [ k=1 Qk (t) + l=1 Zl (t) ]. Assume ω(t)
network implementation begins. Define L((t))= 2 2

is i.i.d. over slots. We thus have:

((t)) + V E {P enalty(t)|(t)} ≤ D + V E {P enalty(t)|(t)}
K
+ Qk (t)E âk (α(t), ω(t)) − b̂k (α(t), ω(t))|(t)
k=1

L

+ Zl (t)E ŷl (α(t), ω(t))|(t) (5.93)
l=1

The penalty we use is:

M
∂f (xav (t))

P enalty(t)= x̂m (α(t), ω(t))
∂xm
m=1

Below we state the performance of the algorithm that observes queue backlogs every slot t and
takes an action α(t) ∈ Aω(t) that comes within C of minimizing the right-hand-side of the drift
expression (5.93).

Theorem 5.4 (Non-Convex Stochastic Network Optimization (43)) Suppose ω(t) is i.i.d. over slots, the
boundedness assumptions (4.25)-(4.28) hold, the function f (x) is bounded and continuously differentiable
with partial derivatives bounded in magnitude by finite constants νm ≥ 0, and the problem (5.88)-(5.91)
is feasible. For simplicity, assume that (0) = 0. For any V ≥ 0, and for any C-additive approximation
of the above algorithm that is implemented every slot, we have:
(a) All queues Qk (t) and Zl (t) are mean rate stable and:
lim sup y l (t) ≤ 0 ∀l ∈ {1, . . . , L}
t→∞

(b) For all t > 0 and for any alternative vector x∗ that can be achieved as the time average of a
policy that makes all queues mean rate stable and satisfies all required constraints, we have:

1 1 ∗
t−1 M t−1 M
xm (τ )∂f (xav (τ )) ∂f (xav (τ )) D+C
E ≤ xm E +
t ∂xm t ∂xm V
τ =0 m=1 τ =0 m=1

where D is a finite constant related to second moments of the ak (t), bk (t), yl (t) processes.
c) If all time averages converge, so that there is a constant vector x such that xav (t) → x with
probability 1 and x(t) → x, then the achieved limit is a near local optimum/critical point, in the sense
that for any alternative vector x∗ that can be achieved as the time average of a policy that makes all queues
mean rate stable and satisfies all required constraints, we have:

M
∂f (x) D+C
∗
(xm − xm) ≥−
∂xm V
m=1
118 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
d) Suppose there is an > 0 and an ω-only policy α ∗ (t) such that:

E ŷl (α ∗ (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.94)
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ − ∀k ∈ {1, . . . , K} (5.95)
Then all queues Qk (t) are strongly stable with average size O(V ).

e) Suppose we use a variable V (t) algorithm with V (t)= V0 · (1 + t)d for V0 > 0 and 0 < d < 1,
and use any C-additive approximation (where C is constant for all t). Then all virtual and actual queues
are mean rate stable (and so all constraints y l ≤ 0 are satisfied), and under the convergence assumptions of
part (c), the limiting x is a local optimum/critical point, in that:

M
∂f (x)
∗
(xm − xm) ≥0
∂xm
m=1
where x∗ is any alternative vector as specified in part (c).

The inequality guarantee in part (e) can be understood as follows: Suppose we start at our
achieved time average attribute vector x, and we want to shift this in any feasible direction by moving
towards another feasible vector x∗ by an amount (for some > 0). Then:

M
∂f (x)
f x + (x∗ − x) ≈ f (x) + ∗
(xm − xm) ≥ f (x)
∂xm
m=1
Hence, the new cost achieved by taking a small step in any feasible direction is no less than the cost
f (x) that we are already achieving. More precisely, the change in cost cost () satisfies:
cost ()
lim ≥0

→0
Proof. (Theorem 5.4) Our proof uses the same drift-plus-penalty technique as described in previous
sections. Analogous to Theorem 4.5, it can be shown that for any x∗ = (x1∗ , . . . , xM
∗ ) that is a limit

point of x(t) under any policy that makes all queues mean rate stable and satisfies all constraints,
and for any δ > 0, there exists an ω-only policy α ∗ (t) such that (43):

E ŷl (α ∗ (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L}
E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ δ ∀k ∈ {1, . . . , K}

dist(E x̂(α ∗ (t), ω(t)) , x∗ ) ≤ δ
For simplicity of the proof, assume the above holds with δ = 0. Plugging the above into the right-
hand-side of (5.93) with δ = 0 yields:6

M
∂f (xav (t))
M
∗ ∂f (xav (t))
((t)) + V E x̂m (α(t), ω(t)) |(t) ≤ D + C + V xm
∂xm ∂xm
m=1 m=1

6The same result can be derived by plugging in with δ > 0 and then taking a limit as δ → 0.
5.5. NON-CONVEX STOCHASTIC OPTIMIZATION 119
Taking expectations of the above drift bound (using the law of iterated expectations), summing the
telescoping series over τ ∈ {0, 1, . . . , t − 1}, and dividing by V t immediately yields the result of
part (b).
On the other hand, this drift expression can also be rearranged as:

M
∗
((t)) ≤ D + C + V νm (xm − xm,min )
m=1

where xm,min is a bound on the expectation of xm (t) under any policy, known to exist by the
boundedness assumptions. Hence, the drift is less than or equal to a finite constant, and so by
Theorem 4.2, we know all queues are mean rate stable, proving part (a). The proof of part (d) follows
similarly by plugging in the policy α ∗ (t) of (5.94)-(5.95).
The proof of part (c) follows by taking a limit of the result in part (b), where the limits can be
pushed through by the boundedness assumptions and the continuity assumption on the derivatives
of f (x). The proof of part (e) is similar to that of Theorem 4.9 and is omitted for brevity. 2
Using a penalty given by partial derivatives of the function evaluated at the empirical average
attribute vector can be viewed as a “primal-dual” operation that differs from our “pure-dual” approach
for convex problems. Such a primal-dual approach was first used in context of convex network utility
maximization problems in (32)(33)(34). Specifically, the work (32)(33) used a partial derivative
evaluated at the time average xav (t) to maximize a concave function of throughput in a multi-user
wireless downlink with time varying channels. However, the system in (32)(33) assumed infinite
backlog in all queues (similar to Exercise 5.6), so that there were no queue stability constraints.
This was extended in (34) to consider the primal-dual technique for joint stability and performance
optimization, again for convex problems, but using an exponential weighted average, rather than a
running time average xav (t). There, it was shown that a related “fluid limit” of the system has an
optimal utility, and that this limit is “weakly” approached under appropriately scaled systems. It was
also conjectured in (34) that the actual network will have utility that is close to this fluid limit as a
parameter β related to the exponential weighting is scaled (see Section 4.9 in (34)). However, the
analysis does not specify the size of β needed to achieve a near-optimal utility. Recent work in (36)
considers related primal-dual updates for convex problems, and it shows the long term utility of the
actual network is close to optimal as a parameter is scaled.
For the special case of convex problems, Theorem 5.4 above shows that, if the algorithm is
assumed to converge to well defined time averages, and if we use a running time average xav (t)
rather than an exponential average, the primal-dual algorithm achieves a similar [O(1/V ), O(V )]
performance-congestion tradeoff as the dual algorithm. Unfortunately, it is not clear how long the
system must run to approach convergence. The pure dual algorithm seems to provide stronger
analytical guarantees for convex problems because: (i) It does not need a running time average
xav (t) and hence can be shown to be robust to changes in system parameters (as in Section 4.9
and (42)(38)(17)), (ii) It does not require additional assumptions about convergence, (iii) It provides
results for all t > 0 that show how long we must run the system to be close to the infinite horizon
120 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
limit guarantees. However, if one applies the pure dual technique with a non-convex cost function
f (x), one would get a global optimum of the time average f (x), which may not even be a local
optimum of f (x). This is where the primal-dual technique shows its real potential, as it can achieve
a local optimum for non-convex problems.

5.6 WORST CASE DELAY

Here we extend the utility optimization framework to enable O(V ) tradeoffs in worst case
delay. Related problems are treated in (76)(159). Consider a 1-hop network with K queues
Q(t) = (Q1 (t), . . . , QK (t)). In addition to these queues, we keep transport layer queues L(t) =
(L1 (t), . . . , LK (t)), where Lk (t) stores incoming data before it is admitted to the network layer
queue Qk (t) (as in (17)). Let ω(t) = [A(t), S (t)], where A(t) = (A1 (t), . . . , AK (t)) is a vector of
new arrivals to the transport layer, and S (t) = (S1 (t), . . . , SK (t)) is a vector of channel conditions
that affect transmission. Assume that ω(t) is i.i.d. over slots.
Every slot t, choose admission variables a(t) = (a1 (t), . . . , aK (t)) subject to the constraints:

0 ≤ ak (t) ≤ min[Lk (t) + Ak (t), Amax ] (5.96)

where Amax is a finite constant. This means that ak (t) is chosen from the Lk (t) + Ak (t) amount of
data available on slot t, and is no more than Amax per slot (which limits the amount we can send
into the network layer). It is assumed that Ak (t) ≤ Amax for all k and all t. Newly arriving data
Ak (t) that is not immediately admitted into the network layer is stored in the transport layer queue
Lk (t). The controller also chooses a channel-aware transmission decision I (t) ∈ IS (t) , where IS (t) is
an abstract set that defines transmission options under channel state S (t). The transmission rates
are given by deterministic functions of I (t) and S (t):

bk (t) = b̂k (I (t), S (t))

Second moments of bk (t) are assumed to be uniformly bounded.

In addition, define packet drop decisions d(t) = (d1 (t), . . . , dK (t)).These allow packets already
admitted to the network layer queues Qk (t) to be dropped if their delay is too large. Drop decisions
dk (t) are chosen subject to the constraints:

0 ≤ dk (t) ≤ Amax

The resulting queue update equation is thus:

Qk (t + 1) = max[Qk (t) − bk (t) − dk (t), 0] + ak (t) ∀k ∈ {1, . . . , K} (5.97)

For each k ∈ {1, . . . , K}, let φk (a) be a continuous, concave, and non-decreasing utility func-
tion defined over the interval 0 ≤ a ≤ Amax . Let νk be the maximum right-derivative of φk (a)
(which occurs at a = 0), and assume νk < ∞. Example utility functions that have this form are:

φk (a) = log(1 + νk a)
5.6. WORST CASE DELAY 121
where log(·) denotes the natural logarithm. We desire a solution to the following problem, defined
in terms of a parameter > 0:

K
K
Maximize: φk (a k ) − βνk d k (5.98)
k=1 k=1
Subject to: All queues Qk (t) are mean rate stable (5.99)
bk ≥ ∀k ∈ {1, . . . , K} (5.100)
0 ≤ ak (t) ≤ Ak (t) ∀k ∈ {1, . . . , K}, ∀t (5.101)
I (t) ∈ IS (t) ∀k ∈ {1, . . . , K}, ∀t (5.102)

where β is a constant that satisfies 1 ≤ β < ∞. This problem does not specify anything about
worst-case delay, but we soon develop an algorithm with worst case delay of O(V ) that comes
within O(1/V ) of optimizing the utility associated with the above problem (5.98)-(5.102). Note
the following:

• The constraint (5.101) is different from the constraint (5.96).Thus, the less stringent constraint
(5.96) is used for the actual algorithm, but performance is measured with respect to the
optimum utility achievable in the problem (5.98)-(5.102). It turns out that optimal utility is
the same with either constraint (5.101) or (5.96), and in particular, it is the same if there are
no transport layer queues, so that Lk (t) = 0 for all t and all data is either admitted or dropped
upon arrival. We include the Lk (t) queues as they are useful in situations where it is preferable
to store data for later transmission than to drop it.

• An optimal solution to (5.98)-(5.102) has d k = 0 for all k. That is, the objective (5.98) can

equivalently be replaced by the objective of maximizing K k=1 φk (a k ) and by adding the
constraint d k = 0 for all k. This is because the penalty for dropping is βνk , which is greater
than or equal to the largest derivative of the utility function φk (a). Thus, it can be shown
that it is always better to restrict data at the transport layer rather than admitting it and later
dropping it. We recommend choosing β such that 1 ≤ β ≤ 2. A larger value of β will trade
packet drops at the network layer for packet non-admissions at the flow controller.

• The constraint (5.100) requires each queue to transmit with a time-average rate of at least .
This constraint ensures all queues are getting at least a minimum rate of service. If the input
rate E {Ak (t)} is less than , then this constraint is wasteful. However, we shall not enforce
this constraint. Rather, we simply measure utility of our system with respect to the optimal
utility of the problem (5.98)-(5.102), which includes this constraint. It is assumed throughout
that this constraint is feasible, and so the problem (5.98)-(5.102) is feasible. If one prefers to
enforce constraint (5.100), this is easily done with an appropriate virtual queue.
122 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
5.6.1 THE -PERSISTENT SERVICE QUEUE
To ensure worst-case delay is bounded, we define an -persistent service queue, being a virtual queue
Zk (t) for each k ∈ {1, . . . , K} with Zk (0) = 0 and with dynamics:

max[Zk (t) − bk (t) − dk (t) + , 0] if Qk (t) > bk (t) + dk (t)
Zk (t + 1) = (5.103)
0 if Qk (t) ≤ bk (t) + dk (t)

where > 0. We assume throughout that ≤ Amax . The condition Qk (t) ≤ bk (t) + dk (t) is satis-
fied whenever the backlog Qk (t) is cleared (by service and/or drops) on slot t. If this constraint is
not active, then Zk (t) has a departure process that is the same as Qk (t), but it has an arrival of size
every slot. The size of the queue Zk (t) can provide a bound on the delay of the head-of-line data in
queue Qk (t) in a first-in-first-out (FIFO) system. This is similar to (76) (where explicit delays are
kept for each packet) and (159) (which uses a slightly different update). If a scheduling algorithm
is used that ensures Zk (t) ≤ Zk,max and Qk (t) ≤ Qk,max for all t, for some finite constants Zk,max
and Qk,max , then worst-case delay is also bounded, as shown in the following lemma:

Lemma 5.5 Suppose Qk (t) and Zk (t) evolve according to (5.97) and (5.103), and that an algorithm
is used that ensures Qk (t) ≤ Qk,max and Zk (t) ≤ Zk,max for all slots t ∈ {0, 1, 2, . . .}. Assume service
and drops are done in FIFO order. Then the worst-case delay of all non-dropped data in queue k is Wk,max ,
defined:

Wk,max = (Qk,max + Zk,max )/ (5.104)

Proof. Fix a slot t. We show that all arrivals a(t) are either served or dropped on or before slot
t + Wk,max . Suppose this is not true. We reach a contradiction. Note by (5.97) that arrivals a(t) are
added to the queue backlog Qk (t + 1) and are first available for service on slot t + 1. It must be
that Qk (τ ) > bk (τ ) + dk (τ ) for all τ ∈ {t + 1, . . . , t + Wk,max } (else, the backlog on slot τ would
be cleared). Therefore, by (5.103), we have for all slots τ ∈ {t + 1, . . . , t + Wk,max }:

Zk (τ + 1) = max[Zk (τ ) − bk (τ ) − dk (τ ) + , 0]

In particular, for all slots τ ∈ {t + 1, . . . , t + Wk,max }:

Zk (τ + 1) ≥ Zk (τ ) − bk (τ ) − dk (τ ) +

Summing the above over τ ∈ {t + 1, . . . , t + Wk,max } yields:

t+Wk,max
Zk (t + Wk,max + 1) − Zk (t + 1) ≥ − [bk (τ ) + dk (τ )] + Wk,max
τ =t+1
5.6. WORST CASE DELAY 123
Rearranging terms in the above inequality and using the fact that Zk (t + 1) ≥ 0 and Zk (t +
Wk,max + 1) ≤ Zk,max yields:

t+Wk,max
Wk,max ≤ [bk (τ ) + dk (τ )] + Zk,max (5.105)
τ =t+1

On the other hand, the sum of bk (τ ) + dk (τ ) over the interval τ ∈ {t + 1, . . . , t + Wk,max } must
be strictly less than Qk (t + 1) (else, by the FIFO service, all data a(t), which is included at the end
of the backlog Qk (t + 1), would have been cleared during this interval). Thus:

t+Wk,max
[bk (τ ) + dk (τ )] < Qk (t + 1) ≤ Qk,max (5.106)
τ =t+1

Combining (5.106) and (5.105) yields:

Wk,max < Qk,max + Zk,max
which implies:
Wk,max < (Qk,max + Zk,max )/
This contradicts (5.104), proving the result. 2

5.6.2 THE DRIFT-PLUS-PENALTY FOR WORST-CASE DELAY

As usual, we transform the problem (5.98)-(5.102) using auxiliary variables γ (t) =
(γ1 (t), . . . , γK (t)) by:

K
K
Maximize: φk (γk ) − βνk d k (5.107)
k=1 k=1
Subject to: a k ≥ γ k ∀k ∈ {1, . . . , K} (5.108)
All queues Qk (t) are mean rate stable (5.109)
bk ≥ ∀k ∈ {1, . . . , K} (5.110)
0 ≤ γk (t) ≤ Amax ∀k ∈ {1, . . . , K} (5.111)
0 ≤ ak (t) ≤ Ak (t) ∀k ∈ {1, . . . , K} (5.112)
I (t) ∈ IS (t) ∀k ∈ {1, . . . , K} (5.113)
To enforce the constraints (5.108), define virtual queues Gk (t) by:
Gk (t + 1) = max[Gk (t) − ak (t) + γk (t), 0] (5.114)

Now define (t)= [Q(t), Z (t), G(t)] as the combined queue vector, and define the Lyapunov
function L((t)) by:
K
1
L((t))= [Qk (t)2 + Zk (t)2 + Gk (t)2 ]
2
k=1
124 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
Using the fact that Zk (t + 1) ≤ max[Zk (t) − bk (t) − dk (t) + , 0], it can be shown (as usual) that
the Lyapunov drift satisfies:

K
((t)) − V E [φk (γk (t)) − βνk dk (t)]|(t) ≤ B
k=1

K
−V E [φk (γk (t)) − βνk dk (t)]|(t)
k=1

K
+ Zk (t)E − b̂k (I (t), S (t)) − dk (t)|(t)
k=1

K
+ Qk (t)E ak (t) − b̂k (I (t), S (t)) − dk (t)|(t)
k=1

K
+ Gk (t)E {γk (t) − ak (t)|(t)} (5.115)
k=1

where B is a constant that satisfies:

1
K
B ≥ [E ( − bk (t) − dk (t))2 |(t)
2
k=1
1
K
+ E ak (t)2 + (bk (t) − dk (t))2 + (γk (t) − ak (t))2 |(t) (5.116)
2
k=1

Such a constant B exists by the boundedness assumptions on the processes.

The algorithm that minimizes the right-hand-side of (5.115) thus observes Z (t), Q(t), G(t),
S (t) every slot t, and does the following:
• (Auxiliary Variables) For each k ∈ {1, . . . , K}, choose γk (t) to solve:

Maximize: V φk (γk (t)) − Gk (t)γk (t) (5.117)

Subject to: 0 ≤ γk (t) ≤ Amax (5.118)

• (Flow Control) For each k ∈ {1, . . . , K}, choose ak (t) by:

min[Lk (t) + Ak (t), Amax ] if Qk (t) ≤ Gk (t)
ak (t) = (5.119)
0 if Qk (t) > Gk (t)

• (Transmission) Choose I (t) ∈ IS (t) to maximize:

K
[Qk (t) + Zk (t)]b̂k (I (t), S (t)) (5.120)
k=1
5.6. WORST CASE DELAY 125
• (Packet Drops) For each k ∈ {1, . . . , K}, choose dk (t) by:

Amax if Qk (t) + Zk (t) > βV νk
dk (t) = (5.121)
0 if Qk (t) + Zk (t) ≤ βV νk

• (Queue Update) Update Qk (t), Zk (t), Gk (t) by (5.97), (5.103), (5.114).

In some cases, the above algorithm may choose a drop variable dk (t) such that Qk (t) <
bk (t) + dk (t). In this case, all queue updates are kept the same (so the algorithm is unchanged), but
it is useful to first transmit data with offered rate bk (t) on slot t, and then drop only what remains.

5.6.3 ALGORITHM PERFORMANCE

Define Zk,max and Qk,max as follows:

Zk,max = βV νk + (5.122)

Qk,max = min[βV νk + Amax , V νk + 2Amax ] (5.123)

Gk,max = V νk + Amax (5.124)

Theorem 5.6 If ≤ Amax , then for arbitrary sample paths the above algorithm ensures:

Zk (t) ≤ Zk,max , Qk (t) ≤ Qk,max , Gk (t) ≤ Gk,max ∀t

where Zk,max , Qk,max , Gk,max are defined in (5.122)-(5.124), provided that these inequalities hold for
t = 0. Thus, worst-case delay Wk,max is given by:

Wk,max = (Zk,max + Qk,max )/ = O(V )

Proof. That Gk (t) ≤ Gk,max for all t follows by an argument similar to that given in Section 5.2.1,
showing that the auxiliary variable update (5.117)-(5.118) chooses γk (t) = 0 whenever Gk (t) >
V νk .
To show the Qk,max bound, it is clear that the packet drop decision (5.121) yields dk (t) = Amax
whenever Qk (t) > βV νk . Because ak (t) ≤ Amax , the arrivals are less than or equal to the offered
drops whenever Qk (t) > βV νk , and so Qk (t) ≤ βV νk + Amax for all t. However, we also see that
if Qk (t) > Gk,max , then the flow control decision will choose ak (t) = 0, and so Qk (t) also cannot
increase. It follows that Qk (t) ≤ Gk,max + Amax for all t.This proves the Qk,max bound.The Zk,max
bound is proven similarly. The worst-case-delay result then follows immediately from Lemma 5.5.
2
126 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
The above theorem only uses the fact that packet drops dk (t) take place according to the
rule (5.121), flow control decisions ak (t) take place according to the rule (5.119), and auxiliary
variable decisions satisfy γk (t) = 0 whenever Gk (t) > V νk (a property of the solution to (5.117)-
(5.118)).The fact that γk (t) = 0 whenever Gk (t) > V νk can be hard-wired into the auxiliary variable
decisions, even when they are chosen to approximately solve (5.117)-(5.118) otherwise. Further, the
I (t) decisions can be arbitrary and are not necessarily those that maximize (5.120).The next theorem
holds for any C-additive approximation for minimizing the right-hand-side of (5.115) that preserves
the above basic properties. A 0-additive approximation performs the exact algorithm given above.

Theorem 5.7 Suppose ω(t) is i.i.d. over slots and any C-additive approximation for minimizing
the right-hand-side of (5.115) is used such that (5.121), (5.119) hold exactly, and γk (t) = 0 whenever
Gk (t) > V νk . Suppose Qk (0) ≤ Qk,max , Zk (0) ≤ Zk,max , Gk (0) ≤ Gk,max for all k, and ≤ Amax .
Then the worst-case queue backlog and delay bounds given in Theorem 5.6 hold, and achieved utility
satisfies:

K K
lim inf t→∞ k=1 φk (a k (t)) − k=1 βd k (t) ≥ φ ∗ − B/V

where B is defined in (5.116), a k (t) and d k (t) are defined:

1 t−1 1 t−1
a k (t)= t τ =0 E {ak (τ )} , d k (t)= t τ =0 E {dk (τ )}

and where φ ∗ is the optimal utility associated with the problem (5.98)-(5.102).

The theorem relies on the following fact, which can be proven using Theorem 4.5: For all δ > 0,
there exists a vector γ ∗ = (γ1∗ , . . . , γK∗ ) and an ω-only policy [a∗ (t), I ∗ (t), d∗ (t)] that chooses a∗ (t)
as a random function of A(t), I ∗ (t) as a random function of S (t), and d∗ (t) = 0 (so that it does
not drop any data) such that:

K
φk (γk∗ ) = φ ∗ (5.125)
k=1
E ak∗ (t) = γk∗ ∀k ∈ {1, . . . , K} (5.126)
∗
E b̂k (I (t), S (t)) ≥ − δ ∀k ∈ {1, . . . , K} (5.127)

E b̂k (I ∗ (t), S (t)) ≥ E ak∗ (t) − δ ∀k ∈ {1, . . . , K} (5.128)
∗
I (t) ∈ IS (t) , 0 ≤ γk∗ ≤ Amax , 0 ≤ ak∗ (t) ≤ Ak (t) ∀k ∈ {1, . . . , K}, ∀t (5.129)

where φ ∗ is the optimal utility associated with the problem (5.98)-(5.102).

5.6. WORST CASE DELAY 127
Proof. (Theorem 5.7) The C-additive approximation ensures by (5.115):

K
((t)) − V E [φk (γk (t)) − βνk dk (t)]|(t) ≤ B + C
k=1

K
−V E [φk (γk∗ ) − βνk dk∗ (t)]|(t)
k=1

K
+ Zk (t)E − b̂k (I ∗ (t), S (t)) − dk∗ (t)|(t)
k=1

K
+ Qk (t)E ak∗ (t) − b̂k (I ∗ (t), S (t)) − dk∗ (t)|(t)
k=1

K

+ Gk (t)E γk∗ (t) − ak∗ (t)|(t)
k=1

where d∗ (t), I ∗ (t), a∗ (t) are any alternative decisions that satisfy I ∗ (t) ∈ IS (t) , 0 ≤ dk∗ (t) ≤ Amax ,
and 0 ≤ ak∗ (t) ≤ min[Lk (t) + Ak (t), Amax ] for all k ∈ {1, . . . , K} and all t. Substituting the ω-only
policy from (5.125)-(5.129) in the right-hand-side of the above inequality and taking δ → 0 yields:

K
((t)) − V E [φk (γk (t)) − βνk dk (t)]|(t) ≤ B + C − V φ ∗
k=1

Using iterated expectations and telescoping sums as usual yields for all t > 0:

1
t−1 K
E [φk (γk (τ )) − βνk dk (τ )] ≥ φ ∗ − (B + C)/V − E {L((0))} /(V t)
t
τ =0 k=1

Using Jensen’s inequality for the concave functions φk (γ ) yields for all t > 0:

K
[φk (γ k (t)) − d k (t)] ≥ φ ∗ − (B + C)/V − E {L((0))} /(V t) (5.130)
k=1

However, because Gk (t) ≤ Gk,max for all t, it is easy to show (via (5.114) and (2.5)) that for all k
and all slots t > 0:
a k (t) ≥ max[γ k (t) − Gk,max /t, 0]
Therefore, since φk (γ ) is continuous and non-decreasing, it can be shown:

K
K
lim inf [φk (a k (t)) − d k (t)] ≥ lim inf [φk (γ k (t)) − d k (t)]
t→∞ t→∞
k=1 k=1

Using this in (5.130) proves the result. 2

128 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
Because the network layer packet drops dk (t) are inefficient, it can be shown that:

K
B +C (φ ∗ − φ ∗ )
lim sup νk d k (t) ≤ + =0
t→∞ V (β − 1) β −1
k=1

where φ ∗ is the optimal solution to (5.98)-(5.102) for the given > 0, and φ=0 ∗ is the solution to
∗ ∗
(5.98)-(5.102) with = 0 (which removes constraint (5.100)). Thus, if φ = φ=0 , network layer
drops can be made arbitrarily small by either increasing β or V .7
The above analysis allows for an arbitrary operation of the transport layer queues Lk (t).
Indeed, the above theorems only assume that Lk (t) ≥ 0 for all t. Thus, as in (17), these can have
either infinite buffer space, finite buffer space, or 0 buffer space. With 0 buffer space, all data that is
not immediately admitted to the network layer is dropped.

5.7 ALTERNATIVE FAIRNESS METRICS

One type of fairness used in the literature is the so-called max-min fairness (see, for example,
(129)(3)(5)(6)). Let (x 1 , . . . , x M ) represent average throughputs achieved by users {1, . . . , M} un-
der some stabilizing control algorithm, and let denote the set of all possible (x 1 , . . . , x M ) vectors.
A vector (x1 , . . . , xM ) ∈ is max-min fair if:

• It maximizes the lowest entry of (x1 , . . . , xM ) over all possible vectors in .

• It maximizes the second lowest entry over all vectors in that satisfy the above condition.

• It maximizes the third lowest entry over all vectors in that satisfy the above two conditions,
and so on.

This can be viewed as a sequence of nested optimizations, much different from the utility opti-
mization framework treated in this chapter. For flow-based networks with capacitated links, one can
reach a max-min fair allocation by starting from 0 and gradually increasing all flows equally until a
bottleneck link is found, then increasing all non-bottlenecked flows equally, and so on (see Chapter
6.5.2 in (129)). A token-based scheduling scheme is developed in (160) for achieving max-min
fairness in one-hop wireless networks on graphs with link selections defined by matchings.
One can approximate max-min fairness using a concave utility function in a network with
capacitated links. Indeed, it is shown in (3) that optimizing a sum of concave functions of the form
gα (x) = −1x α approaches a max-min fair point as α → ∞. It is likely that such an approach also holds
for more general wireless networks with transmission rate allocation and scheduling. However, such
functions are non-singular at x = 0 (preventing worst-case backlog bounds as in Exercises 5.6-5.7),
7 If b ≥ for all k then the final term (φ ∗ ∗
k =0 − φ )/(β − 1) can be removed. Alternatively, if virtual queues Hk (t +
1) = max[Hk (t) − μk (t) + , 0] are added to enforce these constraints, then lim supt→∞ [ν1 d 1 (t) + . . . + νK d K (t)] ≤ (B̃ +
C)/(V (β − 1)), where B̃ adds second moment terms (μk (t) − )2 to (5.116).
5.8. EXERCISES 129
and for large α they have very large values of |gα (x)/gα (x)| for x > 0, which typically results in
large queue backlog if used in conjunction with the drift-plus-penalty method.
A simpler hard fairness approach seeks only to maximize the minimum throughput (161).This
easily fits into the concave utility based drift-plus-penalty framework using the concave function
g(x) = min[x1 , . . . , xM ]:

Maximize: min[x 1 , x 2 , . . . , x M ] (5.131)

Subject to: 1) All queues are mean rate stable (5.132)
2) α(t) ∈ Aω(t) ∀t ∈ {0, 1, 2, . . .} (5.133)

See also Exercise 5.4. A “mixed” approach can also be considered, which seeks to maximize

β min[x 1 , . . . , x M ] + Mm=1 log(1 + x m ). The constant β is a large weight that ensures maximizing
the minimum throughput has a higher priority than maximizing the logarithmic terms.

5.8 EXERCISES

Exercise 5.1. (Using Logarithmic Utilities) Give a closed form solution to the auxiliary variable
update of (5.49)-(5.50) when:

a) φ(γ ) = M log(γm ), where log(·) denotes the natural logarithm.
m=1
b) φ(γ ) = M m=1 log(1 + νm γm ), where log(·) denotes the natural logarithm.

Exercise 5.2. (Transformed Problem with Auxiliary Variables) Let α (t) be a policy that yields
well defined averages x , y l , and that satisfies all constraints of problem (5.2)-(5.5),(5.8) (including
the constraint x ∈ R), with utility φ(x ) = φ opt . Construct a policy that satisfies all constraints of
problem (5.13)-(5.18) and that yields the same utility value φ(x ). Hint: Use γ (t) = x for all t.

Exercise 5.3. ( Jensen’s Inequality) Let φ(γ ) be a concave function defined over a convex set R ⊆
RM . Let γ (τ ) be a sequence of random vectors in R for τ ∈ {0, 1, 2, . . .}. Fix an integer t > 0,
and define T as an independent and random time that is uniformly distributed over the integers
{0, 1, . . . , t − 1}. Define the random vector X = γ (T ). Use (5.9) to prove (5.10)-(5.11).

Exercise 5.4. (Hard Fairness (161)) Consider a system with M attributes x(t) =
(x1 (t), . . . , xM (t)), where xm (t) = x̂m (α(t), ω(t)) for m ∈ {1, . . . , M}. Assume there is a positive
constant θmax such that:

0 ≤ x̂m (α, ω) ≤ θmax ∀m ∈ {1, . . . , M}, ∀ω, ∀α ∈ Aω

130 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
a) State the drift-plus-penalty algorithm for solving the following problem, with θ(t) as a
new variable:

Maximize: θ
Subject to: 1) x m ≥ θ ∀m ∈ {1, . . . , M}
2) 0 ≤ θ (t) ≤ θmax ∀t ∈ {0, 1, 2, . . .}
3) α(t) ∈ Aω(t) ∀t ∈ {0, 1, 2, . . .}

b) State the utility-based drift-plus-penalty algorithm for solving the problem:

Maximize: min[x 1 , x 2 , . . . , x M ]
Subject to: α(t) ∈ Aω(t) ∀t ∈ {0, 1, 2, . . .}

which is solved with auxiliary variables γm (t) with 0 ≤ γm (t) ≤ θmax .

c) The problems in (a) and (b) both seek to maximize the minimum throughput. Show that
if both algorithms “break ties” when choosing auxiliary variables by choosing the lowest possible
values, then they are exactly the same algorithm. Show they are slightly different if ties are broken
to choose the largest possible auxiliary variables, particularly in cases when some virtual queues are
zero.

Exercise 5.5. (Bounded Virtual Queues) Consider the auxiliary variable optimization for γm (t) in
(5.49)-(5.50), where φm (x) has the property that:

φm (x) ≤ φm (0) + νm x whenever 0 ≤ x ≤ γm,max

for a constant νm > 0. Show that if 0 ≤ γm (t) ≤ γm,max , we have:

V φm (γm (t)) − Gm (t)γm (t) ≤ V φm (0) + (V νm − Gm (t))γm (t)

Use this to prove that γm (t) = 0 is the unique optimal solution to (5.49)-(5.50) whenever Gm (t) >
V νm . Conclude from (5.48) that Gm (t) ≤ V νm + γm,max for all t, provided this is true at t = 0.

Exercise 5.6. (1-Hop Wireless System with Infinite Backlog) Consider a wireless system with
M channels. Transmission rates on slot t are given by b(t) = (b1 (t), . . . , bM (t)) with bm (t) =
b̂m (α(t), ω(t)), where ω(t) = (S1 (t), . . . , SM (t)) is an observed channel state vector for slot t (as-
sumed to be i.i.d. over slots), and α(t) is a control action chosen within a set Aω(t) . Assume that
each channel has an infinite backlog of data, so that there is always data to send. The goal is to
choose α(t) every slot to maximize φ(b), where φ(b) is a concave and entrywise non-decreasing
utility function.
a) Verify that the algorithm of Section 5.0.5 in this case is:
5.8. EXERCISES 131
• (Auxiliary Variables) Choose γ (t) = (γ1 (t), . . . , γM (t)) to solve:

Maximize: V φ(γ (t)) − M m=1 Gm (t)γm (t)
Subject to: 0 ≤ γm (t) ≤ γm,max ∀m ∈ {1, . . . , M}

• (Transmission) Observe ω(t) and choose α(t) ∈ Aω(t) to maximize

M
m=1 G m (t) b̂m (α(t), ω(t)).

• (Virtual Queue Update) Update Gm (t) for all m ∈ {1, . . . , M} according to:

Gm (t + 1) = max[Gm (t) + γm (t) − b̂m (α(t), ω(t)), 0]

b) Suppose that φ(b) = M m=1 φm (bm ), where the functions φm (bm ) are continuous, concave,
non-decreasing, with maximum right-derivative νm < ∞, so that φm (γ ) ≤ φm (0) + νm γ for all
γ ≥ 0. Prove that the auxiliary variable decisions above yield γm (t) = 0 if Gm (t) > V νm (see also
Exercise 5.5). Conclude that 0 ≤ Gm (t) ≤ V νm + γm,max for all t, provided that this holds at t = 0.
c) Use (5.33) to conclude that if the conditions of part (b) hold, if all virtual queues are initially
empty, and if any C-additive approximation is used, then:

D + C νm (V νm + γm,max )
M
φ(b(t)) ≥ φ opt − − , ∀t > 0
V t
m=1

Exercise 5.7. (1-Hop Wireless System with Random Arrivals) Consider the same system as Ex-
ercise 5.6, with the exception that we have random arrivals Am (t) and:

Qm (t + 1) = max[Qm (t) − b̂m (α(t), ω(t)), 0] + xm (t)

where xm (t) is a flow control decision, made subject to 0 ≤ xm (t) ≤ Am (t). We want to maximize
φ(x).
a) State the new algorithm for this case.
b) Suppose 0 ≤ Am (t) ≤ Am,max for some finite constant Am,max . Suppose φ(b) has the
structure of Exercise 5.6(b). Using a similar argument, show that all queues Gm (t) and Qk (t) are
deterministically bounded.

Exercise 5.8. (Imperfect Channel Knowledge) Consider the general problem of Theorem 5.3, but
under the assumption that ω(t) provides only a partial understanding of the channel for each queue
Qk (t), so that b̂k (α(t), ω(t)) is a random function of α(t) and ω(t), assumed to be i.i.d. over all slots
132 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
with the same α(t) and ω(t), and assumed to have finite second moments regardless of the choice
of α(t). Define:

βk (α, ω)= E b̂k (α(t), ω(t))|α(t) = α, ω(t) = ω

Assume that the function βk (α, ω) is known. Assume the other functions x̂m (·), ŷl (·), âk (·) are
deterministic as before. State the modified algorithm that minimizes the right-hand-side of (5.84)
in this case. Hint:

E {bk (t)|(t)} = E {E {bk (t)|(t), α(t), ω(t)} |(t)} = E {βk (α(t), ω(t))|(t)}

Note: Related problems with randomized service outcomes and Lyapunov drift are consid-
ered in (162)(163)(164)(154)(165)(161), where knowledge of the channel statistics is needed for
computing the βk (α, ω) functions and their generalizations, and a max-weight learning framework
is developed in (166) for the case of unknown statistics.

Exercise 5.9. (Equivalence of the Transformed Problem Using Auxiliary Variables)

a) Suppose that α ∗ (t) is a policy that satisfies all constraints of the problem (5.71)-(5.75),
yielding time averages x∗ and y ∗l and a cost value of y ∗0 + f (x∗ ). Show that this policy also satisfies
all constraints of the problem (5.76)-(5.81), and yields the same cost value, if we define the auxiliary
variable decisions to be γ (t) = x∗ for all t.
b) Suppose that α (t), γ (t) is a policy that satisfies all constraints of problem (5.76)-(5.81),
yielding time averages x , y l and a cost value in (5.76) given by some value v. Show that this same
policy also satisfies all constraints of problem (5.71)-(5.75), with a cost y 0 + f (x ) ≤ v.

Exercise 5.10. (Proof of Theorem 5.3) We make use of the following fact, analogous to Theorem
4.5: If problem
(5.71)-(5.75)is feasible, then for all δ > 0 there exists an ω-only policy α ∗ (t) ∈ Aω(t)
such that E x̂(α (t), ω(t)) = γ for some vector γ ∗ , and:
∗ ∗

E ŷ0 (α ∗ (t), ω(t)) + f (γ ∗ ) ≤ y0 + f opt + δ
opt

E ŷl (α ∗ (t), ω(t)) + gl (γ ∗ ) ≤ δ , ∀l ∈ {1, . . . , L}

E âk (α ∗ (t), ω(t)) − b̂k (α ∗ (t), ω(t)) ≤ δ , ∀k ∈ {1, . . . , K}
$ ∗ %
dist γ , X ∩ R ≤ δ

For simplicity, in this proof, we assume the above holds for δ = 0, and that all actual and virtual
queues are initially empty. Further assume that the functions f (γ ) and gl (γ ) are Lipschitz continuous,
so that there are positive constants νm , βl,m such that for all x(t) and γ (t), we have:

|f (γ (t)) − f (x(t))| ≤ M
m=1 νm |γm (t) − xm (t)|
M
|gl (γ (t)) − gl (γ (t))| ≤ m=1 βl,m |γm (t) − xm (t)| , ∀l ∈ {1, . . . , L}
5.8. EXERCISES 133
a) Plug the above policy α ∗ (t), together with the constant auxiliary vector γ (t)
= γ ∗ , into the
right-hand-side of the drift bound (5.84) and add C (because of the C-additive approximation) to
derive a simpler bound on the drift expression. The resulting right-hand-side should be: D + C +
opt
V (y0 + f opt ).
b) Use the Lyapunov optimization theorem to prove that for all t > 0:

1
t−1
opt
E {y0 (τ ) + f (γ (τ ))} ≤ y0 + f opt + (D + C)/V
t
τ =0

and hence, by Jensen’s inequality (with y 0 (t) and γ (t) defined by (5.24)):
opt
y 0 (t) + f (γ (t)) ≤ y0 + f opt + (D + C)/V

c) Manipulate the drift bound of part (a) to prove that ((t)) ≤ W for some finite constant
W . Conclude that all virtual and actual queues are mean rate stable, and that (4.7) holds for all t > 0
√
and so E {|Hm (t)|} /t ≤ 2W/t.
d) Use (5.83) and (4.42) to prove that for all m ∈ {1, . . . , M}:

|E {Hm (t)} | E {|Hm (t)|}

0 ≤ lim |x m (t) − γ m (t)| = lim ≤ lim =0
t→∞ t→∞ t t→∞ t
Argue that γ (t) ∈ X ∩ R for all t, and hence (5.87) holds.
e) Use part (b) and the Lipschitz assumptions to prove (5.85).
f ) Use (5.82), Theorem 2.5, and the Lipschitz conditions to prove (5.86).

Exercise 5.11. (Profit Risk and Non-Convexity) Consider a K-queue system described by (5.1),
with arrival and service functions âk (α(t), ω(t)) and b̂k (α(t), ω(t)). Let p(t) = p̂(α(t), ω(t)) be a
random profit variable that is i.i.d. over all slots for which we have α(t) and ω(t), and that has finite
second moment regardless of the policy. Define:

φ(α, ω) = E p̂(α(t), ω(t))|α(t) = α, ω(t) = ω

ψ(α, ω) = E p̂(α(t), ω(t))2 |α(t) = α, ω(t) = ω

and assume the functions φ(α, ω), ψ(α, ω) are known. The goal is to stabilize all queues while
maximizing a linear combination of the profit minus the variance of the profit (where variance
2
is a proxy for “risk”). Specifically, define the variance as V ar(p)= p − p2 , where the notation
h represents a time average expectation of a given process h(t), as usual. We want to maximize
θ1 p − θ2 V ar(p), where θ1 and θ2 are positive constants.
a) Define attributes p1 (t) = p(t), p2 (t) = p(t)2 . Write the problem using p 1 and p 2 in the
form of (5.88)-(5.91), and show this is a non-convex stochastic network optimization problem.
134 5. OPTIMIZING FUNCTIONS OF TIME AVERAGES
b) State the “primal-dual” algorithm that minimizes the right-hand-side of (5.93) in this
context. Hint: Note that:
E {p1 (t)|(t)} = E {E {p1 (t)|(t), α(t), ω(t)} |(t)} = E {φ(α(t), ω(t))|(t)}

Exercise 5.12. (Optimization without Auxiliary Variables (17)(18)) Consider the problem (5.2)-
(5.5). Assume there is a vector γ = (γ1 , . . . , γM
), called the optimal operating point, such that

φ(γ ) = φ , where φ is the maximum utility for the problem. Assume that there is an ω-only

policy α (t) such that for all possible values of ω(t), we have:
x̂m (α (t), ω(t)) = γm ∀m ∈ {1, . . . , M} (5.134)

E âk (α (t), ω(t)) ≤ E b̂k (α (t), ω(t))

∀k ∈ {1, . . . , K} (5.135)

E ŷl (α (t), ω(t)) ≤ 0 ∀l ∈ {1, . . . , L} (5.136)
The assumptions (5.134)-(5.136) are restrictive, particularly because (5.134) must hold determin-
istically for all ω(t) realizations. However, these assumptions can be shown to hold for the special
case when xm (t) represents the amount of data admitted to a network from a source m when: (i) All
sources are “infinitely backlogged” and hence always have data to send, and (ii) Data can be admitted
as a real number.
The Lyapunov drift can be shown to satisfy the following for some constant B > 0:

L

((t)) − V E φ(x̂(α(t), ω(t))) | (t) ≤ B + Zl (t)E ŷl (α(t), ω(t))|(t)
l=1

K

+ Qk (t)E âk (α(t), ω(t)) − b̂k (α(t), ω(t)) | (t) − V E φ(x̂(α(t), ω(t))) | (t)
k=1

Suppose every slot we observe (t) and ω(t) and choose an action α(t) that minimizes the right-
hand-side of the above drift inequality.
a) Assume ω(t) is i.i.d. over slots. Plug the alternative policy α (t) into the right-hand-side
above to get a greatly simplified drift expression.
b) Conclude from part (a) that ((t)) ≤ D + V (φ max − φ ) for all t, for some finite con-
stant D and where φ max is an upper bound on the instantaneous value of φ(x̂(·)) (assumed to
be finite). Conclude that all actual and virtual queues are mean rate stable, and hence all desired
inequality constraints are satisfied.
c) Use Jensen’s inequality and part (a) (with iterated expectations and telescoping sums) to
conclude that for all t > 0, we have:

1
t−1
φ(x(t)) ≥ E {φ(x(τ ))} ≥ φ − B/V − E {L((0))} /(V t)
t
τ =0
5.8. EXERCISES 135
1 t−1
where x(t)= t τ =0 E {x(τ )} and x(τ )=x̂(α(τ ), ω(τ )).

Exercise 5.13. (Delay-Limited Transmission (71)) Consider a K-user wireless system with arrival
vector A(t) = (A1 (t), . . . , AK (t)) and channel state vector S (t) = (S1 (t), . . . , SK (t)) for each
slot t ∈ {0, 1, 2, . . .}. There is no queueing, and all data must either be transmitted in 1 slot or
dropped (similar to the delay-limited capacity formulation of (70)). Thus, there are no actual queues

in the system. Define ω(t)= [A(t), S (t)] as the random network event observed every slot. Define
α(t) ∈ Aω(t) as a general control action, which affects how much of the data to transmit and the
amount of power used according to general functions μ̂k (α, ω) and p̂(α, ω):

μ(t) = (μ̂1 (α(t), ω(t)), . . . , μ̂K (α(t), ω(t))) , p(t) = p̂(α(t), ω(t))

where μ(t) = (μ1 (t), . . . , μK (t)) is the transmission vector and p(t) is the power used on slot t.
Assume these are constrained as follows for all slots t:

0 ≤ μk (t) ≤ Ak (t) ∀k ∈ {1, . . . , K} , 0 ≤ p(t) ≤ pmax

for some finite constant pmax . Assume that Ak (t) ≤ Amax k for all t, for some finite constants Amax
k
for k ∈ {1, . . . , K}. Let μ be the time average expectation of the transmission vector μ(t), and let
φ(μ) be a continuous, concave, and entrywise non-decreasing utility function of μ. The goal is to
solve the following problem:

Maximize: φ(μ)
Subject to: p ≤ Pav

where p is the time average expected power expenditure, and Pav is a pre-specified average power
constraint. This is a special case of the general problem (5.2)-(5.5).
a) Use auxiliary variables γ (t) = (γ1 (t), . . . , γK (t)) subject to 0 ≤ γk (t) ≤ Amax
k for all t, k
to write the corresponding transformed problem (5.13)-(5.18) for this case.
b) State the drift-plus-penalty algorithm that solves this transformed problem. Hint: Use a
virtual queue Z(t) to enforce the constraint p ≤ Pav , and use virtual queues Gk (t) to enforce the
constraints μk ≥ γ k for all k ∈ {1, . . . , K}.

Exercise 5.14. (Delay-Limited Transmission with Errors (71)) Consider the same system as Ex-
ercise 5.13, but now assume that transmissions can have errors, so that μk (t) = μ̂k (α(t), ω(t)) is a
random transmission outcome (as in Exercise 5.8), assumed to be i.i.d. over all slots with the same

α(t) and ω(t), with known expectations βk (α(t), ω(t))= E {μk (t)|α(t), ω(t)} for all k ∈ {1, . . . , K}.
Use iterated expectations (as in Exercise 5.8) to redesign the drift-plus-penalty algorithm for this
case. Multi-slot versions of this problem are treated in Section 7.6.1.
137

CHAPTER 6

Approximate Scheduling
This chapter focuses on the max-weight problem that arises when scheduling for stability or maxi-
mum throughput-utility in a wireless network with interference. Previous chapters showed the key
step is maximizing the expectation of a weighted sum of link transmission rates, or coming within
an additive constant C of the maximum. Specifically, consider a (possibly multi-hop) network with
L links, and let b(t) = (b1 (t), . . . , bL (t)) be the transmission rate offered over link l ∈ {1, . . . , L}
on slot t. The goal is to make (possibly randomized) decisions for b(t) to come within an additive
constant C of maximizing the following expectation:

L
Wl (t)E {bl (t)|W (t)} (6.1)
l=1

where the expectation is with respect to the possibly random decision, and where W (t) =
(W1 (t), . . . , WL (t)) is a vector of weights for slot t. The weights are related to queue backlogs
for single-hop problems and differential backlogs for multi-hop problems. Algorithms that accom-
plish this for a given constant C ≥ 0 every slot are called C-additive approximations. For problems of
network stability, previous chapters showed that C-additive approximations can be used to stabilize
the network whenever arrival rates are inside the network capacity region, with average backlog and
delay bounds that grow linearly with C. For problems of maximum throughput-utility, Chapter 5
showed that C-additive approximations can be used with a simple flow control rule to give utility that
is within (B + C)/V of optimality (where B is a fixed constant and V is any non-negative parame-
ter chosen as desired), with average backlog that grows linearly in both V and C. Thus, C-additive
approximations can be used to push network utility arbitrarily close to optimal, as determined by
the parameter V .
Such max-weight problems can be very complex for wireless networks with interference. This
is because a transmission on one link can affect transmissions on many other links.Thus, transmission
decisions are coupled throughout the network. In this chapter, we first consider a class of interference
networks without time varying channels and develop two C-additive approximation algorithms for
this context. The first is a simple algorithm based on trading off computation complexity and delay.
The second is a more elegant randomized transmission technique that admits a simple distributed
implementation. We then present a multiplicative approximation theorem that holds for general
networks with possibly time-varying channels. It guarantees constant factor throughput results for
algorithms that schedule transmissions within a multiplicative constant of the max-weight solution
every slot.
138 6. APPROXIMATE SCHEDULING

6.1 TIME-INVARIANT INTERFERENCE NETWORKS

Suppose the network is time invariant, in that the channel conditions do not change and the trans-
mission rate options are the same for all slots t ∈ {0, 1, 2, . . .}. Assume that all transmissions are
in units of packets, and each link can transmit at most one packet per slot. The transmission rate
vector b(t) = (b1 (t), . . . , bL (t)) is a binary vector with bl (t) = 1 if link l transmits a packet on
slot t, and bl (t) = 0 otherwise. We say that a binary vector b(t) is feasible if the set of links that
correspond to “1” entries can be simultaneously activated for successful transmission. Define B as
the collection of all feasible binary vectors, called the link activation set (7). The set B depends on
the interference properties of the network. Every slot t, the network controller observes the current
link weights W (t) = (W1 (t), . . . , WL (t)) and chooses a (possibly random) b(t) ∈ B , with the goal
of maximizing the max-weight value (6.1). It is easy to show that the maximum is achieved by a
deterministic choice bopt (t), where:
L

opt
b (t)= arg max Wl (t)bl
b∈B
l=1

The amount of computation required to find an optimal vector bopt (t) depends on the structure of
the set B . If this set is defined by all links that satisfy matching constraints, so that no two active links
share a node, then bopt (t) can be found in polynomial time (via a centralized algorithm). However,
the problem may be NP-hard for general sets B , so that no polynomial time solution is available.
Let C be a given non-negative constant. A C-additive approximation to the max-weight
problem finds a vector b(t) every slot t that satisfies:
L
L
Wl (t)E {bl (t)|W (t)} ≥ max Wl (t)bl − C
b∈B
l=1 l=1

6.1.1 COMPUTING OVER MULTIPLE SLOTS

We first consider the following simple technique for obtaining a C-additive approximation with
arbitrarily low per-time slot computation complexity. Fix a positive integer T > 0, and divide the

timeline into successive intervals of T -slot frames. Define tr = rT as the start of frame r, for r ∈
{0, 1, 2, . . .}. At the beginning of each frame r ∈ {0, 1, 2, . . .}, the network controller observes the
weights W (tr ) and begins a computation to find bopt (tr ). We assume the computation is completed
within the T slot frame, possibly by exhaustively searching through all options in the set B . The
network controller then allocates the constant rate vector b(tr ) for all slots of frame r + 1, while also
computing bopt (tr+1 ) during that frame. Thus, every frame r ∈ {1, 2, 3, . . .} the algorithm allocates
the constant rate vector that was computed on the previous frame. Meanwhile, it also computes the
optimal solution to the max-weight problem for the current frame (see Fig. 6.1). Thus, for any frame
r ∈ {1, 2, 3, . . .}, we have:
b(t) = bopt (tr−1 ) ∀t ∈ {tr , . . . , tr + T − 1}
6.1. TIME-INVARIANT INTERFERENCE NETWORKS 139

opt opt opt

Compute b (t0) Compute b (t1) Compute b (t2)

opt opt
Implement b (t0) Implement b (t1)
t0 t1 t2 t3

Figure 6.1: An illustration of the frame structure for the algorithm of Section 6.1.1.

Now assume the maximum change in queue backlog over one slot is deterministically bounded,
as is the maximum change in each link weight. Specifically, assume that no link weight can change by
an amount more than θ, where θ is some positive constant. It follows that for any two slots t1 < t2 :

|Wl (t1 ) − Wl (t2 )| ≤ θ (t2 − t1 )

Under this assumption, we now compute a value C such that the above algorithm is a C-
additive approximation for all slots t ≥ T . Fix any slot t ≥ T . Let r represent the frame containing
this slot. Note that |t − tr−1 | ≤ 2T − 1. We have:

L
L
opt
Wl (t)bl (t) = Wl (t)bl (tr−1 )
l=1 l=1

L
opt

L
opt
= Wl (tr−1 )bl (tr−1 ) + (Wl (t) − Wl (tr−1 ))bl (tr−1 )
l=1 l=1

L
opt

L
opt
≥ Wl (tr−1 )bl (tr−1 ) − θ |t − tr−1 |bl (tr−1 )
l=1 l=1

L
opt
≥ Wl (tr−1 )bl (tr−1 ) − Lθ (2T − 1) (6.2)
l=1

Further, because bopt (tr−1 ) solves the max-weight problem for links W (tr−1 ), we have:

L
opt

L
Wl (tr−1 )bl (tr−1 ) = max Wl (tr−1 )bl
b∈B
l=1 l=1

L
opt
≥ Wl (tr−1 )bl (t)
l=1
140 6. APPROXIMATE SCHEDULING

L
opt

L
opt
= Wl (t)bl (t) − [Wl (t) − Wl (tr−1 )]bl (t)
l=1 l=1

L
opt
≥ Wl (t)bl (t) − Lθ (2T − 1)
l=1
L

= max Wl (t)bl − Lθ (2T − 1) (6.3)
b∈B
l=1

Combining (6.2) and (6.3) yields:

L
L
Wl (t)bl (t) ≥ max Wl (t)bl − 2Lθ (2T − 1)
b∈B
l=1 l=1

Taking conditional expectations gives:

L

L
Wl (t)E {bl (t)|W (t)} ≥ max Wl (t)bl − 2Lθ (2T − 1)
b∈B
l=1 l=1

It follows that this algorithm yields a C-additive approximation for C = 2Lθ (2T − 1). The constant
C is linear in the number of links L and in the frame size T .
Now let complexity represent the number of operations required to compute the max-weight
solution (assuming for simplicity that this number is independent of the size of the weights Wl (t)).
Because this complexity is amortized over T slots, the algorithm yields a per-slot computation
complexity of complexity/T . This can be made as small as desired by increasing T , with a tradeoff
of increasing the value of C linearly in T . This shows that maximum throughput can be achieved
with arbitrarily low per-time slot complexity, with a tradeoff in average queue backlog and average
delay.
This technique was used in (167)(168) to reduce the per-slot complexity of scheduling in
N × N packet switches. The max-weight problem for N × N packet switches is a max-weight
matching problem that can be computed in time that is polynomial in N . The work (168) uses this
to provide a smooth complexity-delay tradeoff for switches, showing average delay of O(N 4−α ) is
possible with per-slot complexity O(N α ), for any α such that 0 ≤ α ≤ 3.
Unfortunately, the max-weight problem for networks with general activation sets B may be
NP-hard, so that the only available computation algorithms have complexity that is exponential in
the network size L. This means the frame size T must be chosen to be at least exponential in L to
achieve polynomial per-slot complexity, which in turn incurs delay that is exponential in L.

6.1.2 RANDOMIZED SEARCHING FOR THE MAX-WEIGHT SOLUTION

The first low-complexity algorithm for full-throughput scheduling in time-invariant interference
networks was perhaps (169), where new link activations are tried randomly and compared in the max-
6.1. TIME-INVARIANT INTERFERENCE NETWORKS 141
weight metric against the previously tried activation. This is analyzed with a different Markov chain
argument in (169). However, intuitively this works for the same reason as the frame-based scheme
presented in the previous subsection: The randomized selection can be viewed as a (randomized)
computation algorithm that solves the max-weight problem over a (variable length) frame. The
optimal solution is computed in some random number of T slots, where T is geometric with success
probability equal to the number of optimal vectors in B divided by the size of the set B . While
the implementation of the algorithm is more elegant than the deterministic computation method
described in the previous subsection, its resulting delay bounds can be worse. For example, in a
N × N packet switch, the randomized method yields complexity that is O(N) and an average delay
bound of O(N!). However, the deterministic method of (168) can achieve complexity that is O(N)
with an average delay bound of O(N 3 ). This is achieved by using α = 1 in the smooth complexity-
delay tradeoff curve described in the previous subsection. A variation on the randomized algorithm
of (169) for more complex networks is developed in (170).
All known methods for achieving throughput-utility within of optimality for networks
with general interference constraints (and for arbitrary > 0) have either non-polynomial per-slot
complexity, or non-polynomial delays and/or convergence times. This is not surprising: Suppose the
problem of maximizing the number of activated links is NP-hard. If we can design an algorithm that,
after a polynomial time T , has produced a throughput that is within 1/2 from the maximum sum
throughput with high probability, then this algorithm (with high probability) must have selected a
vector b(t) that is a max-size vector during some slot t ∈ {0, . . . , T } (else, the throughput would
be at least 1 away from optimal). Thus, this could be used as a randomized algorithm for finding
a max-size vector in polynomial time. Related NP-hardness results are developed in (171) for pure
stability problems with low delay, even when arrival rates are very low.

6.1.3 THE JIANG-WALRAND THEOREM

Here we present a randomized algorithm that produces a C-additive approximation by allocating
a link vector b(t) according to the steady state solution of a particular reversible Markov chain. The
Markov chain can easily be simulated, and it has a simple relation to distributed scheduling in a
carrier sense multiple access (CSMA) system. Further, if the vector is chosen according to the desired
distribution every slot t, the value of C that this algorithm produces is linear in the network size,
and hence this yields maximum throughput with polynomial delay. We first present the result, and
then discuss the complexity associated with generating a vector with the desired distribution, related
to the convergence time required for the Markov chain to approach steady state.
The following randomized algorithm for choosing b(t) ∈ B was developed in (172) for wire-
less systems with general interference constraints, and in (173) for scheduling in optical networks:
Max Link Weight Plus Entropy Algorithm: Every slot t, observe the current link weights
W (t) = (W1 (t), . . . , WL (t)) and choose b(t) by randomly selecting a binary vector b =
142 6. APPROXIMATE SCHEDULING
(b1 , . . . , bL ) ∈ B with probability distribution:

L
l=1 exp(Wl (t)bl )
p∗ (b)= P r[b(t) = b] = (6.4)
A
where A is a normalizing constant that makes the distribution sum to 1.
The work (172) motivates this algorithm by the modified problem that computes a probability
distribution p(b) over the set B to solve the following:

Maximize: − b∈B p(b) log(p(b)) + b∈B p(b) L l=1 Wl (t)bl (6.5)

Subject to: 0 ≤ p(b) ∀b ∈ B , b∈B p( b) = 1 (6.6)

where log(·) denotes the natural logarithm. This problem is equivalent to maximizing H (p(·)) +
L
l=1 Wl (t)E {bl (t)|W (t)}, where H (p(·)) is the entropy (in nats) associated with the probability
distribution p(b), and E {bl (t)|W (t)} is the expected transmission rate over link l given that b(t)
is selected according to the probability distribution p(b). However, note that because the set B
contains at most 2L link activation sets, and the entropy of any probability distribution that contains
at most k probabilities is at most log(k), we have for any probability distribution p(b):

0≤− p(b) log(p(b)) ≤ L log(2)
b∈B

It follows that if we can find a probability distribution p(b) to solve the problem (6.5)-(6.6), then this
produces a C-additive approximation to the max-weight problem (6.1), with C = L log(2). It follows
that such an algorithm can yield full throughput optimality, and can come arbitrarily close to utility
optimality, with an average backlog and delay expression that is polynomial in the network size.
Remarkably, the next theorem, developed in (172), shows that the probability distribution (6.4) is
the desired distribution, in that it exactly solves the problem (6.5)-(6.6). Thus, the max link-weight-
plus-entropy algorithm is a C-additive approximation for the max-weight problem.

Theorem 6.1 (Jiang-Walrand Theorem (172)) The probability distribution p∗ (b) that solves (6.5)
and (6.6) is given by (6.4).

Proof. The proof follows directly from the analysis techniques used in (172), although we organize
the proof differently below. We first compute the value of the maximization objective under the
particular distribution p ∗ (b) given in (6.4). We have:

L
L
∗ ∗ ∗ ∗ ∗
− p (b) log(p (b)) + p (b) Wl (t)bl = p (b) log(A) − p (b) Wl (t)bl
b∈B b∈B l=1 b∈B b∈B l=1

L
+ p ∗ (b) Wl (t)bl
b∈B l=1
= log(A)
6.1. TIME-INVARIANT INTERFERENCE NETWORKS 143
where we have used the fact that p∗ (b)
is a probability distribution and hence sums to 1. We now
show that the expression in the objective of (6.5) for any other distribution p(b) is no larger than
log(A), so that p ∗ (b) is optimal for this objective. To this end, consider any other distribution p(b).
We have:

L
− p(b) log(p(b)) + p(b) Wl (t)bl
b∈B b∈B l=1
L
p(b)∗
= − p(b) log p (b) ∗ + p(b) Wl (t)bl
p (b)
b∈B b∈ B l=1
p(b)
= − p(b) log − p(b) log(p∗ (b))
p ∗ (b)
b∈B b∈B

L
+ p(b) Wl (t)bl
b∈B l=1

L
≤ − p(b) log(p∗ (b)) + p(b) Wl (t)bl (6.7)
b∈B b∈B l=1

L
= − p(b) log(1/A) − p(b) Wl (t)bl
b∈B b∈B l=1

L
+ p(b) Wl (t)bl
b∈B l=1
= log(A)

where in (6.7), we have used the well known Kullback-Leibler divergence result, which states that
the divergence between any two distributions p∗ (b) and p(b) is non-negative (174):

∗ p(b)
dKL (p||p )= p(b) log ≥0
p ∗ (b)
b∈B

Thus, the maximum value of the objective function (6.5) is log(A), which is achieved by the distri-
bution p ∗ (b), proving the result. 2

Assume now the set B of all valid link activation vectors has a connectedness property, so that it
is possible to get from any b1 ∈ B to any other b2 ∈ B by a sequence of adding or removing single
links, where each step of the sequence produces another valid activation vector in B (this holds in
the reasonable case when removing any activated link from an activation vector in B yields another
activation vector in B ). In this case, the distribution (6.4) is particularly interesting because it is the
exact stationary distribution associated with a continuous time ergodic Markov chain with state b(v)
(where v is a continuous time variable that is not related to the discrete time index t for the current
144 6. APPROXIMATE SCHEDULING
slot). Transitions for this Markov chain take place by having each link l such that bl (v) = 1 de-
activate at times according to an independent exponential distribution with rate μ = 1, and having
each link l such that bl (v) = 0 independently activate according to an exponential distribution with
rate λl = exp(Wl (t)), provided that turning this link ON does not violate the link constraints B .
That the resulting steady state is given by (6.4) can be shown by state space truncation arguments as
in (129)(131). This has the form of a simple distributed algorithm where links independently turn
ON or OFF, with Carrier Sense Multiple Access (CSMA) telling us if it is possible to turn a new
link ON (see also (175)(172)(173)(176)(177) for details on this).
Unfortunately, we need to run such an algorithm in continuous time for a long enough time
to reach a near steady state, and this all needs to be done within one slot to implement the result. Of
course, we can use a T -slot argument as in Section 6.1.1 to allow more time to reach the steady state,
with the understanding that the queue backlog changes by an amount O(T ) that yields an additional
additive term in our C-additive approximation (see (176) for an argument in this direction using
stochastic approximation theory). However, for general networks, the convergence of the Markov
chain to near-steady-state takes a non-polynomial amount of time (else, we could solve NP-hard
problems with efficient randomized algorithms). This is because the Markov chain can get “trapped”
for long durations of time in certain sub-optimal link activations (this is compensated for in the steady
state distribution by getting “trapped” in a max-weight link activation for an even longer duration
of time). Even computing the normalizing A constant for the distribution in (6.4) is known to be a
“#P-complete” problem (178) (see also factor graph approximations in (179)). However, it is known
that for link activation sets with certain degree-2 properties, such as those formed by networks
on rings, similar Markov chains require only a small (polynomial) time to reach near steady state
(180)(181). This may explain why the simulations in (172) for networks with small degree provide
good performance.

6.2 MULTIPLICATIVE FACTOR APPROXIMATIONS

While C-additive approximations can push throughput and throughput-utility arbitrarily close to
optimal, they may have large convergence times and delays as discussed in the previous section.
It is often possible to provide low complexity decisions for b(t) that come within a multiplicative
factor of the max-weight solution. This section shows that such algorithms immediately lead to
constant-factor stability and throughput-utility guarantees. The result holds for general networks,
possibly with time-varying channels, and possibly with non-binary rate vectors.
Let S(t) describe the channel randomness on slot t (i.e., the topology state), and let I (t)
be the transmission action on slot t, chosen within an abstract set IS (t) . The rate vector b(t) =
(b1 (t), . . . , bL (t)) is determined by a general function of I (t) and S(t):

bl (t) = b̂l (I (t), S(t)) ∀l ∈ {1, . . . , L} (6.8)

6.2. MULTIPLICATIVE FACTOR APPROXIMATIONS 145
Definition 6.2 Let β, C be constants such that 0 < β ≤ 1 and C ≥ 0. A (β, C)-approximation is
an algorithm that makes (possibly randomized) decisions I (t) ∈ IS(t) every slot t to satisfy:
L

L
Wl (t)E b̂l (I (t), S(t))|W (t) ≥ β sup Wl (t)b̂l (I, S(t)) − C
l=1 I ∈IS(t) l=1

Under this definition, a (1, C) approximation is the same as a C-additive approximation. It is

known that (β, C)-approximations can provide stability in single or multi-hop networks whenever
the arrival rates are interior to β , being a β-scaled version of the capacity region (17)(22)(19)(182).
For example, if β = 1/2, then stability is only guaranteed when arrival rates are at most half the
distance to the capacity region boundary (so that the region where we can provide stability guarantees
shrinks by 50%). Related constant-factor guarantees are available for joint scheduling and flow
control to maximize throughput-utility, where the β-scaling goes inside the utility function (see
(22)(19) for a precise scaled-utility statement, (137) for applications to cognitive radio, and (154) for
applications to channels with errors). Here, we prove this result only for the special case of achieving
stability in a 1-hop network. This provides all of the necessary insight with the least amount of
notation, and the reader is referred to the above references for proofs of the more general versions.
Consider a 1-hop network with L queues with dynamics:
Ql (t + 1) = max[Ql (t) − bl (t), 0] + al (t) ∀l ∈ {1, . . . , L}
where the service variables bl (t) are determined by I (t) and S(t) by (6.8), and a(t) =

(a1 (t), . . . , aL (t)) is the random vector of new data arrivals on slot t. Define ω(t)= [S(t), a(t)],
and assume that ω(t) is i.i.d. over slots with some probability distribution. Define λl = E {al (t)} as
the arrival rate to queue l.
Define an S-only policy as a policy that independently chooses I (t) ∈ IS(t) based only on a
(possibly randomized) function of the observed S(t). Define as the set of all vectors (b1 , . . . , bL )
that can be achieved as 1-slot expectations under S-only policies. That is, (b1 , . . . , bL ) ∈ if and
only if there is a S-only policy I ∗ (t) that satisfies I ∗ (t) ∈ IS(t) and:

E b̂l (I ∗ (t), S(t)) = bl ∀l ∈ {1, . . . , L}

where the expectation in the left-hand-side is with respect to the distribution of S(t) and the possibly
randomized decision for I ∗ (t) that is made in reaction to the observed S(t). For simplicity, assume
the set is closed. Recall that for any rate vector (λ1 , . . . , λN ) in the capacity region , there exists
a S-only policy I ∗ (t) that satisfies:

E b̂l (I ∗ (t), S(t)) ≥ λl ∀l ∈ {1, . . . , L}

We say that a vector (λ1 , . . . , λL ) is interior to the scaled capacity region β if there is an > 0
such that:
(λ1 + , . . . , λL + ) ∈ β
146 6. APPROXIMATE SCHEDULING
Assume second moments of the arrival and service rate processes are bounded. Define

L(Q(t)) = 21 L 2
l=1 Ql (t) , and recall that Lyapunov drift satisfies (see (3.16)):

L
L
(Q(t)) ≤ B + Ql (t)λl − Ql (t)E b̂l (I (t), S(t))|Q(t) (6.9)
l=1 l=1

where B is a positive constant that depends on the maximum second moments.

Theorem 6.3 Consider the above 1-hop network with ω(t) i.i.d. over slots and with arrival rates
(λ1 , . . . , λL ). Fix β such that 0 < β ≤ 1. Suppose there is an > 0 such that:
(λ1 + , . . . , λL + ) ∈ β (6.10)
If a (β, C)-approximation is used for all slots t (where C ≥ 0 is a given constant), and if E {L(Q(0))} <
∞, then the network is mean rate stable and strongly stable, with average queue backlog bound:

1
t−1 L
lim sup E {Ql (τ )} ≤ B/
t→∞ t
τ =0 l=1

where B is the constant from (6.9).

Proof. Fix slot t. Because our decision I (t) yields a (β, C)-approximation for minimizing the final
term in the right-hand-side of (6.9), we have:

L
L
(Q(t)) ≤ B + C + Ql (t)λl − β Ql (t)E b̂l (I ∗ (t), S(t))|Q(t) (6.11)
l=1 l=1

where I ∗ (t) is any other (possibly randomized) decision in the set IS(t) . Because (6.10) holds, we
know that:
(λ1 /β + /β, . . . , λL /β + /β) ∈
Thus, there exists a S-only policy I ∗ (t) that satisfies:

E b̂l (I ∗ (t), S(t))|Q(t) = E b̂l (I ∗ (t), S(t)) ≥ λl /β + /β ∀l ∈ {1, . . . , L}

where the first equality above holds because I ∗ (t) is S-only and hence independent of the queue
backlogs Q(t). Plugging this policy into the right-hand-side of (6.11) yields:

L
L
(Q(t)) ≤ B + C + Ql (t)λl − β Ql (t)(λl /β + /β) (6.12)
l=1 l=1

L
= B +C− Ql (t) (6.13)
l=1

The result then follows by the Lyapunov drift theorem (Theorem 4.1). 2
6.2. MULTIPLICATIVE FACTOR APPROXIMATIONS 147
The above theorem can be intuitively interpreted as follows: Any (perhaps approximate) effort
to schedule transmissions to maximize the weighted sum of transmission rates translates into good
network performance. More concretely, simple greedy algorithms with β = 1/2 and C = 0 (i.e.
(1/2, 0)-approximation algorithms) exist for networks with matching constraints (where links can be
simultaneously scheduled if they do not share a common node). Indeed, it can be shown that the
greedy maximal match algorithm that first selects the largest weight link (breaking ties arbitrarily),
then selects the next largest weight link that does not conflict with the previous one, and so on, yields
a (1/2, 0)-approximation, so that it comes within a factor β = 1/2 of the max-weight decision (see,
for example, (137)). Distributed random access versions of this that produce (β, C) approximations
are considered in (154).
Different forms of approximate scheduling, not based on approximating the queue-based
max-weight rule, are treated using maximal matchings for stable switch scheduling in (183)(102),
for stable wireless networks in (184)(104)(103), for utility optimization in (185), and for energy
optimization in (186).
149

CHAPTER 7

Optimization of Renewal
Systems
Here we extend the drift-plus-penalty framework to allow optimization over renewal systems. In
previous chapters, we considered a slotted structure and assumed that every slot t a single random
event ω(t) is observed, a single action α(t) is taken, and the combination of α(t) and ω(t) generates
a vector of attributes (i.e., either penalties or rewards) for that slot. Here, we change the slot structure
to a renewal frame structure. The frame durations are variable and can depend on the decisions made
over the course of the frame. Rather than specifying a single action to take on each frame r, we must
specify a dynamic policy π [r] for the frame. A policy is a contingency plan for making a sequence
of decisions, where new random events might take place after each decision in the sequence. This
model allows a larger class of problems to be treated, including Markov Decision Problems, described
in more detail in Section 7.6.2.
An example renewal system is a wireless sensor network that is repeatedly used to perform
sensing tasks. Assume that each new task starts immediately when the previous task is completed.
The duration of each task and the network resources used depend on the policy implemented for
that task. Examples of this type are given in Section 7.4 and Exercise 7.1.

7.1 THE RENEWAL SYSTEM MODEL

T[0] T[1] T[2] T[3]

t[0]=0 t[1] t[2] t[3] t[4]

Figure 7.1: An illustration of a sequence of renewal frames.

Consider a dynamic system over the continuous timeline t ≥ 0 (where t can be a real number).
We decompose the timeline into successive renewal frames. Renewal frames occur one after the other,
and the start of each renewal frame is a time when the system state is “refreshed,” which will be
made precise below. Define t[0] = 0, and let {t[0], t[1], t[2], . . .} be a strictly increasing sequence
that represents renewal events. For each r ∈ {0, 1, 2, . . .}, the interval of time [t[r], t[r + 1]) is the
150 7. OPTIMIZATION OF RENEWAL SYSTEMS

rth renewal frame. Denote T [r]= t[r + 1] − t[r] as the duration of the rth renewal frame (see Fig.
7.1).
At the start of each renewal frame r ∈ {0, 1, 2, . . .}, the controller chooses a policy π [r] from
some abstract policy space P .This policy is implemented over the course of the frame.There may be a
sequence of random events during each frame r, and the policy π[r] specifies decisions that are made
in reaction to these events. The size of the frame T [r] is random and may depend on the policy. Fur-
ther, the policy on frame r generates a random vector of penalties y [r] = (y0 [r], y1 [r], . . . , yL [r]).
We formally write the renewal size T [r] and the penalties yl [r] as random functions of π[r]:

T [r] = T̂ (π [r]) , yl [r] = ŷl (π [r]) ∀l ∈ {0, 1, . . . , L}

Thus, given π [r], T̂ (π[r]) and ŷl (π [r]) are random variables. We make the following renewal
assumptions:

• For any policy π ∈ P , the conditional distribution of (T [r], y [r]), given π[r] = π, is inde-
pendent of the events and outcomes from past frames, and is identically distributed for each
frame that uses the same policy π .

• The frame sizes T [r] are always strictly positive, and there are finite constants Tmin , Tmax ,
y0,min , y0,max such that for all policies π ∈ P , we have:

0 < Tmin ≤ E T̂ (π[r])|π [r] = π ≤ Tmax , y0,min ≤ E ŷ0 (π [r])|π [r] = π ≤ y0,max

2
• There are finite constants D 2 and yl,max for l ∈ {1, . . . , L} such that for all π ∈ P :

E T̂ (π[r])2 |π [r] = π ≤ D2 (7.1)

E ŷl (π [r])2 |π [r] = π ≤ yl,max
2
∀l ∈ {1, . . . , L} (7.2)

That is, second moments are uniformly bounded, regardless of the policy.

In the special case when the system evolves in discrete time with unit time slots, all frame
sizes T [r] are positive integers, and Tmin = 1.

7.1.1 THE OPTIMIZATION GOAL

Suppose we have an algorithm that chooses π [r] ∈ P at the beginning of each frame r ∈
{0, 1, 2, . . .}. Assume temporarily that this algorithm yields well defined frame averages T and
y l with probability 1, so that:

1 1
R−1 R−1
lim T [r] = T (w.p.1) , lim yl [r] = y l (w.p.1) (7.3)
R→∞ R R→∞ R
r=0 r=0
7.1. THE RENEWAL SYSTEM MODEL 151
We want to design an algorithm that chooses policies π [r] over each frame r ∈ {0, 1, 2, . . .} to solve
the following problem:

Minimize: y 0 /T (7.4)
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L} (7.5)
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.6)

where (c1 , . . . , cL ) are a given collection of real numbers that define time average cost constraints for
each penalty.
The value y l /T represents the time average penalty associated with the yl [r] process. To
understand this, note that the time average penalty, sampled at renewal times, is given by:
R−1 R−1
yl [r] limR→∞ 1
r=0 yl [r] yl
lim r=0
R−1
= R
R−1 =
r=0 T [r] limR→∞ R1 T [r] T
R→∞
r=0

Hence, our goal is to minimize the time average associated with the y0 [r] penalty, subject to the
constraint that the time average associated with the yl [r] process is less than or equal to cl , for all
l ∈ {1, . . . , L}.
As before, we shall find it easier to work with time average expectations of the form:

1
R−1 R−1
1
T [R]= E {T [r]} , y l [R]= E {yl [r]} ∀l ∈ {0, 1, . . . , L} (7.7)
R R
r=0 r=0

Under mild boundedness assumptions on T [r] and yl [r] (for example, when these are determinis-
tically bounded), the Lebesgue dominated convergence theorem ensures that the limiting values of
T [R] and y l [R] also converge to T and y l whenever (7.3) holds (see Exercise 7.9).

7.1.2 OPTIMALITY OVER I.I.D. ALGORITHMS

Define an i.i.d. algorithm as one that, at the beginning of each new frame r ∈ {0, 1, 2, . . .}, chooses
a policy π[r] by independently and probabilistically selecting π ∈ P according to some distribution
that is the same for all frames r. Let π ∗ [r] represent such an i.i.d. algorithm. Then the values
{T̂ (π ∗ [r])}∞ ∗ ∞
r=0 are independent and identically distributed (i.i.d.) over frames, as are {ŷl (π [r])}r=0 .
∗ ∗
Thus, by the law of large numbers, these have well defined averages T and y l with probability 1,
where the averages are equal to the expectations over one frame. We say that the problem (7.4)-(7.6)
is feasible if there is an i.i.d. algorithm π ∗ [r] that satisfies:

E ŷl (π ∗ [r])
≤ cl ∀l ∈ {1, . . . , L} (7.8)
E T̂ (π ∗ [r])
152 7. OPTIMIZATION OF RENEWAL SYSTEMS
Assuming feasibility, we define ratioopt as the infimum value of the following quantity over
all i.i.d. algorithms that meet the constraints (7.8):

E ŷ0 (π ∗ [r])
E T̂ (π ∗ [r])

The following lemma is an immediate consequence of these definitions:

Lemma 7.1 If there is an i.i.d. algorithm that satisfies the feasibility constraints (7.8), then for any
δ > 0 there is an i.i.d. algorithm π ∗ [r] that satisfies:

E ŷ0 (π ∗ [r]) ≤ E T̂ (π ∗ [r]) (ratioopt + δ) (7.9)

E ŷl (π ∗ [r]) ≤ E T̂ (π ∗ [r]) cl ∀l ∈ {1, . . . , L} (7.10)

The value ratioopt is defined in terms of i.i.d. algorithms. It can be shown that, under mild
assumptions, the value ratioopt is also the infimum of the objective function in the problem (7.4)-
(7.6), which does not restrict to i.i.d. algorithms. This is similar in spirit to Theorems 4.18 and 4.5.
However, rather than stating these assumptions and proving this result, we simply use ratioopt as
our target, so that we desire to push the time average penalty objective as close as possible to the
smallest value that can be achieved over i.i.d. algorithms.
It is often useful to additionally assume that the following “Slater” assumption holds:
Slater Assumption for Renewal Systems: There is a value > 0 and an i.i.d. algorithm π ∗ [r]
such that:
E ŷl (π ∗ [r]) ≤ E T̂ (π ∗ [r]) (cl − ) ∀l ∈ {1, . . . , L} (7.11)

7.2 DRIFT-PLUS-PENALTY FOR RENEWAL SYSTEMS

For each l ∈ {1, . . . , L}, define virtual queues Zl [r] with Zl [0] = 0, and with dynamics as follows:

Zl [r + 1] = max[Zl [r] + yl [r] − cl T [r], 0] ∀l ∈ {1, . . . , L} (7.12)

Let Z [r] be the vector of queue values, and define the Lyapunov function L(Z [r]) by:

1
L

L(Z [r])= Zl [r]2 (7.13)
2
l=1

Define the conditional Lyapunov drift (Z [r]) as:

(Z [r])= E {L(Z [r + 1]) − L(Z [r])|Z [r]}
7.2. DRIFT-PLUS-PENALTY FOR RENEWAL SYSTEMS 153
Using the same techniques as in previous chapters, it is easy to show that:

L
(Z [r]) ≤ B + Zl [r]E ŷl (π [r]) − cl T̂ (π [r])|Z [r] (7.14)
l=1

where B is a finite constant that satisfies the following for all r and all possible Z [r]:

1
L
B≥ E (yl [r] − cl T [r])2 |Z [r] (7.15)
2
l=1

Such a finite constant B exists by the boundedness assumptions (7.1)-(7.2). The drift-plus-penalty
for frame r thus satisfies:

L

(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + V E ŷ0 (π [r])|Z [r] + Zl [r]E ŷl (π [r])|Z [r]
l=1

L
− Zl [r]cl E T̂ (π [r])|Z [r] (7.16)
l=1

This variable-frame drift methodology was developed in (56)(57) for optimizing delay in networks
defined on Markov chains. However, the analysis in (56)(57) used a policy based on minimizing the
right-hand-side of the above inequality, which was only shown to be effective for pure feasibility
problems (where ŷ0 (π [r]) = 0 for all r) or for problems where the frame durations are independent of
the policy (see also Exercise 7.3). Our algorithm below, which can be applied to the general problem,
is inspired by the decision rule in (58), which minimizes the ratio of expected drift-plus-penalty
over expected frame size.
Renewal-Based Drift-Plus-Penalty Algorithm: At the beginning of each frame r ∈ {0, 1, 2, . . .},
observe Z [r] and do the following:
• Choose a policy π[r] ∈ P that minimizes the following ratio:
L
E V ŷ0 (π [r]) + l=1 Zl [r]ŷl (π [r])|Z [r]
(7.17)
E T̂ (π [r])|Z [r]

• Update the virtual queues Zl [r] by (7.12).

As before, we define a C-additive approximation to the ratio-minimizing decision as follows.

Definition 7.2 A policy π [r] is a C-additive approximation of the policy that minimizes (7.17) if:
⎡ ⎤
E V ŷ0 (π [r]) + L l=1 Zl [r]ŷl (π [r])|Z [r] E V ŷ0 (π ) + L l=1 Zl [r]ŷl (π )|Z [r]
≤ C + inf ⎣ ⎦
E T̂ (π[r])|Z [r] π∈P E T̂ (π )|Z [r]
154 7. OPTIMIZATION OF RENEWAL SYSTEMS

In particular, if policy π [r] is a C-additive approximation, then:

L
E V ŷ0 (π [r]) + Zl [r]ŷl (π [r])|Z [r] ≤ CTmax
l=1
L
E V ŷ0 (π ∗ [r]) + l=1 Zl [r]ŷl (π
∗ [r])
+E T̂ (π [r])|Z [r] (7.18)
E T̂ (π ∗ [r])

where π ∗ [r] is any i.i.d. algorithm that is chosen in P and is independent of queues Z [r]. In the
above inequality, we have used the fact that:

E T̂ (π [r])|Z [r] ≤ Tmax

Theorem 7.3 (Renewal-Based Drift-Plus-Penalty Performance) Assume there is an i.i.d. algorithm

π ∗ [r] that satisfies the feasibility constraints (7.8). Suppose we implement the above renewal-based drift-
plus-penalty algorithm using a C-additive approximation for all frames r, with initial condition Zl [0] = 0
for all l ∈ {1, . . . , L}. Then:
a) All queues Zl [r] are mean rate stable, in that:

E {Zl [R]}
lim = 0 ∀l ∈ {1, . . . , L}
R→∞ R
b) For all l ∈ {1, . . . , L} we have:

lim sup(y l [R] − cl T [R]) ≤ 0 and so lim sup y l [R]/T [R] ≤ cl

R→∞ R→∞

where y l [R] and T [R] are defined in (7.7).

c) The penalty process y0 [r] satisfies the following for all R > 0:

B + CTmax
y 0 [R] − ratioopt T [R] ≤
V
where B is defined in (7.15).
d) If the Slater assumption (7.11) holds for a constant > 0, then all queues Zl [r] are strongly
stable and satisfy the following for all R > 0:

1
R−1 L
VF
E {Zl [r]} ≤ (7.19)
R Tmin
r=0 l=1
7.2. DRIFT-PLUS-PENALTY FOR RENEWAL SYSTEMS 155
where the constant F is defined below in (7.22). Further, if for all l ∈ {1, . . . , L}, yl [r] − cl T [r] is either
deterministically lower bounded or deterministically upper bounded, then queues Zl [r] are rate stable and:
R−1
r=0 yl [r]
1
R
lim sup 1 R−1 ≤ cl ∀l ∈ {1, . . . , L} (w.p.1)
R→∞ R r=0 T [r]

Proof. (Theorem 7.3) Because we use a C-additive approximation every frame r, we know that
(7.18) holds. Plugging the i.i.d. algorithm π ∗ [r] from (7.18) into the right-hand-side of the drift-
plus-penalty inequality (7.16) yields:

E T̂ (π [r])|Z [r]
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + V E ŷ0 (π ∗ [r])
E T̂ (π ∗ [r])

E T̂ (π [r])|Z [r] L L
∗
+ Zl [r]E ŷl (π [r]) − Zl [r]cl E T̂ (π [r])|Z [r] (7.20)
E T̂ (π ∗ [r]) l=1 l=1

where π ∗ [r] is any policy in P . Now fix δ > 0, and plug into the right-hand-side of (7.20) the policy
π ∗ [r] that satisfies (7.9)-(7.10), which makes decisions independent of Z [r], to yield:

(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + E T̂ (π [r])|Z [r] V (ratioopt + δ)

The above holds for all δ > 0. Taking a limit as δ → 0 yields:

(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + E T̂ (π [r])|Z [r] V ratioopt (7.21)

To prove part (a), we can rearrange (7.21) to yield:

(Z [r]) ≤ B + CTmax + V max[ratioopt Tmax , ratioopt Tmin ] − V y0,min

where we use max[ratioopt Tmax , ratioopt Tmin ] because ratioopt may be negative. This proves
that all components Zl [r] are mean rate stable by Theorem 4.1, proving part (a). The first lim sup
statement in part (b) follows immediately from mean rate stability of Zl [r] (via Theorem 2.5(b)).
The second lim sup statement in part (b) follows from the first (see Exercise 7.4).
To prove part (c), we take expectations of (7.21) to find:

E {L(Z [r + 1])} − E {L(Z [r])} + V E {y0 [r]} ≤ B + CTmax + E T̂ (π [r]) V ratioopt

Summing over r ∈ {0, . . . , R − 1} and dividing by RV yields:

R−1 R−1
E {L(Z [R])} − E {L(Z [0])} 1 B + CTmax 1
+ E {y0 [r]} ≤ + ratioopt E {T [r]}
RV R V R
r=0 r=0
156 7. OPTIMIZATION OF RENEWAL SYSTEMS

Using the definitions of y 0 [R] and T [R] in (7.7) and noting that E {L(Z [R])} ≥ 0 and
E {L(Z [0])} = 0 yields:

B + CTmax
y 0 [R] ≤ + ratioopt T [R]
V
This proves part (c).
Part (d) follows from plugging the policy π ∗ [r] from (7.11) into (7.20) to obtain:

E T̂ (π [r])|Z [r] L
(Z [r]) + V E {y0 [r]|Z [r]} ≤ B + CTmax + V y0,max − Tmin Zl [r]
E T̂ (π ∗ [r]) l=1

This can be written in the form:

L
(Z [r]) ≤ V F − Tmin Zl [r]
l=1

where the constant F is defined:

B + CTmax Tmax Tmin
F= + max y0,max , y0,max − y0,min (7.22)
V Tmin Tmax

Thus, from Theorem 4.1, we have that (7.19) holds, so that all queues Zl [r] are strongly stable.
In the special case when the yl [r] − cl T [r] are deterministically bounded, we have by the Strong
Stability Theorem (Theorem 2.8) that all queues are rate stable. Thus, by Theorem 2.5(a):

1 1
R−1 R−1
lim sup yl [r] − cl T [r] ≤ 0 (w.p.1)
R→∞ R R
r=0 r=0

However:
R−1
R−1
R−1
r=0 yl [r] r=0 yl [r] r=0 T [r]
1 1 1
R
R−1
− cl ≤ max R1 R−1
− cl , 0 R
R−1
r=0 T [r] r=0 T [r] T [r]
1 1
R R R
r=0
1
R−1
1
R−1
1
= max yl [r] − cl T [r], 0 R−1
(7.23)
r=0 T [r]
R R 1
r=0 r=0 R

Further, because for all r ∈ {1, 2, . . .} we have E {T [r]|T [0], T [1], . . . , T [r − 1]} ≥ Tmin and
E T [r]2 |T [0], T [1], . . . , T [r − 1] ≤ D 2 , from Lemma 4.3 it follows that:

1
R−1
lim inf T [r] ≥ Tmin > 0 (w.p.1)
R→∞ R
r=0
7.3. MINIMIZING THE DRIFT-PLUS-PENALTY RATIO 157
and so taking a lim sup of (7.23) yields:
R−1
r=0 yl [r]
1
1
lim sup R
R−1 − cl ≤ 0 × = 0 (w.p.1)
r=0 T [r]
1 Tmin
R→∞ R

This proves part (d). 2

The above theorem shows that time average penalty can be pushed to within O(1/V ) of
optimal (for arbitrarily large V ). The tradeoff is that the virtual queues are O(V ) in size, which
affects the time required for the penalties to be close to their required time averages cl .

7.2.1 ALTERNATE FORMULATIONS

In some cases, we care more about y l itself, rather than y l /T . Consider the following variation of
problem (7.4)-(7.6):

Minimize: y 0 /T
Subject to: y l ≤ 0 ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}

This changes the constraints from y l /T ≤ cl to y l ≤ 0. However, this is just a special case of the
original problem (7.4)-(7.6) with cl = 0.
Now suppose we seek to minimize y 0 , rather than y 0 /T . The problem is:

Minimize: y0
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}

This problem has a significantly different structure than (7.4)-(7.6), and it is considerably easier to
solve. Indeed, Exercise 7.3 shows that it can be solved by minimizing an expectation every frame,
rather than a ratio of expectations.
Finally, we note that Exercise 7.5 explores an alternative algorithm for the original problem
(7.4)-(7.6). The alternative uses only a minimum of an expectation every frame, rather than a ratio
of expectations.

7.3 MINIMIZING THE DRIFT-PLUS-PENALTY RATIO

We re-write the drift-plus-penalty ratio (7.17) in the following simplified form:
E {a(π)}
E {b(π )}
where a(π ) represents the numerator and b(π ) the denominator, both expressed as a function of the
policy π ∈ P . We note that Tmax ≥ E {b(π )} ≥ Tmin > 0 for all π ∈ P . Define θ ∗ as the infimum
158 7. OPTIMIZATION OF RENEWAL SYSTEMS
of the above ratio:
∗ E {a(π)}
θ = inf (7.24)
π∈P E {b(π )}
We want to understand how to find θ ∗ .
In the special case when E {b(π )} does not depend on the policy π (which holds when the
expected renewal interval size is the same for all policies), the minimization is achieved by choosing
π ∈ P to minimize E {a(π)}. This is important because the minimization of an expectation is typi-
cally much simpler than a minimization of the ratio of expectations, and it can often be accomplished
through dynamic programming algorithms (64)(67)(57) and their special cases of stochastic shortest path
algorithms.
To treat the case when E {b(π )} may depend on the policy, we use the following simple but
useful lemmas.

Lemma 7.4 For any policy π ∈ P , we have:

E a(π) − θ ∗ b(π ) ≥ 0 (7.25)

with equality if and only if policy π achieves the infimum ratio E {a(π)} /E {b(π )} = θ ∗ .

Proof. By definition of θ ∗ , we have for any policy π ∈ P :

E {a(π)} E {a(π)}
≥ inf = θ∗
E {b(π )} π∈P E {b(π )}

Multiplying both sides by E {b(π )} and noting that E {b(π )} > 0 yields (7.25). That equality holds
if and only if E {a(π )} /E {b(π )} = θ ∗ follows immediately. 2

Lemma 7.5 We have:

inf E a(π) − θ ∗ b(π ) = 0 (7.26)
π∈P

Further, for any value θ ∈ R, we have:

inf E {a(π) − θ b(π )} < 0 if θ > θ ∗ (7.27)

π∈P

inf E {a(π) − θ b(π )} > 0 if θ < θ ∗ (7.28)

π∈P
7.3. MINIMIZING THE DRIFT-PLUS-PENALTY RATIO 159
Proof. To prove (7.26), note from Lemma 7.4 that we have for any policy π :

∗
E {a(π)} ∗ E {a(π)} ∗
0 ≤ E a(π ) − θ b(π) = E {b(π )} − θ ≤ Tmax −θ
E {b(π )} E {b(π )}

Taking infimums over π ∈ P of the above yields:

E {a(π)}
0 ≤ inf E a(π) − θ ∗ b(π ) ≤ Tmax inf − θ∗ =0
π∈P π∈P E {b(π )}

where the final equality uses the definition of θ ∗ in (7.24). This proves (7.26).
To prove (7.27), suppose that θ > θ ∗ . Then:

inf E {a(π ) − θb(π )} = inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )E {b(π )}
π∈P π∈P
≤ inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )Tmin
π∈P
= −(θ − θ ∗ )Tmin < 0

where we have used (7.26). This proves (7.27). To prove (7.28), suppose θ < θ ∗ . Then:

inf E {a(π ) − θb(π )} = inf [E a(π) − θ ∗ b(π ) + E (θ ∗ − θ )b(π ) ]
π∈P π∈P
≥ inf E a(π) − θ ∗ b(π ) + (θ ∗ − θ )Tmin
π∈P
= (θ ∗ − θ )Tmin > 0

7.3.1 THE BISECTION ALGORITHM

Lemmas 7.4 and 7.5 show that we can approach the optimal ratio θ ∗ with a simple iterative bisection
algorithm that computes infimums of expectations at each step. Specifically, suppose that on stage k
(k) (k)
of the iteration, we have finite bounds θmin and θmax such that we know:

θmin < θ ∗ < θmax

(k) (k)

(k)
Define θbisect as:
(k) (k) (k)
θbisect = (θmax + θmin )/2

We then compute inf π∈P E a(π) − θbisect b(π ) . If the result is 0, then θbisect = θ ∗ . If the result
(k) (k)

is positive then we know θbisect < θ ∗ , and if the result is negative we know θbisect > θ ∗ . We then
(k) (k)

appropriately adjust our upper and lower bounds for stage k + 1. The uncertainty interval decreases
by a factor of 2 on each stage, and so this algorithm converges exponentially fast to the value θ ∗ . This
is useful because each stage involves minimizing an expectation, rather than a ratio of expectations.
160 7. OPTIMIZATION OF RENEWAL SYSTEMS
7.3.2 OPTIMIZATION OVER PURE POLICIES
Let P pure be any finite or countably infinite set of policies that we call pure policies:

P pure = {π1 , π2 , π3 , . . .}

Let P be the larger policy space that considers all probabilistic mixtures of pure policies. Specifically,
the space P considers policies that make a randomized decision about which policy πi ∈ P pure

to use, according to some probabilities qi = P r[Implement policy πi ] with ∞ i=1 qi = 1. It turns
out that minimizing the ratio E {a(π)} /E {b(π )} over π ∈ P can be achieved by considering only
pure policies π ∈ P pure . To see this, define θ ∗ as the infimum ratio over π ∈ P , and for simplicity,
assume that θ ∗ is achieved by some particular policy π ∗ ∈ P , which corresponds to a probability
distribution (q1∗ , q2∗ , . . .) for selecting pure policies (π1 , π2 , . . .). Then:
∞

0 = E a(π ∗ ) − θ ∗ b(π ∗ ) = qi∗ E a(πi ) − θ ∗ b(πi )
i=1
∞

≥ qi∗ inf ∗
E a(π) − θ b(π )
π∈P pure
i=1
= inf E a(π) − θ ∗ b(π )
π∈P pure

On the other hand, because P is a larger policy space than P pure , we have:

0 = inf E a(π) − θ ∗ b(π ) ≤ infpure E a(π) − θ ∗ b(π )
π∈P π∈P

Thus:
infpure E a(π) − θ ∗ b(π ) = 0
π∈P
which shows that the infimum ratio θ ∗ can be found over the set of pure policies.
The same result holds more generally: Let P pure be any (possibly uncountably infinite) set of
policies that we call pure policies. Define as the set of all vectors (E {a(π)} , E {b(π )}) that can
be achieved by policies π ∈ P pure . Suppose P is a larger policy space that contains all pure policies
and is such that the set of all vectors (E {a(π)} , E {b(π )}) that can be achieved by policies π ∈ P is
equal to the convex hull of , denoted Conv( ).1 If θ ∗ is the infimum ratio of E {a(π)} /E {b(π)}
over π ∈ P , then:

0 = inf E a(π) − θ ∗ b(π ) = inf [a − θ ∗ b]
π∈P (a,b)∈Conv( )
= inf [a − θ ∗ b]
(a,b)∈

= infpure E a(π) − θ ∗ b(π )
π∈P
1The convex hull of a set ⊆ Rk (for some integer k > 0) is the set of all finite probabilistic mixtures of vectors in . It can be
shown that Conv( ) is the set of all expectations E {X} that can be achieved by random vectors X that take values in the set
according to any probability distribution that leads to a finite expectation.
7.3. MINIMIZING THE DRIFT-PLUS-PENALTY RATIO 161
where we have used the well known fact that the infimum of a linear function over the convex hull
of a set is equal to the infimum over the set itself. Therefore, by Lemma 7.4, it follows that θ ∗ is also
the infimum ratio of E {a(π )} /E {b(π )} over the smaller set of pure policies P pure .

7.3.3 CAVEAT — FRAMES WITH INITIAL INFORMATION

Suppose at the beginning of each frame r, we observe a vector η[r] of initial information that
influences the penalties and frame size. Assume {η[r]}∞ r=0 is i.i.d. over frames. Each policy π ∈ P first
observes η[r] and then chooses a sub-policy π ∈ Pη[r] that possibly depends on the observed η[r].
One might (incorrectly) implement the policy that first observes
η[r] π ∈ Pη[r]
and then chooses
that minimizes the ratio of conditional expectations E a(π )|η[r] /E b(π )|η[r] . This would
work if the denominator does not depend on the policy, but it may be incorrect in general. Minimizing
the ratio of expectations is not always achieved by the policy that minimizes the ratio of conditional
expectations given the observed initial information. For example, suppose there are two possible
initial vectors η1 and η2 , both equally likely. Suppose there are two possible policies for each vector:
• Under η1 : π11 gives [a = 1, b = 1], π12 gives [a = 2, b = 1].
• Under η2 : π21 gives [a = 20, b = 10], π22 gives [a = .4, b = .1].
It can be shown that any achievable (E {a(π)} , E {b(π )}) vector can be achieved by a probabilistic
mixture of the following four pure policies:
• Pure policy π1 : Choose π11 if η[r] = η1 , π21 if η[r] = η2 .
• Pure policy π2 : Choose π11 if η[r] = η1 , π22 if η[r] = η2 .
• Pure policy π3 : Choose π12 if η[r] = η1 , π21 if η[r] = η2 .
• Pure policy π4 : Choose π12 if η[r] = η1 , π22 if η[r] = η2 .
Clearly π11 minimizes the conditional ratio a/b given η1 , and π21 minimizes the conditional ratio
a/b given η2 . The policy π1 that chooses π11 whenever η1 is observed, and π21 whenever η2 is
observed, yields:
E {a(π1 )} (1/2)1 + (1/2)20 10.5
= = ≈ 1.909
E {b(π1 )} (1/2)1 + (1/2)10 5.5
On the other hand, the policy that minimizes the ratio E {a(π)} /E {b(π )} is the policy π2 , which
chooses π11 whenever η1 is observed, and chooses π22 whenever η2 is observed:
E {a(π2 )} (1/2)1 + (1/2)0.4 .7
= = ≈ 1.273
E {b(π2 )} (1/2)1 + (1/2)0.1 .55
A correct minimization of the ratio can be obtained as follows: If we happen to know the
optimal ratio θ ∗ , we can use the fact that:

0 = inf E a(π ) − θ ∗ b(π ) = E inf E a(π ) − θ ∗ b(π )|η[r]
π∈P π ∈Pη [r]
162 7. OPTIMIZATION OF RENEWAL SYSTEMS
and so using the policy π ∗ that first observes η[r]
and then chooses π ∈ Pη[r] to minimize the con-
ditional expectation E a(π ) − θ b(π )|η[r] yields E {a(π ) − θ ∗ b(π ∗ )} = 0, which by Lemma
∗ ∗

7.4 shows it must also minimize the ratio E {a(π)} /E {b(π )}.
If θ ∗ is unknown, we can compute an approximation of θ ∗ via the bisection algorithm as
follows. At step k, we have θbisect [k], and we want to compute:

inf E {a(π ) − θbisect [k]b(π )} = E inf E a(π ) − θbisect [k]b(π )|η[r]
π∈P π ∈Pη [r]

This can be done by generating a collection of W i.i.d. samples {η1 , η2 , . . . , ηW } (all with the same
distribution as η[r]), computing the infimum conditional expectation for each sample, and then
using the law of large numbers to approximate the expectation as follows:

E inf E a(π ) − θbisect [k]b(π )|η[r] ≈
π ∈Pη [r]

1
W

inf E a(π ) − θbisect [k]b(π )|η[r] = ηw = val(θbisect [k]) (7.29)
W π ∈P η
w=1 w

For a given frame r, the same samples {η1 , . . . , ηW } should be used for each step of the
bisection routine. This ensures the stage-r approximation function val(θ) uses the same samples
and is thus non-increasing in θ, important for the bisection to work properly (see Exercise 7.2).
However, new samples should be used on each frame. If it is difficult to generate new i.i.d. samples
{η1 , . . . , ηW } on each frame (possibly because the distribution of η[r] is unknown), we can use W
past values of η[r]. There is a subtle issue here because these past values are not independent of the
queue backlogs Zl [r] that are part of the a(π) function. However, using these past values can still
be shown to work via a delayed-queue argument given in the max-weight learning theory of (166).

7.4 TASK PROCESSING EXAMPLE

Consider a network of L wireless nodes that collaboratively process tasks and report the results to a
receiver.There are an infinite sequence of tasks {T ask[0], T ask[1], T ask[2], . . .} that are performed
back-to-back, and the starting time of task r ∈ {0, 1, 2, . . .} is considered to be the start of renewal
frame r. At the beginning of each task r ∈ {0, 1, 2, . . .}, the network observes a vector η[r] of
task information. We assume {η[r]}∞ r=0 is i.i.d. over tasks with an unknown distribution. Every task
must be processed using one of K pure policies P pure = {π1 , π2 , . . . , πK }. The frame size T [r], task
processing utility g[r], and energy expenditures yl [r] for each node l ∈ {1, . . . , L} are deterministic
functions of η[r] and π [r]:

T [r] = T̂ (η[r], π[r]) , g[r] = ĝ(η[r], π [r]) , yl [r] = ŷl (η[r], π[r])
7.4. TASK PROCESSING EXAMPLE 163
Let pav be a positive constant. The goal is design an algorithm to solve:
Maximize: g/T
Subject to: y l /T ≤ pav ∀l ∈ {1, . . . , L}
π [r] ∈ P pure ∀r ∈ {0, 1, 2, . . .}
Example Problem:
a) State the renewal-based drift-plus-penalty algorithm for this problem.
b) Assume that the frame size is independent of the policy, so that T̂ (η[r], π [r]) = T̂ (η[r]).
Show that minimization of the ratio of expectations can be done without bisection, by solving a
single deterministic problem every slot.
c) Assume the general case when the frame size depends on the policy. Suppose the optimal
ratio value θ ∗ [r] is known for frame r. State the deterministic problem to solve every slot, with the
structure of minimizing a(π ) − θ ∗ [r]b(π ) as in Section 7.3.3.
d) Describe the bisection algorithm that obtains an estimate of θ ∗ [r] for part (c). Assume we
have W past values of initial information {η[r], η[r − 1], . . . , η[r − W + 1]}, and that we know
θmin ≤ θ ∗ [r] ≤ θmax for some constants θmin and θmax .

Solution:
a) Create virtual queues Zl [r] for each l ∈ {1, . . . , L} as follows:
Zl [r + 1] = max[Zl [r] + ŷl (η[r], π[r]) − T̂ (η[r], π[r])pav , 0] (7.30)
Every frame r ∈ {0, 1, 2, . . .}, observe η[r] and Z [r] and do the following:
• Choose π[r] ∈ P pure to minimize:
L
E −V ĝ(η[r], π[r]) + l=1 Zl [r]ŷl (η[r], π[r])|Z [r]
(7.31)
E T̂ (η[r], π[r])|Z [r]

• Update queues Zl [r] according to (7.30).

b) If E T̂ (η[r], π [r])|Z [r] does not depend on the policy, it suffices to minimize the
numerator in (7.31). This is done by observing η[r] and Z [r] and choosing the policy π [r] ∈ P pure
as the one that minimizes:

L
−V ĝ(η[r], π [r]) + Zl [r]ŷl (η[r], π[r])
l=1

c) If θ ∗ [r] is known, then we observe η[r] and Z [r] and choose the policy π [r] ∈ P pure as
the one that minimizes:

L
−V ĝ(η[r], π [r]) + Zl [r]ŷl (η[r], π[r]) − θ ∗ [r]T̂ (η[r], π [r])
l=1
164 7. OPTIMIZATION OF RENEWAL SYSTEMS

d) Fix a particular frame r. Let θmin and θmax be the bounds on θ ∗ [r] for step k of the bisection,
(k) (k)
(0) (k) (k) (k) (k)
where θmin = θmin and θmax = θmax . Define θbisect = (θmin + θmax )/2. Define {η1 , . . . ηW } as the
W samples to be used. Define the function val(θ) as follows:

1
W L
val(θ) = min −V ĝ(ηi , π) + Zl [r]ŷl (ηi , π) − θ T̂ (ηi , π) (7.32)
W π ∈P pure
i=1 l=1

Note that computing val(θ) involves W separate minimizations. Note also that val(θ) is non-
(k)
increasing in θ (see Exercise 7.2). Now compute val(θbisect ):

• If val(θbisect ) = 0, we are done and we declare θbisect as our estimate of θ ∗ [r].

(k) (k)

(k) (k+1) (k) (k+1) (k)

• If val(θbisect ) > 0, then define θmin = θbisect , θmax = θmax .
(k) (k+1) (k) (k+1) (k)
• If val(θbisect ) < 0, then define θmin = θmin , θmax = θbisect .
Then proceed with the iterations until our error bounds are sufficiently low. Note that this algorithm
(0) (0)
requires val(θmin ) ≥ 0 ≥ val(θmax ), which should be checked before the iterations begin. If this is
(0) (0)
violated, we simply increase θmax and/or decrease θmin .

7.5 UTILITY OPTIMIZATION FOR RENEWAL SYSTEMS

Now consider a renewal system that generates both a penalty vector y [r] = (y1 [r], . . . , yL [r])
and an attribute vector x[r] = (x1 [r], . . . , xM [r]). These are random functions of the policy π [r]
implemented on frame r:

xm [r] = x̂m (π [r]) , yl [r] = ŷl (π [r]) ∀m ∈ {1, . . . , M}, l ∈ {1, . . . , L}

The frame size T [r] is also a random function of the policy as before: T [r] = T̂ (π[r]). We make
the same assumptions as before, including that second moments of x̂m (π [r]) are uniformly bounded
regardless of the policy, and that the conditional distribution of (T [r], y [r], x[r]), given π[r] = π ,
is independent of events on previous frames, and is identically distributed on each frame that uses
the same policy π . Let Tmin , Tmax , xm,min , xm,max be finite constants such that for all policies π ∈ P
and all m ∈ {1, . . . , M}, we have:

0 < Tmin ≤ E T̂ (π [r])|π [r] = π ≤ Tmax , xm,min ≤ E x̂(π [r])|π [r] = π ≤ xm,max

Under a particular algorithm for choosing policies π[r] over frames r ∈ {0, 1, 2, . . .}, define T [R],
y l [R], x m [R] for R > 0 by:

1
R−1 R−1 R−1
1 1
T [R]= E {T [r]} , y l [R]= E {yl [r]} , x m [R]= E {xm [r]}
R R R
r=0 r=0 r=0
7.5. UTILITY OPTIMIZATION FOR RENEWAL SYSTEMS 165

Define T , y l , x m as the limiting values of T [R], y l [R], x m [R], assuming temporarily that the limit
exists. For each m ∈ {1, . . . , M}, define γm,min and γm,max by:

xm,min xm,min xm,max xm,max
γm,min = min , , γm,max = max ,
Tmin Tmax Tmin Tmax

It is clear that for all m ∈ {1, . . . , M} and all R > 0, we have:

x m [R] xm
γm,min ≤ ≤ γm,max , γm,min ≤ ≤ γm,max (7.33)
T [R] T
Let φ(γ ) be a continuous, concave, and entrywise non-decreasing function of vector γ =
(γ1 , . . . , γM ) over the rectangle γ ∈ R, where:

R = {(γ1 , . . . , γM )|γm,min ≤ γm ≤ γm,max ∀m ∈ {1, . . . , M}} (7.34)

Consider the following problem:

Maximize: φ(x/T ) (7.35)

Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L} (7.36)
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.37)

To transform this problem to one that has the structure given in Section 7.1.1, we define
auxiliary variables γ [r] = (γ1 [r], . . . , γM [r]) that are chosen in the rectangle R every frame r. We
then define a new penalty y0 [r] as follows:

y0 [r]= − T [r]φ(γ [r])

Now consider the following transformed (and equivalent) problem:

Maximize: T φ(γ )/T (7.38)

Subject to: x m ≥ T γm ∀m ∈ {1, . . . , M} (7.39)
y l /T ≤ cl ∀l ∈ {1, . . . , L} (7.40)
γ [r] ∈ R ∀r ∈ {0, 1, 2, . . .} (7.41)
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.42)

where:

1
R−1

T φ(γ ) = lim E {T [r]φ(γ [r])} = −y 0
R→∞ R
r=0
1
R−1

T γm = lim E {T [r]γm [r])} ∀m ∈ {1, . . . , M}
R→∞ R
r=0
166 7. OPTIMIZATION OF RENEWAL SYSTEMS
That the problems (7.35)-(7.37) and (7.38)-(7.42) are equivalent is proven in Exercise 7.7
using the fact:
T φ(γ )/T ≤ φ(T γ /T )
This fact is a variation on Jensen’s inequality and is proven in the following lemma.

Lemma 7.6 Let φ(γ ) be any continuous and concave (not necessarily non-decreasing) function defined
over γ ∈ R, where R is defined in (7.34).
(a) Let (T , γ ) be a random vector that takes values in the set {(T , γ )|T > 0, γ ∈ R} according to
any joint distribution that satisfies 0 < E {T } < ∞. Then:

E {T φ(γ )} E {T γ }
≤φ
E {T } E {T }

(b) Let (T [r], γ [r]) be a sequence of random vectors of the type specified in part (a), for r ∈
{0, 1, 2, . . .}. Then for any integer R > 0:
R−1
1 R−1
r=0 T [r]φ(γ [r]) 1
r=0 T [r]γ [r]
R
R−1
≤ φ R
1 R−1
(7.43)
r=0 T [r] r=0 T [r]
1

1 R−1 1 R−1
R R
r=0 E {T [r]φ(γ [r])} r=0 E {T [r]γ [r]}
R
1 R−1
≤ φ R
1 R−1
(7.44)
R r=0 E {T [r]} R r=0 E {T [r]}

and thus T φ(γ )/T ≤ φ(T γ /T ).

Proof. Part (b) follows easily from part (a) (see Exercise 7.6). Here we prove part (a). Let
{(T [r], γ [r])}∞
r=0 be an i.i.d. sequence of random vectors, each with the same distribution as (T , γ ).
R−1
Define t0 = 0, and for integers R > 0 define tR = r=0 T [r]. Let interval [tr , tr+1 ) represent the
rth frame. Define γ̂ (t) to take the value γ [r] if t is in the rth frame, so that:

γ̂ (t) = γ [r] if t ∈ [tr , tr+1 )

We thus have for any integer R > 0:

R−1 R−1
r=0 T [r]φ(γ [r])
1
1 tR T [r]φ(γ [r])
φ(γ̂ (t))dt = r=0
R−1 = R
1 R−1
(7.45)
tR 0 r=0 T [r] R r=0 T [r]

On the other hand, by Jensen’s inequality for the concave function φ(γ ):
tR R−1
r=0 T [r]γ [r]
1
1 tR 1
φ(γ̂ (t))dt ≤ φ γ̂ (t)dt = φ R 1 R−1
(7.46)
r=0 T [r]
tR 0 tR 0
R
7.5. UTILITY OPTIMIZATION FOR RENEWAL SYSTEMS 167
Taking limits of (7.45) as R → ∞ and using the law of large numbers yields:

1 tR E {T φ(γ )}
lim φ(γ̂ (t))dt = (w.p.1)
R→∞ tR 0 E {T }
Taking limits of (7.46) as R → ∞ and using the law of large numbers and continuity of φ(γ ) yields:

1 tR E {T γ }
lim φ(γ̂ (t))dt ≤ φ (w.p.1)
R→∞ tR 0 E {T }
2

7.5.1 THE UTILITY OPTIMAL ALGORITHM FOR RENEWAL SYSTEMS

To solve (7.38)-(7.42), we enforce the constraints x m ≥ T γm and y l /T ≤ cl with virtual queues
Zl [r] and Gm [r] for l ∈ {1, . . . , L} and m ∈ {1, . . . , M}:

Zl [r + 1] = max[Zl [r] + yl [r] − T [r]cl , 0] (7.47)

Gm [r + 1] = max[Gm [r] + T [r]γm [r] − xm [r], 0] (7.48)

Note that the constraint x m ≥ T γm is equivalent to T γm − xm ≤ 0, which is the same as

pm /T ≤ 0 for pm [r]= T [r]γm [r] − xm [r]. Hence, the transformed problem fits the general renewal
framework (7.4)-(7.6). Using y0 [r] = −T [r]φ(γ [r]), the algorithm then observes Z [r], G[r] at the
beginning of each frame r ∈ {0, 1, 2, . . . , } and chooses a policy π [r] ∈ P and auxiliary variables
γ [r] ∈ R to minimize:

−V E T̂ (π [r])φ(γ [r])|Z [r], G[r]

E T̂ (π[r])|Z [r], G[r]

L M
E l=1 Zl [r]ŷl (π [r]) + m=1 Gm [r][T̂ (π[r])γm [r] − x̂(π [r])]|Z [r], G[r]
+
E T̂ (π[r])|Z [r], G[r]

This minimization can be simplified by separating out the terms that use auxiliary variables. The
expression to minimize is thus:
M
E T̂ (π [r])[−V φ(γ [r]) + m=1 Gm [r]γm [r]]|Z [r], G[r]

E T̂ (π[r])|Z [r], G[r]

L M
E l=1 Zl [r]ŷl (π [r]) − m=1 Gm [r]x̂(π [r])|Z [r], G[r]
+
E T̂ (π [r])|Z [r], G[r]

Clearly, the γ [r] variables can be optimized separately to minimize the first term, making the
frame size in the numerator and denominator of the first term cancel. The resulting algorithm is
168 7. OPTIMIZATION OF RENEWAL SYSTEMS
thus: Observe Z [r] and G[r] at the beginning of each frame r ∈ {0, 1, 2, . . .}, and perform the
following:

• (Auxiliary Variables) Choose γ [r] to solve:

Maximize: V φ(γ [r]) − Mm=1 Gm [r]γm [r]
Subject to: γm,min ≤ γm [r] ≤ γm,max ∀m ∈ {1, . . . , M}

• (Policy Selection) Choose π [r] ∈ P to minimize the following:

L M
E l=1 Zl [r]ŷl (π [r]) − m=1 Gm [r]x̂(π [r])|Z [r], G[r]

E T̂ (π[r])|Z [r], G[r]

• (Virtual Queue Updates) At the end of frame r, update Z [r] and G[r] by (7.47) and (7.48).

The auxiliary variable update has the same structure as that given in Chapter 5, and it is
a deterministic optimization that reduces to M optimizations of single variable functions if φ(γ )

has the form φ(γ ) = M m=1 φm (γm ). The policy selection stage is a minimization of a ratio of
expectations, and it can be solved with the techniques given in Section 7.3.

7.6 DYNAMIC PROGRAMMING EXAMPLES

This section presents more complex renewal system examples that involve the theory of dynamic
programming. Readers unfamiliar with dynamic programming can skip this section, and are referred
to (64) for a coverage of that theory. Readers familiar with dynamic programming can peruse these
examples.

7.6.1 DELAY-LIMITED TRANSMISSION EXAMPLE

Here we present an example similar to the delay-limited transmission system developed for coopera-
tive communication in (71), although we remove the cooperative component for simplicity. Consider
a system with L wireless transmitters that deliver data to a common receiver. Time is slotted with
unit size, and all frames are fixed to T slots, where T is a positive integer. At the beginning of each
frame r ∈ {0, 1, 2, . . .}, new packets arrive for transmission. These packets must be delivered within
the T slot frame τ ∈ {rT , . . . , (r + 1)T − 1}, or they are dropped at the end of the frame. Let
A[r] = (A1 [r], . . . , AL [r]) be the vector of new packet arrivals, treated as initial information about
frame r. Assume that A[r] is i.i.d. over frames. On each slot τ of the T -slot frame, at most one
transmitter l ∈ {1, . . . , L} is allowed to transmit, and it can transmit at most a single packet. Let
Ql (τ ) represent the (integer) queue size for transmitter l on slot τ . Then for frame r ∈ {0, 1, . . .},
7.6. DYNAMIC PROGRAMMING EXAMPLES 169
we have:
Ql (rT ) = Al [r]
+v−1
rT
Ql (rT + v) = Al [r] − 1l (τ ) , ∀v ∈ {1, . . . , T − 1}
τ =rT

where 1l (τ ) is an indicator function that is 1 if transmitter l successfully delivers a packet on slot τ ,

and is 0 otherwise.
The success of each packet transmission depends on the power that was used. Let p(τ ) =
(p1 (τ ), . . . , pL (τ )) represent the power allocation vector on each slot τ in the T -slot frame. This
vector is chosen every slot τ subject to the constraints:
0 ≤ pl (τ ) ≤ pmax ∀l ∈ {1, . . . , L}, ∀τ
pl (τ ) = 0 if Ql (τ ) = 0 ∀l ∈ {1, . . . , L}, ∀τ
pl (τ )pm (τ ) = 0 ∀l = m, ∀τ
The third constraint above ensures at most one transmitter can send on any given slot. Transmission
successes are conditionally independent of past history given the transmission power used, with
success probability for each l ∈ {1, . . . , L} given by:

ql (p)= P r[transmitter l is successful on slot τ |pl (τ ) = p, Ql (τ ) > 0]
We assume that ql (0) = 0 for all l ∈ {1, . . . , L}. Define Dl [r] and yl [r] as the total packets delivered
and total energy expended by transmitter l on frame r:
+T −1
rT

Dl [r] = 1l (τ ) ∀l ∈ {1, . . . , L}
τ =rT
+T −1
rT

yl [r] = pl (τ ) ∀l ∈ {1, . . . , L}
τ =rT

The goal is to maximize a weighted sum of throughput subject to average power constraints:
L
Maximize: l=1 wl D l /T
Subject to: y l /T ≤ pav ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}
where {wl }L l=1 are a given collection of positive weights, pav is a given constant power constraint, D l
and y l are the average delivered data and energy expenditure by transmitter l on one frame, and P is
the policy space that conforms to the above transmission constraints over the frame. This problem
fits the standard renewal form given in Section 7.1 with cl = pav for all l ∈ {1, . . . , L}, and:

L +T −1
rT

y0 [r]= − wl 1l (τ )
l=1 τ =rT
170 7. OPTIMIZATION OF RENEWAL SYSTEMS
We thus form virtual queues Zl [r] for each l ∈ {1, . . . , L}, with updates:
rT +T −1

Zl [r + 1] = max Zl [r] + pl (τ ) − pav T , 0 (7.49)
τ =rT

Then perform the following:

• For every frame r, observe A[r] and make actions over the course of the frame to solve:

+T −1
L rT
Maximize: E {V wl 1l (τ ) − Zl [r]pl (τ )|Z [r], A[r]}
l=1 τ =rT
Subject to: (1) 0 ≤ ≤ pmax ∀l, ∀τ ∈ {rT , . . . , rT + T − 1}
pl (τ )
(2) pl (τ ) = 0 if Ql (τ ) = 0 ∀l, ∀τ ∈ {rT , . . . , rT + T − 1}
(3) pl (τ )pm (τ ) = 0 ∀l = m, ∀τ ∈ {rT , . . . , rT + T − 1}

• Update Zl [r] according to (7.49).

The above uses the fact that the desired ratio (7.17) in this case has a constant denominator T ,
and hence it suffices to minimize the numerator, which can be achieved by minimizing the conditional
expectation given both the Z [r] and A[r] values. Using iterated expectations, the expression to be
maximized can be re-written:
+T −1
L rT
E {V wl ql (pl (τ )) − Zl [r]pl (τ )|Z [r], A[r]}
l=1 τ =rT

The problem can be solved as a dynamic program (64). Specifically, we can start backwards and
define JT (Q) as the optimal reward in the final stage T (corresponding to slot τ = rT + T − 1)
given that Q(rT + T − 1) = Q:

JT (Q)= max max [V wl ql (p) − Zl [r]p]
l|Ql >0 {p|0≤p≤pmax }

This function JT (Q) is computed for all integer vectors Q that satisfy 0 ≤ Q ≤ A[r]. Then define
JT −1 (Q) as the optimal expected sum reward in the last two stages {T − 1, T }, given that Q(rT +
T − 2) = Q:

JT −1 (Q)= max max [V wl ql (p) − Zl [r]p + ql (p)JT (Q − el ) + (1 − ql (p))JT (Q)]
l|Ql >0 {p|0≤p≤pmax }

where el is a vector that is zero in all entries j = l, and is 1 in entry l. The function JT −1 (Q) is also
computed for all Q that satisfy 0 ≤ Q ≤ A[r]. In general, we have for stages k ∈ {1, . . . , T − 1}
the following recursive equation:

Jk (Q)= max max [V wl ql (p) − Zl [r]p + ql (p)Jk+1 (Q − el ) + (1 − ql (p))Jk+1 (Q)]
l|Ql >0 {p|0≤p≤pmax }
7.6. DYNAMIC PROGRAMMING EXAMPLES 171
The value J1 (Q) represents the expected total reward over frame r under the optimal policy, given
that Q(rT ) = Q. The optimal action to take at each stage k corresponds to the transmitter l and
the power level p that achieves the maximum in the computation of Jk (Q).
For a modified problem where power allocations are restricted to pl (τ ) ∈ {0, pmax }, it can
be shown the problem has a simple greedy solution: On each slot τ of frame r, consider the set
of links l such that Ql (τ ) > 0, and transmit over the link l in this set that has the largest positive
V wl ql (pmax ) − Zl [r]pmax value, breaking ties arbitrarily and choosing not to transmit over any
link if none of these values are positive.

7.6.2 MARKOV DECISION PROBLEM FOR MINIMUM DELAY

SCHEDULING
Here we consider a Markov decision problem involving queueing delay, from (56)(57). Consider a
2-queue wireless downlink in slotted time t ∈ {0, 1, 2, . . .}. Packets arrive randomly every slot, and
the controller can transmit a packet from at most one queue per slot. Let Qi (t) be the (integer)
number of packets in queue i on slot t, for i ∈ {1, 2}. We assume the queues have a finite buffer of 10
packets, so that packets arriving when the Qi (t) = 10 are dropped. To enforce a renewal structure,
let χ(t) be an independent process of i.i.d. Bernoulli variables with P r[χ (t) = 1] = δ, for some
renewal probability δ > 0. The contents of both queues are emptied whenever χ (t) = 1, so that
queueing dynamics are given by:

min[Qi (t) + Ai (t), 10] − 1i (t) if χ (t) = 0
Qi (t + 1) =
0 if χ (t) = 1

where 1i (t) is an indicator function that is 1 if a packet is successfully transmitted from queue i on
slot t (and is 0 otherwise), and Ai (t) is the (integer) number of new packet arrivals to queue i. The
maximum packet loss rate due to forced renewals is thus 20δ, which can be made arbitrarily small
with a small choice of δ > 0. We assume the controller knows the value of χ (t) at the beginning
of each slot. We have two choices of a renewal definition: (i) Define a renewal event on slot t
whenever (Q1 (t), Q2 (t)) = (0, 0), (ii) Define a renewal event on slot t whenever χ (t − 1) = 1.
The first definition has shorter renewal frames, but the frames sizes depend on the control actions.
This would require minimizing a ratio of expectations every slot. The second definition has frame
sizes that are independent of the control actions, and have mean 1/δ. For simplicity, we use the
second definition.
Let gi (t) be the number of packets dropped from queue i on slot t:

Ai (t)1{Qi (t) = 10} if χ (t) = 0
gi (t) =
Qi (t) + Ai (t) − 1i (t) if χ (t) = 1

where 1{Qi (t) = 10} is an indicator function that is 1 if Qi (t) = 10, and 0 otherwise.
Assume the processes A1 (t) and A2 (t) are independent of each other. A1 (t) is i.i.d. Bernoulli
with P r[A1 (t) = 1] = λ1 , and A2 (t) is i.i.d. Bernoulli with P r[A2 (t) = 1] = λ2 . Every slot, the
172 7. OPTIMIZATION OF RENEWAL SYSTEMS
controller chooses a queue for transmission by selecting a power allocation vector (p1 (t), p2 (t))
subject to the constraints:

0 ≤ pi (t) ≤ pmax , p1 (t)p2 (t) = 0 ∀i ∈ {1, 2}, ∀t

pi (t) = 0 , if Qi (t) = 0 ∀i ∈ {1, 2}, ∀t

where pmax is a given maximum power level. Let P (Q) denote the set of all power vectors that
satisfy these constraints. Transmission successes are independent of past history given the power
level used, with probabilities:

qi (p)= P r[1i (t) = 1|Qi (t) > 0, pi (t) = p]

Assume that q1 (0) = q2 (0) = 0.

The goal is to minimize the time average rate of packet drops g 1 + g 2 subject to an average
power constraint pav and an average delay constraint of 3 slots for all non-dropped packets in each
queue: W 1 ≤ 3, W 2 ≤ 3. Specifically, define λ̃i = λi − g i as the throughput of queue i. By Little’s
Theorem (129), we have Qi = λ̃i W i , and so the delay constraints can be transformed to Qi ≤ 3λ̃i ,
which is equivalent to Qi − 3(λi − g i ) ≤ 0.
Let t[r] be the slot that starts renewal frame r ∈ {0, 1, 2, . . .} (where t[0] = 0), and let T [r]
represent the number of slots in the rth renewal frame. Thus, we have constraints:

1 [r]−1
R−1 t[r]+T
lim [Q1 (τ ) − 3(A1 (τ ) − g1 (τ ))] ≤ 0
R→∞ R
r=0 τ =t[r]
1 R−1 t[r]+T [r]−1
R r=0 τ =t[r] [p1 (τ ) + p2 (τ )]
lim R−1 ≤ pav
r=0 T [r]
R→∞ 1
R

Following the renewal system framework, we define virtual queues Z1 [r], Z2 [r], Zp [r]:
⎡ ⎤
[r]−1
t[r]+T
Z1 [r + 1] = max ⎣Z1 [r] + [Q1 (τ ) − 3(A1 (τ ) − g1 (τ ))], 0⎦ (7.50)
τ =t[r]
⎡ ⎤
[r]−1
t[r]+T
Z2 [r + 1] = max ⎣Z2 [r] + [Q2 (τ ) − 3(A2 (τ ) − g2 (τ ))], 0⎦ (7.51)
τ =t[r]
⎡ ⎤
[r]−1
t[r]+T
Zp [r + 1] = max ⎣Zp [r] + [p1 (τ ) + p2 (τ ) − pav ], 0⎦ (7.52)
τ =t[r]

Making the queues Z1 [r] and Z2 [r] rate stable ensures the desired delay constraints are satisfied,
and making queue Zp [r] rate stable ensures the power constraint is satisfied. We thus have the
following algorithm, which only minimizes the numerator in the ratio of expectations because the
denominator is independent of the policy:
7.6. DYNAMIC PROGRAMMING EXAMPLES 173
• At the beginning of each frame r, observe Z [r] = [Z1 [r], Z2 [r], Zp [r]] and make power
allocation decisions to minimize the following expression over the frame:
⎧ ⎫
⎨ t[r]+T ⎬
[r]−1
E f (p(τ ), A(τ ), Q(τ ), Z [r]) Z [r]
⎩ ⎭
τ =t[r]

where f (p(τ ), A(τ ), Q(τ ), Z [r]) is defined:

2

f (p(τ ), A(τ ), Q(τ ), Z [r]) = V (g1 (τ ) + g2 (τ )) + Zi [r][Qi (τ ) + 3gi (τ )]
i=1
+Zp [r][p1 (τ ) + p2 (τ )]

• Update the virtual queues Z [r] by (7.50)-(7.52).

The minimization in the above algorithm can be solved by dynamic programming. Specifically,
given queues Z [r] = Z that start the frame, define JZ (Q) as the optimal cost until the end of a
renewal frame, given the initial queue backlog is Q. Then JZ (0) is the value of the expression to be
minimized. We have (56)(57):
#
JZ (Q) = δEA inf f (p, A, Q, Z )|χ = 1, Q, Z
p∈P (Q)
#

+(1 − δ)EA inf f (p, A, Q, Z ) + h(p, A, Q, Z ) |χ = 0, Q, Z (7.53)
p∈P (Q)

where h(p, A, Q, Z ) is defined:

h(p, A, Q, Z )= JZ (min[Q + A, 10])(1 − q(p)) + JZ (min[Q + A, 10] − e(p))q(p)

where:

q1 (p1 ) if p1 > 0 (1, 0) if p1 > 0
q(p) = , e(p) =
q2 (p2 ) if p1 = 0 (0, 1) if p1 = 0
The equation (7.53) must be solved to find JZ (Q) for all Q ∈ {0, 1, . . . , 10} × {0, 1, . . . , 10}.
Define (J ) as an operator that takes a function J (Q) (for Q ∈ {0, 1, . . . , 10} ×
{0, 1, . . . , 10}) and maps it to another such function via the right-hand-side of (7.53). Then (7.53)
reduces to:
JZ (Q) = (JZ (Q))
and hence the desired JZ (Q) is a fixed point of the (·) operator. It can be shown that (·) is a
contraction with an appropriate definition of distance (67)(57), and so the fixed point is unique and
can be obtained by iteration of the (·) operator starting with any initial function J (0) (Q) (such as
J (0) (Q) = 0):

J (0) (Q) = 0 , J (i+1) (Q) = (J (i) (Q)) ∀i ∈ {0, 1, 2, . . .}

174 7. OPTIMIZATION OF RENEWAL SYSTEMS

Then limi→∞ J (i) (Q) solves the fixed point equation and hence is equal to the desired JZ (Q)
function. While this then needs to be recomputed for the next frame (because the queue Z [r]
change), the change in these queues over one frame is bounded and the resulting JZ (Q) function
for frame r is already a good approximation for this function on frame r + 1. Thus, the initial value
of the iteration can be the final value found in the previous frame.
Iteration of the (J ) operator requires knowledge of the A(t) distribution to compute the
desired expectations. In this case of independent Bernoulli inputs, this involves knowing only two
scalars λ1 and λ2 . However, for larger problems when the random events every slot can be a large
vector, the expectations can be accurately approximated by averaging over past samples, as in (7.29).
See (57) for an analysis of the error bounds in this technique.
See also (61)(60)(59) for alternative approximations to the Markov Decision Problem for wire-
less queueing delay. A detailed treatment of stochastic shortest path problems and approximations is
found in (67). Approximate dynamic programming methods that approximate value functions with
simpler functions can be found in (68)(187)(67)(69). Recent work in (62)(63) combines Markov
Decision theory and approximate value functions for treatment of energy and delay optimization in
wireless systems.

7.7 EXERCISES

Exercise 7.1. (Deterministic Task Processing) Suppose N network nodes cooperate to process
a sequence of tasks. A new task is started when the previous task ends, and we label the tasks
r ∈ {0, 1, 2, . . .}. For each new task r, the network controller makes a decision about which single
node n[r] will process the task, and what modality m[r] will be used in the processing. Assume
there are M possible modalities, each with different durations and energy expenditures. The task r
decision is π[r] = (n[r], m[r]), where n[r] ∈ {1, . . . , N} and m[r] ∈ {1, . . . , M}. Define T (n, m)
and β(n, m) as the duration of time and the energy expenditure, respectively, required for node n to
process a task using modality m. Assume that T (n, m) ≥ 0 and β(n, m) ≥ 0 for all n, m. Let en [r]
represent the energy expended by node n ∈ {1, . . . , N} during task r:

β(n[r], m[r]) if n[r] = n
en [r] =
0 if n[r] = n
We want to maximize the task processing rate subject to average power constraints at each node:

Maximize: 1/T
Subject to: 1) en /T ≤ pn,av , ∀n ∈ {1, . . . , N}
2) n[r] ∈ {1, . . . , N}, m[r] ∈ {1, . . . , M} , ∀r ∈ {0, 1, 2, . . .}

where pn,av is the average power constraint for node n ∈ {1, . . . , N}. State the renewal-based drift-
plus-penalty algorithm of Section 7.2 for this problem. Note that there is no randomness here, and so
the ratio of expectations to be minimized on each frame becomes a ratio of deterministic functions.
7.7. EXERCISES 175
Exercise 7.2. (Non-Increasing Property of val(θ)). Consider the val(θ) function in (7.32). Suppose
that θ1 ≤ θ2 .
a) Argue that for all ηi , π, Zl [r], we have:

L
L
−V ĝ(ηi , π ) + Zl [r]ŷl (ηi , π) − θ1 T̂ (ηi , π) ≥ −V ĝ(ηi , π) + Zl [r]ŷl (ηi , π) − θ2 T̂ (ηi , π)
l=1 l=1

b) Prove that val(θ1 ) ≥ val(θ2 ).

Exercise 7.3. (An Alternative Algorithm with Modified Objective) Consider the system of Section
7.1. However, suppose we desire a solution to the following modified problem:

Minimize: y0
Subject to: y l /T ≤ cl ∀l ∈ {1, . . . , L}
π [r] ∈ P ∀r ∈ {0, 1, 2, . . .}

This differs from (7.4)-(7.6) because we seek to minimize y 0 rather than y 0 /T . Define the same
virtual queues Z [r] in (7.12). Note that (7.16) still applies. Consider the algorithm that, every frame
r, observes Z [r] and chooses a policy π [r] ∈ P to minimize the right-hand-side of (7.16). It then
updates Z [r] by (7.12) at the end of the frame. Assume there is an i.i.d. algorithm π ∗ [r] that yields:

E ŷ0 (π ∗ [r]) = y0
opt
(7.54)
∗
∗
E ŷl (π [r]) ≤ E T̂ (π [r]) cl ∀l ∈ {1, . . . , L} (7.55)

a) Plug the i.i.d. algorithm π ∗ [r] into the right-hand-side of (7.16) to show that (Z [r]) ≤ F
for some finite constant F , and hence all queues are mean rate stable so that:

lim sup[y l [R] − cl T [R]] ≤ 0

R→∞

b) Again plug the i.i.d. algorithm π ∗ [r] into the right-hand-side of (7.16), and use iterated
expectations and telescoping sums to prove:
opt
lim sup y 0 [R] ≤ y0 + B/V
R→∞

Exercise 7.4. (Manipulating limits) Suppose that lim supR→∞ [y l [R] − cl T [R]] ≤ 0, where 0 <
Tmin ≤ T [R] ≤ Tmax for all R > 0.
a) Argue that for all integers R > 0:

y l [R] y [R] T [R] 1
− cl ≤ max 0, l − cl = max 0, y l [R] − cl T [R]
T [R] T [R] Tmin Tmin
176 7. OPTIMIZATION OF RENEWAL SYSTEMS
b) Take limits of the inequality in (a) to conclude that:

y l [R]
lim sup ≤ cl
R→∞ T [R]

Exercise 7.5. (An Alternative Algorithm with Time Averaging) Consider the optimization prob-
lem (7.4)-(7.6) for a renewal system with frame sizes T [r] that depend on the policy π [r]. Define
θ[0] = 0. For each stage r ∈ {1, 2, . . . , } define θ [r] by:
1 r−1
r y0 [k]
θ [r]= 1 k=0
r−1
(7.56)
r k=0 T [k]

so that θ[r] is the empirical time average of the penalty to be minimized over the first r frames.
Consider the following modified algorithm, which does not require the multi-step bisection phase,
but makes assumptions about convergence:

• Every frame r, observe θ [r], Z [r], and choose a policy π [r] ∈ P to minimize:

E V [ŷ0 (π [r]) − θ[r]T̂ (π[r])] + L l=1 Z l [r][ ŷl (π [r]) − c l T̂ (π[r])]| Z [r], θ [r]

• Update θ [r] by (7.56) and update Z [r] by (7.12).

To analyze this algorithm, we assume that there are constants θ, T , y 0 such that, with probability 1:
R−1 R−1
limR→∞ θ[R] = θ , limR→∞ 1
R r=0 T [r] = T , limR→∞ 1
R r=0 y0 [r] = y 0 (7.57)

We further assume there is an i.i.d. algorithm π ∗ [r] that satisfies (7.9)-(7.10) with δ = 0.
a) Use (7.14) to complete the right-hand-side of the following inequality:

(Z [r]) + V E {y0 [r] − θ [r]T [r]|Z [r]} ≤ B + · · ·

b) Assume E {L(Z [0])} = 0. Plug the i.i.d. algorithm π ∗ [r] from (7.9)-(7.10) into the right-
hand-side of part (a) to prove that (Z [r]) ≤ F for some constant F , and so all queues are mean
rate stable. Use iterated expectations and the law of telescoping sums to conclude that for any R > 0:

E R1 R−1 ∗ ratioopt − R1 R−1
r=0 [y0 [r] − θ [r]T [r]] ≤ E T̂ (π [r]) r=0 E {θ[r]} + B/V

c) Argue from (7.56) and (7.57) that, with probability 1:

R−1 R−1
limR→∞ 1
R r=0 [y0 [r] − θ [r]T [r]] = 0 , limR→∞ 1
R r=0 θ [r] = θ
7.7. EXERCISES 177
d) Assume that:
R−1 R−1
limR→∞ E 1
R r=0 [y0 [r] − θ [r]T [r]] = 0 , limR→∞ 1
R r=0 E {θ[r]} = θ

This can be justified via part (c) together with the Lebesgue Dominated convergence theorem,
provided that mild additional boundedness assumptions on the processes are introduced. Use this
with part (b) to prove:
1 R−1
y0 [r] B
θ = lim 1 r=0
R
R−1
≤ ratioopt + (w.p.1)
r=0 T [r] E T̂ (π ∗ [r]) V
R→∞
R

Exercise 7.6. (Variation on Jensen’s Inequality) Assume the result of Lemma 7.6(a).
a) Let {T [0], T [1], . . . , T [R − 1]}, {γ [0], γ [1], . . . , γ [R − 1]} be deterministic sequences.
Prove (7.43) by defining X as a random integer that is uniform over {0, . . . , R − 1} and defining
the random vector (T [X], γ [X]).
b) Prove (7.44) by considering {T [0], T [1], . . . , T [R − 1]}, {γ [0], γ [1], . . . , γ [R − 1]} as
random sequences that are independent of X.

Exercise 7.7. (Equivalence of the Transformed Problem)

a) Suppose that π [r], γ [r] solve (7.38)-(7.42), yielding γ m , T , y l , T φ(γ ), T γ . Use

the fact that φ(T γ /T ) ≥ T φ(γ )/T to show that the same policy π [r] satisfies the feasibility

constraints (7.36)-(7.37) and yields φ(x /T ) ≥ T φ(γ )/T .
∗
b) Suppose that π ∗ [r] is an algorithm that solves (7.35)-(7.37), yielding x∗ , T , and y ∗l . Show
∗
that the optimal value of (7.38) is greater than or equal to φ(x∗ /T ). Hint: Use the same policy
∗
π ∗ [r], and use the constant γ [r] = x∗ /T for all r ∈ {0, 1, 2, . . .}, noting from (7.33) that this is
in R.

Exercise 7.8. (Utility Optimization with Delay-Limited Scheduling) Modify the example in Sec-

tion 7.6.1 to treat the problem of maximizing the utility function L
l=1 log(1 + D l /T ), rather than
L
maximizing l=1 wl D l /T .

Exercise 7.9. (A simple form of Lebesgue Dominated Convergence) Let {f [r]}∞ r=0 be an infinite
sequence of random variables. Suppose there are finite constants fmin and fmax such that the random
variables deterministically satisfy fmin ≤ f [r] ≤ fmax for all r ∈ {0, 1, 2, . . .}. Suppose there is a
finite constant f such that:
R−1
limR→∞ R1 r=0 f [r] = f (w.p.1)
178 7. OPTIMIZATION OF RENEWAL SYSTEMS
R−1
We will show that limR→∞ R1 r=0 E {f [r]} = f .
a) Fix > 0. Argue that for any integer R > 0:
R−1 R−1 R−1
E 1
R r=0 f [r] ≤ (f + )P r 1
R r=0 f [r] ≤ f + + fmax P r 1
R r=0 f [r] > f +

b) Argue that for any > 0:

R−1
limR→∞ P r 1
R r=0 f [r] > f + = 0

Use this with part (a) to conclude that for all > 0:
R−1
limR→∞ R1 r=0 E {f [r]} ≤ f +

Conclude that the left-hand-side in the above inequality is less than or equal to f .
R−1
c) Make a similar argument to show limt→∞ R1 r=0 E {f [r]} ≥ f .
179

CHAPTER 8

Conclusions
This text has presented a theory for optimizing time averages in stochastic networks. The tools
of Lyapunov drift and Lyapunov optimization were developed to solve these problems. Our focus
was on communication and queueing networks, including networks with wireless links and mobile
devices. The theory can be used for networks with a variety of goals and functionalities, such as
networks with:
• Network coding capabilities (see Exercise 4.12 and (188)(189)(190)).

• Dynamic data compression (see Exercise 4.14 and (191)(165)(143)).

• Multi-input, multi-output (MIMO) antenna capabilities (162)(192)(193).

• Multi-receiver diversity (154).

• Cooperative combining (194)(71).

• Hop count minimization (155).

• Economic considerations (195)(153).

Lyapunov optimization theory also has applications to a wide array of other problems, including
(but not limited to):
• Stock market trading (40)(41).

• Product assembly plants (196)(175)(197)(198).

• Energy allocation for smart grids (159).

This text has included several representative simulation results for 1-hop networks (see Chap-
ter 3). Further simulation and experimentation results for Lyapunov based algorithms in single-hop
and multi-hop networks can be found in (54)(55)(199)(200)(201)(202)(203)(154)(142)(42).
We have highlighted the simplicity of Lyapunov drift and Lyapunov optimization, empha-
sizing that it only uses techniques of (see Chapters 1 and 3): (i) Telescoping sums, (ii) Iterated
expectations, (iii) Opportunistically minimizing an expectation, (iv) Jensen’s inequality. Further, the
drift-plus-penalty algorithm of Lyapunov optimization theory is analyzed with the following simple
framework:
1. Define a Lyapunov function as the sum of squares of queue backlog.
180 8. CONCLUSIONS
2. Compute a bound on the drift-plus-penalty by squaring the queueing equation. The bound
typically has the form:

((t)) + V E {penalty(t)|(t)} ≤

N
B + V E {penalty(t)|(t)} + n (t)E {hn (t)|(t)}
n=1

where B is a constant that bounds second moments of the processes, (t) =

(1 (t), . . . , n (t)) is a general vector of (possibly virtual) queues, and hn (t) is the arrival-
minus-departure value for queue n (t) on slot t.

3. Design the policy to minimize the right-hand-side of the above drift-plus-penalty bound.

4. Conclude that, under this algorithm, the drift-plus-penalty is bounded by plugging any other
policy into the right-hand-side:

((t)) + V E {penalty(t)|(t)} ≤
∗
N

B + V E penalty (t)|(t) + n (t)E h∗n (t)|(t)
n=1

5. Plug an ω-only policy α ∗ (t) into the right-hand-side, one that is known to exist (although it
would be hard to compute) that satisfies all constraints and yields a greatly simplified drift-
plus-penalty expression on the right-hand-side.

Also important in this theory is the use of virtual queues to transform time average inequality
constraints into queue stability problems, and auxiliary variables for the case of optimizing convex
functions of time averages.The drift-plus-penalty framework was also shown to hold for optimization
of non-convex functions of time averages, and for optimization over renewal systems.
The resulting min-drift (or “max-weight”) algorithms can be very complex for general prob-
lems, particularly for wireless networks with interference. However, we have seen that low complexity
approximations can be used to provide good performance. Further, for interference networks with-
out time-variation, methods that take a longer time to find the max-weight solution (either by a
deterministic or randomized search) were seen to provide full throughput and throughput-utility op-
timality with arbitrarily low per-timeslot computation complexity, provided that we let convergence
time and/or delay increase (possibly non-polynomially) to infinity. Simple distributed Carrier Sense
Multiple Access (CSMA) implementations are often possible (and provably throughput optimal)
for these networks via the Jiang-Walrand theorem, which hints at deeper connections with Lya-
punov optimization, max-weight theory, C-additive approximations, maximum entropy solutions,
randomized algorithms, and Markov chain steady state theory.
181

Bibliography
[1] F. Kelly. Charging and rate control for elastic traffic. European Transactions on Telecommuni-
cations, vol. 8, no. 1 pp. 33-37, Jan.-Feb. 1997. DOI: 10.1002/ett.4460080106 3, 98

[2] F.P. Kelly, A.Maulloo, and D. Tan. Rate control for communication networks: Shadow prices,
proportional fairness, and stability. Journ. of the Operational Res. Society, vol. 49, no. 3, pp.
237-252, March 1998. DOI: 10.2307/3010473 3, 7, 98, 104

[3] J. Mo and J. Walrand. Fair end-to-end window-based congestion control. IEEE/ACM

Transactions on Networking, vol. 8, no. 5, Oct. 2000. DOI: 10.1109/90.879343 3, 128

[4] L. Massoulié and J. Roberts. Bandwidth sharing: Objectives and algorithms. IEEE/ACM
Transactions on Networking, vol. 10, no. 3, pp. 320-328, June 2002.
DOI: 10.1109/TNET.2002.1012364 3

[5] A. Tang, J. Wang, and S. Low. Is fair allocation always inefficient. Proc. IEEE INFOCOM,
March 2004. DOI: 10.1109/INFCOM.2004.1354479 3, 98, 128

[6] B. Radunovic and J.Y. Le Boudec. Rate performance objectives of multihop wireless net-
works. IEEE Transactions on Mobile Computing, vol. 3, no. 4, pp. 334-349, Oct.-Dec. 2004.
DOI: 10.1109/TMC.2004.45 3, 128

[7] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and
scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions
on Automatic Control, vol. 37, no. 12, pp. 1936-1948, Dec. 1992. DOI: 10.1109/9.182479 6,
49, 113, 138

[8] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel queues with randomly
varying connectivity. IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466-478,
March 1993. DOI: 10.1109/18.212277 6, 10, 24, 49, 66

[9] P. R. Kumar and S. P. Meyn. Stability of queueing networks and scheduling policies. IEEE
Trans. on Automatic Control, vol.40,.n.2, pp.251-260, Feb. 1995. DOI: 10.1109/9.341782 6

[10] N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% throughput

in an input-queued switch. IEEE Transactions on Communications, vol. 47, no. 8, August 1999.
6, 88
182 BIBLIOGRAPHY
[11] E. Leonardi, M. Mellia, F. Neri, and M. Ajmone Marsan. Bounds on average delays and queue
size averages and variances in input-queued cell-based switches. Proc. IEEE INFOCOM,
2001. DOI: 10.1109/INFCOM.2001.916303 6
[12] L. Tassiulas. Scheduling and performance limits of networks with constantly changing topol-
ogy. IEEE Transactions on Information Theory, vol. 43, no. 3, pp. 1067-1073, May 1997.
DOI: 10.1109/18.568722 6
[13] N. Kahale and P. E. Wright. Dynamic global packet routing in wireless networks. Proc. IEEE
INFOCOM, 1997. DOI: 10.1109/INFCOM.1997.631182 6
[14] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, P. Whiting, and R. Vijaykumar. Providing
quality of service over a shared wireless link. IEEE Communications Magazine, vol. 39, no.2,
pp.150-154, Feb. 2001. DOI: 10.1109/35.900644 6
[15] M. J. Neely, E. Modiano, and C. E Rohrs. Dynamic power allocation and routing for time
varying wireless networks. IEEE Journal on Selected Areas in Communications, vol. 23, no. 1,
pp. 89-103, January 2005. DOI: 10.1109/JSAC.2004.837349 6, 24, 56, 113
[16] B. Awerbuch and T. Leighton. A simple local-control approximation algorithm for mul-
ticommodity flow. Proc. 34th IEEE Conf. on Foundations of Computer Science, Oct. 1993.
DOI: 10.1109/SFCS.1993.366841 6
[17] M. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with
Time Varying Channels. PhD thesis, Massachusetts Institute of Technology, LIDS, 2003. 6,
8, 11, 49, 105, 119, 120, 128, 134, 145
[18] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic control for heterogeneous
networks. Proc. IEEE INFOCOM, March 2005. DOI: 10.1109/INFCOM.2005.1498453 6,
49, 105, 134
[19] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic control for heterogeneous
networks. IEEE/ACM Transactions on Networking, vol. 16, no. 2, pp. 396-409, April 2008.
DOI: 10.1109/TNET.2007.900405 6, 105, 145
[20] M. J. Neely. Energy optimal control for time varying wireless networks. Proc. IEEE INFO-
COM, March 2005. DOI: 10.1109/INFCOM.2005.1497924 6, 49
[21] M. J. Neely. Energy optimal control for time varying wireless networks. IEEE Transactions on
Information Theory, vol. 52, no. 7, pp. 2915-2934, July 2006. DOI: 10.1109/TIT.2006.876219
6, 28, 38, 49, 56, 83, 84
[22] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation and cross-layer control
in wireless networks. Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-149, 2006.
DOI: 10.1561/1300000001 xi, 7, 8, 24, 49, 105, 110, 111, 113, 145
BIBLIOGRAPHY 183
[23] S. H. Low and D. E. Lapsley. Optimization flow control, i: Basic algorithm and con-
vergence. IEEE/ACM Transactions on Networking, vol. 7 no. 6, pp. 861-875, Dec. 1999.
DOI: 10.1109/90.811451 7, 104, 109

[24] S. H. Low. A duality model of TCP and queue management algorithms. IEEE Trans. on
Networking, vol. 11, no. 4, pp. 525-536, August 2003. DOI: 10.1109/TNET.2003.815297 7

[25] L. Xiao, M. Johansson, and S. Boyd. Simultaneous routing and resource allocation for wireless
networks. Proc. of the 39th Annual Allerton Conf. on Comm., Control, Comput., Oct. 2001. 7

[26] L. Xiao, M. Johansson, and S. P. Boyd. Simultaneous routing and resource allocation via dual
decomposition. IEEE Transactions on Communications, vol. 52, no. 7, pp. 1136-1144, July
2004. DOI: 10.1109/TCOMM.2004.831346 7

[27] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Downlink power allocation for multi-class cdma
wireless networks. Proc. IEEE INFOCOM, 2002. DOI: 10.1109/INFCOM.2002.1019399
7

[28] M. Chiang. Balancing transport and physical layer in wireless multihop networks: Jointly op-
timal congestion control and power control. IEEE Journal on Selected Areas in Communications,
vol. 23, no. 1, pp. 104-116, Jan. 2005. DOI: 10.1109/JSAC.2004.837347 7

[29] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle. Layering as optimization decom-

position: A mathematical theory of network architectures. Proceedings of the IEEE, vol. 95,
no. 1, Jan. 2007. DOI: 10.1109/JPROC.2006.887322 7, 104, 109

[30] R. Cruz and A. Santhanam. Optimal routing, link scheduling, and power
control in multi-hop wireless networks. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208720 7

[31] X. Lin and N. B. Shroff. Joint rate control and scheduling in multihop wireless networks.
Proc. of 43rd IEEE Conf. on Decision and Control, Paradise Island, Bahamas, Dec. 2004. 7, 8,
109

[32] R. Agrawal and V. Subramanian. Optimality of certain channel aware scheduling policies.
Proc. 40th Annual Allerton Conference on Communication , Control, and Computing, Monticello,
IL, Oct. 2002. 7, 119

[33] H. Kushner and P. Whiting. Asymptotic properties of proportional-fair sharing algorithms.

Proc. of 40th Annual Allerton Conf. on Communication, Control, and Computing, 2002. 7, 119

[34] A. Stolyar. Maximizing queueing network utility subject to stability: Greedy primal-dual algo-
rithm. Queueing Systems, vol. 50, no. 4, pp. 401-457, 2005. DOI: 10.1007/s11134-005-1450-0
7, 119
184 BIBLIOGRAPHY
[35] A. Stolyar. Greedy primal-dual algorithm for dynamic resource allocation in complex net-
works. Queueing Systems, vol. 54, no. 3, pp. 203-220, 2006. DOI: 10.1007/s11134-006-0067-2
7

[36] Q. Li and R. Negi. Scheduling in wireless networks under uncertainties: A greedy primal-dual
approach. Arxiv Technical Report: arXiv:1001:2050v2, June 2010. 8, 119

[37] L. Huang and M. J. Neely. Delay reduction via lagrange multipliers in stochastic network
optimization. Proc. of 7th Intl. Symposium on Modeling and Optimization in Mobile, Ad Hoc,
and Wireless Networks (WiOpt), June 2009. DOI: 10.1109/WIOPT.2009.5291609 8, 10, 69,
71, 113

[38] M. J. Neely. Universal scheduling for networks with arbitrary traffic, channels, and mobility.
Proc. IEEE Conf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 81, 102, 112,
119

[39] M. J. Neely. Universal scheduling for networks with arbitrary traffic, channels, and mobility.
ArXiv technical report, arXiv:1001.0960v1, Jan. 2010. 8, 77, 81, 102, 107

[40] M. J. Neely. Stock market trading via stochastic network optimization. Proc. IEEE Conference
on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 179

[41] M. J. Neely. Stock market trading via stochastic network optimization. ArXiv Technical
Report, arXiv:0909.3891v1, Sept. 2009. 8, 77, 179

[42] M. J. Neely and R. Urgaonkar. Cross layer adaptive control for wireless mesh networks. Ad
Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, August 2007.
DOI: 10.1016/j.adhoc.2007.01.004 8, 102, 112, 119, 179

[43] M. J. Neely. Stochastic network optimization with non-convex utilities and costs. Proc. Infor-
mation Theory and Applications Workshop (ITA), Feb. 2010. DOI: 10.1109/ITA.2010.5454100
8, 116, 117, 118

[44] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue-
length-based scheduling and congestion control. Proc. IEEE INFOCOM, March 2005.
DOI: 10.1109/INFCOM.2005.1498459 8

[45] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue-length-
based scheduling and congestion control. IEEE/ACM Transactions on Networking, vol. 15,
no. 6, pp. 1333-1344, Dec. 2007. DOI: 10.1109/TNET.2007.897944 8, 69, 71

[46] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Opportunistic power scheduling for dynamic
multiserver wireless systems. IEEE Transactions on Wireless Communications, vol. 5, no.6, pp.
1506-1515, June 2006. DOI: 10.1109/TWC.2006.1638671 8
BIBLIOGRAPHY 185
[47] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state information
for throughput maximization. IEEE Transactions on Information Theory, vol. 50, no. 11, pp.
2566-2582, Nov. 2004. DOI: 10.1109/TIT.2004.836687 8
[48] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state
information for throughput maximization. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/TIT.2004.836687 8
[49] X. Liu, E. K. P. Chong, and N. B. Shroff. A framework for opportunistic schedul-
ing in wireless networks. Computer Networks, vol. 41, no. 4, pp. 451-474, March 2003.
DOI: 10.1016/S1389-1286(02)00401-2 8
[50] R. Berry and R. Gallager. Communication over fading channels with delay constraints.
IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1135-1149, May 2002.
DOI: 10.1109/18.995554 8, 9, 67
[51] M. J. Neely. Optimal energy and delay tradeoffs for multi-user wireless downlinks.
IEEE Transactions on Information Theory, vol. 53, no. 9, pp. 3095-3113, Sept. 2007.
DOI: 10.1109/TIT.2007.903141 8, 10, 67, 71
[52] M. J. Neely. Super-fast delay tradeoffs for utility optimal fair scheduling in wireless
networks. IEEE Journal on Selected Areas in Communications, Special Issue on Nonlin-
ear Optimization of Communication Systems, vol. 24, no. 8, pp. 1489-1501, Aug. 2006.
DOI: 10.1109/JSAC.2006.879357 8, 10, 67, 71
[53] M. J. Neely. Intelligent packet dropping for optimal energy-delay tradeoffs in wireless down-
links. IEEE Transactions on Automatic Control, vol. 54, no. 3, pp. 565-579, March 2009.
DOI: 10.1109/TAC.2009.2013652 8, 10, 67, 71
[54] S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali. Routing without routes: The
backpressure collection protocol. Proc. 9th ACM/IEEE Intl. Conf. on Information Processing
in Sensor Networks (IPSN), April 2010. DOI: 10.1145/1791212.1791246 8, 10, 71, 72, 113,
179
[55] L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari. LIFO-backpressure achieves near
optimal utility-delay tradeoff. Arxiv Technical Report, arXiv:1008.4895v1, August 2010. 8,
10, 72, 113, 179
[56] M. J. Neely. Stochastic optimization for Markov modulated networks with application to
delay constrained wireless scheduling. Proc. IEEE Conf. on Decision and Control (CDC),
Shanghai, China, Dec. 2009. DOI: 10.1109/CDC.2009.5400270 8, 9, 153, 171, 173
[57] M. J. Neely. Stochastic optimization for Markov modulated networks with application to delay
constrained wireless scheduling. ArXiv Technical Report, arXiv:0905.4757v1, May 2009. 8,
9, 153, 158, 171, 173, 174
186 BIBLIOGRAPHY
[58] C.-P. Li and M. J. Neely. Network utility maximization over partially observable markovian
channels. Arxiv Technical Report: arXiv:1008.3421v1, Aug. 2010. 8, 153

[59] F. J. Vázquez Abad and V. Krishnamurthy. Policy gradient stochastic approximation algo-
rithms for adaptive control of constrained time varying Markov decision processes. Proc.
IEEE Conf. on Decision and Control, Dec. 2003. DOI: 10.1109/CDC.2003.1273053 8, 174

[60] D. V. Djonin and V. Krishnamurthy. q-learning algorithms for constrained Markov de-
cision processes with randomized monotone policies: Application to mimo transmission
control. IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 2170-2181, May 2007.
DOI: 10.1109/TSP.2007.893228 8, 9, 174

[61] N. Salodkar, A. Bhorkar, A. Karandikar, and V. S. Borkar. An on-line learning algo-

rithm for energy efficient delay constrained scheduling over a fading channel. IEEE
Journal on Selected Areas in Communications, vol. 26, no. 4, pp. 732-742, May 2008.
DOI: 10.1109/JSAC.2008.080514 8, 9, 174

[62] F. Fu and M. van der Schaar. A systematic framework for dynamically optimizing multi-user
video transmission. IEEE Journal on Selected Areas in Communications, vol. 28, no. 3, pp.
308-320, April 2010. DOI: 10.1109/JSAC.2010.100403 8, 9, 174

[63] F. Fu and M. van der Schaar. Decomposition principles and online learning in cross-layer
optimization for delay-sensitive applications. IEEE Trans. Signal Processing, vol. 58, no. 3,
pp. 1401-1415, March 2010. DOI: 10.1109/TSP.2009.2034938 8, 9, 174

[64] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols. 1 and 2. Athena Scientific,
Belmont, Mass, 1995. 8, 158, 168, 170

[65] E. Altman. Constrained Markov Decision Processes. Boca Raton, FL, Chapman and Hall/CRC
Press, 1999. 8

[66] S. Ross. Introduction to Probability Models. Academic Press, 8th edition, Dec. 2002. 8, 12, 27,
76

[67] D. P. Bertsekas and J. N.Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont,

Mass, 1996. 8, 158, 173, 174

[68] W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John
Wiley & Sons, 2007. DOI: 10.1002/9780470182963 8, 174

[69] S. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2008. 8, 174

[70] D. Tse and S. Hanly. Multi-access fading channels: Part ii: Delay-limited capacities.
IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2816-2831, Nov. 1998.
DOI: 10.1109/18.737514 9, 135
BIBLIOGRAPHY 187
[71] R. Urgaonkar and M. J. Neely. Delay-limited cooperative communication with re-
liability constraints in wireless networks. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062187 9, 135, 168, 179

[72] A. Mekkittikul and N. McKeown. A starvation free algorithm for achieving 100% throughput
in an input-queued switch. Proc. ICCN, pp. 226-231, 1996. 9

[73] A. L. Stolyar and K. Ramanan. Largest weighted delay first scheduling: Large de-
viations and optimality. Annals of Applied Probability, vol. 11, no. 1, pp. 1-48, 2001.
DOI: 10.1214/aoap/998926986 9, 11

[74] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijaykumar, and P. Whiting.

Scheduling in a queueing system with asynchronously varying service rates. Probabil-
ity in the Engineering and Informational Sciences, vol. 18, no. 2, pp. 191-217, April 2004.
DOI: 10.1017/S0269964804182041 9

[75] S. Shakkottai and A. Stolyar. Scheduling for multiple flows sharing a time-varying channel:
The exponential rule. American Mathematical Society Translations, series 2, vol. 207, 2002. 9

[76] M. J. Neely. Delay-based network utility maximization. Proc. IEEE INFOCOM, March
2010. DOI: 10.1109/INFCOM.2010.5462097 9, 120, 122

[77] A. Fu, E. Modiano, and J. Tsitsiklis. Optimal energy allocation for delay-constrained data
transmission over a time-varying channel. Proc. IEEE INFOCOM, 2003. 9

[78] M. Zafer and E. Modiano. Optimal rate control for delay-constrained data transmission over
a wireless channel. IEEE Transactions on Information Theory, vol. 54, no. 9, pp. 4020-4039,
Sept. 2008. DOI: 10.1109/TIT.2008.928249 9

[79] M. Zafer and E. Modiano. Minimum energy transmission over a wireless channel with
deadline and power constraints. IEEE Transactions on Automatic Control, vol. 54, no. 12, pp.
2841-2852, December 2009. DOI: 10.1109/TAC.2009.2034202 9

[80] M. Goyal, A. Kumar, and V. Sharma. Power constrained and delay optimal policies
for scheduling transmission over a fading channel. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208683 9

[81] A. Wierman, L. L. H. Andrew, and A. Tang. Power-aware speed scaling in pro-

cessor sharing systems. Proc. IEEE INFOCOM, Rio de Janeiro, Brazil, April 2009.
DOI: 10.1109/INFCOM.2009.5062123 9

[82] E. Uysal-Biyikoglu, B. Prabhakar, and A. El Gamal. Energy-efficient packet transmission

over a wireless link. IEEE/ACM Trans. Networking, vol. 10, no. 4, pp. 487-499, Aug. 2002.
DOI: 10.1109/TNET.2002.801419 9
188 BIBLIOGRAPHY
[83] M. Zafer and E. Modiano. A calculus approach to minimum energy transmission
policies with quality of service guarantees. Proc. IEEE INFOCOM, March 2005.
DOI: 10.1109/INFCOM.2005.1497922 9

[84] M. Zafer and E. Modiano. A calculus approach to energy-efficient data transmission with
quality-of-service constraints. IEEE/ACM Transactions on Networking, vol. 17, no. 13, pp.
898-911, June 2009. DOI: 10.1109/TNET.2009.2020831 9

[85] W. Chen, M. J. Neely, and U. Mitra. Energy-efficient transmissions with individual packet
delay constraints. IEEE Transactions on Information Theory, vol. 54, no. 5, pp. 2090-2109,
May 2008. DOI: 10.1109/TIT.2008.920344 9

[86] W. Chen, U. Mitra, and M. J. Neely. Energy-efficient scheduling with individual packet delay
constraints over a fading channel. Wireless Networks, vol. 15, no. 5, pp. 601-618, July 2009.
DOI: 10.1007/s11276-007-0093-y 9

[87] M. A. Khojastepour and A. Sabharwal. Delay-constrained scheduling: Power

efficiency, filter design, and bounds. Proc. IEEE INFOCOM, March 2004.
DOI: 10.1109/INFCOM.2004.1354603 9

[88] B. Hajek. Optimal control of two interacting service stations. IEEE Transactions on Automatic
Control, vol. 29, no. 6, pp. 491-499, June 1984. DOI: 10.1109/TAC.1984.1103577 9

[89] S. Sarkar. Optimum scheduling and memory management in input queued switches with finite
buffer space. Proc. IEEE INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208973
9

[90] A. Tarello, J. Sun, M. Zafer, and E. Modiano. Minimum energy transmission scheduling
subject to deadline constraints. ACM Wireless Networks, vol. 14, no. 5, pp. 633-645, 2008.
DOI: 10.1007/s11276-006-0005-6 9

[91] B. Sadiq, S. Baek, and Gustavo de Veciana. Delay-optimal opportunistic scheduling and
approximations: the log rule. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062088 9

[92] B. Sadiq and G. de Veciana. Optimality and large deviations of queues under the pseudo-log
rule opportunistic scheduling. 46th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, IL, Sept. 2008. DOI: 10.1109/ALLERTON.2008.4797636 9, 11

[93] A. L. Stolyar. Large deviations of queues sharing a randomly time-varying server. Queueing
Systems Theory and Applications, vol. 59, no. 1, pp. 1-35, 2008.
DOI: 10.1007/s11134-008-9072-y 9, 11
BIBLIOGRAPHY 189
[94] A. Ganti, E. Modiano, and J. N. Tsitsiklis. Optimal transmission scheduling in symmetric
communication models with intermittent connectivity. IEEE Transactions on Information
Theory, vol. 53, no. 3, pp. 998-1008, March 2007. DOI: 10.1109/TIT.2006.890695 10

[95] E. M. Yeh and A. S. Cohen. Delay optimal rate allocation in multiaccess fading communi-
cations. Proc. Allerton Conference on Communication, Control, and Computing, Monticello, IL,
2004. 10

[96] E. M. Yeh. Multiaccess and Fading in Communication Networks. PhD thesis, Massachusetts
Institute of Technology, Laboratory for Information and Decision Systems (LIDS), 2001. 10

[97] S. Kittipiyakul and T. Javidi. Delay-optimal server allocation in multi-queue multi-server

systems with time-varying connectivities. IEEE Transactions on Information Theory, vol. 55,
no. 5, pp. 2319-2333, May 2009. DOI: 10.1109/TIT.2009.2016051 10

[98] A. Ephremides, P. Varaiya, and J. Walrand. A simple dynamic routing problem. IEEE
Transactions on Automatic Control, vol. AC-25, no.4, pp. 690-693, Aug. 1980. 10

[99] M. J. Neely, E. Modiano, and Y.-S. Cheng. Logarithmic delay for n × n packet switches
under the crossbar constraint. IEEE Transactions on Networking, vol. 15, no. 3, pp. 657-668,
June 2007. DOI: 10.1109/TNET.2007.893876 10, 11, 37

[100] M. J. Neely. Order optimal delay for opportunistic scheduling in multi-user wireless uplinks
and downlinks. IEEE/ACM Transactions on Networking, vol. 16, no. 5, pp. 1188-1199, October
2008. DOI: 10.1109/TNET.2007.909682 10, 24, 37

[101] M. J. Neely. Delay analysis for max weight opportunistic scheduling in wireless sys-
tems. IEEE Transactions on Automatic Control, vol. 54, no. 9, pp. 2137-2150, Sept. 2009.
DOI: 10.1109/TAC.2009.2026943 10, 11, 24, 37

[102] S. Deb, D. Shah, and S. Shakkottai. Fast matching algorithms for repetitive optimization: An
application to switch scheduling. Proc. of 40th Annual Conference on Information Sciences and
Systems (CISS), Princeton, NJ, March 2006. DOI: 10.1109/CISS.2006.286659 10, 37, 147

[103] M. J. Neely. Delay analysis for maximal scheduling with flow control in wireless networks
with bursty traffic. IEEE Transactions on Networking, vol. 17, no. 4, pp. 1146-1159, August
2009. DOI: 10.1109/TNET.2008.2008232 10, 11, 37, 147

[104] X. Wu, R. Srikant, and J. R. Perkins. Scheduling efficiency of distributed greedy scheduling
algorithms in wireless networks. IEEE Transactions on Mobile Computing, vol. 6, no. 6, pp.
595-605, June 2007. DOI: 10.1109/TMC.2007.1061 11, 37, 147

[105] J. G. Dai and B. Prabhakar. The throughput of data switches with and without speedup. Proc.
IEEE INFOCOM, 2000. DOI: 10.1109/INFCOM.2000.832229 11, 37
190 BIBLIOGRAPHY
[106] J. M. Harrison and J. A. Van Mieghem. Dynamic control of brownian networks: State space
collapse and equivalent workload formulations. The Annals of Applied Probability, vol. 7(3),
pp. 747-771, Aug. 1997. DOI: 10.1214/aoap/1034801252 11
[107] S. Shakkottai, R. Srikant, and A. Stolyar. Pathwise optimality of the exponential scheduling
rule for wireless channels. Advances in Applied Probability, vol. 36, no. 4, pp. 1021-1045, Dec.
2004. DOI: 10.1239/aap/1103662957 11
[108] A. L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and
workload minimization in heavy traffic. Annals of Applied Probability, pp. 1-53, 2004.
DOI: 10.1214/aoap/1075828046 11
[109] D. Shah and D. Wischik. Optimal scheduling algorithms for input-queued switches. Proc.
IEEE INFOCOM, 2006. DOI: 10.1109/INFOCOM.2006.238 11
[110] I. Keslassy and N. McKeown. Analysis of scheduling algorithms that provide 100% through-
put in input-queued switches. Proc. 39th Annual Allerton Conf. on Communication, Control,
and Computing, Oct. 2001. 11
[111] T. Ji, E. Athanasopoulou, and R. Srikant. Optimal scheduling policies in small generalized
switches. Proc. IEEE INFOCOM, Rio De Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5062259 11
[112] V. J. Venkataramanan and X. Lin. Structural properties of ldp for queue-length based wireless
scheduling algorithms. Proc. of 45th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, Illinois, September 2007. 11
[113] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Large deviations analysis of the
generalized processor sharing policy. Queueing Systems, vol. 32, pp. 319-349, 1999.
DOI: 10.1023/A:1019151423773 11
[114] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Asymptotic buffer overflow probabilities
in multiclass multiplexers: An optimal control approach. IEEE Transactions on Automatic
Control, vol. 43, no. 3, pp. 315-335, March 1998. DOI: 10.1109/9.661587 11
[115] S. Bodas, S. Shakkottai, L. Ying, and R. Srikant. Scheduling in multi-channel wireless
networks: Rate function optimality in the small-buffer regime. Proc. ACM SIGMET-
RICS/Performance Conference, June 2009. DOI: 10.1145/1555349.1555364 11
[116] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEETransactions on Information
Theory, vol. 46, no. 2, pp. 388-404, March 2000. DOI: 10.1109/18.825799 11
[117] M. Grossglauser and D. Tse. Mobility increases the capacity of ad-hoc wireless net-
works. IEEE/ACM Trans. on Networking, vol. 10, no. 4, pp. 477-486, August 2002.
DOI: 10.1109/TNET.2002.801403 11
BIBLIOGRAPHY 191
[118] M. J. Neely and E. Modiano. Capacity and delay tradeoffs for ad-hoc mobile net-
works. IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 1917-1937, June 2005.
DOI: 10.1109/TIT.2005.847717 11

[119] S. Toumpis and A. J. Goldsmith. Large wireless networks under fading, mobility, and delay
constraints. Proc. IEEE INFOCOM, 2004. DOI: 10.1109/INFCOM.2004.1354532 12

[120] X. Lin and N. B. Shroff. Towards achieving the maximum capacity in large mobile wireless
networks. Journal of Communications and Networks, Special Issue on Mobile Ad Hoc Wireless
Networks, vol. 6, no. 4, December 2004. 12

[121] X. Lin and N. B. Shroff. The fundamental capacity-delay tradeoff in large mobile ad hoc
networks. Purdue University Tech. Report, 2004. 12

[122] A. El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Optimal throughput-delay scaling in

wireless networks – part 1: The fluid model. IEEE Transactions on Information Theory, vol.
52, no. 6, pp. 2568-2592, June 2006. DOI: 10.1109/TIT.2006.874379 12

[123] G. Sharma, R. Mazumdar, and N. Shroff. Delay and capacity trade-offs in mobile ad-hoc
networks: A global perspective. Proc. IEEE INFOCOM, April 2006.
DOI: 10.1109/INFOCOM.2006.144 12

[124] X. Lin, G. Sharma, R. R. Mazumdar, and N. B. Shroff. Degenerate delay-capacity trade-offs

in ad hoc networks with brownian mobility. IEEE Transactions on Information Theory, vol.
52, no. 6, pp. 2777-2784, June 2006. DOI: 10.1109/TIT.2006.874544 12

[125] N. Bansal and Z. Liu. Capacity, delay and mobility in wireless ad-hoc networks. Proc. IEEE
INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208990 12

[126] L. Ying, S. Yang, and R. Srikant. Optimal delay-throughput tradeoffs in mobile ad hoc
networks. IEEE Transactions on Information Theory, vol. 54, no. 9, pp. 4119-4143, Sept. 2008.
DOI: 10.1109/TIT.2008.928247 12

[127] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughput-delay trade-off in
mobile wireless networks. Proceedings of the International Symposium on Information Theory,
Seoul, Korea, June 2009. 12

[128] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughput-delay trade-
off in mobile wireless networks. IEEE Transactions on Information Theory, to appear.
DOI: 10.1109/ISIT.2009.5205277 12

[129] D. P. Bertsekas and R. Gallager. Data Networks. New Jersey: Prentice-Hall, Inc., 1992. 12,
19, 25, 27, 37, 48, 109, 128, 144, 172
192 BIBLIOGRAPHY
[130] R. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, Boston, 1996. 12, 27,
50, 74, 76

[131] F. P. Kelly. Reversibility and Stochastic Networks. Wiley, Chichester, 1979. 12, 27, 144

[132] S. Ross. Stochastic Processes. John Wiley & Sons, Inc., New York, 1996. 12, 74

[133] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and Optimization. Boston:
Athena Scientific, 2003. 12, 67

[134] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 12,
67

[135] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1996. 12

[136] M. J. Neely. Stability and capacity regions for discrete time queueing networks. ArXiv
Technical Report: arXiv:1003.3396v1, March 2010. 18, 19, 56, 102

[137] R. Urgaonkar and M. J. Neely. Opportunistic scheduling with reliability guarantees in cogni-
tive radio networks. IEEE Transactions on Mobile Computing, vol. 8, no. 6, pp. 766-777, June
2009. DOI: 10.1109/TMC.2009.38 28, 145, 147

[138] M. J. Neely. Queue stability and probability 1 convergence via lyapunov optimization. Arxiv
Technical Report, arXiv:1008.3519, August 2010. 50, 51

[139] O. Kallenberg. Foundations of Modern Probability, 2nd ed., Probability and its Applications.
Springer-Verlag, 2002. 50

[140] D. Williams. Probability with Martingales. Cambridge Mathematical Textbooks, Cambridge

University Press, 1991. 50

[141] Y. V. Borovskikh and V. S. Korolyuk. Martingale Approximation. VSP BV, The Netherlands,
1997. 50

[142] M. J. Neely and R. Urgaonkar. Opportunism, backpressure, and stochastic optimization with
the wireless broadcast advantage. Asilomar Conference on Signals, Systems, and Computers,
Pacific Grove, CA, Oct. 2008. DOI: 10.1109/ACSSC.2008.5074815 70, 71, 179

[143] M. J. Neely and A. Sharma. Dynamic data compression with distortion constraints for wireless
transmission over a fading channel. arXiv:0807.3768v1, July 24, 2008. 70, 71, 84, 89, 179

[144] L. Huang and M. J. Neely. Max-weight achieves the exact [O(1/V ), O(V )] utility-delay
tradeoff under Markov dynamics. Arxiv Technical Report, arXiv:1008.0200, August 2010. 74,
77
BIBLIOGRAPHY 193
[145] P. Billingsley. Probability Theory and Measure, 2nd edition. New York: John Wiley & Sons,
1986. 76, 92

[146] M. J. Neely. Distributed and secure computation of convex programs over a network of
connected processors. DCDIS Conf., Guelph, Ontario, July 2005. 81

[147] L. Tassiulas and A. Ephremides. Throughput properties of a queueing network with dis-
tributed dynamic routing and flow control. Advances in Applied Probability, vol. 28, pp.
285-307, 1996. DOI: 10.2307/1427922 86

[148] Y. Wu, P. A. Chou, and S-Y Kung. Information exchange in wireless networks with network
coding and physical-layer broadcast. Conference on Information Sciences and Systems, Johns
Hopkins University, March 2005. 87

[149] E. Leonardi, M. Mellia, M. A. Marsan, and F. Neri. Optimal scheduling and routing for
maximizing network throughput. IEEE/ACM Transactions on Networking, vol. 15, no. 6,
Dec. 2007. DOI: 10.1109/TNET.2007.896486 104, 107

[150] Y. Li, A. Papachristodoulou, and M. Chiang. Stability of congestion control schemes with
delay sensitive traffic. Proc. IEEE ACC, Seattle, WA, June 2008.
DOI: 10.1109/ACC.2008.4586779 104, 108, 109

[151] J. K. MacKie-Mason and H. R. Varian. Pricing congestible network resources. IEEE Journal
on Selected Areas in Communications, vol. 13, no. 7, September 1995. DOI: 10.1109/49.414634
109

[152] M. J. Neely and E. Modiano. Convexity in queues with general inputs. IEEE Transactions on
Information Theory, vol. 51, no. 2, pp. 706-714, Feb. 2005. DOI: 10.1109/TIT.2004.840859
109

[153] M. J. Neely. Optimal pricing in a free market wireless network. Wireless Networks, vol. 15, no.
7, pp. 901-915, October 2009. DOI: 10.1007/s11276-007-0083-0 112, 179

[154] M. J. Neely and R. Urgaonkar. Optimal backpressure routing in wireless networks with
multi-receiver diversity. Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862-881, July 2009.
DOI: 10.1016/j.adhoc.2008.07.009 113, 132, 145, 147, 179

[155] L. Ying, S. Shakkottai, and A. Reddy. On combining shortest-path and back-

pressure routing over multihop wireless networks. Proc. IEEE INFOCOM, 2009.
DOI: 10.1109/INFCOM.2009.5062086 113, 179

[156] J.W. Lee, R. R. Mazumdar, and N. B. Shroff. Non-convex optimization and rate control
for multi-class services in the internet. IEEE/ACM Trans. on Networking, vol. 13, no. 4, pp.
827-840, Aug. 2005. DOI: 10.1109/TNET.2005.852876 116
194 BIBLIOGRAPHY
[157] M. Chiang. Nonconvex optimization of communication systems. Advances in Mechanics and
Mathematics, Special volume on Strang’s 70th Birthday, Springer, vol. 3, 2008. 116
[158] W.-H. Wang, M. Palaniswami, and S. H. Low. Application-oriented flow control: Funda-
mentals, algorithms, and fairness. IEEE/ACM Transactions on Networking, vol. 14, no. 6, Dec.
2006. DOI: 10.1109/TNET.2006.886318 116
[159] M. J. Neely, A. S. Tehrani, and A. G. Dimakis. Efficient algorithms for renewable energy
allocation to delay tolerant consumers. 1st IEEE International Conference on Smart Grid
Communications, 2010. 120, 122, 179
[160] L. Tassiulas and S. Sarkar. Maxmin fair scheduling in wireless ad hoc networks. IEEE
Journal on Selected Areas in Communications, Special Issue on Ad Hoc Networks, vol. 23, no. 1,
pp. 163-173, Jan. 2005. 128
[161] H. Shirani-Mehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with non-perfect
channel state knowledge. IEEE Transactions on Communications, vol. 58, no. 7, pp. 2055-2066,
July 2010. DOI: 10.1109/TCOMM.2010.07.090377 129, 132
[162] M. Kobayashi, G. Caire, and D. Gesbert. Impact of multiple transmit antennas in a queued
SDMA/TDMA downlink. In Proc. of 6th IEEE Workshop on Signal Processing Advances in
Wireless Communications (SPAWC), June 2005. DOI: 10.1109/SPAWC.2005.1506198 132,
179
[163] C. Li and M. J. Neely. Energy-optimal scheduling with dynamic channel acquisition in
wireless downlinks. IEEE Transactions on Mobile Computing, vol. 9, no. 4, pp. 527-539, April
2010. DOI: 10.1109/TMC.2009.140 132
[164] A. Gopalan, C. Caramanis, and S. Shakkottai. On wireless scheduling with partial channel-
state information. Allerton Conf. on Comm., Control, and Computing, Sept. 2007. 132
[165] M. J. Neely. Dynamic data compression for wireless transmission over a fading channel. Proc.
Conference on Information Sciences and Systems (CISS), invited paper, Princeton, March 2008.
DOI: 10.1109/CISS.2008.4558703 132, 179
[166] M. J. Neely. Max weight learning algorithms with application to scheduling in unknown
environments. arXiv:0902.0630v1, Feb. 2009. 132, 162
[167] D. Shah and M. Kopikare. Delay bounds for approximate maximum weight match-
ing algorithms for input queued switches. Proc. IEEE INFOCOM, June 2002.
DOI: 10.1109/INFCOM.2002.1019350 140
[168] M. J. Neely, E. Modiano, and C. E. Rohrs. Tradeoffs in delay guarantees and computation
complexity for n × n packet switches. Proc. of Conf. on Information Sciences and Systems (CISS),
Princeton, March 2002. 140, 141
BIBLIOGRAPHY 195
[169] L.Tassiulas. Linear complexity algorithms for maximum throughput in radio networks and in-
put queued switches. Proc. IEEE INFOCOM, 1998. DOI: 10.1109/INFCOM.1998.665071
140, 141

[170] E. Modiano, D. Shah, and G. Zussman. Maximizing throughput in wireless net-

works via gossiping. Proc. ACM SIGMETRICS / IFIP Performance’06, June 2006.
DOI: 10.1145/1140103.1140283 141

[171] D. Shah, D. N. C. Tse, and J. N. Tsitsiklis. Hardness of low delay network scheduling. under
submission. 141

[172] L. Jiang and J.Walrand. A distributed csma algorithm for throughput and utility maximization
in wireless networks. Proc. Allerton Conf. on Communication, Control, and Computing, Sept.
2008. DOI: 10.1109/ALLERTON.2008.4797741 141, 142, 144

[173] S. Rajagopalan and D. Shah. Reversible networks, distributed optimization, and network
scheduling: What do they have in common? Proc. Conf. on Information Sciences and Systems
(CISS), 2008. 141, 144

[174] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: John Wiley &
Sons, Inc., 1991. DOI: 10.1002/0471200611 143

[175] L. Jiang and J. Walrand. Scheduling and congestion control for wireless and processing
networks. Synthesis Lectures on Communication Networks, vol. 3, no. 1, pp. 1-156, 2010.
DOI: 10.2200/S00270ED1V01Y201008CNT006 144, 179

[176] L. Jiang and J.Walrand. Convergence and stability of a distributed csma algorithm for maximal
network throughput. Proc. IEEE Conference on Decision and Control (CDC), Shanghai, China,
December 2009. DOI: 10.1109/CDC.2009.5400349 144

[177] J. Ni, B. Tan, and R. Srikant. Q-csma: Queue length based csma/ca algorithms for achiev-
ing maximum throughput and low delay in wireless networks. ArXive Technical Report:
arXiv:0901.2333v4, Dec. 2009. 144

[178] G. Louth, M. Mitzenmacher, and F. Kelly. Computational complexity of loss networks. The-
oretical Computer Science, vol. 125, pp. 45-59, 1994. DOI: 10.1016/0304-3975(94)90216-X
144

[179] J. Ni and S. Tatikonda. A factor graph modelling of product-form loss and queueing net-
works. 43rd Allerton Conference on Communication, Control, and Computing (Monticello, IL),
September 2005. 144

[180] M. Luby and E. Vigoda. Fast convergence of the glauber dynamics for sampling independent
sets: Part i. International Computer Science Institute, Berkeley, CA, Technical Report TR-99-002,
196 BIBLIOGRAPHY
Jan. 1999.
DOI: 10.1002/(SICI)1098-2418(199910/12)15:3/4%3C229::AID-RSA3%3E3.0.CO;2-X
144

[181] D. Randall and P. Tetali. Analyzing glauber dynamics by comparison of Markov chains.
Lecture Notes in Computer Science, Proc. of the 3rd Latin American Symposium on Theoretical
Informatics, vol. 1380:pp. 292–304, 1998. DOI: 10.1063/1.533199 144

[182] L. Bui, E. Eryilmaz, R. Srikant, and X. Wu. Joint asynchronous congestion control and
distributed scheduling for multi-hop wireless networks. Proc. IEEE INFOCOM, 2006.
DOI: 10.1109/INFOCOM.2006.210 145

[183] D. Shah. Maximal matching scheduling is good enough. Proc. IEEE Globecom, Dec. 2003.
DOI: 10.1109/GLOCOM.2003.1258788 147

[184] P. Chaporkar, K. Kar, X. Luo, and S. Sarkar. Throughput and fairness guarantees through
maximal scheduling in wireless networks. IEEE Trans. on Information Theory, vol. 54, no. 2,
pp. 572-594, Feb. 2008. DOI: 10.1109/TIT.2007.913537 147

[185] X. Lin and N. B. Shroff. The impact of imperfect scheduling on cross-layer rate control in
wireless networks. Proc. IEEE INFOCOM, 2005. DOI: 10.1109/INFCOM.2005.1498460
147

[186] L. Lin, X. Lin, and N. B. Shroff. Low-complexity and distributed energy minimization in
multi-hop wireless networks. Proc. IEEE INFOCOM, 2007.
DOI: 10.1109/TNET.2009.2032419 147

[187] C. C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic program-
ming for queuing networks. Submitted for publication, 2008. 174

[188] T. Ho, M. Médard, J. Shi, M. Effros, and D. R. Karger. On randomized network coding.
Proc. of 41st Annual Allerton Conf. on Communication, Control, and Computing, Oct. 2003. 179

[189] A. Eryilmaz and D. S. Lun. Control for inter-session network coding. Proc. Information
Theory and Applications Workshop (ITA), Jan./Feb. 2007. 179

[190] X. Yan, M. J. Neely, and Z. Zhang. Multicasting in time varying wireless networks: Cross-
layer dynamic resource allocation. Proc. IEEE International Symposium on Information Theory
(ISIT), June 2007. DOI: 10.1109/ISIT.2007.4557630 179

[191] A. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic data compression in

multi-hop wireless networks. Proc. SIGMETRICS, 2009. DOI: 10.1145/1555349.1555367
179
BIBLIOGRAPHY 197
[192] C. Swannack, E. Uysal-Biyikoglu, and G. Wornell. Low complexity multiuser scheduling
for maximizing throughput in the mimo broadcast channel. Proc. of 42nd Allerton Conf. on
Communication, Control, and Computing, September 2004. 179

[193] H. Shirani-Mehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with non-
perfect channel state knowledge. IEEE Transactions on Communications, to appear.
DOI: 10.1109/TCOMM.2010.07.090377 179

[194] E. M. Yeh and R. A. Berry. Throughput optimal control of cooperative relay networks. IEEE
Transactions on Information Theory: Special Issue on Models, Theory, and Codes for Relaying
and Cooperation in Communication Networks, vol. 53, no. 10, pp. 3827-3833, October 2007.
DOI: 10.1109/TIT.2007.904978 179

[195] L. Huang and M. J. Neely. The optimality of two prices: Maximizing revenue in a stochastic
communication system. IEEE/ACM Transactions on Networking, vol. 18, no. 2, pp. 406-419,
April 2010. DOI: 10.1109/TNET.2009.2028423 179

[196] L. Jiang and J. Walrand. Stable and utility-maximizing scheduling for stochastic pro-
cessing networks. Allerton Conference on Communication, Control, and Computing, 2009.
DOI: 10.1109/ALLERTON.2009.5394870 179

[197] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
profit. Proc. IEEE Conf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 179

[198] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
profit. ArXiv Technical Report, arXiv:1004.0479v1, April 2010. 179

[199] A. Warrier, S. Ha, P. Wason, I. Rhee, and J. H. Kim. Diffq: Differential backlog congestion
control for wireless multi-hop networks. Conference on Sensor, Mesh and Ad Hoc Communi-
cations and Networks (SECON), San Francisco, US, 2008. DOI: 10.1109/SAHCN.2008.78
179

[200] A. Warrier, S. Janakiraman, S. Ha, and I. Rhee. Diffq: Practical differential backlog con-
gestion control for wireless networks. Proc. IEEE INFOCOM, Rio de Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5061929 179

[201] A. Sridharan, S. Moeller, and B. Krishnamachari. Making distributed rate control us-
ing lyapunov drifts a reality in wireless sensor networks. 6th Intl. Symposium on Mod-
eling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), April 2008.
DOI: 10.4108/ICST.WIOPT2008.3205 179

[202] U. Akyol, M. Andrews, P. Gupta, J. Hobby, I. Saniee, and A. Stolyar. Joint schedul-
ing and congestion control in mobile ad-hoc networks. Proc. IEEE INFOCOM, 2008.
DOI: 10.1109/INFOCOM.2008.111 179
198 BIBLIOGRAPHY
[203] B. Radunović, C. Gkantsidis, D. Gunawardena, and P. Key. Horizon: Balancing
tcp over multiple paths in wireless mesh network. Proc. ACM Mobicom, 2008.
DOI: 10.1145/1409944.1409973 179
199

Author’s Biography

MICHAEL J. NEELY
Michael J. Neely received B.S. degrees in both Electrical Engineering and Mathematics from the
University of Maryland, College Park, in 1997. He then received a 3 year Department of Defense
NDSEG Fellowship for graduate study at the Massachusetts Institute of Technology, where he
completed the M.S. degree in 1999 and the Ph.D. in 2003, both in Electrical Engineering. He
joined the faculty of Electrical Engineering at the University of Southern California in 2004, where
he is currently an Associate Professor. His research interests are in the areas of stochastic network
optimization and queueing theory, with applications to wireless networks, mobile ad-hoc networks,
and switching systems. Michael received the NSF Career award in 2008 and the Viterbi School of
Engineering Junior Research Award in 2009. He is a member of Tau Beta Pi and Phi Beta Kappa.

Wiley - Performance of Computer Communication Systems
No ratings yet
Wiley - Performance of Computer Communication Systems
499 pages
Analysis of Queues - Methods and Applications (2012, CRC Press)
No ratings yet
Analysis of Queues - Methods and Applications (2012, CRC Press)
788 pages
(These) 85710 Salaun 2020 Archivage
No ratings yet
(These) 85710 Salaun 2020 Archivage
136 pages
Optimisation de La Topologie Des Reseaux
No ratings yet
Optimisation de La Topologie Des Reseaux
141 pages
Marasevic Columbia 0054D 13545
No ratings yet
Marasevic Columbia 0054D 13545
232 pages
Queueing
No ratings yet
Queueing
183 pages
Lecture Notes in Economics and Mathematica1 Systems
No ratings yet
Lecture Notes in Economics and Mathematica1 Systems
492 pages
Sangma2022 Article HierarchicalClusteringForMulti
No ratings yet
Sangma2022 Article HierarchicalClusteringForMulti
26 pages
Performance Modeling of Communication Networks With Markov Chains
No ratings yet
Performance Modeling of Communication Networks With Markov Chains
90 pages
Sensors 24 05076 v2
No ratings yet
Sensors 24 05076 v2
23 pages
Phdthesis Ptdeboer
No ratings yet
Phdthesis Ptdeboer
206 pages
Liu Research Statement
No ratings yet
Liu Research Statement
4 pages
Online Scheduling of CPU-NPU Co-Inference For Edge AI Tasks
No ratings yet
Online Scheduling of CPU-NPU Co-Inference For Edge AI Tasks
6 pages
Online Computation Offloading For Collaborative Space Aerial-Aided Edge Computing Toward 6G System
No ratings yet
Online Computation Offloading For Collaborative Space Aerial-Aided Edge Computing Toward 6G System
11 pages
2020 - Reducing Offloading Latency For Digital Twin Edge Networks in 6G
No ratings yet
2020 - Reducing Offloading Latency For Digital Twin Edge Networks in 6G
12 pages
Computer Network and System
0% (1)
Computer Network and System
5 pages
A Unified Approach To QoS-Guaranteed Scheduling For Channel-Adaptive Wireless Networks
No ratings yet
A Unified Approach To QoS-Guaranteed Scheduling For Channel-Adaptive Wireless Networks
22 pages
Long-Term Profit For Electric
No ratings yet
Long-Term Profit For Electric
9 pages
Van Bemten2016NetworkCalculus AComprehensive
No ratings yet
Van Bemten2016NetworkCalculus AComprehensive
57 pages
Network Performance Analysis and Modeling
No ratings yet
Network Performance Analysis and Modeling
4 pages
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
No ratings yet
GreenEdge - Joint Green Energy Scheduling and Dynamic Task Offloading in Multi - Tier Edge Computing Systems
14 pages
Distributed Energy Management For Multiple Data Centers With Renewable Resources and Energy Storages
No ratings yet
Distributed Energy Management For Multiple Data Centers With Renewable Resources and Energy Storages
12 pages
Asadi 2013
No ratings yet
Asadi 2013
18 pages
TMC2020
No ratings yet
TMC2020
17 pages
Mohammadi Arezou 200904 PHD
No ratings yet
Mohammadi Arezou 200904 PHD
143 pages
Queueing
No ratings yet
Queueing
379 pages
GSM Wirelesscourse
No ratings yet
GSM Wirelesscourse
95 pages
Generating Function Analysis of Wireless Networks and ARQ Systems
No ratings yet
Generating Function Analysis of Wireless Networks and ARQ Systems
178 pages
Chapter8 QueuingTheory
No ratings yet
Chapter8 QueuingTheory
74 pages
Java Abstract 2010 & 2009
No ratings yet
Java Abstract 2010 & 2009
60 pages
Basic Queuing Theory
No ratings yet
Basic Queuing Theory
193 pages
Basic Queueing Theory: Dr. János Sztrik
No ratings yet
Basic Queueing Theory: Dr. János Sztrik
193 pages
ELE539A: Optimization of Communication Systems Lecture 7: Scheduling
No ratings yet
ELE539A: Optimization of Communication Systems Lecture 7: Scheduling
15 pages
(Gelenbe Erol) Analysis and Synthesis of Computer
100% (1)
(Gelenbe Erol) Analysis and Synthesis of Computer
324 pages
Wireless Networks: Lecture Notes: R. Combes January 29, 2016
No ratings yet
Wireless Networks: Lecture Notes: R. Combes January 29, 2016
111 pages
Queueing in Computer Networks
No ratings yet
Queueing in Computer Networks
180 pages
Redes Opticas
No ratings yet
Redes Opticas
208 pages
DigitalNetworks PDF
No ratings yet
DigitalNetworks PDF
187 pages
Network Analysis
0% (1)
Network Analysis
204 pages
Queueing
No ratings yet
Queueing
180 pages
Applications of Mathematics: Springer Science+Business Media, LLC
No ratings yet
Applications of Mathematics: Springer Science+Business Media, LLC
407 pages
Preface Data Nets
No ratings yet
Preface Data Nets
14 pages
Queueing
No ratings yet
Queueing
182 pages
Topological Design of Communication Networks Using Multiobjective Genetic Optimization
No ratings yet
Topological Design of Communication Networks Using Multiobjective Genetic Optimization
6 pages
Books Networks
No ratings yet
Books Networks
3 pages
Topics in Network Communications
No ratings yet
Topics in Network Communications
107 pages
Delay Analysis For Maximal Scheduling With Flow Control in Wireless Networks With Bursty Traffic
No ratings yet
Delay Analysis For Maximal Scheduling With Flow Control in Wireless Networks With Bursty Traffic
14 pages
Markov Chains Queueing
100% (1)
Markov Chains Queueing
159 pages
Network Analysis Dec 06
No ratings yet
Network Analysis Dec 06
204 pages
Basic Elements of Queueing Theory Application To The Modelling of Computer Systems Lecture Notes
No ratings yet
Basic Elements of Queueing Theory Application To The Modelling of Computer Systems Lecture Notes
110 pages
E Kubilinskas THESIS
No ratings yet
E Kubilinskas THESIS
254 pages
Basic Elements of Queueing Theory Lec Notes Philippe NAIN
No ratings yet
Basic Elements of Queueing Theory Lec Notes Philippe NAIN
110 pages
The Linux Shell Scripting Handbook - From Journeyman to Master
From Everand
The Linux Shell Scripting Handbook - From Journeyman to Master
Michael Basler
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Risk Management and System Safety
From Everand
Risk Management and System Safety
Leonam dos Santos Guimarães
5/5 (1)
Plain JavaScript: Learning the Front-End
From Everand
Plain JavaScript: Learning the Front-End
Roger Beans-Rivet
No ratings yet
Cybersecurity for Executives: A Guide to Protecting Your Business
From Everand
Cybersecurity for Executives: A Guide to Protecting Your Business
Matthew C. Smith
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
CAN Bus for Beginners: A Practical Guide to Automotive Networking
From Everand
CAN Bus for Beginners: A Practical Guide to Automotive Networking
Mohamad Charara
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet