0% found this document useful (0 votes)
47 views9 pages

O (:+ki'04a: Minimizing End-to-End Delay in High-Speed Networks With A Simple Coordinated Schedule

This document presents a new scheduling scheme called Coordinated-Earliest-Deadline-First (CEDF) that aims to minimize end-to-end delay in high-speed networks. CEDF uses an earliest-deadline-first approach with coordination between deadlines at consecutive hops. The key is that once a packet passes its first hop, it can pass through subsequent hops quickly. The analysis shows CEDF achieves an additive delay bound of approximately pi + Ki, which is better than the multiplicative bound of ~ x Ki provided by Weighted Fair Queueing, especially for large number of hops Ki. Simulations show CEDF often has lower delays in practice by exhibiting an additive

Uploaded by

eugene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

O (:+ki'04a: Minimizing End-to-End Delay in High-Speed Networks With A Simple Coordinated Schedule

This document presents a new scheduling scheme called Coordinated-Earliest-Deadline-First (CEDF) that aims to minimize end-to-end delay in high-speed networks. CEDF uses an earliest-deadline-first approach with coordination between deadlines at consecutive hops. The key is that once a packet passes its first hop, it can pass through subsequent hops quickly. The analysis shows CEDF achieves an additive delay bound of approximately pi + Ki, which is better than the multiplicative bound of ~ x Ki provided by Weighted Fair Queueing, especially for large number of hops Ki. Simulations show CEDF often has lower delays in practice by exhibiting an additive

Uploaded by

eugene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Minimizing End-to-End Delay in High-Speed

Networks with a Simple Coordinated Schedule


Matthew Andrews
Lisa Zhang
Bell Laboratories
600-700 Mountain Avenue
Murray Hill, NJ 07974
{andrews,ylz} (iih-esearch.bell-labs.com
Absimc&.Westudy the problem of providing end-to-end delay guarantees in connection-oriented
networks. In this environment multiple-hop
sessions coexist and interfere with one another.
Parekh and Gatlager showed that the Weighted Fair Qneneing (WFQ)
scheduling dmcipline providea a worst-case delay guarantee comparable to
fi x J-f;for a session with rate p; and K, hops. Such delays can occur since
a session-i packet can wait for time & at every

hop.

We describe a work-conservingschemethat guarantees an aaliitive delay


bound of approximately ~ + K,. Thw bound is smatler than the rrrulfip2icative bound ~

Ki of WFQ, cspeciatty when the hop count Ki is huge. We

call our scheme COORDINATED-EARLIEST-DEADLINE-FIRST


(CEDF) since
it uses an eartieat-deadtine-ftrst approach in which simple coordination is
apptied to tbe deadtines for consecutive hops of a session. The key to the
bound is that once a packet has passed through its first server, it ean pass
through all its subsequent servers quickly.
We conduct simulations to compare the delays actually produced by the
two schcduting disciplines. In many cases, these actual delays are comparable to their analytical worst-case bounds, implying that CEDF outperforms
WFQ.

I. INTRODUCTION
The provision of end-to-end delay guarantees in high-speed
networks remains one of the most important and widely studied Quality-of-Service (QoS) issues. Many real time audio and
video applications rely on the ability of the network to provide small delays. One key mechanism for achieving this aim
is scheduling at the outputs of the switches. In this paper, we
attempt to minimize end-to-end delay using a novel scheduling
scheme.
Before we introduce our scheme we first recall the delay
bounds for the much studied Weighted Fair Queueing (WFQ)
scheduling discipline, also known as Packet-by-Packet Generalized Processor-Sharing (PGPS). In their seminal papers [1], [2],
Parekh and Gallager showed that WFQ achieves the following
session-i delay bound for Rate Proportional Processor Sharing
(RPPS).1

(1)

For session i, Li is the maximum packet size, Ki is the number


of servers and rm is the service rate of the mth server. The maximum packet size over all sessions is L~aX. Session i is leakybucket constrained with burst size Ui and rate pi. Throughout
this paper, we assume that all service is non-cut-through and
non-preemptive.
1we briefly

review the

definitions of WFQ and RPPS in Section11.

To understand the delay guarantee of (1) better, we compare


the delay bound when session i has a single hop (Ki = 1) with
the bound when session i has multiple hops (Ki > 1). We observe the following. When the burst size ~i is large then the
multiple-hop delay bound is much less than Ki times the singlehop delay bound. However, when c~ is small then the multiplehop delay can be approximately Ki times the single-hop delay.
To see this, let us assume a uniform packet size for all sessions
(Li = 1) and a uniform service rate for all servers (rm = 1).
The delay bound of (1) now becomes,
~i+Kil
Pi

+Kz.

Hence, for a small burst size, e.g. ~i = 1, the multiple-hop


is essentially
1
~ x Ki,

delay

and the single-hop delay is essentially ~., Moreover, it is possible to construct an example in which this bound is achieved
since a packet can wait for time fi at every hop. This illustrates
our earlier observation.
In this paper, we demonstrate with both analysis and simulation that even for small burst sizes, a bound of g~ x Ki is not
necessary, i.e. the K-hop delay does not have to e K times the
l-hop delay. Indeed, in the case of uniform packet sizes, uniform service rates and small burst sizes, [3] showed that each
session i can achieve a delay bound,2
O

()

#+Ki

using a centralized scheme. The same paper also proposed a


simple distributed protocol with a slightly weaker bound,

O(:+Ki04a
Here, n is the number of servers in the network and p~in is the

minimumsessionrate.
Our Results In Section III we generalize the above simple protocol to accommodate arbitrary packet sizes and arbitrary server
2The bound 0 ( -+ + Ki ) is best possible

Up tO a COIISkKIt

factor. TOsee

this, under non-cut-&rough service all sessions must suffer delay Ki. Moreover,
examples can be constructed iu which some sessions must suffer delay pi.

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

01
O51O152Q

30

35404550

Se&5k#h

Fig. 1. A plot of the multiplicativedelaybound ~


represents a different vatue of

x K~.

Each curve

Fig. 2. A plot of the additive delay bound & + Ki.

pi. The delays are plotted against

Ki.

rates. We derive the following exact delay bound which allows


us to provide a direct comparison with (l).

(2)
The parameter e is the server utilization factor defined later. The
logarithmic term, although small, is somewhat involved. We
give the full definition later. In Section IV we provide simulation
results to compare the actual performance of our protocol and
WFQ.
The basic ideas of our protocol are an earliest-deadline-first
approach coupled with randomization and coordination. We assign a deadline for every server through which a packet passes.
By introducing some randomness, the deadlines can be sufficiently spread out so that all the packets can meet all their
deadlines. By introducing simple coordination among the deadlines, we can ensure that once a packet has passed through
its first server, it can pass through all its subsequent servers
quickly. We refer to our protocol as COORDINATED-EARLIESTDEADLINE-FIRST

(CEDF).

The traffic lights in Manhattan provide an intuitive analogue


to CEDF. Since the lights are coordinated, when one traffic light
turns green, many lights further down the street turn green also.
This means that once a car waits through one red light it can
then drive through many green lights quickly. In this way, delay
does not have to accumulate at every light.
From now on, we refer to a delay bound of the fotm ~ x Ki
.
as a multiplicative bound and a bound of the form ~ + Ki as
an additive bound. In Figures 1 and 2, we plot these bounds for
different values of Ki and pi. The curves for the multiplicative
bound have different slopes for different pi, whereas the curves
for the additive bound all have the same slope. We can see that
in general it is desirable to have an additive bound. We note that
the bound (2) of CEDF is close to an additive bound. (It does not
contain a term K;/pi.) Apart from the bound in reference [3] we
know of no previous end-to-end delay bound that is close to an
additive bound.
In our simulations, we observe that the actual delays under WFQ and CEDF are often comparable to their analytical

bounds. In many scenarios, the former exhibits the behavior of


a multiplicative bound, and the latter exhibits the behavior of
an additive bound. For these scenarios, CEDF produces significantly lower delays. In other scenarios where there is less contention between sessions, both protocols exhibit the behavior of
an additive bound.
CEDF has other desirable properties. First, we do not need
traffic reshaping between hops. Second, we only need to do
per-session processing at the points where the sessions enter the
network. That is, we do no per-session processing within the
network.

Previous Work The Earliest-Deadline-First

(EDF) scheduling
discipline when applied to a single server has received much attention. For example, Ferrari and Verma [4] and Verma, Zhang
and Ferrari [5] showed that it can provide delay bounds and
delay-jitter bounds.
Georgiadis, Gu6rin and Parekh [6] and
Liebeherr, Wrege and Ferrari [7] proved that EDF is delayoptimal in the sense that if a set of delay bounds is achievable
then it can be achieved by EDF. Necessary and sufficient conditions for a set of delay bounds to be achievable were given.
Liebeherr et al. also presented schemes with low implementation complexity that approximate EDF [7], [8]. For networks,
Georgiadis, Gu6rin, Peris and Sivarajan [9] showed that EDF
can be sub-optimal. Nevertheless they proved that if the traftic
is correctly reshaped after each node then EDF can outperform
Weighted Fair Queueing. However, the best explicit bound on
end-to-end delay given in [9] is the same as Equation (1). General techniques for calculating end-to-end delay bounds were
obtained by Goyal, Lam and Vin [10] and Goyal and Vin [11].
A number of papers have simulated end-to-end delay performance.
Simulation results for EDF are presented in [4],
[5]. Clark, Shenker and Zhang [12] used simulation to compare WFQ with variants of FIFO. Yates, Kurose, Towsley and
Hluchyj [13] examined end-to-end delay distributions for WFQ,
FIFO and Golestanis Stop-and-Go Fair Queueing [14], [15].
They found that the analytic delay bounds can be too pessimistic.
Grossglauser and Keshav [16] showed that FIFO
can outperform the Weighted Round Robin (WRR) and Round
Robin (RR) disciplines for CBR traffic.

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

Our protocol CEDF is motivated by techniques of Leighton,


Maggs and Rao [17] and Leighton, Maggs and Richa [18] for
static packet scheduling. In this static setting, all packets are
present inthe network initially. Similar techniques were used
by Rabani and Tardos [19] and Ostrovsky and Rabani [20]. For
an overview of different scheduling disciplines, see [21], [22],
The rest of the paper is divided into sections as follows, We
define our model and briefly review WFQ and RPPS in Section II. Our protocol CEDF is described and analyzed in Section III. The simulation results are presented in Section IV. We
give our conclusions in Section V. The Appendix provides the
details of the proofs.
II. MODEL AND DEFINITIONS
We consider a packet-based connection-oriented network. We
equate each link in the network with the server that schedules the
sessions on the link. Each session is specified by a fixed path
through the network. Let Ki be the number of servers along the
path of session i, and let m!), m:),..
., m~~ be these servers.
When it causes no confusion, we drop the superscript (z). We
define Li to be the maximum size of a session-i packet in bits.
Let L max = maxi Li and Lmin = mini Li.
We use the (cr, p) traffic model introduced by Cruz [23], [24]
in which the traffic entering the network is leaky-bucket constrained. The session-i traffic is characterized by a burst size
ni and a session rate pi. If Ai (tl, t2) denotes the amount of
session-i traffic entering the network during the time interval
(tl,
t2],
then,

Let rm be the service rate of server m, i.e. m can service at


most rm (t2 tl) bits during the interval (tl, tz]. Let I(m) be
the set of sessions served by server m. We require the following
stability condition,

s is a server utilization factor. It is crucial in


allowing us to use coordination to achieve low delay bounds.
We adopt the non-cut-through and non-preemptive convention for scheduling. First, no packet is eligible for service until
its last bit has arrived. Second, once a server begins serving a
packet, it must continue until the whole packet has been serviced.
The parameter

Review of Weighted Fair Queueing Since we refer frequently to


Weighted Fair Queueing, we now provide a brief definition. For
details see [25], [1], [2]. WFQ attempts to emulate the Generalized Processor Sharing (GPS) scheme, in which all backlogged
sessions receive service simultaneously. In particular, if session
i is backlogged at server m then under GPS it receives service
at rate,

where Bm is the set of backlogged sessions at server m and the


@~ area set of allocated weights.

WFQ is a non-preemptive scheme that emulates GPS on a


packet-by-packet basis. In particular, if a server needs to select
a packet for transmission at time t then it selects the first packet
that would complete service under GPS if no additional packets
were to arrive after time t.
In this paper we restrict our attention to a special case of WFQ
known as Rate Proportional Processor Sharing (RPPS) in which
@~ = pz for all sessions i and servers m, The end-to-end delay
bound for RPPS derived in [2] is stated in Equation (l).
III. ANALYTICAL BOUND

A. Overview
The basic idea of COORDINATED-EDF is very simple. For
each packet p, we assign deadlines D1, D2 . . . DK for eve~
server, ml, mz . . . mK, through which p passes. The deadlines
at a server m are defined using a parameter G~, where G~ is
essentially %
log(.). (We define the logarithmic term in Gm
later.) In particular, D1 is rand + G~, time after ps injection,
where rand is a random number chosen from an appropriate
range. Each subsequent deadline D~+l is Dk + Gm~. CEDF
gives priority to the packet with the earliest deadline if more
than one packet is waiting for a server. Ties are broken arbitrarily.
Note that randomness is only added to the first deadline
of each packet. This randomness has the important effect of
spreading out the deadlines. If rand is chosen from a large
enough range, i.e. proportional to L~/p~ for session i, then deadlines from different sessions do not cluster together. In this way,
packets do not compete for the same server simultaneously, and
hence all packets are able to meet all their deadlines.
The Gms provide coordination among the deadlines. We
point out that the values of the G~s are usually small, especially in high-speed networks where the server rates rm are
large. This means that once a packet passes through its first
server, it passes through all its subsequent servers quickly. As
an analogy to our strategy, consider the traffic lights on an avenue in Manhattan. If a car is stopped at a red light then once
that light turns green, many of the subsequent lights turn green
also. In other words, the coordination of the lights means that
once the car has passed through one light, it can quickly travel
through many lights in succession.
We emphasize that the Gms are independent of the session
rates. Under CEDF, session-i packets do not accumulate a delay
of ~ for each server that they pass through. Hence, CEDF does
not have a multiplicative

term of the form ~ x Ki in its delay


bound. This provides a significant contrast with the delay bound
of WFQ. We discuss in more detail the advantages of CEDF in
Section III-B.

B. Protocol and Analysis


We define parameters T~ and M for generating
random numbers. Roughly speaking, M serves as the period
of the deadlines. Once the deadlines are defined in an interval
of length Al, all deadlines are defined. The parameter Ti is the
size of the intervals from which the random numbers for session
i are chosen. When Ti is about ~, the amount of randomness
is sufficient to spread out the deadlines. We choose to write Ti

Parameters

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

in the following (slightly complicated) form, because it ensures


that kf is an integral multiple of all the Tis. For reasons that
will become clear later, we also define Si such that Si/Ti is
slightly greater than the session rate Pi. Let,

We define Gm for each server m, which determines how the


deadline for a packet is incremented when it advances from one
server to the next. Let,

nillrme

()

Grn=c@#og,

mm

where LmaX = m~i Li ~d Lmin = mini Li. The parameter


), where p.UC is the success probability
a =
log #--of the protocol. ~es~iscuss
this success probability in the Remarks section.) Note that a is independent of Li, ~i, pi, Ki and
rm.

0(s3

Tokens We use tokens to define deadlines. For session i, let


I-1, 72... TMITi be numbers chosen uniformly at random from
each of the intervals [0, Ti), [Ti, 2Ti) . . . [M Tij M). Session-i
tokens appear periodically with period Mat the following times.
T1

72

...

TM/Ti

rl+M
71+ 2M
rl + 3M

T2 + M

...
...
...

TM/Ti

T2

T2

2M
3M

rMITi + M
+ 2M
TM/Ti
1- 3M

Let ml, m2 . . . mKi be the servers on the path of


session i. For each session-i packet, we define a sequence of
deadlines DI, Dz . . . D~i for traversing the servers.
When a packet of size 1 bits obtains a token, it consumes 1
bits from that token. At most Si bits can be consumed from
each session-i token. Suppose a session-i packet p is injectedat

Deadlines

time ti~j and has lP bits. Supposealso that the session-ipacket


injected immediately before p obtains its token at time ~Prev.
Packet p obtains the first session-i token after rn~{tinj,
tpre. }
that has at least 1P bits unconsumed. Let T be the time that the
token appears. The deadlines are defined as follows.

DI

Dj

I_+ Gml
Dj.l + Gmi

Now that all deadlines are defined, each server gives priority
to the packet that has the earliest deadline.

packets autonomously, i.e. we do not require explicit communication among servers.


2. We do not place tokens at times Ti, 2Ti, 3Ti etc., but rather
we introduce some randomness. This randomness is essential
for spacing out the deadlines so that not many deadlines contend
for the same server simultaneously, Once the tokens are chosen,
the deadlines are chosen deterministically.
3. We emphasize that our protocol is work conserving and requires no trafic shaping. As long as some packets are waiting
for a server, the packet with the earliest deadline gets serviced.
In particular, a packet can be serviced before it obtains a token. The concept of a packet obtainin~consuming
a token is
merely a method of counting for the purpose of assigning deadlines.
4. The only per-session processing is the determination of
which token a packet obtains. This can be done at the point
on the edge of the network where the session enters. Once the
token has been obtained, the deadlines for the packet are independent of its session parameters. This means that we need no
per-session state within the network.
5. We say that the protocol is successful if all the packets meet
all their deadlines. The success of the protocol is equivalent to
the successful placing of afinite number of tokens due to the periodicity of the token placement. Hence, we can use a Chernoffbound argument to analyze the success probability.
6. To prove the desired end-to-end delay bound, we prove two
statements in the Appendix. First, with high probability the protocol is successful. (See Lemmas 2 and 3.) Second, ~ is at most
tinj+ & + ~ for each session-i packet, where tinjis the injection time of that packet. (See Lemma 4.) Therefore,
Theorem 1: With high probability, the end-to-end delay guarantee for session i is
~i + 4Li/&

K;

+cl~*

Pi

k=l

loge
(%)

We emphasize that when the protocol is successful, every packet


meets all of its deadlines, i.e. the bound in Theorem 1 is a worstcase delay bound.
7. The factor l/e in the term 4Li/& is needed in the proof of
Lemma 4. However, we conjecture that in many situations it
will be possible to obtain a delay bound in which the term 4Li/&
is replaced by 4L~.
8. We now compare the bound of WFQ with the bound of
CEDF when rm is large, e.g. in high-speed networks. Here,
the terms containing I/rm are negligible. The bound for WFQ
becomes,
~i + 2(Ki l)Li
pi

and the bound for CEDF becomes,

Remarks
1. The only coordination required comes from the above iterative definition of the deadlines.
This coordination can be
achieved simply by stamping each packet with its current deadline.3 Each server can then update the deadlines of its pending
3This con be done using techniques sirnikwto the protocols of [26].

We note that the bound for CEDF does not contain Ki.
IV. SIMULATION RESULTS
Our experiments simulate a simple situation with uniform
packet sizes and uniform server rates. Since CEDF involves

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

WFQ
150
140 -

/,

140

,30

,30
,

120 -

120

110 -

110

Iw

I@

t
90

80

70 60
50
40
30
20
10
0.
05,0,5m

3935404553
Ses..?wh

Fig. 3. Mean delay of the long session due to WFQ,


W,o
...

150140

Fig. 4. Mean delay of the long session due to S-CEDF.


ShWIe-CEDF

1s0
mko.03
Me-mm

!40

,,

rale-o.z
Me-0.s
Inko.b
Me-w
rate=m
rate=a7

tm
110 Ico

80

70

+
+

130 -

,.
.,

,.

4
*
* .+.
.e--.*..

,,,
,.4,

eo
50
40
34
m
10
0
05101520

.3Q

3540456a

*.,,:=m

Fig. 5. 98%-percentile delay of the long session due to WFQ.

many parameters, we simulate a simplified version, SIMPLECEDF, which nevertheless contains the essence of CEDF. Under S-CEDF, the deadline for the first server is chosen randomly
(without reference to periodic tokens). Every subsequent deadline is the deadline for the previous server incremented by one
packet service time. (See Figure 11.) As we shall see, the performance of S-CEDF corresponds to the analytical bounds of
Section III.

P:

A session-i packet
Injection time of p
Deadline of p at its kth hop

t~~j

Ilk:

1
2
3

D1 := randomly chosen from [ti~j, tini+ ~]


Dk := D,&l + one packet service time
Each link gives priority to the packet with the
earliest deadline.

Fig. 6. 98%-percentile delay of the long session due to S-CEDF.

packet is dropped in any experiments,


packet size I link speed I packet service time I buffer size
1000b
lMb/sec
lms
co

A. Single Long Session


We begin with a very simple configuration as illustrated in
Figure 12. The network consists of a line of N links. A long
session of N hops travels through the network sharing each hop
with a short session of 1 hop. These short sessions provide the
cross-traffic for the long session. The length N of the long
session varies from 5 to 40. The link utilization is set to 0.8
(i.e. e = 0.2). The rate of the long session pl varies in the
range from 0.03 to 0.7. The rate of each short session p$ is set
to 0.8 pl. Experiments of a similar setup were conducted in
other simulation studies, e.g. [16], [27], [5].

Fig. 11. S-CEDF,the SIMPLE-CEDFprotocol.


We compare the performance of WFQ and SIMPLE-CEDF
(S-CEDF) using the mean end-to-end delay and the 98%percentile end-to-end delay. We use the following simulation
parameters. The link speed is set to lMb/sec and all packets
have a size 1000 bits. The packet service time on each link is
therefore lms. The end-to-end delay consists of the packet service time and the queueing time, i.e. the time that the packet
spends waiting in a buffer. Buffers have a large size and no

Fig. 12. SessionOis the long sessionwith5 hops. Sessions1 tluough5 are the
1-hopsessions.
We first use a deterministic injection model that conforms
to the (a, p) traffic model with o = 1 for each session. Figures 3,4,5 and 6 illustrate the end-to-end delay experienced by
the long session. We note the striking resemblance between the
curves for these actual delays and the curves for the analytical

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

.
,;,;;
a,,,,,,...~

1s0
140 130 !m

mle.0,05
m-t
mfea15
ratn.D.2
Me-m
M9-a.4
mw,5
rate-m
rde.f.7

,<
,.$

....

110 700

.,/

SipltiEDF

rale=ma

,,;

,/@;..,...x

+
*.+
+
+.+..,
---.*-

,.!-0,03
ra&o.05
.

+
+

,,-
,.,W___
raw,z
W.a
rale=.a4
rale-D.5
ralea, e
ra&a7

.. ..

.-., *- J
-* + -+.
.G..*/

so 80

70

/,d//
//

,.>
~ .-

;/x,...

,..-

6D ,/(,

....

,,

.,,*,/,..#-

50
,/;
40
J
3D

K
,/

20

70 -

;;.:$..2

....*

e,>,%.,:.,

/#r
-,
.<,,,,<+.,,,,.---?
.=:,,::,::s

. ...
/.-+---:SPS-
.>s=@--

.....*

. . ... ,-s:.~f

--

. . ..

01

o
05101520

W354D4550

05107523

Se*aim=b@

se*sl%@lh

Fig. 7. Probabilistic on-off source. Mean delay due to WFQ,

Fig. 8. Probabnistic on-off source. Mean delay due to S-CEDE


Siitie-$EDF
150 [

i 30

rako,,s
rd4-0.2
mls-as
rale.sO.4
mte-c.s
Mn-0.6
ratn*.7

120

:m

~lo
,Cm

rae=o,oa
.
We=O.os +
rata-o
! +-.

MD

O(

.*--*
+... .
-e-.* -

O51O%52V

3J354945E0
S=si%mmh

Fig. 9. Probabilistic on-off source. 98%-percentile delay due to WFQ.

delay bounds. (Recall Figures 1 and 2.) These plots demonstrate


that for small values of pe, S-CEDF has a significant advantage
over WFQ in terms of the end-to-end delay of the long session.
The two disciplines present similar behavior for larger values of

Fig. 10. Probabilistic on-off source. 98%-percentile delay due to SCEDF.

of
.-

We take a closer look at the behavior of the long session for


small p~. Under WFQ, packets from the long session are frequently delayed by packets from the 1-hop sessions, since p. is
much larger than pt. Furthermore, a packet from the long session suffers from a similar amount of queueing delay at each
link. This behavior of WFQ supports the analytical bound of the
multiplicative form ~ x K.
Under S-CEDF, the long session behaves differently. When
traversing the first few links, a packet from the long session is
likely to queue in the buffers. This is because the initial deadline
is chosen from the range [tinj, tinj+ ~]. When pt is smaller
than p,, the long session is likely to have later deadlines than
the interfering l-hop sessions at the beginning of its path, and
hence its packets are delayed. However, as the packet from the
long session moves further along its path, its deadline becomes
earlier in comparison to the deadlines of the l-hop sessions, and
hence it suffers less delay. This behavior of S-CEDF supports
the analytical bound of the additive form ~ + K.
Despite the fact that the long sessions with small pe have
much smaller end-to-end delay under S-CEDF than under WFQ,
the l-hop sessions do not suffer a great deal under S-CEDF. The
following table summarizes the mean delay of the 1-hop sessions.

0.77
&Q
S-CEDF

Pt.

I 0.03

0.1

0.7

0.2 I 0.3 I 0.4 I 0.5 I 0.(5


0.6
0.5
0.4
0.3
0.2

0.7

0.1

1.0
1.0
1.0
1.0
1.4
1.5
1.8
2.2
I 1.06 I 1.16 I 1.26 I 1.3 I 1.42 I 1.53 I 1.79 I 2.15
I

Variations of the above experiments are conducted. We first


vary the configuration of the network and the sessions. For example, instead of having one l-hop session at each link, we use
multiple l-hop sessions at each link where the total rates of these
l-hop sessions add up to 0.8 pt. As another example, we experiment with a ring of 40 nodes and 40 links. Multiple long
sessions wrap around the ring interfering with one another in
addition to the 1-hop sessions on each link. These experiments
yield similar results to those shown in Figures 3-6. (We omit the
plots here.)
We also vary the injection patterns at the source for the single long session configuration shown in Figure 12. Experiments
with a larger burst size, e.g. c = 10, yields plots similar to
Figures 3-6. A probabilistic on-off source with exponentially
distributed on and off times yields the plots in Figures 7-10.
We have results for
cipline. In this setting,
to the delays of WFQ,
multiplicative formula.

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

similar experiments using the FIFQ disthe delays produced by FIFQ are close
i.e. the delays can be approximated by a
(We omit the plots here.)

S,mF4e.CEDF

WFO
Icn

Km

so ,*

rale403
-3
role-0.05 me-o. ! *MA-O.15
.. ..

,
mla.o 03 4
raa-c.05
+
W.*-a. ! *-.
,.1-0.1s
.*..

90

80 -

80

70

70 t

64

.*
2

so

~
.
40

24

20

,0

0
0510152d

mm

:;woh~

35

40

45

Fig. 14. Multiple long sessions. Mean delay due to S-CEDF.


SLW48.CEDF
WI

Fig. 13. Multiple long sessions. Mean delay due to WFQ.


WFO
Ico

late-a. 1 +---

,, ..

rat6_m3
.lE=r.05
mm-o. 1
,ak=O.15

90

+
+
*-*-

60 -

70 -

+
$

60

...

so

:@

30 -

20

10 -

01
051015~

23404550

OL
O51OI52U

35

Lenglll o%ewkw

Lw!UI

Fig. 15. Multiple long sessions. 98%-percentile delay due to WFQ,

B. Multiple Long Sessions


We now consider a more complicated configuration. We use
a ring of 40 nodes, where neighboring nodes are connected by
8 links. Sessions with hops 1,5, 10, 15,20,25,
30,35 and 40
coexist and intefiere with one another in this network. The paths
and rates of these sessions are chosen as follows. We first choose
a set of 40-hop paths. Each path begins with a random node and
then follows the ring. Each hop of the path between two neighboring nodes can follow any of the 8 links between these nodes.
The choice is made randomly subject to the constraint that the
number of paths going through each link is the same. We now
cut some of these 40-hop paths into shorter paths. Some 40-hop
paths are divided into a 5-hop path and a 35-hop path, others
are divided into a 10-hop path and 30-hop path, etc. After this
process, the network has paths with lengths 5, 10, 15,...,40. We
also have some 1-hop paths. All sessions have the same rate. By
varying the number of the original 40-hop paths, we achieve the
desired session rates. Figures 13-16 summarize the performance
of WFQ and S-CEDF. As we can see, the curves for WFQ have
the multiplicative characteristic, although it is less pronounced
than in Figures 3 and 5. The curves for S-CEDF have the additive characteristic. We also observe that long sessions perfomn
better under S-CEDF than under WFQ, whereas
short sessions
perform marginally better under WFQ.
We finally note that the analytical bound for WFQ is a worstcase bound, and therefore can be overly conservative. In our
experiments, we have encountered situations in which WFQ behaves in a similar manner to S-CEDF, i.e. the additive form of

404550

.?Neiwmk=

Fig. 16. Multiple long sessions. 98%-percentile delay due to S-CEDF.

1 + K is more apparent. In one such experiment, we consider a


fine of41 nodes and 801inks, where neighboringnodes
are connected by double links. All sessions have 40 hops, starting from
the node on the left end and finishing at the node on the right
end. Each hop along the path of a session can follow either the
upper or the lower link. The choice is made randomly, subject
to the constraint that each link has an equal number of sessions
passing through it. All sessions have the same injection rate.
We vary the number of sessions in order to achieve the desired
session rate. Figures 17 and 19 illustrate the end-to-end delays
due to WFQ averaging over all the 40-hop sessions. These delays have little multiplicative behavior. This is because in this
network there is little contention among packets. S-CEDF produces similar end-to-end delays.
V. CONCLUSION
We have described a work-conserving
COORDINATED-EARLIEST-DEADLINE-FIRST

scheduling discipline
with end-to-end

delay bound,

+:~e+f *

%().

k=l

CEDF uses randomization and simple coordination to ensure


that once a packet passes through its first server it can pass
through all its subsequent servers quickly.
Under CEDF, a
session-i packet does not accumulate a delay of ~
over Ki
hops, and therefore its delay bound is smaller than that of the

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

Wm

la
MO

mk-o,os
ram-ao5
rate-a. 1
raleaO.i5
rab.az

130 -

+
+
*-.
.*--.

rakO.03
.
,anwkmo; ~

mfE=o.15
rab-az

120 -

.. ..
-.

01

or
05107524

53354Q45YJ

05101520

30954045m

S&=be

Sessimxkgth

Fig. 17. Double-link network. Mean delay due to WFQ.

Fig. 18. Double-lii network. Mermdelay due to S-CEDF.

Wm

simple.cEOF
154,

Ma

*30

lm

Fak=o,oa
rakO.05
Ie.tn.o, 1
.Ia&

+
-1--*-.
.g -

130
I=

110 Iw
90

.
.
.=
i
;/ ;

,.. ...
,,. ,..
/,/.

70
en

.../)

..-
50
..

=5

40
0
30
;
:
;/

. . . . . . ...2

/,.

~,,;25z5=--

20

10

01
O51O152V

0
05101520

w3540456a
Sw:slsm

Weighted Fair Quetteing discipline. We have also presented simulation results to show that the performance of CEDF and WFQ
can bo comparable to the analytical bounds.
The major open problem is to reduce the delay bound still further. The ultimate goal is a simple protocol with a delay bound,

ACKNOWLEDGMENTS

We thank Antonio Fermfindez, Mor Harchol-Baiter and Tom


Leighton fortheir help inearlier stages ofthis work. Antonio
Fern&tdez also provided many detailed comments on a preliminary clraft of this paper. We thank Jorg Liebeherr for his insight
on implementation issues.
REFERENCES

[2]
[3]

[4]

A. K. Parekb and R. G. Gallager. A generated processor sharing approach to flow control in integrated services networks: The single-node
case. IEEELACM Transactions on Networking, 1(3):344 -357, 1993.
A. K. Parekb and R. G. Galtager. A generalized processor sharing approach to flow control in integrated services networks: The multiple-node
case. IEEEZACM Transactions on Networking, 2(2):137 -150, 1994.
M. Andrews, A. Fem6rrdez,M. Harchol-Batter, T. Leighton, and L. Zhrmg.
Dynamic packet routing with per-packet delay guarantees of O(distance +
lkession rate). In Proceedings of the 38th Annual Symposium on Founoh
tions of Computer Science, pages 294 302, Miami Beach, FL, October
1!)97.
D. Ferrari and D. Venna. A scheme for rerd-time channel establishment in
wide-area networks. IEEE Journal on Selected Areas in Communications,
8(,3):368 379, A@t 1990.

a0a5404550
Se,sl.%m

Fig. 19. Double-1inknetwork. 98%-percentile delay due to WFQ.

[1]

.
+
+.**-

110 -

80

~=

mlE=O.w
[email protected]
R&0,
1
m!e=o.15
mko.z

140

Fig. 20. Double-link network. 98%-percentile delay due to S-CEDF.

D. Verma, H. Zhang, and D. Fermri. Guaranteeing delay jitter bounds in


packet switching networks. In Proceedings of Tricomm 91, Chapel Hill,
NC, April 1991.
[6] L, Georgiadis, R. Gu6rin, and A. Parekh. Optimal multiplexing on a single
link: delay and buffer requirements. IEEE Transactions on Information
Theory, 43(5):1518 -1535, September 1997.
[7] J. Llebeherr, D. Wrege, and D. Fermri. Exact admission control for networks with a bounded delay service. IEEE/ACM Transactions on Networking, 4(6):885 -901, December 1996.
[8] D. Wrege and J. Liebeherx A near-optimat packet scheduler for QoS networks. In Proceedings of IEEE INFOCOM 97, 1997.
[9] L. Georgiadis, R. Gu6rin, V. Peris, and K. Sivarajao. Efficient network
QoS provisioning based on per node tmffrc shaping. In Proceedings of
IEEE INFOCOM 96, pages 102 110, 1996.
[10] P. Goyal, S. Lam, and H. Vin. Determining end-to-end delay bounds in
heterogeneous networks. In Proceedings of the Fzfth International Work[5]

shop on Network and Operating System Support for Digital Audio and
Wdeo, pages 287 298, Jlurham, NH, April 1995.

[11] P. Goyal and H. Vln. Generahzed guaranteed rate scheduling algorithms:


A framework. Technical Report TR-95-30, University of Texas, Austin,
September 1995.
[12] D. Clark, S. Shenker, and L. Zhaog. Supporting real-time applications in
an integrated services packet network: Architecture and mechanism. In
Proceedings of ACM SIGCOMM 92, pages 14 26, August 1992.
[13] D. Yates, J. Kurose, D. Towsley, and M. Hluchyj. On per-session endto-end delay dktributions and the catl admission problem for real time
applications with QOS requirements. In Proceedings of ACM SIGCOMM
93, pages 2-12, 1993.
[14] S. J. Golestani. A framing strategy for congestion management. IEEE
Journal on Setected Areas in Communications,
9(7):1064
1077, September 1991.
[15] S. J. Golestani. Congestion-free communication in high-speed packet networks. IEEE Transactions on Communications, 39(12):1802 1812, December 1992.
[16] M. Grossglauser and S. Keshav. On CBR service. In Proceedings of IEEE
INFOCOM 96, pages 129-136, 1996.
[17] F. T. Leighton, B. M. Maggs, and S. B. Rae. Packet routing and job-shop

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

scheduling in O(congestion + dilation) steps. Cornbirsatorica, 14(2):167

Let

186, 1993.

[18] F. T. Leighton, B. M. Maggs, and A. W. Richa. Fast algorithms for finding


O(congestion + dilation) packet routing schedules. Tecbrdcatreport CMUCS-96-152, Carnegie Mellon University, 1996.
[19] Y. Rabani and E. Tardos. Distributed packet switching in arbitrary networks. In Proceedings of the 28th Annual ACM Symposium on Theo~ of
Computing, Philadelphia, PA, May 1996.
[20] R. Ostrovsky and Y.Rabani. Local control packet switching afgorithm. In

[21]
[22]
[23]

[24]
[25]

[26]

[27]

APPENDIX

PROOF OF THEOREM

Consider a server m and a time interval 1. Let P be the set


of packets that have a deadline for server m in interval 1. If the
total size of the packets in P is x, then we say that 1 services z
bits at server m.
Lemma 2: Consider any server m and any time interval I =
[t G~, t], where t is a potential deadline for some session at
server m. With high probability, any such interval 1 services
fewer than Gmr-m bits at server m.
Proof Let X; be the number of session-i bits that 1 services at server m. The expectation of Xi, 13[X~], is at most
@*Gm. This is because one session-i token is placed at random
in each of the intervals [0, Ti), [Ti, 2Ti), etc., and the deadlines
for each session are a fixed amount of time after the tokens. In
addition, each token is consumed by at most Si bits. Let N be
the set of sessions whose paths pass through m. By linearity of
expectation,
<(l

loge

Recall

-s3(l-c)rw

that

Gm

G_/(48.L~.X)

(:2)(

(n:z)

<

1 Psuc.

($:.)3(1-)48

We can choose PSU,, the success probability


be close to 1.

of the protocol, to
H

Lemma 3: If the assumption in Lemma 2 holds, then every


packet meets all its deadlines.
Proof For the purpose of contradiction, let D be the first
deadline that is missed. This implies that all deadlines earlier
than D are met. Let p be the packet that misses deadline D for
server m. Suppose that packet p has length !P, Since packet p
meets its previous deadlines, it must be waiting at server m at
time D Gm. Hence, server m is servicing other packets from
time D Gm to D lP/rm. Let p be such a packet, then p must
have a deadline D < D by the definition of EDF. Moreover,
D ~ D Gm since D is the first deadline missed. Hence, the
total size of packets that have deadlines in [D Gm, D] is at
least rmGm. This contradicts the assumption of Lemma 2.
H
Lemma 2 and Lemma 3 imply that each session-i packet p
reaches its destination by time T + ~~~1 Gmi. TO complete our
analysis, we upper bound T as follows.
Lemma 4: For each session-i packet p injected at tinj, we
haver < tinj+ : + ~.
Proof Let tobe the last time before tinjthat no session-i
packet is waiting to obtain a token. During (to,
-r)
every sessioni token must consume packets injected during (to,tinj] onlyand
each token must consume more than Si Li bits. Otherwise,
either (t., tinj
) contains a time when no session-i packet is waiting or p would obtain a token before r. The total number of bits
injected during (t., tinj] is at most,
ai + (tinj

c/2)rmGm.

A Chernoff-type argument shows that Pr [&N


Xi ~ rmG~]
is small. (We omit details here, since the calculation is standard.) In particular,

l~.uc

By a union bound argument, the prob=ge(=)


ability that some server m services at least Gmrm bits during
some interval 1 is at most,

Proceedings of the 29th Annual ACM Symposium on Theory of Computing,

May 1997.
S. Keshav. An engineering approach to computer networking. Addison
Wesley, Reading, MA, 1997.
H. Zhang. Service disciplines for guaranteed performance service in
packet-switching networks. In Proceedings of IEEE, October 1995.
R. L. Cruz. A catculus for network delay, Part I: Network elements in
isolation. IEEE Transactions on Information Theory, pages 114-131,
1991.
R. L. Cruz. A catculus for network delay, Part 11:Network anatysis. IEEE
Transactions on Information Theory, pages 132-141, 1991.
A. Demers, S. Keshav, and S. Shenker. Anatysis and simulation of a fair
queueing rdgorithm. Journal of Interneiworking: Research and Experience, 1:3 26, 1990.
A. BanerjeA D. Ferrari, B. Mah, M. Moran, D. Verma, and H. Zhang. The
Tenet reaf-time protocol suite: Design, implementation, and experiences,
IEEEZACM Transactions on Networking, 4(1):1 11, February 1996.
D. Stiliadis. Traf?7cscheduling in packet-switched networks: analysis design and implementation. PhD thesis, UCSC, 1996.

to)~i.

The total number of session-i tokens during (to,T] is at least


$,T. Therefore, the total number of session-i bits consumed
during [t.,
T]is at least,
7-$,-

i(Si

- .Li).

Hence,
Since the token placement is periodic with period M, we only
need to consider a fixed time period of length M. For each
server m, only M/Ti intervals I = [t Gm, t] can have t as
a deadline for a session-i packet in that time period. There are n
servers in the network. Hence, the total number of such intervals
I is,

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

You might also like