Sizing Router Buffer
Sizing Router Buffer
a dissertation
submitted to the department of computer science
and the committee on graduate studies
of stanford university
in partial fulfillment of the requirements
for the degree of
doctor of philosophy
Guido Appenzeller
March 2005
c Copyright by Guido Appenzeller 2004
All Rights Reserved
ii
I certify that I have read this dissertation and that, in
my opinion, it is fully adequate in scope and quality as a
dissertation for the degree of Doctor of Philosophy.
Nick McKeown
(Principal Adviser)
Balaji Prabhakar
Dawson Engler
iii
Abstract
All Internet routers contain buffers to hold packets during times of congestion. Today,
the size of the buffers is determined by the dynamics of TCP’s congestion control
algorithms. To keep link utilization high, a widely used rule-of-thumb states that
each link needs a buffer of size B = RT T × C, where RT T is the average round-
trip time of a flow passing across the link, and C is the data rate of the link. For
example, a 10Gb/s router linecard needs approximately 250ms × 10Gb/s = 2.5Gbits
of buffers; and the amount of buffering grows linearly with the line-rate. Such large
buffers are challenging for router manufacturers, who must use large, slow, off-chip
DRAMs. And queueing delays can be long, have high variance, and may destabilize
the congestion control algorithms.
In this thesis we argue that the rule-of-thumb (B = RT T × C) is now outdated
and incorrect for backbone routers. This is because of the large number of flows (TCP
connections) multiplexed together on a single backbone link. Using theory, simulation
and experiments on a network of real routers, we show that a link with n long lived
√
flows requires no more than B = (RT T × C)/ n, for long-lived or short-lived TCP
flows.
The consequences on router design are enormous: A 2.5Gb/s link carrying 10,000
flows could reduce its buffers by 99% with negligible difference in throughput; and a
10Gb/s link carrying 50,000 flows requires only 10Mbits of buffering, which can easily
be implemented using fast, on-chip SRAM.
iv
Acknowledgements
Writing a thesis is not possible without the advice, support and feedback from re-
searchers, collaborators, colleagues and friends.
First and foremost I would like to acknowledge Nick McKeown. He not only had
an incredible instinct for suggesting a thesis topic that is both relevant to practitioners
and academically deep, he also mentored me through the process with critical feedback
and advice. It is hard to imagine a better advisor than Nick. I am very grateful to
him.
Isaac Keslassy was an invaluable collaborator and great fun to work with. He
kept asking the right questions at the right time, and provided major value with his
superiour mathematical background.
Matthew Holliman helped me with the early experiments and deserves credit for
being the first to suggest to look for a CLT based model. Many thanks go to Yashar
Ganjali, Sundar Iyer, Shang-Tse Chuang, Martin Casado, Rui Zhang and Nandita
Dukapi, Greg Watson, Ichiro Okajima and everyone else in the HPNG for discussions
of my results while they were being developed.
The experiemnts on the GSR would not have been possible without the help of
Joel Sommers and Paul Barford. Their expertise in setting up experiments with
physical routers was a huge help.
Balaji Prabhakar and members of his research group gave valuable feedback on
early results. I would also like to thank the other members of my orals comittee
Dawson Engler, Nick Bambos and David Cheriton for their hard questions, they
helped refine the written dissertation.
Early talks with Sally Floyd and Frank Kelly helped identify the issues and the
v
existing knowledge in buffer research. Their feedback on the final results was a major
step towards this work’s conclusions. I would also like to thank the reviewers of the
SIGCOMM article and specifically Craig Partridge for their valuable feedback.
Sunya Wang, Wayne Sung and Lea Roberts from the Stanford Networking team
were extremely helpful in implementing the experiments on the Stanford Network and
lending me equipment to get an understanding of routers. Many thanks go to them.
The team and investors at Voltage made it possible for me to complete my Ph.D.
in parallel, I am very grateful for their patience.
Last but not least my thanks to Isabelle, who supported me in good and bad
times, and was a calming influence when the Ph.D. stress was threatening to get the
upper hand.
vi
Contents
Abstract iv
Acknowledgements v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Buffer Size and Router Design . . . . . . . . . . . . . . . . . . 2
1.1.2 Buffer Size and Latency . . . . . . . . . . . . . . . . . . . . . 4
1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Long Flows 21
3.1 Synchronized Long Flows . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
3.1.1 When are Flows Synchronized? . . . . . . . . . . . . . . . . . 23
3.2 Desynchronized Long Flows . . . . . . . . . . . . . . . . . . . . . . . 25
viii
6.1.2 The Cisco Catalyst 12000 Router . . . . . . . . . . . . . . . . 63
6.1.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 64
6.2 Stanford Network Experiment . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Introduction and Setup . . . . . . . . . . . . . . . . . . . . . . 69
6.2.2 The Cisco 7200 VXR . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.3 Experiment and Results . . . . . . . . . . . . . . . . . . . . . 79
7 Conclusion 84
A 86
A.1 Summary of TCP Behavior . . . . . . . . . . . . . . . . . . . . . . . 86
A.2 Behavior of a Single Long TCP Flow . . . . . . . . . . . . . . . . . . 87
A.3 Queue Distribution using
Effective Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.4 Configuration of ns2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.5 Cisco IOS Configuration Snippets for the VXR . . . . . . . . . . . . . 94
Bibliography 96
ix
List of Figures
3.1 Time Evolution of two TCP flows sharing a Bottleneck Link in ns2 . 22
3.2 Probability Distribution of the Sum of the Congestion Windows for
Desynchronized Flows sharing a Bottleneck Link from an ns2 simulation 26
3.3 Plot of Wi (t) of all TCP flows, and of the queue Q offset by 10500
P
packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Buffer Requirements vs. number of Flows . . . . . . . . . . . . . . . . 30
5.1 Utilization (top) and Goodput (bottom) vs. Number of Flows for dif-
ferent buffer sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Amount of Buffering Required for different Levels of Utilization (Top)
2Tp ×C
and Example of the √
n
rule for a low bandwidth link (Bottom). . . 44
5.3 Average Queue Length for Short Flows at three Different Bandwidths 46
x
5.4 Effect of the Access Link Bandwidth on the Queue Length . . . . . . 47
5.5 Average Flow Completion Time for short flows as a function of the
Load of the Bottleneck Link and the amount of Buffering. . . . . . . 49
5.6 Minimum Required Buffer for a Mix of 20% Short Flows and 80% Long
Flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.7 Average Flow Completion Time for Large (2Tp × C) and Small ( 2T√p ×C
n
)
Buffers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.8 Utilization (top) and Average Flow Completion Times (bottom) for
different Ratios of Long and Short Flows . . . . . . . . . . . . . . . . 55
5.9 Number of Flows (Top), Utilization, Queue Length and Drops (Bot-
tom) for Pareto Distributed Flow lengths with large Buffers of 2Tp × C 56
5.10 Number of Flows (Top), Utilization, Queue Length and Drops (Bot-
2Tp ×C
tom) for Pareto Distributed Flow lengths with small Buffers of √
n
58
6.1 Comparison of our model, ns2 simulation and experimental results for
buffer requirements of a Cisco GSR 12410 OC3 line card. . . . . . . . 65
6.2 Short Flow Queue Distribution of 62 packet flows measured on a Cisco
GSR compared to model prediction . . . . . . . . . . . . . . . . . . . 66
6.3 Short Flow Queue Distribution of 30 packet flows measured on a Cisco
GSR compared to model prediction . . . . . . . . . . . . . . . . . . . 67
6.4 Short Flow Queue Distribution of 14 packet flows measured on a Cisco
GSR compared to model prediction . . . . . . . . . . . . . . . . . . . 68
6.5 “average” sending rate of a router as measured by IOS. Actual sending
pattern was an on/off source. The reported byte based rate and the
packet based rate differ substantially. . . . . . . . . . . . . . . . . . . 77
6.6 CDF of the class-based-shaping queue length reported by IOS on a
VXR router. The maximum queue length was configured to be 40
packets, however the router reports queue lengths above this value. . 78
6.7 Packet length statistics for the router in the experiment. Packet lengths
are in bytes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.8 Long term netflow statistics for the router used in the experiment . . 80
xi
6.9 Utilization data from the router measured during the experiment. The
buffer includes an extra 45 packets due to the minimum size of the
toekn buffer that in this configuration acts like an additional buffer. . 83
xii
Chapter 1
Introduction
1.1 Motivation
Internet routers are packet switches and employ buffers to hold packets during times of
congestion. Arguably, router buffers are the single biggest contributor to uncertainty
in the Internet. Buffers cause queueing delay and delay-variance. When they overflow,
they cause packet loss, and when they underflow, they can degrade throughput. Given
the significance of their role, we might reasonably expect the dynamics and sizing of
router buffers to be well understood, based on a well-grounded theory, and supported
by extensive simulation and experimentation. This is not so.
Router buffers are sized today based on a rule-of-thumb commonly attributed to a
1994 paper by Villamizar and Song [43].1 Using experimental measurements of up to
eight TCP flows on a 40 Mb/s link, they concluded that a router needs an amount of
buffering equal to the round-trip time of a typical flow that passes through the router,
multiplied by the capacity of the router’s network interfaces. This is the well-known
B = RT T × C rule. We will show later that for a small number of long-lived TCP
flows, the rule-of-thumb makes sense.
Network operators follow the rule-of-thumb and require router manufacturers to
provide 250ms (or more) of buffering [38]. The rule is also found in architectural
guidelines [9]. Requiring such large buffers complicates router design and is a big
1
While attributed to this paper, it was already known to the inventors of TCP [25]
1
CHAPTER 1. INTRODUCTION 2
impediment to building routers with larger capacity. For example, a 10Gb/s router
line card needs approximately 250ms × 10Gb/s = 2.5Gbits of buffers; and the amount
of buffering grows linearly with the line-rate.
Given the effect of buffer size on network performance and router design, it is
worth asking if the rule-of-thumb still holds. Today, backbone links commonly op-
erate at 2.5Gb/s or 10Gb/s and carry well over 10,000 flows [18]. Quantitatively,
current backbone links are very different from 1994 and we might expect them to
be qualitatively different too. It seems fair to say that it is currently not well un-
derstood how much buffering is actually needed or how buffer size affects network
performance [15].
It is worth asking why we care to size router buffers accurately. With declining
memory prices, why not just over-buffer routers and not run the risk of losing link
utilization? We believe over-buffering is a bad idea for two reasons. First, it has
architectural implications on high-speed routers, leading to designs that are more
complicated, consume more power, and occupy more board space. Second, over-
buffering increases end-to-end delay in the presence of congestion. This is especially
the case with TCP, as a single TCP flow in the absence of other constraints will
completely fill the buffer of a bottleneck link. In such a case, large buffers conflict with
the low-latency needs of real time applications (e.g. video games, and device control).
In some cases, large delays can make congestion control algorithms unstable [28] and
applications unusable.
card would require more than 300 chips and consume 2.5kW, making the board too
large, too expensive and too hot. If instead we try to build the line card using DRAM,
we would need ten devices consuming only 40W. This is because DRAM devices are
available up to 1Gbit and consume only 4mW/Mbit. However, DRAM has a random
access time of about 50ns, which is hard to use when a minimum length (40byte)
packet can arrive and depart every 8ns. Worse still, DRAM access times fall by only
7% per year [21]; thus, the problem is going to get worse as line-rates increase in the
future.
In practice, router line cards use multiple DRAM chips in parallel to obtain the
aggregate data-rate (or memory-bandwidth) they need. Packets either are scattered
across memories in an ad-hoc statistical manner, or use an SRAM cache with a refresh
algorithm [24]. Either way, a large packet buffer has a number of disadvantages: it
uses a very wide DRAM bus (hundreds or thousands of signals) with a large number of
fast data pins (network processors and packet processor ASICs frequently have more
than 2,000 pins making the chips large and expensive). Such wide buses consume
large amounts of board space and the fast data pins on modern DRAMs consume too
much power.
In summary, it is extremely difficult to build packet buffers at 40Gb/s and beyond.
Given how slowly memory speeds improve, this problem is going to get worse over
time.
Substantial benefits could be gained from using significantly smaller router buffers,
particularly if it was possible to use SRAM. For example, it would be feasible to build
a packet buffer using about 512Mbits of off-chip SRAM (16 devices). If buffers of 5%
of the delay-bandwidth product were sufficient, we could use SRAM to build buffers
for a 40 Gb/s line card.
The real opportunity, however, lies in placing the memory directly on the chip
that processes the packets (a network processor or an ASIC). In this case, very wide
and fast access to a single memory is possible. Commercial packet processor ASICs
have been built with 256Mbits of “embedded” DRAM. If memories of 2% the delay-
bandwidth product were acceptable, then a single-chip packet processor would need
no external memories. We will present evidence later that buffers this small would
CHAPTER 1. INTRODUCTION 4
RT T × C
TQ = = RT T .
C
An additional queueing delay of one RTT means that in the case of congestion, a
router that is sized using the rule-of-thumb will double the latency of any flow going
through it. If there are several points of congestion on a flow’s path, each congested
router will incur an additional RT T worth of queueing delay.
Additional latency affects real-time applications such as online gaming, IP tele-
phony, video conferencing, remote desktop or terminal based applications. IP tele-
phony typically requires a round-trip latency of less than 400ms. For competitive
online gaming, latency differences of 50ms can be decisive. This means that a con-
gested router that is buffered using the rule-of-thumb will be unusable for these appli-
cations. This is the case even if the loss rate of the router is still very small. A single,
congestion-aware TCP flow that attempts to “just fill” the pipe will be sufficient to
make online gaming or IP telephony on the router impossible.
ISP’s routers are typically substantially over-buffered. For example, we measured
up to five seconds of queueing delay at one router, making web surfing extremely
cumbersome and any interactive applications all but impossible. The packet loss
rate, however, was still well below 1%. With smaller buffers, this router could have
remained usable in this case of congestion. An ISP using such an overbuffered router,
has no choice but to overprovision its network, as congestion would immediately lead
CHAPTER 1. INTRODUCTION 5
more expensive to buffer at high line-rates. Instead, the access devices that serve
slow, last-mile access links of less than 1Mb/s should continue to include about 7
packets per flow worth of buffering for each link. With line speeds increasing and the
MTU size staying constant, we would also assume this issue to become less relevant
in the future.
Avrachenkov et al [6] present a fixed-point model for utilization (for long flows) and
flow completion times (for short flows). They model short flows using an M/M/1/K
model that accounts for flows but not for bursts. In their long flow model, they use
an analytical model of TCP that is affected by the buffer through the RTT. As the
model requires fixed-point iteration to calculate values for specific settings and only
one simulation result is given, we cannot compare their results directly with ours.
We start by modeling how a single TCP flow interacts with a router. Doing so will
not only show where the rule-of-thumb for buffer sizing comes from, it will also give
us the necessary tools to analyze the multi-flow case in Chapter 3. In Sections 2.1
and 2.2 we examine the case of a router that has the right amount of buffering vs.
a router that has too much or too little buffering. This will confirm that the rule-
of-thumb does hold for a single flow through a router. We then formally prove the
rule-of-thumb using two different methods. In Section 2.3 we will do it based on rates
and in Section 2.4 by accounting for outstanding packets.
Consider the topology in Figure 2.1 with a single sender and one bottleneck link.
The sender is sending an unlimited amount of data using TCP. We define:
7
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 8
The TCP sending rate is controlled by the congestion window W (for a brief
summary of how TCP’s congestion control algorithm works see Appendix A.1).
For this experiment, we assume that there is no congestion on the reverse path
and that the capacity of the access link is higher than the capacity of the bottleneck
link C 0 > C. We also assume that the window size and the sending rate of the TCP
flow are not limited.
For simplicity, we will express data (Q, B, W ) in packets and rates (U , R) in
packets per second. This is a simplification as TCP effectively counts bytes and
packets might have different lengths. Buffers of real routers may be organized as
packets or smaller units (see Section 6.2), however, in practice, a flow sending at the
maximum rate will behave close to this simplified model as it will primarily generate
packets of the MTU size.
The RT T that a flow experiences is the two-way propagation delay, plus the
queueing delay TQ from the router queue:
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 9
Q
RT T = 2Tp + TQ = 2Tp + .
C
The sending rate of the TCP flow is well known to be the window size divided by
the round trip time [42]:
W W W
R= = = Q. (2.1)
RT T 2Tp + TQ 2Tp + C
300
Window [pkts]
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
300
250
Queue [pkts]
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
0.3
0.25
RTT [ms]
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60 70 80 90 100
1000
Rate [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
1000
Utilizatin [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
Figure 2.2: A single TCP flow through a single router with buffers equal to the
delay-bandwidth product (142 packets).
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 11
utilized. This means that the router is sending packets at a rate C ≈ 1000 packets
per second. These packets arrive at the receiver which in turn sends ACK packets at
the same rate of 1000 packets/s. The sender receives these ACK packets and will, for
each packet received, send out a new data packet at a rate R that is equal to C, i.e.
the sending rate is clocked by the bottleneck link. (There are two exceptions to this,
when packets are dropped and when the window size is increased. If a packet was
dropped and a sequence number is missing, the sender may not send new data. We’ll
treat this case below.) Once per RTT, the sender has received enough ACK packets
to increase its window size by one. This will cause it to send exactly one extra packet
per RTT. For this work we will treat this extra packet seperately, and not consider it
as part of the sending rate. To summarize, for a sufficiently buffered router, a TCP
sender sends (not counting the extra packet from a window increase) at a constant
rate R that is equal to C:
R=C (2.2)
That this is the case in practice can be easily confirmed by looking at Figure 2.2.
This is compatible with our with our Rate Equation 2.1. The RT T follows a sawtooth
pattern that matches that of the window size and at any given time we have
W ∼ RT T
The reason for this can be seen by looking at the queue length Q in Figure 2.2. The
router receives packets from the sender at a rate of R = C, and drains its queue at
the same rate C. If W is increase by one, this causes an extra packet to be sent that
Q
increases Q by one. As RT T = 2Tp + TQ = 2Tp + C
this the increase of W by one
W
will cause an increase of RT T and RT T
is constant.
300
Window [pkts]
250
200
150
100
50
0
35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36
300
250
Queue [pkts]
200
150
100
50
0
35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36
1000
Rate [pkts/s]
800
600
400
200
0
35.2 35.3 35.4 35.5 35.6 35.7 35.8 35.9 36
Time [seconds]
missing sequence number) takes some time to travel back to the sender. For a complete
description of how TCP scales down, see [42], but the effect is that at t = 35.44, the
sender halves its window size and stops sending for exactly one RT T . During this
RT T , the bottleneck link is now serviced from packets stored in the router. For a
router with buffers of 2Tp × C, the first new packet from the sender will arrive at the
router just as it sends the last packet from the buffer, and the link will never go idle.
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 13
300
Window [pkts]
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
300
250
Queue [pkts]
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
0.3
0.25
RTT [ms]
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60 70 80 90 100
1000
Rate [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
1000
Utilizatin [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
400
350
Window [pkts]
300
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
400
350
Queue [pkts]
300
250
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
0.35
0.3
0.25
RTT [ms]
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60 70 80 90 100
1000
Rate [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
1000
Utilizatin [pkts/s]
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
This is the familiar rule-of-thumb. While not widely known, similar arguments
were made elsewhere [12, 35], and our result can be easily verified using ns2 [1]
simulation and a closed-form analytical model [5].
The rule-of-thumb essentially states that we need enough buffers to absorb the
fluctuation in TCP’s window size.
One interesting observation is that the amount of buffering directly depends on
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 17
the factor TCP uses for multiplicative decrease. For TCP Reno, this is 21 . Generally,
if TCP scales down its window as W → W (1 − n1 ), the amount of buffering that is
required is
1
B= 2Tp × C.
n−1
Other types of TCP use smaller factors and require smaller buffers. For example
High-Speed TCP [16] uses an adaptive mechanism that will require much smaller
buffers for large window sizes.
TCP flavors that use latency (and not drops) to detect congestion [8, 26] have very
different buffer requirements and are based on a fixed amount of buffering needed per
flow, independent of the delay-bandwidth-product.
A packet that is dropped will only be counted as dropped as long as the sender hasn’t
detected the drop yet. Once detected, the dropped packet is no longer counted as
outstanding.
Assuming we have full utilization of the bottleneck link, we know that packets
leave the sender as well as the router at a constant rate of C. The number of packets
in transit with an empty buffer is 2Tp × C. The number of packets in the buffer is
the length of the queue Q(t).
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 18
The number of outstanding packets is commonly equated with the window size.
This is not entirely correct as when the window-size is halved, the window size differs
substantially from the actual number of outstanding packets [42]. However, both W
and the number of outstanding packets share the increase and multiplicative decrease
behavior. As the common intuition on W captures all of the essential aspects of the
number of outstanding packets, we will use W for the number of outstanding packets
1
for the remainder of this work.
In summary, we can write the equation of outstanding packets
This equation without losses holds well in practice over a wide range of TCP
settings, for multiple flows (in this case we substitute W (t) by the sum of the windows
of all flows W i (t)), mixes of long and short flows and other network topologies.
P
Figure 2.6 shows a link that receives bursty TCP traffic from many flows and is
congested over short periods of time. For this graph, we plot utilization multiplied by
the two-way propagation delay 2Tp × U to allow a uniform scale on the y axis. When
t < 0.4 utilization is below 100% and 2Tp × U ≈ W i (t). This makes sense as we
P
have one RT T (with empty buffers is RT T = 2Tp ) worth of traffic at rate U in transit
at any given time and the buffer is still empty. For periods of full link utilization (e.g.
from 3 to 4 seconds), we have W i (t) = 2Tp × C + Q. The second observation is that
P
W i (t) < 2Tp × C is a necessary and sufficient condition for the utilization dropping
P
below 100%.
Using Equation 2.4, we can now also easily derive the rule-of-thumb. At the time
before and after the window size is reduced, no packets are dropped and the network
is “filled” with packets (i.e. no link is idle). The capacity of the network is 2Tp × C
packets. We therefore need a minimum window size (after decrease) of
Wmin = 2Tp × C
1
For ns2 simulations we measured the actual number of outstanding packets as the difference
between highest sequence number sent and highest sequence number acknowledged. This again will
be referred to as W .
CHAPTER 2. A SINGLE TCP FLOW THROUGH A ROUTER 19
1000
Aggregate Window W
Q + 2T*C
Link Utilization
100% Utilization
900
800
Packets
700
600
500
400
0 1 2 3 4 5 6
time [seconds]
Figure 2.6: Comparison of the sum of window sizes W i (t) and 2Tp × C + Q(t)
P
and for the maximum window size, the extra packets have to be in the router buffer
Wmax = 2Tp × C + B.
.
We know the maximum window size is half the minimum window size Wmax =
2Wmin . We substitute and find again the rule-of-thumb:
2.5 Summary
To summarize this Chapter, the role of a router buffer is to absorb the fluctuation
in TCP window size and the fluctuation in the number of packets on the network.
Because TCP in congestion avoidance mode varies its window size between Wmax and
1
2
Wmax , the number of packets on the network varies by a factor of two, and conse-
quently, the buffer needs to hold an amount of packets that’s equal to the capacity of
the network. For one or a small number of flows, this capacity is one delay bandwidth
product as correctly observed by Villamizar and Song [43].
Chapter 3
Long Flows
In a backbone router, many flows share the bottleneck link simultaneously. For ex-
ample, a 2.5Gb/s (OC48c) link typically carries more than 10,000 flows at a time [18].
This should not be surprising: A typical user today is connected via a 56kb/s modem,
and a fully utilized 2.5Gb/s can simultaneously carry more than 40,000 such flows.
When it’s not fully utilized, the buffers are barely used, and the link isn’t a bottle-
neck. Therefore, we should size the buffers to accommodate a large number of flows.
So, how should we change our model to reflect the buffers required for a bottleneck
link with many flows? We will consider two situations. First, we will consider the
case when all the flows are synchronized with each other, and their sawtooths march
in lockstep perfectly in-phase. Then, we will consider flows that are not synchronized
with each other, or are at least, not so synchronized as to be marching in lockstep.
When they are sufficiently desynchronized — and we will argue that this is the case
in practice — the amount of buffering drops sharply.
21
CHAPTER 3. LONG FLOWS 22
300
Flow 1
Window [pkts]
250
Flow 2
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
300
250
Queue [pkts]
200
150
100
50
0
0 10 20 30 40 50 60 70 80 90 100
1000 Flow 1
Rate [pkts/s]
800 Flow 2
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
time [seconds]
Figure 3.1: Time Evolution of two TCP flows sharing a Bottleneck Link in ns2
studied tendency of flows sharing a bottleneck to become synchronized over time [35,
4].
A set of precisely synchronized flows has the same buffer requirements as a single
flow. Their aggregate behavior is still a sawtooth; as before, the height of the sawtooth
is dictated by the maximum window size needed to fill the round-trip path, which
is independent of the number of flows. Specifically, assume that there are n flows,
each with a congestion window W i (t) at time t, and end-to-end propagation delay Tpi ,
where i = [1, ..., n]. The window size is the maximum allowable number of outstanding
packets, so from Equation 2.4, we have
n
W i (t) = 2T p × C + Q(t)
X
(3.1)
i=1
CHAPTER 3. LONG FLOWS 23
where Q(t) is the buffer occupancy at time t, and T p is the average1 propagation
delay. As before, we can solve for the buffer size by considering two cases: just before
and just after packets are dropped. First, because they move in lock-step, the flows
all have their largest window size, Wmax at the same time; this is when the buffer is
full, so
n
i
X
Wmax = Wmax = 2T p × C + B. (3.2)
i=1
Similarly, their window size is smallest just after they all drop simultaneously [35].
If the buffer is sized so that it just goes empty as the senders start transmitting after
the pause, then
n
X
i Wmax
Wmin = = 2T p × C. (3.3)
i=1 2
authors find synchronization in ns2 for up to 1000 flows as long as the RT T variation
is below 10%. Likewise, we found in our simulations and experiments that while in-
phase synchronization is common for less than 100 concurrent flows, it is very rare
above 500 concurrent flows.2
On real networks, synchronization is much less frequent. Anecdotal evidence of
synchronization [3] is largely limited to a small number of very heavy flows. In
a laboratory experiment on a throttled shared memory router, we found that two
typical TCP flows were not synchronized, while an identical ns2 setup showed perfect
synchronization. On large routers in the core of the network, there is no evidence
of synchronization at all [18, 23]. Most analytical models of TCP traffic operate at
a flow level and do not capture packet-level effects. They are therefore, no help in
predicting the conditions under which synchronization occurs. The only model we
know of [13] that predicts synchronization does not give clear bounds that we could
use to guess when it occurs.
Today, we don’t understand fully what causes flows to synchronize and to what
extent synchronization exists in real networks. It seems clear that the ns2 simulator
with long-lived flows does not correctly predict synchronization on real networks. This
is not surprising as ns2 in itself introduces no randomness into the traffic, while real
networks have a number of such sources (e.g. shared memory routers, shared medium
collisions, link-level errors, end host service times, etc.).
It is safe to say though, that flows are not synchronized in a backbone router car-
rying thousands of flows with varying RT T s. Small variations in RT T or processing
time are sufficient to prevent synchronization [33]; and the absence of synchronization
has been demonstrated in real networks [18, 23]. Although we don’t precisely under-
stand when and why synchronization of TCP flows takes place, we observed that
for aggregates of more than 500 flows with varying RT T s, the amount of in-phase
synchronization decreases even in ns2. Under such circumstances, we can treat flows
as being not synchronized at all.
2
Some out-of-phase synchronization (where flows are synchronized but scale down their window
at different times during a cycle) was visible in some ns2 simulations with up to 1000 flows. How-
ever, the buffer requirements are very similar for out-of-phase synchronization as they are for no
synchronization at all.
CHAPTER 3. LONG FLOWS 25
2
E[Wi ] = µW var[Wi ] = σW .
Now, the central limit theorem gives us the distribution of the sum of the window
sizes as
X √
Wi (t) → nµW + nσW N (0, 1).
Figure 3.2 shows that indeed, the aggregate window size does converge to a Gaus-
sian process. The graph shows the probability distribution of the sum of the conges-
tion windows of all flows W = Wi , with different propagation times and start times
P
as explained in Chapter 5.
CHAPTER 3. LONG FLOWS 26
0.035
PDF of Aggregate Window
Normal Distribution N(11000,400)
0.03
Buffer = 1000 pkts
0.025
0.02
Probability
Q=0 Q>B
link underutilized packets dropped
0.015
0.01
0.005
0
9500 10000 10500 11000 11500 12000 12500
Packets
Figure 3.2: Probability Distribution of the Sum of the Congestion Windows for Desyn-
chronized Flows sharing a Bottleneck Link from an ns2 simulation
With this model of the distribution of W , we can now find the necessary amount
of buffering. From the window size process, we know from Equation 2.4 that the
queue occupancy at time t is
n
X
Q(t) = Wi (t) − (2Tp × C) − ∆drop (3.4)
i=1
This equation assumes full utilization and an unlimited buffer. What happens
with a limited buffer can be seen in Figure 3.3. In the graph, we plotted the queue
length against the right side of Equation 3.4. As we have many desynchronized flows,
the sum of the TCP windows fluctuates rapidly at time scales below or around one
RT T . The two dotted lines are for Q = 0 and Q = B, respectively. We can see that
CHAPTER 3. LONG FLOWS 27
11500
Buffer B
10500
10000
50 52 54 56 58 60 62 64
time [seconds]
Figure 3.3: Plot of Wi (t) of all TCP flows, and of the queue Q offset by 10500
P
packets.
the above equation holds very well as long as the number of outstanding packets plus
2Tp × C is within the buffer. If the queue drops below the lower line, our utilization is
below 100%. If we are above the upper line, the buffer overflows and we drop packets.
The goal of picking the buffer is to make sure both of these events are rare and most
of the time 0 < Q < B.
Equation 3.6 tells us that if W has a Gaussian distribution, Q has a Guassian
distribution shifted by a constant (of course, the Guassian Distribution is restricted
to the allowable range of Q). This is very useful because we can now pick a buffer
size and know immediately the probability that the buffer will underflow and lose
throughput.
Because it is Gaussian, we can determine the queue occupancy process if we know
CHAPTER 3. LONG FLOWS 28
its mean and variance. The mean is simply the sum of the mean of its constituents.
To find the variance, we’ll assume for convenience that all sawtooths have the same
average value (assuming different values would not fundamentally change the result).
For a given network link, the overall bandwidth, and thus the sum of the congestion
windows W , will be the same for any number of flows n. If we denote the average
P
2
E[W ] = µn=1 var[W ] = σn=1 ,
then if n flows share this link, the mean and variance become
2
µn=1 σn=1
E[W ] = var[W ] =
n n
X µW √ σW N (0, 1)
Wi (t) → n + n (3.5)
n n
1
= µn=1 + √ σn=1 N (0, 1). (3.6)
n
1 4 2 1
σWi =√ Wi − Wi = √ Wi
12 3 3 3 3
W 2T p × C + Q 2T p × C + B
Wi = = ≤ .
n n n
For a large number of flows, the standard deviation of the sum of the windows, W , is
given by
√
σW ≤ nσWi ,
CHAPTER 3. LONG FLOWS 29
Now that we know the distribution of the queue occupancy, we can approximate
the link utilization for a given buffer size. Whenever the queue size is below a thresh-
old, b, there is a risk (but not guaranteed) that the queue will go empty, and we
will lose link utilization. If we know the probability that Q < b, then we have an
upper bound on the lost utilization. Because Q has a normal distribution, we can use
the error-function3 to evaluate this probability. Therefore, we get the following lower
bound for the utilization
√
3 3 B
√
U til ≥ erf . (3.7)
2 2 2T p√
×C+B
n
This means that we can achieve full utilization with buffers that are the delay-
bandwidth product divided by the square-root of the number of flows, or a small
multiple thereof. As the number of flows through a router increases, the amount of
required buffer decreases.
Whether this model holds can be easily verified using ns2 and the result is shown
in Figure 3.4. Between one and 300 flows share a bottleneck link, and we iteratively
found the minimum amount of buffering that allows us to achieve at least 95% link
utilization. For fewer than 50 flows, our model clearly does not hold, we still have
synchronization effects. However, as we reach 50 flows, the flows desynchronize and
3
A more precise result could be obtained by using Chernoff Bounds instead. However to achieve
our goal of determining a buffer size that gives us low packet loss and underutilization, this simple
method is sufficient.
CHAPTER 3. LONG FLOWS 30
ns2 Simulation
120 2T*C/sqrt(n)
100
Minimum Buffer [Pkts]
80
60
40
20
0
0 50 100 150 200 250 300
Number of TCP flows
we can see that our model predicts the actual required minimum buffer very well.
We will verify our model more extensively with simulations in Chapter 5 and with
experiments on real networks in Chapter 6.
This result has practical implications for building routers. A typical congested
core router currently has 10,000 to 100,000 flows passing through it at any given
time. While the vast majority of flows are short (e.g. flows with fewer than 100
packets), the flow length distribution is heavy-tailed and the majority of packets at
any given time belong to long flows4 . As a result, such a router would achieve close
to full utilization with buffer sizes that are only √ 1 = 1% of the delay-bandwidth
10000
product.
4
Live production networks have mixes of short flows and long flows. We will show that this model
also holds for mixes of flows in Chapter 5 and present results on estimating n from live traffic in
Chapter 6.
Chapter 4
31
CHAPTER 4. SHORT FLOW MODEL 32
20
10
0
5.8 6 6.2 6.4 6.6 6.8
time [seconds]
eight, sixteen, etc.). If the access links have lower bandwidth than the bottleneck
link, the bursts are spread out and a single burst causes no queueing. We assume the
worst case where access links have infinite speed; bursts arrive intact at the bottleneck
router.
flows, the buffer will usually empty several times during one RTT and is effectively
“memoryless” at this time scale.
For instance, let’s assume we have arrivals of flows of a fixed length l. Because of
the doubling of the burst lengths in each iteration of slow-start, each flow will arrive
in n bursts of size
Xi = {2, 4, ...2n−1 , R},
where R is the remainder, R = l mod (2n − 1). Therefore, the bursts arrive as a
Poisson process, and their lengths are i.i.d. random variables, equally distributed
among {2, 4, ...2n−1 , R}.
The router buffer can now be modeled as a simple M/G/1 queue with a FIFO
service discipline. In our case, a “job” is a burst of packets, and the job size is the
number of packets in a burst. The average number of jobs in an M/G/1 queue is
known to be (e.g. [44])
ρ
E[N ] = E[X 2 ].
2(1 − ρ)
Here, ρ is the load on the link (the ratio of the amount of incoming traffic to the
link capacity C), and E[X] and E[X 2 ] are the first two moments of the burst size.
This model will overestimate the queue length because bursts are processed packet-
by-packet while in an M/G/1 queue, the job is only de-queued when the whole job
has been processed. If the queue is busy, it will overestimate the queue length by half
the average job size, and so,
ρ E[X 2 ] E[X]
E[Q] = −ρ .
2(1 − ρ) E[X] 2
It is interesting to note that the average queue length is independent of the number
of flows and the bandwidth of the link. It only depends on the load of the link and
the length of the flows.
A similar model is described in [20]. The key difference is that the authors model
bursts as batch arrivals in an M [k] /M/1 model (as opposed to our model that models
bursts by varying the job length in a M/G/1 model). It accommodates both slow-
start and congestion avoidance mode; however, it lacks a closed form solution. In the
CHAPTER 4. SHORT FLOW MODEL 34
50
40 Mbit/s link
80 Mbit/s link
200 Mbit/s link
M/G/1 Model
40
Average Queue Length E[Q]
30
20
10
0
0 10 20 30 40 50 60
Length of TCP Flow [pkts]
Figure 4.2: The average queue length as a function of the flow length for ρ = 0.8.
end, the authors obtain queue distributions that are very similar to ours.
1
Flow length of 2 (ns2 simulation)
Flow length of 64 (ns2 simulation)
0.9 Flow length 2 (M/G/1 Model)
Flow length 62 (M/G/1 Model)
0.8
0.7
Probability of Q > x
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200
Queue length [pkts]
Figure 4.3: Queue length distribution for load 0.85 and flows of length 2 and 62
2(1−ρ) E[Xi ]
−b ρ E[X 2 ]
P (Q ≥ b) = e i .
Figure 4.3 shows an experimental verification of this result. For traffic of only short
flows of length two and length 62 respectively, we measured the queue occupancy using
an ns2 simulation and compared it to the model predictions. The model gives us a
good upper bound for the queue length distribution. It overestimates the queue length
as the model assumes infinitely fast access links. The access links in ns2 spread out
the packets slightly, which leads to shorter queues. The overestimation is small for
the bursts of length two, but much larger for bursts of length 62. In Chapter 5, we
will evaluate this effect in more detail.
Our goal is to drop very few packets (if a short flow drops a packet, the retrans-
mission significantly increases the flow’s duration). In other words, we want to choose
a buffer size B such that P (Q ≥ B) is small.
A key observation is that, for short flows, the size of the buffer does not depend
on the line-rate, the propagation delay of the flows, or the number of flows; it only
depends on the load of the link, and length of the flows. Therefore, a backbone router
serving highly aggregated traffic needs the same amount of buffering to absorb short-
lived flows as a router serving only a few clients. Furthermore, because our analysis
doesn’t depend on the dynamics of slow-start (only on the burst-size distribution), it
can be easily extended to short unresponsive UDP flows.
In practice, buffers can be made even smaller. For our model and simulation, we
assumed access links that are faster than the bottleneck link. There is evidence [18, 10]
that highly aggregated traffic from slow access links in some cases can lead to bursts
being smoothed out completely. In this case, individual packet arrivals are close to
Poisson, resulting in even smaller buffers. The buffer size can be easily computed
with an M/D/1 model by setting Xi = 1.
In summary, short-lived flows require only small buffers. When there is a mix of
short- and long-lived flows, we will see from simulations and experiments in Chapter 5
that the short-lived flows contribute very little to the buffering requirements, and the
buffer size will usually be determined by the number of long-lived flows.1
1
For a distribution of flows, we define short flows and long flows as flows that are in slow-start
and congestion avoidance mode respectively. This means that flows may transition from short to
long during their existence.
CHAPTER 4. SHORT FLOW MODEL 37
• Does not adapt its overall sending patterns to packet loss or changes in latency
(other than simple retransmitting of packets)
• Sends packets in bursts where the time between bursts is large enough that the
router buffer will usually empty at least once during this time. For a high-speed
router at moderate loads, this is always the case if the inter-burst interval is in
the order of the RTT.
If these two conditions hold, the arguments and methodology we used for short
flows will equally hold and give us a good model of the queue distribution. Below, we
discuss several examples of such flows.
This chapter is dedicated to verifying the models we built in the last two chapters
using simulation with the ns2 [1] network simulator. We also give an overview of our
experimental setup and methodology to facilitate further research in the area.
Some of the experimental scripts used for this work are available on the author’s
home page and use, modification and re-distribution for research purposes is encour-
aged.
We initially look at long flows and short flows separately. This allows us to verify
quantitatively our models and get a better understanding on where they hold. We
then combine long and short flows to see how they interact. Real traffic typically
consists of flows with a continuous and typically heavy-tailed length distribution.
We, therefore, study Pareto distributed flow lengths to see if the models we built for
the separate classes still hold for this case.
Finally, we show that our model, at least in some cases, also holds for two-way
congestion.
39
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 40
defacto standard to analyze new protocols and models in the academic network re-
search community. It is known to give a very accurate picture of how protocols will
perform on real networks; however, it is also known to deviate in some areas, such as
predicting synchronization in the network.
For the NS2 simulations in this thesis, we, in most cases, used a standard dumbbell
topology. A number of senders send data over network links (herein referred to as
“access links”) to a router. Each sender has a separate access link with an individual
latency and bandwidth. In most experiments, bandwidths were identical, however,
the latency varies by about 20% of the total end-to-end latency unless otherwise noted.
The router is connected to a receiving host via the bottleneck link. All receivers are
on the receiving host. All links are bi-directional full-duplex links.
Focusing on this topology might seem restrictive at first. However, in practice,
the model captures most characteristics of a complex network, or can at least be used
to build a “worst case” topology that is similar to a complex topology, but will create
more burstiness.
In a complex network, we can calculate the propagation delay for any sender-
destination pair. By adjusting the variable latency of the access links, we can tune
our simplified network to the same end-to-end propagation delay. In the absence of
any (even short term) congestion, we can therefore create a simplified network that
from the sender/receiver point of view is identical to the complex network.
In practice, a complex network would encounter congestion in two places. In the
access networks leading to and from the router, and at the bottleneck link. The focus
of this entire work is to investigate buffer requirements for networks with a single
point of congestion. There still might be short-term congestion in the access network
(e.g. two packets arriving at an access router at the same time), however this will
mainly have the effect of reducing burstiness of the arriving traffic, which has little
effect on long-lived flows and reduces queue size for short lived flows. The main effect
of our simplified topology is to see an increased amount of burstiness and concurrent
packet arrivals at our bottleneck router. In other words, we simplify the network in
a way that, at least in most cases, should increase buffer requirements.
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 41
0.98
0.96
utilization
0.94
0.92
0.9
0.88
50 packet buffer (20 ms)
0.86 100 packet buffer (40 ms)
>= 500 packet buffer (200 ms)
0 100 200 300 400 500
number of flows
0.98
0.96
0.94
goodput
0.92
0.9
0.88
50 packet buffer (20ms)
0.86 100 packet buffer (40ms)
500 packet buffer (200ms)
50 100 150 200 250 300
number of flows
Figure 5.1: Utilization (top) and Goodput (bottom) vs. Number of Flows for different
buffer sizes
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 43
two-way propagation delay was 130ms and the buffer sizes correspond to 20ms, 40ms
and 200ms, respectively.
With buffers above 2Tp × C, we can achieve full utilization, even with a single
flow. However, if we reduce the buffers, a small number of flows is no longer able to
achieve full utilization. As we increase the number of flows, and with it the amount
of statistical multiplexing, utilization increases until it eventually reaches 100%.
The bottom graph in Figure 5.1 shows goodput achieved in the same experiment.
The results are essentially the same. With small buffers, we need a minimum number
of flows to achieve full utilization. With large buffers, even a single flow can saturate
the link. Goodput decreases as the number of flows increases and it is slightly higher
for larger buffers. This is not surprising as more flows lead to smaller TCP windows
and thus a higher drop rate [31].
400
98.0% Utilization
99.0% Utilization
350 99.5% Utilization
99.9% Utilization
RTTxBW/sqrt(x)
Minimum required buffer [pkts]
300 2*RTTxBW/sqrt(x)
250
200
150
100
50
0
50 100 150 200 250 300 350 400 450 500
Number of long-lived flows
100
Minimum Buffer for 99.5% Utilization
RTT x BW/sqrt(n)
2 * RTT x BW/sqrt(n)
80
Required Buffer
60
40
20
0
50 100 150 200 250 300
Number of Flows
Figure 5.2: Amount of Buffering Required for different Levels of Utilization (Top)
and Example of the 2T√p ×C
n
rule for a low bandwidth link (Bottom).
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 45
45
Simulation, 5 Mb/s
Simulation, 10 Mb/s
40 Simulation, 25 Mb/s
M/G/1 Model
35
Average Queue Length [Pkts]
30
25
20
15
10
0
0 20 40 60 80 100
Flow Length [Pkts]
Figure 5.3: Average Queue Length for Short Flows at three Different Bandwidths
about 80% in the link. Access links had a speed of 10 Gb/s to create maximum
burstiness. We varied the length of the flows from 1 to 100 and for each length
measured over several minutes, the average length of the router’s buffer queue. This
experiment was repeated for bandwidths of 5 Mb/s, 10 Mb/s and 25Mb/s. In the
graph, we also show the average queue length as predicted by the M/G/1 model from
Chapter 4.
We can see from the graph that the model matches the actual measured average
queue length very well. More importantly, however, the results from the measure-
ments at three different bandwidths are almost identical. We measured queue occu-
pancies at line rates of up to 100 Mb/s and found no substantial difference in queue
length.
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 47
100
80
required buffer [pkts]
60
40
20
Figure 5.4: Effect of the Access Link Bandwidth on the Queue Length
link has a bandwidth of 5 Mb/s. We repeated the experiment three times, once with
access link speeds of 500 Kb/s, 5 Mb/s and 50 Mb/s. As we would expect, faster
access links require more buffering as they preserve the burstiness of TCP the best.
However, the difference between an access link that is faster than the bottleneck link
and an access link that has the same speed as the bottleneck link is fairly small.
Increasing the access link speed further does not change the results substantially. We
can, therefore, treat access links that are at least as fast as the bottleneck link as
having infinite speed. However, with a 500 Kb/s access link that is ten times slower
than the bottleneck link, we require much less buffering. This is primarily of interest
for the core of the network. Access links to a 10 Gb/s link are at 1 Gb/s or below,
which would allow us to reduce the buffer for such links by a factor of two.
750
10 pkts
20 pkts
700 40 pkts
80 pkts
2TxC = 644 pkts
650
Average Flow Completion Time [ms]
600
550
500
450
400
350
300
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Load of Bottleneck Link
Figure 5.5: Average Flow Completion Time for short flows as a function of the Load
of the Bottleneck Link and the amount of Buffering.
some point the rate of retransmits plus new flows will surpass the bottleneck capacity
and the system will become unstable. This will happen earlier for small buffer sizes.
For a buffer of only 10 packets, load cannot increase beyond 75% before a substantial
number of flows no longer compete. Picking too small a buffer carries two penalties.
First, it will increase flow completion times and second, it lowers the critical load at
which the system becomes unstable.
In practice, this is of limited relevance. Even with buffers of only 40 packets, we
can reach up to 90% load with little increase in AFCT. For a high-speed router, 40
packets of buffering is small and probably far below what is needed for long flows.
And running a router that serves non-congestion-aware traffic at 90% would be a bad
idea as a small error in estimating the load would make the system unstable.
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 50
1500
1000
500
0
0 50 100 150 200 250 300 350 400 450 500
Number of long-lived flows
Figure 5.6: Minimum Required Buffer for a Mix of 20% Short Flows and 80% Long
Flows.
a capacity of 50 Mb/s, Therefore, 20% of the link’s bandwidth was at any given time,
taken up by short flows. We created a number of infinite length TCP flows that sent
data over the bottleneck link that took up the remaining 80% of the bottleneck link.
The first thing we observed is that in such a mix of long and short flows, the short
flows will always claim their share of the bandwidth with fairly minimal losses. The
reason for this is the much more aggressive multiplicative increase of the short flows
vs. the additive increase of the long flows in congestion avoidance mode.
2Tp ×C
More importantly, we see from Figure 5.6 that a buffer size of √
n
is also suf-
ficient for mixes of flows. In fact, in a mix of flow, we can calculate buffer size for
both types of traffic separately and add them to obtain an upper bound of the total
buffer. As the buffering needed for short flows is much smaller than the buffering
needed for long flows, this effectively means we can calculate what fraction of the
overall bandwidth comes from long flows and then use this reduced bandwidth to
calculate the amount of buffering required. In this experiment, the delay-bandwidth
product was 4230 packets. As 80% of the traffic is from long flows, we use a reduced
delay-bandwidth product of C 0 = C ∗ 0.8 to calculate our minimal required buffer. As
we can see in the graph, the minimum required buffer we found experimentally fits
very well with the amount of buffering predicted by our model.
It is also interesting to see that our new rule already holds at about 150 flows.
We think that this is due to short flows causing long flows to desynchronize. In other
words, mixes of long and short flows require less buffer than traffic consisting only of
long flows.
700
Average completion time for a 14 pkt TCP flow
600
500
400
300
200
100
Figure 5.7: Average Flow Completion Time for Large (2Tp × C) and Small ( 2T√p ×C
n
)
Buffers.
2Tp ×C
of 2Tp × C and once with buffers of √
n
. The results show that with smaller buffers,
the flows complete much faster. This might be surprising at first, after all, smaller
buffers will cause an increase in packet drops and thus, retransmits and time-outs.
However, smaller buffers also decrease the RT T of the flows. A router with large
buffers that is congested will have full or nearly full buffers all the time. This causes
the RTT of flows to be:
Q 2TP × C
RT Tmax = 2TP + ≈ 2TP + = 4TP
C C
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 53
2Tp ×C
With buffers of √
n
the RT T will only be:
2T√
P ×C
!
Q n 1
RT Tmin = 2TP + ≈ 2TP + = 2TP 1+ √
C C n
For large n, the term √1 becomes very small and the RT T for small buffers is close
n
to only half the RT T for large buffers.
One retransmit will usually add one additional RT T to the flow completion time
of a flow. The 14 packet flow in the experiment has a length of 4 × RT T without
losses. With small buffers and a halved RTT, the flow can incur up to four retransmits
and still finish as quickly as a flow with large buffers, twice the RT T and no packet
losses. For four retransmits in a 14 packet flow, we would need a loss rate of more
than 20%! Smaller buffers can in the worst case scenario, quadruple the loss rate of
TCP vs. full buffers of one delay-bandwidth product [31]. We can therefore argue
that if the loss rate with full buffers is below 5%, smaller buffers will always lower the
AFCT.
The above calculation is an oversimplification as it ignores time-outs and the
details of the TCP retransmit algorithm. However, it seems fair to conclude that for
typical loss rates in the internet, smaller buffers will always reduce the completion
time of short TCP flows.
20% short flows, 30% short flows. For comparison, we also ran the experiment with
large buffers of 2Tp × C and 20% short flows.
The top graph in Figure 5.8 shows the minimum buffer required to achieve 95%
utilization for traffic consisting of 10%, 20% and 30% short flows. We can see that
our model holds well in all cases. What is surprising though is that the difference
in required buffering becomes virtually zero as soon as we are above 110 flows. The
explanation for this can be found using our short flow model. If we consider the short
flow traffic alone, without the long flows, we can determine the minimum amount of
buffering needed for the short flows. It is plotted in the graph as the horizontal line
at about y = 100. At around n = 110, the amount of buffering needed for 100%
utilization drops below this threshold. What this means is that at this point, more
and more short flows experience drops and are forced out of slow-start into congestion
avoidance mode. As we increase n, soon all flows are in congestion avoidance mode and
are effectively “long” flows. As we can see, the transition of the buffer dropping below
the short flow threshold and the differences in flow lengths disappearing coincides
very well.
With the amount of buffering being almost equal, the AFCT shown in the lower
graph of Figure 5.8 only differs by a few percent. However, in any of the three settings,
it remains much smaller than the AFCT with full buffers of 2Tp × C.
Overall, we can conclude that the amount of required buffer and the change in
AFCT is not very sensitive by small changes in the ratio of long vs. short flows.
Buffer requirements in mixes of flows are almost entirely driven by the number of
long flows in congestion avoidance mode.
0.5 x 2T*C/sqrt(n)
Minimum Buffer for Short Flows
1000
800
600
400
200
0
0 20 40 60 80 100 120 140 160 180
Number of long flows
800
750
Minimum Required Buffer [pkts]
700
650
600
550
500
Figure 5.8: Utilization (top) and Average Flow Completion Times (bottom) for dif-
ferent Ratios of Long and Short Flows
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 56
180
No. of Flows in CA Mode
160
140
120
100
80
60
40
20
0
50 60 70 80 90 100
1
0.8
Utilization
0.6
0.4
0.2
0
50 60 70 80 90 100
1800
1600
1400
Queue [pkts]
1200
1000
800
600
400
200
0
50 60 70 80 90 100
0.6
0.5
Drop Rate
0.4
0.3
0.2
0.1
0
50 60 70 80 90 100
Time [seconds]
Figure 5.9: Number of Flows (Top), Utilization, Queue Length and Drops (Bottom)
for Pareto Distributed Flow lengths with large Buffers of 2Tp × C
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 57
Figure 5.9 shows what happens for Pareto distributed flow lengths with a large
2Tp ×C
buffer of size √
n
. We want to get an idea of the worst case congestion and we chose
a load of just above 100%. A load above 100% might seem like a hopeless choice as it
can not possibly be served by the router. However, in practice it simply means that
a few heavy flows will not terminate for the duration of the experiment.
The top graph shows the number of flows in congestion avoidance mode. It fluctu-
ates between 60 and 160 flows with an average of about 110 flows. The second graph
shows the utilization, it was 100% for the entire duration of the experiment. The
queue length shown in the third graph fluctuates between 800 and the maximum of
1600 packets. It never drops below 800 packets. Clearly, this buffer is too large and
could be reduced by at least 800 packets. The bottom graph shows the loss rate for
each 50ms interval. Drops occur only in some intervals, however, for some intervals,
the drop rate goes as high as 50%. On average, the loss rate is around 1%. This is a
very high loss rate for TCP however, it still works well at this rate.
To pick a reduced buffer size, we need to find the number of concurrent long flows
n. Looking at the top graph of Figure 5.9, we pick n = 100 and size the buffer to
1
10
th of its original size or 165 packets. Note that this is not an intuitive choice. The
buffer fluctuation that we can see in the queue length graph of Figure 5.9 is about
800 packets.
It does turn out to be the correct choice though. The top graph of Figure 5.10
shows the number of concurrent flows with the smaller buffers. We can see that the
number of flows stays above 100 all the time, justifying our choice of n = 100. The
second graph in Figure 5.10 shows the utilization. We can see that the utilization is
close to 100% almost all the time. Averaging shows that the utilization is above 99%.
The queue length shown in the third graph oscillates between zero and the buffer size
of 165. This tells us that the buffer is fully used and gives us an indication that we
are probably close to the minimum buffer. The drop rate shown in the lowest graph
is quite different from the large buffer case. Overall, the drop rate is now higher
(but still a low single digit percentage), however more intervals have drops and the
maximum drop rate in a single interval is lower.
The most important result for this experiment is that we can achieve a utilization
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 58
300
No. of Flows in CA Mode
250
200
150
100
50
0
50 60 70 80 90 100
1
0.8
Utilization
0.6
0.4
0.2
0
50 60 70 80 90 100
180
160
140
Queue [pkts]
120
100
80
60
40
20
0
50 60 70 80 90 100
0.45
0.4
0.35
Drop Rate
0.3
0.25
0.2
0.15
0.1
0.05
0
50 60 70 80 90 100
Time [seconds]
Figure 5.10: Number of Flows (Top), Utilization, Queue Length and Drops (Bottom)
for Pareto Distributed Flow lengths with small Buffers of 2T√p ×C
n
CHAPTER 5. EXPERIMENTAL VERIFICATION WITH N S2 59
2Tp ×C
of 99% with buffers of size √
n
. The main change from the long flow case is that
now we have to measure or estimate the number of long-lived flows. We have also
seen that smaller buffers increase the overall loss rate and decrease the probability of
very large loss events.
One might be surprised that the number of long flows is this large for Pareto dis-
tributed traffic. It has been argued [19] that in a network with low latency, fast access
links and no limit on the TCP window size, there would be very few concurrent flows.
In such a network, a single very heavy flow could hog all the bandwidth for a short
period of time and then terminate. But this is unlikely in practice, unless an operator
allows a single user to saturate the network. And so long as backbone networks are
orders of magnitude faster than access networks, few users will be able to saturate
the backbone anyway. Even in our simulation where we have unlimited window sizes
and very fast access links, TCP is not capable of utilizing a link quickly due to its
additive increase behavior above a certain window size. Traffic transported by high-
speed routers on commercial networks today [18, 2] has 10s of 1000s of concurrent
flows and we believe this is unlikely to change in the future.
The last Chapter examined the experimental verification of our models using ns2
simulation. This Section verifies our model with experiments performed on actual,
physical routers with TCP traffic generated by real TCP stacks.
While ns2 gives us valuable insights into how buffers and network traffic interact,
it also fails to capture a number of characteristics of real networks:
• Packet Timings. In ns2, all packet timings are deterministic. In a real net-
work, there are many sources of randomization and burstiness such as context
switching or busy intervals on end hosts or shared memory routers, routing
messages, retransmits of packets by the link layer, etc.
60
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 61
• Other Protocols. We assume only TCP flows with some length distribution
in our simulation. Real networks will have a variety of traffic patterns and
protocols.
The first set of experiments tries to evaluate the effect of the first three of the above
issues. The experiments were done on a Cisco GSR [36] router in a laboratory setting
at the University of Wisconsin-Madison. This setup allows us to create specific traffic
patterns and directly compare the predictions of our model and ns2 with a physical
router.
The other two experiments were done on live production networks. The key goal
here is to verify the effect of different applications, protocols, packet types and usage
patterns. The Stanford experiment measures performance of a congested router. The
Internet2 experiment verifies our results for an uncongested router.
The 1Gb/s links of the network aggregating traffic is, by a factor of 60, faster
than the speed of the bottleneck link. Congestion will therefore, only happen on GSR
interface serving the OC3 line. Reverse traffic consists only of ACK packets and will
not cause congestion anywhere.
TCP traffic was generated using the Harpoon traffic generator [41]. Harpoon
allows us to specify flow arrival patterns as well as flow lengths. The flows are real TCP
flows that react to congestion and will terminate once all data has been transmitted.
For the delay generation, we originally used a PC running the Dummynet [34]
network simulator. However, after running a first set of experiments, we found that
Torment would sometimes queue traffic for several milliseconds, and then send it as
a burst. The result was traffic with very different characteristics than the original
network traffic. For example, the mean queue length was several times longer than
without Dummynet. We replaced Dummynet with an Adrea delay generator and the
burstiness in the traffic disappeared.
The goal of the experiment is to verify quantitatively both the long flow as well
as the short flow model. For this, we need to measure:
• The capacity C. The line rate of an OC3 is 155 Mb/s, however after PPP and
SONET encoding, we measured an effective throughput of 149.26 Mb/s. This
is very close to the theoretical value.
• The number of flows, flow arrival pattern and overall load is directly controlled
by the Harpoon traffic generators.
• Utilization of the bottleneck link is measured on the GSR itself via two mecha-
nisms: Netflow and byte counters on the interface that are polled via SNMP.
• The queue length Q can be polled on the GSR via the DOS command prompt.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 63
• Packet arrives on the physical interface. The line card inspects the packet and
determines the packet’s destination line card.
• The packet is queued in the input queue (called “ToFab” or ToFabric buffer on
the GSR) until the switching fabric becomes available.
• The switching fabric switches the packet to the output line card.
• On the output line card, the packet is queued in an Output Queue (called
FromFab buffer on the GSR).
• Once the output interface becomes available, the packet is de-queued and sent
out on the network.
The switching fabric of the GSR is designed to switch traffic at rates of up to 10 Gb/s
per line card. For our experiment, traffic through the router is limited to 155 Mb/s
and we would expect that the switching fabric is never the bottleneck. We verified
this experimentally and found the ToFab buffers to be empty all of the time.
The GSR keeps different buffer pools for different packet lengths. The relative
size of these pools is centrally determined for the whole router and depends on the
MTU sizes of the connected interfaces. This “carving” process takes place whenever
a line card is added or removed or MTU sizes are changed. As we can not control the
carving process, it is difficult to limit the buffer size directly.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 64
On the “Engine 0” line card that we are using, it is possible to set directly the
maximum queue lengths of the FrFab queue 1 .
During our measurements, we initially were puzzled by the fact that we seemed to
overestimate the queue length by a constant offset of about 43 packets or 64 kBytes.
After contacting the router manufacturer, we found out that the GSR 4xOC3 Engine
0 line card has a FIFO buffer of 128 kByte on the interface management ASIC, of
which in our setup the first 64 KByte are used. With an MTU of 1500 Bytes, this
creates an additional 43 packets of buffering. Packets in this buffer are not visible to
IOS and therefore explain the under reporting.
With the exception of the utilization, measuring the data we need on the GSR is
fairly simple. The GSR allows us to poll directly the queue length of the ToFab and
FrFab queue on the line card. Measuring utilization however, is difficult. The GSR
has packet and byte counters, which together with a time stamp can convert into line
rates. The problem is that the counters are on the line cards while the clock providing
the time stamp is on the central controller. The counter on the line cards is sent to the
central controller only once every 10 seconds. We tried polling the counters exactly
once every 10 seconds, however, the timing jitter of the SNMP commands introduces
a statistical error of several percent for each measurement.
We address this problem by a two-pronged approach. First, we measure utilization
over several minutes, which reduces the statistical error to about 0.1%. Second, we
use netflow records [11] to get a second estimate for the utilization. If Netflow differs
too much from the SNMP byte counter measurements, we discard the experiment.
Figure 6.1 shows the results of the long flow measurements. The router memory was
adjusted by limiting the length of the interface queue on the outgoing interface. The
RT√
T ×C
buffer size is given as a multiple of n
, the number of packets and the size of the
1
Engine 0 line cards accept the IOS “TX-Queue limit” command, reportedly current engine 1 and
engine 2 line cards do not support this feature. On these more modern cards, it might be possible
to achieve the same effect with class-based shaping. See Section 6.2 for a detailed description.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 65
Figure 6.1: Comparison of our model, ns2 simulation and experimental results for
buffer requirements of a Cisco GSR 12410 OC3 line card.
RAM device that would be needed. We subtracted the size of the internal FIFO on
the line-card (see Section 6.1.3). Model is the lower-bound on the utilization predicted
by the model. Sim. and Exp. are the utilization as measured by a simulation with
ns2 and on the physical router respectively. For 100 and 200 flows, there is, as we
expect, some synchronization. Above that the model predicts the utilization correctly
within the measurement accuracy of about ±0.1%. ns2 sometimes predicts a lower
utilization than we found in practice. We attribute this to more synchronization
between flows in the simulations than in the real network.
The key result here is that model, simulation and experiment all agree that a
RT√
T ×C
router buffer should have a size equal to approximately n
, as opposed to RT T ×C
(which in this case would be 1291 packets). However, we can also see that our model
predicts the transition point where utilization drops from 100% to values below 100%
fairly accurately.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 66
1
Exp. Cisco GSR if-queue
Exp. Cisco GSR buffers
Model M/G/1 PS
Model M/G/1 FIFO
0.1
P(Q > x)
FIFO
0.01
0.001
0 50 100 150 200 250 300 350 400
Queue length [pkts]
Figure 6.2: Short Flow Queue Distribution of 62 packet flows measured on a Cisco
GSR compared to model prediction
Short Flows
In Section 4.1, we used a M/G/1 model to predict the buffer size we would need
for short-lived, bursty TCP flows. To verify our model, we generated short-lived
flows and measured the probability distribution of the queue length of the GSR. We
repeated the experiment for flow lengths of 2, 14, 30 and 62 packets. The load for all
three experiments was around 85%. For each case, we measured flow length using two
different methods. First, we measured the number of packets in the output queue.
Second, we measured the overall amount of buffering that was occupied. In practice,
both methods agree very well. All measured queue lengths have to be adjusted by 43
packets to account for the FIFO buffer in the interface management chip.
Figure 6.2 shows the results for 62 packet flows compared to our M/G/1 model and
the M/G/1/PS mode. The M/G/1 model matches the experimental data remarkably
well. As expected, the M/G/1/PS model overestimates the queue length but can be
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 67
1
Exp. Cisco 12000 if
Exp. Cisco 12000 buf
M/G/1 Eff. BW 2nd
M/G/1 PS
0.1
P(Q > x)
0.01
0.001
0 50 100 150 200 250 300 350 400
Queue length [pkts]
Figure 6.3: Short Flow Queue Distribution of 30 packet flows measured on a Cisco
GSR compared to model prediction
1
Exp. Cisco 12000 if
Exp. Cisco 12000 buf
M/G/1 Eff. BW 2nd
M/G/1 PS
0.1
P(Q > x)
0.01
0.001
0 50 100 150 200 250 300 350 400
Queue length [pkts]
Figure 6.4: Short Flow Queue Distribution of 14 packet flows measured on a Cisco
GSR compared to model prediction
----100 Mb
100 Mb 100 Mb /
Internet <------> Cisco 7200 VXR <---I--> Cisco 5000 ----100 Mb
| \
Packeteer ----100 Mb
The 100 Mb/s link from and to the internet is saturated a good part of the day.
Traffic is from a mix of different applications including Web, FTP, Games, Peer-to-
Peer, Streaming and others. The number of flows that are active varies by the time
of day from 100s to at least 1000s of concurrent flows. Traffic is mainly TCP as
well as some UDP. This traffic mix is far away from the idealized assumptions in the
rest of the paper and fairly similar to network traffic on large commercial network
backbones.
Between the VXR and the Cisco 5000 switch is a Paketeer packet shaper. This
shaper can be used to throttle flows in both directions. For the experiment, the
bandwidth of the router was below the trigger bandwidth of the Packeteer, which
was basically inactive.
The goal of our experiment is to observe a congested router serving a large number
of flows. The above setup is not directly usable for this purpose as the Cisco 7200
never experiences congestion - its input and output capacity is the same. In order to
generate congestion at the router, we throttle the link from the Cisco 7200 router to
the Cisco 5000 Ethernet switch to a rate substantially below 100 Mb/s. How this is
done is explained in detail below.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 70
q
The goal of the experiment is to verify if the 2T × C/ (n) hypothesis holds. In
order to do this, we measure before the start of the experiment:
• The number of flows n. We can estimate this via the Netflow mechanism of the
Cisco router.
• The round-trip propagation delay 2T. We cannot measure or estimate the RTT
distribution in a meaningful way and instead assume a typical average for back-
bone networks.
All of the above are assumed to be constant for the time of the experiment. During
the experiment, we change the amount of buffering on the router, and measure the
following parameters:
• The queue length Q. This is mainly to verify that the router was congested and
operating with full buffers.
In our idealized router model, a packet is only queued once in a router. As we will
see, the reality on the VXR is far more complex. As we have seen in the experiments
for the GSR, it is important to understand where queueing can occur accurately for
all buffering. The goal of this Section is to understand where in the VXR packets are
queued and how these buffers are sized.
To understand fully the VXR, we ran a number of tests on a VXR in a laboratory
setting before conducting the actual experiment on the VXR in the Stanford network.
Before we explain the buffer architecture of the VXR, we need to discuss its packet
switching architecture. Incoming packets on the VXR are copied into shared memory
by the line card via DMA. Once the packet is copied, it triggers an interrupt that
eventually causes the packet to be switched to the right outgoing line card. The VXR
uses a number of different switching methods, depending on the packet and router
state:
• Fast Switching. Again everything is done in software but a route cache is used
and the packet is switched in the interrupt handler that was triggered by the
received packet.
• Netflow switching. Netflow stores information for each flow it sees and uses this
cached information to make very fast switching decisions. Netflow was switched
off on the router used for the experiment.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 72
The type of switching affects the path of the packet through the router and the
OS as well as the maximum rate the router is able to achieve. In practice, the router
in the experiment forwarded the vast majority of packets using CEF. In this case, the
usual path of a packet is:
1. Packet is received by the network interface and transferred (via PCI bus DMA)
into the Tx Ring. The Tx Ring is circular list of pointers to buffers. If the Tx
Ring is full, the packet is dropped.
3. An interrupt is triggered and the buffers are removed from the Tx Ring. If the
packet can be switched with anything but processor switching, the MAC header
is rewritten and the packet is moved to an output queue. If no virtual output
queueing is used, the output queue is one of the outgoing interface (and the
next step is skipped). In our case, we use class-based VOQs and the packet is
moved to the output queue of its class. If either type of output queue is full,
the packet is dropped.
4. If virtual output queues (e.g. different queues for different classes of traffic) are
used and a packet arrives at the head of the queue, it is switched to the output
queue of the interface.
5. Once there is space available in the Tx ring, the packet is removed from the
output queue to the Tx Ring.
6. The Tx Ring gets DMA’d to the interface card and the packet is transmitted.
Generally, packets never change their location in memory, all moving is done by
putting pointers to the packet data into different queues. Process switching is the
exception to the rule as packets have to be reassembled from fragments (see the buffer
pool description below).
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 73
A further complication is that the 7200 uses particles instead of packets. For standard
IP packets, a particle is approximately 745 bytes or half the most widely used Ethernet
MTU size of 1500. Buffer pools contain either particles or full buffers (1500 bytes),
but never both.
The 7200 keeps several different types of buffer pools. The most important ones
are:
Private Particle Pools These are used to replenish the Rx rings after packets
have arrived. After a packet has been sent from the Tx Ring, the buffer is returned to
the private pool of the interface. That the buffers are associated with the incoming
interface is a major difference compared to the idealized output queued model or
central fabric routers such as the GSR. If we want to prevent a shortage in buffers, we
have to increase the private pool of the incoming interface, not the outgoing interface.
Normal Particle Pools. These are global particle pools not associated with an
interface. They are used as a back-up if private pools run out.
Public Pools. These pools are used to assemble packets temporarily that have
to be switched in software. There are separate pools (small pool, middle pool, big
pool) corresponding to typical packet lengths (40,...,1500,... bytes). Public Pools are
global and are not associated with a particular interface.
All buffer pools are fairly small by default. If a buffer runs low, additional particles
or buffers are added by IOS from the overall memory pool. If the CPU load of the
router becomes too high, the software process that performs the buffer resizing can
get starved and buffers might be too small, which eventually can cause additional
packet drop. We tested the behavior of IOS both in a lab setting as well as on the
real router, and at the line speeds that we are interesting in, IOS was always fast
enough to re-size buffers as needed.
For our experiment, we want to limit the total amount of buffering available to
packets through the router. Due to the VXR’s architecture, it is impossible to do
this by simply limiting the overall amount of memory. The dynamic allocation of
memory between queues, the queues minimum sizes and the different paths through
the queues make this impossible. Instead, our strategy will be to limit queue length
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 74
directly, which for some queues we can do via IOS configuration commands.
Rate Limiting
In the experiment, we want to limit the rate of the VXR for two reasons. First, we
need to throttle the bandwidth of the bottleneck link and create congestion in case
there is not enough traffic. Second, by throttling the bandwidth substantially below
the interface bandwidth, we can keep some of the internal queues in the router empty
all of the time, which simplifies our analysis.
In a real router with limited buffers, the sending rate is limited by the outgoing
hardware interface. Throttling the hardware interface directly is not possible for
the VXR. However, we can rate limit the sending rate in software using Cisco IOS.
Depending on how we do this, we will cause queueing in one or several of the queues
described above. Understanding the details of how queueing works and which queues
different methods for rate limiting use, is crucial to understand the results of the
experiment.
In Cisco IOS, there are two approaches for reducing the sending rate of an interface:
Policing and Shaping [37]. Policing (e.g. CAR, class based policing) drops packets
that exceed a maximum rate selectively (usually based on a token bucket rule), but it
never queues packets. This is very different from what a router with limited memory
would do. Measuring router performance using policing will give us little insight in
the behavior of a router.
Traffic Shaping (e.g. GTS, class-based shaping) first queues packets, and then
drops packets based on the length of the queue. This is identical to what a router
with limited memory would do.
The simplest shaping mechanism supported by IOS is General Traffic Shaping
(GTS). It works on all IOS versions and can do simple rate limiting on most types of
interfaces. The problem is that according to Cisco documentation, it is incompatible
with Netflow [40]. In our laboratory setting, it did not limit sending rates when
Netflow was turned on.
The second mechanism is Class-Based Shaping [39]. Class based shaping allows
is to divide traffic into classes of packets. Each class can be queued separately and
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 75
priorities between classes can be defined. Class based shaping uses its own output
queue in addition to the standard output queue described above. Queue length can
be adjusted with an IOS command. In our case, we configure a class that contains
any type of packet and limit this class to our target rate. Configuration snippets can
be found in Appendix A.5.
In addition to the queue, class-based shaping uses a token bucket to limit the
sending rate. For our experiment, we are only interested in the buffering effect due to
the router’s queue and would like to set the token bucket size to zero (or one packet).
In practice however, IOS does not allow us to set a token bucket below 10ms. This
effectively creates additional buffering of 10ms multiplied with our throttled interface
rate.
The queues that Class-Based Shaping uses, come before the interface’s output
queue and the TX Ring. If the output interface is the bottleneck, the effective queue
length is the sum of all three queue lengths. In our case, however, the rate defined by
the class-based shaping ( << 100 Mb/s) is below the rate of the output interface (100
Mb/s). We can thus expect the output queue and the TX Ring to be empty at any
given time. We verified this in an experiment with a VXR in a laboratory setting.
We throttled a 1 Gb/s link down to 100 Mb/s and queried the TX Ring occupancy
and output queue length. Except for packets due to the minimum token bucket size,
both were empty or contained a single packet all the time.
Ideally, we would have liked to measure network data using specialized measurement
equipment in the network. In practice, doing so is difficult in a production environ-
ment. However, it turns out that we can capture all data that we need via the console
of the router. Below we outline how this is done for each data type and what level of
accuracy we can expect.
Link Capacity C. In the experiment, we set the link capacity using class-based
shaping. In theory, this allows us to specify precisely a number of bits/s. We measured
the accuracy of the shaped rate in a laboratory setting and the rate limiting in practice
seems to be very accurate with an error far below one percent.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 76
Number and type of flows. IOS has a feature called Netflow [11] that tracks
all flows passing through a router and collects statistics about these flows. What
we need for our experiment is the number of concurrent long-lived flows. Netflow is
not able to provide this information directly, however we are able to get an estimate
indirectly.
First, we measure the number of flows that are active over a 10 second interval.
We do this by resetting the Netflow counter, and then dumping Netflow statistics.
This count will be higher than the number of long-lived flows, as it includes short
flows that were only active for a small fraction of the time, however it provides an
upper bound. Netflow also gives us statistics about the type of flows (e.g. protocol,
port numbers), the average number of packets per flow, and the average duration per
flow. Using this additional information, we can estimate what percentage of flows
were long lived. While this method is not very precise, we can use it to get a lower
bound on the number of flows, which allows us to pick a sufficiently large buffer for
the router.
Utilization. IOS automatically collects for each interface the rate in bits/s and
packets/s. We tested the accuracy of this rate and found it to be unsuitable for our
experiment. The reason for this is shown in Figure 6.5, which shows the time evolution
of the measured rate in packets and bytes. At time zero, we saturate an interface
of a previously idle router that is throttled to 1 Mbit/s and after about 23 minutes,
we stop the traffic. First, we can see that the router uses a geometrically weighted
average. This means we would have to run our experiment over long periods of time
to achieve a stable state. Additionally, the rate calculated from the packet rate and
the byte rate differ substantially and take different amounts of time to converge. We
also found in a different experment that the convergence depends on the line rate.
A second measurement mechanism on the router are packet and byte counters on
the interfaces. These counters have the advantage that they are exact, however we
need to obtain the accurate time that corresponds to a counter value. On the VXR,
the interface counter and the clock are both accessible directly by IOS running on
the controller. In a lab setting, this method proved to be very accurate, with errors
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 77
900000
800000
700000
Bandwidth [Bits/s]
600000
500000
400000
300000
200000
BW in Packets
100000 BW in Bits/s
Theoretical BW
0
0 5 10 15 20 25 30
Time [minutes]
Figure 6.5: “average” sending rate of a router as measured by IOS. Actual sending
pattern was an on/off source. The reported byte based rate and the packet based
rate differ substantially.
Configured Q_Max = 40
0.5
0.4
Probability
0.3
0.2
0.1
Prob(Q>X)
0
0 10 20 30 40 50 60 70 80
Queue Length [Packets]
Figure 6.6: CDF of the class-based-shaping queue length reported by IOS on a VXR
router. The maximum queue length was configured to be 40 packets, however the
router reports queue lengths above this value.
network interface to 1 Mb/s and sent a TCP stream over a 1 Gb/s access link. The
result was congestion on the 1Mb/s link with almost all outstanding packets in the
router’s buffer. In this experiment, the flow quickly reached its maximum window
size of 42 packets (65,536 Bytes), which were all queued in the router. The reported
queue length again frequently exceeded 40 packets and one data point exceeded 80
packets. We then reduced the queue length first to 43 packets and then to 40 packets.
At 43 packets, the flow was unaffected, however at 40 packets, it would lose packets
as the queue would overflow. This experiment suggests that the queueing mechanism
works correctly, and it is only the reporting of the queue length that is incorrect.
In practice, we can use queue length measurements on the VXR only as qualitative
data points. They give us a good idea of the typical queue lengths, however exact dis-
tribution as well as minimum and maximum values do not match other experimental
evidence.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 79
Packet Length 1-32 64 96 128 160 192 224 256 288 320
Probability .001 .522 .034 .021 .012 .007 .009 .005 .004 .004
Packet length 352 384 416 448 480 512 544 576 1024 1536
Probability .004 .006 .004 .003 .003 .003 .005 .011 .039 .293
Figure 6.7: Packet length statistics for the router in the experiment. Packet lengths
are in bytes.
Packet Length
We need the average packet length to convert delay bandwidth product into a mean-
ingful number of packets. Statistics on the packet length are collected by Netflow.
The data generated by Netflow is shown in Figure 6.7. Half of the packets are between
32 and 64 bytes, these are TCP ACKs as well as games, real-time applications and
port scans. 29% of the packets are packets of maximum length, these are typically
from long-lived TCP flows from the data we calculate and average packet size of 557.1
bytes.
We cannot easily measure the RTT of flows through the router, instead we use the
common 250ms assumption. With the 20 Mb/s throttled bandwidth and average
packet size from above, this gives us a delay bandwidth product of:
Figure 6.8: Long term netflow statistics for the router used in the experiment
Before the actual experiment, we measured that during a 10 second interval, about
3400 flows are active. This includes long-lived flows, as well as short-lived flows,
real-time applications and port scans. To estimate what percentage of these flows
are long-lived flows, we use Netflow statistics gathered by the router. The output of
Netflow on the router used for the experiment is shown in Figure 6.8. The top three
categories generate more than 90% of the traffic. These categories are:
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 81
This adds up to a total bandwidth of 106 Mb/s. This makes sense as this is
the bi-directional average for a 100 Mb/s link that is saturated during the day and
underutilized at night.
If we care about what flows are competing for bandwidth, we can essentially
ignore UDP. UDP flows are real-time applications such as games, streaming and port
scanning tools. All of these traffic sources usually do not respond to congestion.
Additionally, they compromise only 4.3% of the overall traffic.
The TCP-WWW flows are HTTP flows generated by web browsers or web services.
They have an average length of only 16 packets. Therefore, the average http flow will
never leave slowstart. We now try to estimate how many TCP flows we have in a 10-
second interval. Web traffic is 30bytes. The number of packets per second generated
by web flows is thus:
To verify the rate, we compare this to the rate reported by Netflow. Netflow
reports 5400 packets/s for a traffic rate of 106 Mb/s. As we have only a rate of 20
Mb/s, we would expect a rate of
106M b/s
5400pkts/s × = 1018pkts/s (6.3)
20M b/s
Both methods of calculating the packet rate agree fairly well. Assuming average
flow length of 10-20 packets (Netflow reports 16), this means 50 to 100 flows/s at 20
Mb/s. This is again is consistent with the 323 flows/s (at 106 Mb/s) that Netflow
suggests. Netflow reports that UDP has twice as many flows/s, thus 100 to 200 per
second.
It seems that short-lived web and UDP flows make up the majority of the flows
we are seeing. The minimum and maximum number of long-lived flows during the 10
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 82
Given the delay-bandwidth product of about 1100 packets, this gives us an average
window size of only 2.5 packets, which is very low. Many of the flows will be throttled
or constrained by the source. However, those that try to send at full speed will likely
see frequent time-outs.
Buffer Size
Using the minimum and maximum number of flows, as well as the delay-bandwidth
product, we can calculate the minimum and maximum buffer:
2T × C 1121pkts
Bmin √ = √ = 26pkts (6.6)
nmax 1900
2T × C 1121pkts
Bmax √ = √ = 56pkts (6.7)
nmin 400
For the experiment, we will assume a minimum buffer size of 56 packets. We can
set the queue length of the class-based shaper in packets. However, the shaper uses
a token bucket mechanism with a minimum size of the bucket of 10 ms. At our rate
of 20 Mb/s, this translates to:
The token bucket effectively increases the queue length by 45. Thus, we should
expect (and during the experiment observed) that we still have good link utilization
with a maximum queue length set to zero.
CHAPTER 6. EXPERIMENTAL RESULTS ON PHYSICAL ROUTERS 83
Figure 6.9: Utilization data from the router measured during the experiment. The
buffer includes an extra 45 packets due to the minimum size of the toekn buffer that
in this configuration acts like an additional buffer.
We measured the utilization using byte counters and timestamps for large buffers as
well as three settings of small buffers. In each case, we measured the rate over several
minutes. The results are shown in Figure 6.9.
The most important result of this experiment that we can achieve very close to full
2Tp ×C
utilization with buffers that are in the order of √
n
. The delay-bandwidth product
for our experiment is 557 packets, the buffer available on this router is in the 1000s of
pockets, yet we can achieve 98.5% utilization with only the equivalent of 85 packets.
This is the case not with idealized traffic, but a complex mix of long flows, short flows,
UDP and TCP and a variety of non-congestion aware applications. It also holds in
the worst case of a router experiencing heavy congestion.
Additionally, we can see that our model predicts the approximate amount of buffer-
ing needed. The utilization drop predicted by the model is much steeper than the
drop-off we can observe in the experiment. However, the model predicts the point
where utilization drops from full to below full utilization accurately within a small
factor.
Chapter 7
Conclusion
We believe that the buffers in backbone routers are much larger than they need to
be — possibly by two orders of magnitude. We have demonstrated that theory, ns2
simulation and experimental results agree that much smaller buffers are sufficient for
full utilization and good quality of service.
The results we present in this paper assume only a single point of congestion on
a flow’s path. We don’t believe our results would change much if a percentage of the
flows experienced congestion on multiple links, however we have not investigated this.
A single point of congestion means there is no reverse path congestion, which would
likely have an effect on TCP-buffer interactions [46]. With these assumptions, our
simplified network topology is fairly general. In an arbitrary network, flows may pass
through other routers before and after the bottleneck link. However, as we assume
only a single point of congestion, no packet loss and little traffic shaping will occur
on previous links in the network.
We focus on TCP as it is the main traffic type on the internet today, however
Chapter 4 shows that at least some other traffic types can be modeled with the
same model we use for short flows and that in mixes of flows, long TCP flows will
dominate. As in the internet today, the majority of traffic is TCP. Our results should
cover a fairly broad range of scenarios and the experiment supports this. However,
traffic with a large share of traffic that uses different congestion-aware protocols would
likely require further study.
84
CHAPTER 7. CONCLUSION 85
We did run some simulations using active queue management techniques (e.g.
RED [17]) and this had an effect on flow synchronization for a small number of flows.
Aggregates of a large number (> 500) of flows with varying RTTs are not synchronized
and RED tends to have little or no effect on buffer requirements. However, early drop
can slightly increase the required buffer since it uses buffers less efficiently.
Congestion can also be caused by denial of service (DOS) [22] attacks that attempt
to flood hosts or routers with large amounts of network traffic. Understanding how to
make routers robust against DOS attacks is beyond the scope of this thesis, however,
we did not find any direct benefit of larger buffers for resistance to DOS attacks.
If our results are right, they have consequences for the design of backbone routers.
While we have evidence that buffers can be made smaller, we haven’t tested the
hypothesis in a real operational network. It is a little difficult to persuade the operator
of a functioning, profitable network to take the risk and remove 99% of their buffers.
That has to be the next step, and we see the results presented in this thesis as a first
step toward persuading an operator to try it.
If an operator verifies our results, or at least demonstrates that much smaller
buffers work fine, it still remains to persuade the manufacturers of routers to build
routers with fewer buffers. In the short-term, this is difficult too. In a competitive
market-place, it is not obvious that a router vendor would feel comfortable building
a router with 1% of the buffers of its competitors. For historical reasons, the network
operator is likely to buy the router with larger buffers, even if they are unnecessary.
Eventually, if routers continue to be built using the current rule-of-thumb, it will
become very difficult to build line cards from commercial memory chips. And so, in
the end, necessity may force buffers to be smaller. At least, if our results are true,
we know the routers will continue to work just fine, and the network utilization is
unlikely to be affected.
Appendix A
86
APPENDIX A. 87
In both states, if several packets are not acknowledged in time, the sender can also
trigger a timeout. It then goes back to the slow-start mode and the initial window
size. Note that while in congestion avoidance, the window size typically exhibits
a sawtooth pattern. The window size increases linearly until the first loss. It then
sharply halves the window size, and pauses to receive more ACKs (because the window
size has halved, the number of allowed outstanding packets is halved, and so the sender
must wait for them to be acknowledged before continuing). The sender then starts
increasing the window size again.
window size increases by 1 packet every RTT (Appendix A.1), and therefore
1 1
Ẇ (t) = = .
RT T 2Tp + Q(t)/C
We finally get a simple model for the increase of the window size in the sawtooth,
q
W (t) = 2Ct + (2Tp C)2 . (A.2)
B2
T = + 2Tp B.
2C
APPENDIX A. 89
When the buffer size is equal to the bandwidth-delay product (B = 2Tp C), we get
B2
T = 1.5 .
C
142 2
For instance, in Figure 2.2, the modelled sawtooth period is T = 1.5 1000 = 30.2,
which fits the observed value.
We assume that the bursts arrive as a Poisson process N (t) of rate ν, and that their
burst size follows a distribution function F . The queue is serviced with a capacity C.
Therefore, we use an M/G/1 model for the job size in the router queue. Of course,
the traffic arrival rate λ is the product of the burst arrival rate by the average burst
size:
λ = νE[X].
The effective bandwidth theory describes the characteristics of a traffic source, and
in many cases can be a powerful tool to derive properties that are otherwise hard to
compute. For more information we refer the reader to [27], from which most of the
results below are adapted. Consider a cumulative traffic arrival function A(t), where
APPENDIX A. 90
1 h i
α(s, t) = logE esA(t) .
st
In our case, A(t) has i.i.d. increments. It is a special case of Lévy process, and its
effective bandwidth has a simpler form (section 2.4 of [27]):
1 Z sx λ h i
α(s) = (e − 1)νdF (x) = E esX − 1 . (A.4)
s sE[X]
We can now use Cramér’s estimate to model the tail of the distribution of the queue
length Q (section 3.2 of [27]). This estimate assumes an infinite buffer size. For
Cramér’s estimate there has to exist a constant κ such that the effective bandwidth
is the link capacity:
α(κ) = C (A.5)
Also the derivative α0 (s) has to exist at κ. Both are satisfied in our case as λ < C
and α(s) is differentiable for any s > 0. Cramer’s estimate is given by
C − α(0) −κb
P (Q ≥ b) ≈ e (A.6)
κα0 (κ)
s2 2 s3 3
esX = 1 + sX + X + X + O(X 4 ) (A.7)
2 6
We present solutions for the 2nd and 3rd order approximation. In our experience the
2nd order approximation is sufficient to estimate the required buffer.
Second Order Approximation We substitute Equation (A.7) into Equation (A.4)
and obtain for the 2nd order approximation:
s2
" #
λ
α(s) = E 1 + sX + X 2 − 1
sE[X] 2
APPENDIX A. 91
λs E[X 2 ]
= λ+
2 E[X]
λ E[X 2 ]
α0 (s) =
2 E[X]
The load ρ is defined as the ratio of the arrival rate λ by the capacity C:
λ
ρ= .
C
2(1 − ρ) E[X]
κ=
ρ E[X 2 ]
λ
ρ
−λ
P (Q ≥ b) ≈ 2(1−ρ) E[X] λ E[X 2 ]
e−bκ = e−bκ
ρ E[X 2 ] 2 E[X]
s2 s3
" #
λ
α(s) = E 1 + sX + X 2 + X 3 − 1
sE[X] 2 6
2 2 3
λs E[X ] λs E[X ]
= λ+ +
2 E[X] 6 E[X]
λ E[X 2 ] λs E[X 3 ]
α0 (s) = +
2 E[X] 3 E[X]
λ
Again we now need to solve α(κ) = C where ρ = C
. We obtain the quadratic
equation.
and for κ v
2 u 3 E[X 2 ] 2
u !
3 E[X ] 6(1 − ρ) E[X]
κ=− + t
+
2 E[X 3 ] 2 E[X 3 ] ρ E[X 3 ]
For the queue size distribution we find.
λ
ρ
−λ
P (Q ≥ b) ≈ λκ E[X 2 ] λκ2 E[X 3 ]
e−bκ
2 E[X]
+ 3 E[X]
1−ρ E[X]
= e−bκ
ρ 2 E[X 2 ] + κ32 E[X 3 ]
κ
#
# TcpSim
# (c) Guido Appenzeller, 2003
#
add-packet-header Pushback NV
add-packet-header IP TCP Flags ;# hdrs reqd for cbr traffic
match any
[3] Guy Almes. [e2e] mailing list. Posting to the end-to-end mailing list, April, 2004.
[4] Youngmi Joo Anna Gilbert and Nick McKeown. Congestion control and periodic
behavior. In LANMAN Workshop, March 2001.
[5] Guido Appenzeller, Isaac Keslassy, and Nick McKeown. Sizing router buffers.
Technical Report TR04-HPNG-06-08-00, Stanford University, June 2004. Ex-
tended version of the paper published at SIGCOMM 2004.
[6] K.E. Avrachenkov, U. Ayesta, E. Altman, P. Nain, and C. Barakat. The effect
of router buffer size on the TCP performance. In Proceedings of the LONIIS
Workshop on Telecommunication Networks and Teletraffic Theory, pages 116–
121, St.Petersburg, Russia, Januar 2002.
[7] Vijay Bollapragada, Curtis Murphy, and Russ White. Inside Cisco IOS Software
Architecture. Cisco Press, 2000.
[8] L. Brakmo, S. O’Malley, and L. Peterson. Tcp vegas: New techniques for conges-
tion detection and avoidance. In Proceedings of ACM SIGCOMM, pages 24–35,
August 1994.
[9] R. Bush and D. Meyer. RFC 3439: Some internet architectural guidelines and
philosophy, December 2003.
96
BIBLIOGRAPHY 97
[10] J. Cao, W. Cleveland, D. Lin, and D. Sun. Internet traffic tends to poisson and
independent as the load increases. Technical report, Bell Labs, 2001.
[11] Inc. Cisco Systems. Netflow services solution guido, July 2001. http://www.
cisco.com/.
[12] Constantine Dovrolis. [e2e] Queue size of routers. Posting to the end-to-end
mailing list, January 17, 2003.
[13] Bohacek et al. A hybrid system framework for tcp congestion control. Technical
report, University of California at Santa Cruz, 6 2002.
[14] Anja Feldmann, Anna C. Gilbert, and Walter Willinger. Data networks as cas-
cades: Investigating the multifractal nature of internet WAN traffic. In SIG-
COMM, pages 42–55, 1998.
[15] Dennis Ferguson. [e2e] Queue size of routers. Posting to the end-to-end mailing
list, January 21, 2003.
[16] S. Floyd. RFC 3649: Highspeed TCP for large congestion windows, December
2003.
[17] Sally Floyd and Van Jacobson. Random early detection gateways for congestion
avoidance. IEEE/ACM Transactions on Networking, 1(4):397–413, 1993.
[19] S. Ben Fredj, T. Bonald, A. Proutière, G. Régnié, and J.W. Roberts. Statis-
tical bandwidth sharing: a study of congestion at flow level. In Proceedings of
SIGCOMM 2001, San Diego, USA, August 2001.
[20] Michele Garetto and Don Towsley. Modeling, simulation and measurements of
queueing delay under long-tail internet traffic. In Proceedings of SIGMETRICS
2003, San Diego, USA, June 2003.
BIBLIOGRAPHY 98
[21] John Hennessy and David Patterson. Computer Architecture. Morgan Kaufmann
Publishers Inc., 1996.
[22] Alefiya Hussain, John Heidemann, and Christos Papadopoulos. A framework for
classifying denial of service attacks. In Proceedings of ACM SIGCOMM, August
2003.
[23] Gianluca Iannaccone, Martin May, and Christophe Diot. Aggregate traffic perfor-
mance with active queue management and drop from tail. SIGCOMM Comput.
Commun. Rev., 31(3):4–13, 2001.
[24] Sundar Iyer, R. R. Kompella, and Nick McKeown. Analysis of a memory ar-
chitecture for fast packet buffers. In Proceedings of IEEE High Performance
Switching and Routing, Dallas, Texas, May 2001.
[25] Van Jacobson. [e2e] re: Latest TCP measurements thoughts. Posting to the
end-to-end mailing list, March 7, 1988.
[26] C. Jin, D. X. Wei, and S. H. Low. Fast tcp: motivation, architecture, algorithms,
performance. In Proceedings of IEEE Infocom, March 2004.
[29] Microsoft. TCP/IP and NBT configuration parameters for windows xp. Microsoft
Knowledge Base Article - 314053, November 4, 2003.
[30] R. Morris. TCP behavior with many flows. In Proceedings of the IEEE Interna-
tional Conference on Network Protocols, Atlanta, Georgia, October 1997.
[31] Robert Morris. Scalable TCP congestion control. In Proceedings of IEEE INFO-
COM 2000, Tel Aviv, USA, March 2000.
BIBLIOGRAPHY 99
[32] Vern Paxson and Sally Floyd. Wide area traffic: the failure of Poisson modeling.
IEEE/ACM Transactions on Networking, 3(3):226–244, 1995.
[33] Lili Qiu, Yin Zhang, and Srinivasan Keshav. Understanding the performance of
many TCP flows. Comput. Networks, 37(3-4):277–306, 2001.
[34] Luigi Rizzo. Dummynet: a simple approach to the evaluation of network proto-
cols. ACM Computer Communication Review, 27(1):31–41, 1997.
[36] Cisco Support Web Site. Cisco 12000 series routers. http://www.cisco.com/
en/US/products/hw/routers/ps167/.
[37] Cisco Support Web Site. Cisco ios quality of service solutions configura-
tion guide. http://www.cisco.com/univercd/cc/td/doc/product/software/
ios122/122cgcr/fqos_c/fqcprt4/index.htm.
[39] Cisco Support Web Site. Configuring class-based shaping - cisco ios quality of ser-
vice solutions configuration guide. http://www.cisco.com/univercd/cc/td/
doc/product/software/ios122/122cgcr/fqos_c/fqcprt4/qcfcbshp.htm.
[40] Cisco Support Web Site. Configuring generic traffic shaping - cisco ios soft-
ware. http://www.cisco.com/en/US/products/sw/iosswrel/ps1828/prod_
configuration_guides_list.html.
[41] Joel Sommers and Paul Barford. Self-configuring network traffic generation.
In Proceedings of the ACM SIGCOMM Internet Measurement Conference,
Taormina, Italy, October 2004.
BIBLIOGRAPHY 100
[42] W. Richard Stevens. TCP Illustrated, Volume 1 - The Protocols. Addison Wesley,
1994.
[43] Curtis Villamizar and Cheng Song. High performance TCP in ANSNET. ACM
Computer Communications Review, 24(5):45–60, 1994 1994.
[44] Ronald W. Wolff. Stochastic Modelling and the Theory of Queues, chapter 8.
Prentice Hall, October 1989.
[45] Lixia Zhang and David D. Clark. Oscillating behaviour of network traffic: A case
study simulation. Internetworking: Research and Experience, 1:101–112, 1990.
[46] Lixia Zhang, Scott Shenker, and David D. Clark. Observations on the dynamics
of a congestion control algorithm: The effects of two-way traffic. In Proceedings
of ACM SIGCOMM, pages 133–147, September 1991.