between them in the circle. For example, if 24 in Fig. 5-24 wants to join, it asks any node to
look up successor (24), which is 27. Then it asks 27 for its predecessor (20). After it tells both
of those about its existence, 20 uses 24 as its successor and 27 uses 24 as its predecessor. In
addition, node 27 hands over those keys in the range 21-24, which now belong to 24, At this
point, 24 is fully inserted
However, many finger tables are now wrong. To correct them, every node runs a background
process that periodically recomputes each finger by calling successor. When one of these
queries hits a new node, the corresponding finger entry is updated
When a node leaves gracefully, it hands its keys over to its successor and informs its
predecessor of its departure so the predecessor can link to the departing node's successor.
When a node crashes, a problem arises because its predecessor no longer has a valid
successor. To alleviate this problem, each node keeps track not only of its direct successor but
also its s direct successors, to allow it to skip over up tos - 1 consecutive failed nodes and
reconnect the circle.
Chord has been used to construct a distributed file system (Dabek et al., 2001b) and other
applications, and research is ongoing. A different peer-to-peer system, Pastry, and its
applications are described in (Rowstron and Druschel, 2001a; and Rowstron and Druschel,
2001b). A third peer-to-peer system, Freenet, is discussed in (Clarke et al., 2002). A fourth
system of this type Is described in (Ratnasamy et al., 2001)
5.3 Congestion Control Algorithms
When too many packets are present in (a part of) the subnet, performance degrades. This
situation is called congestion. Figure 5-25 depicts the symptom. When the number of packets’
(except for a few that are afflicted with transmission errors)
However, as traffic increases too far, the routers are no
longer able to cope and they begin losing packets. This tends to make matters worse. At very
high trafffic, performance collapses completely and almost no packets are delivered.
Figure 5-25. When too much traffic is offered, congestion sets in and
performance degrades sharply.
fect
aioumeanyen
apaciy of subnet
~ Desiable
Congested
Pockets delivered
Packets sont
Congestion can be brought on by several factors. Ifall/ofa Sudden, streams of packets begin’
201packets will be dutifully forwarded to the next router, increasing the load all the way to the
destination.
If the routers’ CPUs are slow at performing the
bookkeeping tasks required of them (queueing buffers, updating tables, etc.), queues can build
up, even though there is excess line capacity. Similarly, low-bandwidth lines can also cause
congestion. Upgrading the lines but not changing the processors, or vice versa, often helps a
little, but frequently just shifts the bottleneck. Also, upgrading part, but not all, of the system,
often just moves the bottleneck somewhere else. The real problem is frequently a mismatch
between parts of the system. This problem will persist until all the components are in balance.
It Is worth explicitly pointing out the difference between congestion control and flow control, as
the relationship is subtle. Congestion control has to do with making sure the subnet is able to
To see the difference between these two concepts, consider a fiber optic network with a
capacity of 1000 gigabits/sec on which a supercomputer is trying to transfer a file to
personal computer at 1 Gbps. Although there is no congestion (the network itself is not in
trouble), flow control is needed to force the supercomputer to stop frequently to give the
personal computer a chance to breathe.
At the other extreme, consider a store-and-forward network with 1-Mbps lines and 1000 large
computers, half of which are trying to transfer files at 100 kbps to the other half. Here the
problem is not that of fast senders overpowering slow receivers, but that the total offered
traffic exceeds what the network can handle.
The reason congestion control and flow control are often confused is that some congestion
control algorithms operate by sending messages back to the various sources telling them to
slow down when the network gets into trouble. Thus, a host can get a "slow down" message
either because the receiver cannot handle the load or because the network cannot handle it.
We will come back to this point later.
We will start our study of congestion control by looking at a general model for dealing with it.
Then we will look at broad approaches to preventing it in the first place, After that, we will look
at various dynamic algorithms for coping with it once it has set in
5.3.1 General Principles of Congestion Control
Many problems in complex systems, such as computer networks, can be viewed from a control
theory point of view. This approach leads to dividing all solutions into two groups
and closed loop: Open loop solutions attempt to solve the problem by good design, in essence,
to make sure it does not occur in the first place. Once the system is up and running, mideourse)
corrections are not made.
Tools for doing open-loop control include deciding when to accept new traffic, deciding when to
discard packets and which ones, and making scheduling decisions at various points in the
network, All of these have in common the fact that they make decisions without regard to the
current state of the network.
292In contrast, clasedloop solutions are based on the concept ofa feedbackiloop. This approach
has three parts when applied to congestion control:
1. Monitor the system to detect when and where congestion occurs.
2. Pass this information to places where action can be taken.
3. Adjust system operation to correct the problem.
A variety of metrics can be used to monitor the subnet for congestion. Chief among these are
the percentage of all packets discarded for lack of buffer space, the average queue lengths,
the number of packets that time out and are retransmitted, the average packet delay, and the
standard deviation of packet delay. In all cases, rising numbers indicate growing congestion.
‘The second step in the feedback loop is to transfer the information about the congestion from
the point where it is detected to the point where something can be done about it. The obvious
way is for the router detecting the congestion to send a packet to the traffic source or sources,
announcing the problem. Of course, these extra packets increase the load at precisely the
moment that more load is not needed, namely, when the subnet is congested.
However, other possibilities also exist. For example, a bit or field can be reserved in every
packet for routers to fill in whenever congestion gets above some threshold level. When a
router detects this congested state, it fills in the field in all outgoing packets, to warn the
neighbors.
Stil another approach is to have hosts or routers periodically send probe packets out to
explicitly ask about congestion, This information can then be used to route traffic around
problem areas. Some radio stations have helicopters flying around their cities to report on road
congestion to make it possible for their mobile listeners to route their packets (cars) around
hot spots,
In all feedback schemes, the hope is that knowledge of congestion will cause the hosts to take
appropriate action to reduce the congestion. For a scheme to work correctly, the time scale
must be adjusted carefully. If every time two packets arrive in a row, a router yells STOP and
every time a router is idle for 20 psec, it yells GO, the system will oscillate wildly and never
converge, On the other hand, if it waits 30 minutes to make sure before saying anything, the
congestion control mechanism will react too sluggishly to be of any real use. To work well,
some kind of averaging is needed, but getting the time constant right is a nontrivial matter.
Many congestion control algorithms are known. To provide a way to organize them in a
sensible way, Yang and Reddy (1995) have developed a taxonomy for congestion control
algorithms. They begin by dividing all algorithms into open loop or closed loop, as described
above. They further divide the open loop algorithms into ones that act at the source versus
ones that act at the destination. The closed loop algorithms are also divided into two
subcategories: explicit feedback versus implicit feedback. In explicit feedback algorithms,
packets are sent back from the point of congestion to warn the source. In implicit algorithms,
the source deduces the existence of congestion by making local observations, such as the time
needed for acknowledgements to come back
‘The presence of congestion means that the load is (temporarily) greater than the resources (in
part of the system) can handle. Two solutions come to mind: increase the resources or
decrease the load, For example, the subnet may start using dial-up telephone lines to
temporarily increase the bandwidth between certain points. On satellite systems, increasing
transmission power often gives higher bandwidth. Splitting traffic over multiple routes instead
of always using the best one may also effectively increase the bandwidth, Finally, spare
routers that are normally used only as backups (to make the system fault tolerant) can be put
on-line to give more capacity when serious congestion appears.
293However, sometimes it is not possible to increase the capacity, or it has already been
increased to the limit. The only way then to beat back the congestion is to decrease the load
Several ways exist to reduce the load, including denying service to some users, degrading
service to some or all users, and having users schedule their demands in a more predictable
way.
Some of these methods, which we will study shortly, can best be applied to virtual circuits. For
subnets that use virtual circuits internally, these methods can be used at the network layer.
For datagram subnets, they can nevertheless sometimes be used on transport layer
connections. In this chapter, we will focus on their use in the network layer. In the next one,
we will see what can be done at the transport layer to manage congestion
5.3.2 Congestion Prevention Poli
Let us begin our study of methods to control congestion by looking at open loop systems.
‘These systems are designed to minimize congestion in the first place, rather than letting it
happen and reacting after the fact. They try to achieve their goal by using appropriate policies
at various levels. In Fig. 5-26 we see different data link, network, and transport policies that,
can affect congestion (Jain, 1990).
Figure 5-26. Policies that affect congestion.
Layer Policies
Transport | « Retransmission policy
* Outotorder caching poicy
= Acknowledgemert policy
* Flow ecnirol poley
‘Timeout determination
Network « Virtual circuits versus datagram inside the subnet
‘Packet queuaing anc service policy
+ Packet discard policy
+ Routing slgoriten
‘Packet fete management
Data link | « Retransmission policy
* Out-o-order caching poly
+ Acknowledgernert policy
1 Flow ccnirol paley
Let us start at the data link layer and work our way upward. The retransmission policy 1S)
aury
sender that times out quickly and retransmits all outstanding packets using go back n will put.
a heavier load on the system than will a lelsurely sender that Uses selective repeat. Closely
rlte to th the bering ply. receiver routinely acral autor oder packets,
these packets will have to be transmitted again later, creating extra load. With respect to
congestion control, Selective repeat is clearly better than go back
Acknowledgement policy also affects congestion. If each packet is acknowledged immediately,
the acknowledgement packets generate extra traffic. However, if acknowledgements are saved
up to piggyback onto reverse traffic, extra timeouts and retransmissions may result. A tight
flow control scheme (¢.9., a small window) reduces the data rate and thus helps fight
am
‘At the network layer, the choice between using virtual circuits and using datagrams affects
congestion since many congestion control algorithms work only with virtual-crcutt subnets.
Packet queueing and service policy relates to whether routers have one queue per input line,
one queue per output line, or both. It also relates to the order in which packets are processed
208(2.g., round robin or priority based). Discard policy is the rule telling which packet is dropped
When there is no space. A good policy can help alleviate congestion and a bad one can make it
worse,
whereas a bad one can send too much traffic over already congested lines. Fin
lifetime management deals with how long a packet may live before being discarded. Ifit is too
long, lost packets may clog up the works fora fong time, but if its too short, packets may
sometimes time out before reaching thelr destination, thus inducing retransmissions.
In the transport layer, the same issues occur as in the data link layer, Dut in addition,
Getermining the timeout interval is harder because the transit time across the network is less.
bredictable than the transit time over a wire between two routers. Ifthe timeout interval is too
short, extra packets will be sent unnecessarily. If its too long, congestion will be reduced but
the résponse time will suffer whenever a packet is lost.
5.3.3 Congestion Control in Virtual-Circuit Subnets
The congestion control methods described above are basically open loop: they try to prevent’
this
section we will describe some approaches to dynamically controling congestion in virtual=
[BEEIESUBRS. Tn the next two, we wil look at techniques that can be used in any subnet
One technique that is widely used to Keep congestion that has already started from getting
worse is admission control. The idea is simple: once congestion has been signaled, no more
virtual circuits are set up unti'tne probiem-has gone away. Thus, attempts to set up new
transport layer connections fll. Letting more people in just makes matters worse. While this
approach is crude, itis simple and easy to carry out. In the telephone system, when a switch
gets overloaded, it also practices admission control by not giving dial tones.
An alternative approach is to allow new virtual circuits but carefully route all new virtual
Circuits around problem areas, For example, consider the subnet of Fig, 5-27(a), in which two
routers are congested, as indicated.
Figure 5-27. (a) A congested subnet. (b) A redrawn subnet that
eliminates the congestion. A virtual circuit from A to B is also shown.
{Songestion
@ %
Suppose that a host attached to router A wants to set up a connection to a host attached to
router B. Normally, this connection would pass through one of the congested routers. To avoid
this situation, we can redraw the subnet as shown in Fig. 5-27(b), omitting the congested
routers and all of their lines. The dashed line shows a possible route for the virtual circult that
avoids the congested routers.
295Another strategy relating to virtual circuits is to negotiate an agreement between the host and
subnet when a virtual circuit is set up. This agreement normally specifies the volume and
shape of the traffic, quality of service required, and other parameters. To keep its part of the
agreement, the subnet will typically reserve resources along the path when the circuit is set
up. These resources can include table and buffer space in the routers and bandwidth on the
lines. In this way, congestion is unlikely to occur on the new virtual circuits because all the
necessary resources are guaranteed to be available.
This kind of reservation can be done all the time as standard operating procedure or only when
the subnet is congested. A disadvantage of doing it all the time is that it tends to waste
resources. If six virtual circuits that might use 1 Mbps all pass through the same physical 6-
Mbps line, the line has to be marked as full, even though it may rarely happen that all six
virtual circuits are transmitting full blast at the same time, Consequently, the price of the
congestion control is unused (i.e., wasted) bandwidth in the normal case.
5.3.4 Congestion Control in Datagram Subnets
Let us now turn to some approaches that can be used in datagram subnets (and also in virtual-
Cireuit subnets). Each router can easily monitor the utlzation of fs output lines and other
resources, For example, ican associate with each line a real Variable, u, whose value,
between 0.0 and 1.0, reflects the recent utlization ofthat line, To maintain a good estimate of
4, @ sample of the instantaneous line utilization, r (either 0 oF 1); can be made periodically and
u'updated according to
ley =4ttoys + (1 af
where the constant @ determines how fast the router forgets recent history.
Whenever u moves above the threshold, the output line enters a "warning! state. Each newly-
arriving packet is checked to see if its output line is in warning state, If Itis, some action Is
taken. The action taken can be one of several alternatives, which we will now discuss
1 i_ Asoo nto packet headers et bythe route to war the source when congestion detected
The Warning Bit i250. esl spd pach backed on he Ack ane sotto te senses “Poe sents enters be
rambo’ ACK poses tones ih oe wong ot an dats Voremasion ts ocr
‘The old DECNET architecture signaled the warning state by setting a special bit in the packet's
header. So does frame relay. When the packet arrived at ts destination, the transport entity
Copiee the bit into the next acknowledgement sent back to the source. The source then cut
back on traffic
As long as the router was in the warning state, it continued to set the warning bit, which
meant that the source continued to get acknowledgements with it set. The source monitored
the fraction of acknowledgements with the bit set and adjusted its transmission rate
accordingly. As long as the warning bits continued to flow in, the source continued to decrease
its transmission rate. When they slowed to a trickle, it increased its transmission rate, Note
that since every router along the path could set the warning bit, traffic increased only when no
router was in trouble.
Choke Packets
The previous congestion control algorithm is fairly subtle, It uses a roundabout means to tell
the source to slow down. Why not just tell it directly? In this approach,
the router sends a
choke packet back to the source host, giving it the destination found in the packet. The
298original packet is tagged (a header bit is turned on) so that it will not generate any more choke
packets farther along the path and is then forwarded in the usual way.
Hosts can reduce traffic by adjusting their policy parameters, for example, their window size.
Typically, the first choke packet causes the data rate to be reduced to 0.50 of its previous rate,
the next one causes a reduction to 0.25, and so on. Increases are done in smaller increments
to prevent congestion from reoccurring quickly.
Several variations on this congestion control algorithm have been proposed. For one, the
routers can maintain several thresholds. Depending on which threshold has been crossed, the
choke packet can contain a mild warning, a stern warning, or an ultimatum
Another variation is to use queue lengths or buffer utilization instead of line utilization as the
trigger signal. The same exponential weighting can be used with this metric as with u, of
course.
Hop-by-Hop Choke Packets
‘At high speeds or over long distances, sending a choke packet to the source hosts does not
work well because the reaction is so slow. Consider, for example, a host in San Francisco
(router A in Fig. 5-28) that is sending traffic to a host in New York (router D in Fig. 5-28) at
155 Mbps. If the New York host begins to run out of buffers, it will take about 30 msec for 2
choke packet to get back to San Francisco to tell it to slow down. The choke packet
propagation is shown as the second, third, and fourth steps in Fig. 5-28(a). In those 30 msec,
another 4.6 megabits will have been sent. Even if the host in San Francisco completely shuts
down immediately, the 4.6 megabits in the pipe will continue to pour in and have to be dealt
with. Only in the seventh diagram in Fig. 5-28(a) will the New York router notice a slower flow.
Figure 5-28. (a) A choke packet that affects only the source. (b) A
choke packet that affects each hop it passes through.
207