Reliable Data Transfer
Reliable Data Transfer
C HAPTER 19
Reliable Data Transport Protocols
Packets in a best-effort network lead a rough life. They can be lost for any number of rea
sons, including queue overflows at switches because of congestion, repeated collisions
over shared media, routing failures, and uncorrectable bit errors. In addition, packets can
arrive out-of-order at the destination because different packets sent in sequence take differ
ent paths or because some switch en route reorders packets for some reason. They usually
experience variable delays, especially whenever they encounter a queue. In some cases,
the underlying network may even duplicate packets.
Many applications, such as Web page downloads, file transfers, and interactive termi
nal sessions would like a reliable, in-order stream of data, receiving exactly one copy of
each byte in the same order in which it was sent. A reliable transport protocol does the job
of hiding the vagaries of a best-effort network—packet losses, reordered packets, and du
plicate packets—from the application, and provides it the abstraction of a reliable packet
stream. We will develop protocols that also provide in-order delivery.
A large number of protocols have been developed that various applications use, and
there are several ways to provide a reliable, in-order abstraction. This chapter will not
discuss them all, but will instead discuss two protocols in some detail. The first protocol,
called stop-and-wait, will solve the problem in perhaps the simplest possible way that
works, but do so somewhat inefficiently. The second protocol will augment the first one
with a sliding window to significantly improve performance.
All reliable transport protocols use the same powerful ideas: redundancy to cope with
packet losses and receiver buffering to cope with reordering, and most use adaptive timers. The
tricky part is figuring out exactly how to apply redundancy in the form of packet retrans
missions, in working out exactly when retransmissions should be done, and in achieving
good performance. This chapter will study these issues, and discuss ways in which a reli
able transport protocol can achieve high throughput.
305
306 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
can drop packets arbitrarily, reorder them arbitrarily, delay them arbitrarily, and possibly
even duplicate packets. The receiver wants the packets in exactly the same order in which
the sender sent them, and wants exactly one copy of each packet.1 Our goal is to devise
mechanisms at the sending and receiving nodes to achieve what the receiver wants. These
mechanisms involve rules between the sender and receiver, which constitute the proto
col. In addition to correctness, we will be interested in calculating the throughput of our
protocols, and in coming up with ways to maximize it.
All mechanisms to recover from losses, whether they are caused by packet drops or
corrupted bits, employ redundancy. We have already studied error-correcting codes such as
linear block codes and convolutional codes to mitigate the effect of bit errors. In principle,
one could apply similar coding techniques over packets (rather than over bits) to recover
from packet losses (as opposed to bit corruption). We are, however, interested not just in
a scheme to reduce the effective packet loss rate, but to eliminate their effects altogether,
and recover all lost packets. We are also able to rely on feedback from the receiver that
can help the sender determine what to send at any point in time, in order to achieve that
goal. Therefore, we will focus on carefully using retransmissions to recover from packet
losses; one may combine retransmissions and error-correcting codes to produce a proto
col that can further improve throughput under certain conditions. In general, experience
has shown that if packet losses are not persistent and occur in bursts, and if latencies are
not excessively long (i.e., not multiple seconds long), retransmissions by themselves are
enough to recover from losses and achieve good throughput. Most practical reliable data
transport protocols running over Internet paths today use only retransmissions on packets
(individual links usually use the error correction methods, such as the ones we studied
earlier, and may also augment them with a limited number of retransmissions to reduce
the link-level packet loss rate.
We will develop the key ideas for two kinds of reliable data transport protocols: stop
and-wait and sliding window with a fixed window size. We will use the word “sender”
to refer to the sending side of the transport protocol and the word “receiver” to refer to
the receiving side. We will use “sender application” and “receiver application” to refer to
the processes (applications) that would like to send and receive data in a reliable, in-order
manner.
Sender Receiver S R S R
Data
1
Data Data
1 1
1 RTT
A CK 1 X
Data
2 X
ACK 1 lost
2
ACK Timeout
Data
3 Retransmit Da Data
ta 1 1
A CK 3
Figure 19-1: The stop-and-wait protocol. Each picture has a sender timeline and a receiver timeline. Time
starts at the top of each vertical line and increases moving downward. The picture on the left shows what
happens when there are no losses; the middle shows what happens on a data packet loss; and the right
shows how duplicate packets may arrive at the receiver because of an ACK loss.
receiver, upon receiving the data packet with identifier k, will send an acknowledgment
(ACK) to the sender; the header of this ACK contains k, so the receiver communicates “I
got data packet k” to the sender. Both data packets and ACKs may get lost in the network.
In the stop-and-wait protocol, the sender sends the next data packet on the stream if,
and only if, it receives an ACK for k. If it does not get an ACK within some period of time,
called the timeout, the sender retransmits data packet k.
The receiver’s job is to deliver each data packet it receives to the receiver application.
Figure 19-1 shows the basic operation of the protocol when packets are not lost (left) and
when data packets are lost (right).
Three properties of this protocol bear some discussion:
2. why this protocol may deliver duplicate data packets to the receiver application, and
how the receiver can prevent that from occurring, and
Preventing duplicates: The solution to the problem of duplicate data packets arriving
at the receiver is for the receiver to keep track of the last in-sequence data packet it has
delivered to the application. At the receiver, let us maintain the sequence number of the
last in-sequence data packet in the variable rcv seqnum. If a data packet with sequence
number less than or equal to rcv seqnum arrives, then the receiver sends an ACK for the
packet and discards it. Note that the only way a data packet with sequence number smaller
than rcv seqnum can arrive is if there were reordering in the network and the receiver
gets an old data packet; for such packets, the receiver can safely not send an ACK because
it knows that the sender knows about the receipt of the packet and has sent subsequent
packets. This method prevents duplicate packets from being delivered to the receiving
application.
If a data packet with sequence number rcv seqnum + 1 arrives, then the receiver
sends an ACK to the sender, delivers the data packet to the application, and increments
rcv seqnum. Note that a data packet with sequence number greater than rcv seqnum
SECTION 19.3. ADAPTIVE RTT ESTIMATION AND SETTING TIMEOUTS 309
+ 1 should never arrive in this stop-and-wait protocol because that would imply that the
sender got an ACK for rcv seqnum + 1, but such an ACK would have been sent only if
the receiver got the corresponding data packet. So, if such a data packet were to arrive,
then there must be a bug in the implementation of either the sender or the receiver in this
stop-and-wait protocol.
With this modification, the stop-and-wait protocol guarantees exactly-once delivery to
the application.3
&RXUWHV\RIWKH&RRSHUDWLYH$VVRFLDWLRQIRU,QWHUQHW'DWD$QDO\VLV8VHGZLWKSHUPLVVLRQ
A good solution to the problem of picking the timeout value uses two tools we have
seen earlier in the course: probability distributions (in our case, of the RTT estimates) and a
simple filter design.
Suppose we are interested in estimating a good timeout post facto: i.e., suppose we run
the protocol and collect a sequence of RTT samples, how would one use these values to
pick a good timeout? We can take all the RTT samples and plot them as a probability
distribution, and then see how any given timeout value will have performed in terms of
the probability of a spurious retransmission. If the timeout value is T , then this probability
may be estimated as the area under the curve to the right of “T” in the picture on the left
of Figure 19-3, which shows the histogram of RTT samples. Equivalently, if we look at the
cumulative distribution function of the RTT samples (the picture on the right of Figure 19
3, the probability of a spurious retransmission may be assumed to be the value of the y-axis
corresponding to a value of T on the x-axis.
Real-world distributions of RTT are not actually Gaussian, but an interesting property
of all distributions is that if you pick a threshold that is a sufficient number of standard
deviations greater than the mean, the tail probability of a sample exceeding that threshold
can be made arbitrarily small. (For the mathematically inclined, a useful result for arbi
trary distributions is Chebyshev’s inequality, which you might have seen in other courses
already (or soon will): P (|X − μ| ≥ kσ) ≤ 1/k 2 , where μ is the mean and σ the standard
deviation of the distribution. For Gaussians, the tail probability falls off much faster than
1/k 2 ; for instance, when k = 2, the Gaussian tail probability is only about 0.05 and when
k = 3, the tail probability is about 0.003.)
The protocol designer can use past RTT samples to determine an RTT cut-off so that
only a small fraction f of the samples are larger. The choice of f depends on what spuri
ous retransmission rate one is willing to tolerate, and depending on the protocol, the cost
of such an action might be small or large. Empirically, Internet transport protocols tend to
SECTION 19.3. ADAPTIVE RTT ESTIMATION AND SETTING TIMEOUTS 311
Probability
Cumula,ve
probability
(CDF)
RTT sample RTT sample
Figure 19-3: RTT variations on a wide-area cellular wireless network (Verizon Wireless’s 3G CDMA Rev
A service) across both idle periods and when data transfers are in progress, showing extremely high RTT
values and high variability. The x-axis in both pictures is the RTT in milliseconds. The picture on the left
shows the histogram (each bin plots the total probability of the RTT value falling within that bin), while
the picture on the right is the cumulative distribution function (CDF). These delays suggest a poor network
design with excessively long queues that do nothing more than cause delays to be very large. Of course,
it means that the timeout method must adapt to these variations to the extent possible. (Data collected in
November 2009 in Cambridge, MA and Belmont, MA.)
2. How should the sender estimate the mean and deviation and pick a suitable timeout?
Obtaining RTT estimates. If the sender keeps track of when it sent each data packet, then
it can obtain a sample of the RTT when it gets an ACK for the packet. The RTT sample is
simply the difference in time between when the ACK arrived and when the data packet
was sent. An elegant way to keep track of this information in a protocol is for the sender
to include the current time in the header of each data packet that it sends in a “timestamp”
field. The receiver then simply echoes this time in its ACK. When the sender gets an ACK,
it just has to consult the clock for the current time, and subtract the echoed timestamp to
obtain an RTT sample.
312 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
Calculating the timeout. As explained above, our plan is to pick a timeout that uses
both the average and deviation of the RTT sample distribution. The sender must take two
factors into account while estimating these values:
1. It must not get swayed by infrequent samples that are either too large or too small.
That is, it must employ some sort of “smoothing”.
2. It must weigh more recent estimates higher than old ones, because network condi
tions could have changed over multiple RTTs.
Thus, what we want is a way to track changing conditions, while at the same time not
being swayed by sudden changes that don’t persist.
Let’s look at the first requirement. Given a sequence of RTT samples, r0 , r1 , r2 , . . . , rn ,
we want a sequence of smoothed outputs, s0 , s1 , s2 , . . . , sn that avoids being swayed by
sudden changes that don’t persist. This problem sounds like a filtering problem, which we
have studied earlier. The difference, of course, is that we aren’t applying it to frequency
division multiplexing, but the underlying problem is what a low-pass filter (LPF) does.
A simple LPF that provides what we need has the following form:
Figure 19-4: Frequency response of the exponential weighted moving average low-pass filter. As α de
creases, the low-pass filter becomes even more pronounced. The graph shows the response for α =
0.9, 0.5, 0.1, going from top to bottom.
Figure 19-5: Reaction of the exponential weighted moving average filter to a non-persistent spike in the
RTT (the spike is double the other samples). The smaller α (0.1, shown on the left) doesn’t get swayed by
it, whereas the bigger value (0.5, right) does. The output of the filter is shown in green, the input in blue.
sn = αrn + α(1 − α)rn−1 + α(1 − α)2 rn−2 + . . . + α(1 − α)n−1 r1 + (1 − α)n r0 , (19.3)
observing that each successive older sample’s weight is a factor of (1 − α) “less important”
than the previous one’s.
With this approach, one can compute the smoothed RTT estimate, srtt, quite easily
using the pseudocode shown below, which runs each time an ACK arrives with an RTT
estimate, r.
srtt ← αr + (1 − α)srtt
314 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
Figure 19-6: Reaction of the exponential weighted moving average filter to a persistent change (doubling)
in the RTT. The smaller α (0.1, shown on the left) takes much longer to track the change, whereas the bigger
value (0.5, right) responds much quicker. The output of the filter is shown in green, the input in blue.
What about the deviation? Ideally, we want the sample standard deviation, but it turns
out to be a bit easier to compute the mean linear deviation instead.4 The following elegant
method performs this task:
dev ← |r − srtt|
Here, 0 < β < 1, and we apply an EWMA to estimate the linear deviation as well. TCP
uses β = 0.25; again, values between 0.1 and 0.25 have been found to work well.
Finally, the timeout is calculated very easily as follows:
Exponential back-off of the timeout. When a timeout occurs and the sender retransmits
a data packet, it might be lost again (or its ACK might be lost). In that case, it is possible (in
networks where congestion is the main reason for packet loss) that the network is heavily
congested. Rather than using the same timeout value and retransmitting, it would be
prudent to take a leaf from the exponential back-off idea we studied earlier with contention
MAC protocols and double the timeout value. Eventually, when the retransmitted data
packet is acknowledged, the sender can revert to the timeout value calculated from the
mean RTT and its linear deviation. Most reliable transport protocols use an adaptive timer
with such an exponential back-off mechanism.
4
The mean linear deviation is always at least as big as the sample standard deviation, so picking a timeout
equal to the mean plus k times the linear deviation has a tail probability no larger than picking a timeout equal
to the mean plus k times the sample standard deviation.
SECTION 19.4. THROUGHPUT OF STOP-AND-WAIT 315
because once the sender times out, the expected time to send a data packet and get an
ACK is exactly T , the number we want to calculate. Solving Equation (19.4), we find that
£
T = RTT + 1−£ · RTO.
The expected throughput of the protocol is then equal to 1/T packets per second.6
The good thing about the stop-and-wait protocol is that it is very simple, and should be
used under two circumstances: first, when throughput isn’t a concern and one wants good
reliability, and second, when the network path has a small RTT such that sending one data
packet every RTT is enough to saturate the bandwidth of the link or path between sender
and receiver.
On the other hand, a typical Internet path between Boston and San Francisco might
have an RTT of about 100 milliseconds. If the network path has a bit rate of 1 megabit/s,
and we use a data packet size of 10,000 bits, then the maximum throughput of stop-and
wait would be only 10% of the possible rate. And in the face of packet loss, it would be
much lower than that.
The next section describes a protocol that provides considerably higher throughput. It
5
In general, we will treat the loss rate as a probability of loss, so it is a unit-less quantity between 0 and 1;
it is not a “rate” like the throughput. A better term might be the “loss probability” or a “loss ratio” but “loss
rate” has become standard terminology in networking.
6
The careful reader or purist may note that we have only calculated T , the expected time between the trans
mission of a data packet and the receipt of an ACK for it. We have then assumed that the expected value of the
reciprocal of X, which is a random variable whose expected value is T , is equal to 1/T . In general, however,
1/E[X] is not equal to E[1/X]. But the formula for the expected throughput we have written does in fact
hold. Intuitively, to see why, define Yn = X1 + X2 + . . . Xn . As n → ∞, one can show using the Chebyshev
inequality that the probability that |Yn − nT | > δn goes to 0 or any positive δ. That is, when viewed over a
long period of time, the random variable X looks like a constant—which is the only distribution for which the
expected value of the reciprocal is equal to the reciprocal of the expectation.
316 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
data packet, setting the sequence number to be max seq + 1, where max seq is the highest
sequence number sent so far. Of course, we should remember to update max seq as well,
and increment outstanding by 1.
Whenever the sender gets an ACK, it should remove the acknowledged data packet
from unacked pkts (assuming it hasn’t already been removed), decrement outstanding,
and call the procedure to calculate the timeout (which will use the timestamp echoed in
the current ACK to update the EWMA filters and update the timeout value).
We would like outstanding to keep track of the number of unackowledged data pack
ets between sender and receiver. We have described the method to do this task as follows:
increment it by 1 on each new data packet transmission, and decrement it by 1 on each
ACK that was not previously seen by the sender, corresponding to a packet the sender had
previously sent that is being acknowledged (as far as the sender is concerned) for the first
time. The question now is whether outstanding should be adjusted when a retransmis
sion is done. A little thought will show that it should not be. The reason is that it is precisely
on a timeout of a data packet that the sender believes that the packet was actually lost, and
in the sender’s view, the packet has left the network. But the retransmission immediately
adds a data packet to the network, so the effect is that the number of outstanding packets
is exactly the same. Hence, no change is required in the code.
Implementing a sliding window protocol is sometimes error-prone even when one com
pletely understands the protocol in one’s mind. Three kinds of errors are common. First,
the timeouts are set too low because of an error in the EWMA estimators, and data packets
end up being retransmitted too early, leading to spurious retransmissions. In addition to
keeping track of the sender’s smoothed round-trip time (srtt), RTT deviation, and timeout
estimates,7 it is a good idea to maintain a counter for the number of retransmissions done
for each data packet. If the network has a certain total loss rate between sender and re
ceiver and back (i.e., the bi-directional loss rate), pl , the number of retransmissions should
1
be on the order of 1−p l
− 1, assuming that each packet is lost independently and with the
same probability. (It is a useful exercise to work out why this formula holds.) If your im
plementation shows a much larger number than this prediction, it is very likely that there’s
a bug in it.
Second, the number of outstanding data packets might be larger than the configured
window, which is an error. If that occurs, and especially if a bug causes the number of
outstanding packets to grow unbounded, delays will increase and it is also possible that
packet loss rates caused by congestion will increase. It is useful to place an assertion or two
that checks that the outstanding number of data packets does not exceed the configured
window.
Third, when retransmitting a data packet, the sender must take care to modify the time
at which the packet is sent. Otherwise, that packet will end up getting retransmitted re
peatedly, a pretty serious bug that will cause the throughput to diminish.
in increasing sequence order. Then, check to see whether one or more contiguous data
packets starting from rcv seqnum + 1 are in rcvbuf. If they are, deliver them to the
application, remove them from rcvbuf, and remember to update rcv seqnum.
• 19.5.3 Throughput
What is the throughput of the sliding window protocol we just developed? Clearly, we
send W data packets per RTT when there are no data packet or ACK losses, so the through
put in the absence of losses is W /RTT packets per second. So the question one should ask
is, what should we set W to in order to maximize throughput, at least when there are no
data packet or ACK losses? After answering this question, we will provide a simple for
mula for the throughput of the protocol in the absence of losses, and then finally consider
packet losses.
Setting W
One can address the question of how to choose W using Little’s law. Think of the entire
bi-directional path between the sender and receiver as a single queue (in reality it’s more
complicated than a single queue, but the abstraction of a single queue still holds). W is the
number of (unacknowledged) packets in the system and RT T is the mean delay between
the transmission of a data packet and the receipt of its ACK at the sender (upon which the
sender transmits a new data packet). We would like to maximize the processing rate of
this system. Note that this rate cannot exceed the bit rate of the slowest, or bottleneck, link
between the sender and receiver (i.e., the rate of the bottleneck link) . If that rate is B packets
per second, then by Little’s law, setting W = B × RTT will ensure that the protocol comes
close to achieving a thoroughput equal to the available bit rate.
But what should the RTT be in the above formula? After all, the definition of a “RTT
sample” is the time that elapses between the transmission of a data packet and the receipt
of an ACK for it. As such, it depends on other data using the path. Moreover, if one looks
at the formula B = W/ RTT, it suggests that one can simply increase the window size W
to any value and B may correspondingly just increase. Clearly, that can’t be right!
Consider the simple case when there is only one connection active over a network path.
Observe that the RTT experienced by a packet P sent on the connection may be broken
into two parts: one part that does not depend on any queueing delay (i.e., the sum of the
propagation, transmission, and processing delays of the packet and its ACK), and one part
that depends on how many other packets were ahead of P in the bottleneck queue. (Here
we are assuming that ACKs experience no queueing, for simplicity.) Denote the RTT in the
absence of queuing as RTTmin , the minimum possible round-trip time that the connection
can experience.
Now, suppose the RTT of the connection is equal to RTTmin . That is, there is no queue
building up at the bottleneck link. Then, the throughput of the connection is W/RTT
= W/RTTmin . We would like this throughput to be the bottleneck link rate, B. Setting
W/RTTmin = B, we find that W should be equal to B · RTTmin .
This quantity—B · RTTmin —is an important concept for sliding window protocols (all
sliding window protocols, not just the one we have studied). It is called the bandwidth-
delay product of the connection and is a property of the bi-directional network path be
tween sender and receiver. When the window size is strictly smaller than the bandwidth
320 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
delay product, the throughput will be strictly smaller than the bottleneck rate, B, and the
queueing delay will be non-existent. In this phase, the connection’s throughput linearly
increases as we increase the window size, W , assuming no other traffic intervenes. The
smallest window size for which the throughput will be equal to B is the bandwidth-delay
product.
This discussion shows that for our sliding window protocol, setting W = B × RTTmin
achieves the maximum possible throughput, B, in the absence of any data packet or ACK
losses. When packet losses occur, the window size will need to be higher to get maximum
throughput (utilization), because we need a sufficient number of unacknowledged data
packets to keep a B × RTTmin worth of packets even when losses occur. A smaller win
dow size will achieve sub-optimal throughput, linear in the window size, and inversely
proportional to RTTmin .
But once W exceeds B × RTTmin , the RTT experienced by the connection includes
queueing as well, and the RTT will no longer be a constant independent of W ! That is, in
creasing W will cause RTT to also increase, but the rate, B, will no longer increase. What
is the throughput in this case?
We can answer this question by applying Little’s law twice. Once at the bottleneck
link’s queue, and once on the entire network path. We will show the intuitive result that if
W > B × RTTmin , then the throughput is B packets per second.
First, let the average number of packets at the queue of the bottleneck link be Q. By
Little’s law applied to this queue, we know that Q = B · τ , where B is the rate at which
the queue drains (i.e., the bottleneck link rate), and τ is the average delay in the queue, so
τ = Q/B.
We also know that
RTT = RTTmin + τ = RTTmin + Q/B. (19.5)
Now, consider the window size, W , which is the number of unacknowledged packets.
We know that all these packets, by conservation of packets, must either be in the bottleneck
queue, or in the non-queueing part of the system. That is,
W = Q + B · RTTmin . (19.6)
Finally, from Little’s law applied to the entire bi-directional network path,
W
Throughput = (19.7)
RTT
B · RTTmin + Q
= (19.8)
RTTmin + (Q/B)
= B (19.9)
Thus, we can conclude that, in the absence of any data packet or ACK losses, the con
nection’s throughput is as shown schematically in Figure 19-8.
Assuming that one sets the window size properly, i.e., to be large enough so that W ≥
B × RTTmin always, even in the presence of data or ACK losses, what is the maximum
SECTION 19.5. SLIDING WINDOW PROTOCOL 321
Throughput of connecton
(no data or ACK losses)
R min
8*RTT Window size, W
Figure 19-8: Throughput of the sliding window protocol as a function of the window size in a network
with no other traffic. The bottleneck link rate is B packets per second and the RTT without any queueing
is RTTmin . The product of these two quantities is the bandwidth-delay product.
throughput of our sliding window protocol if the network has a certain probability of
packet loss?
Consider a simple model in which the network path loses any packet—data or ACK—
such that the probability of either a data packet being lost or its ACK being lost is equal to
C, and the packet loss random process is independent and identically distributed (the same
model as in our analysis of stop-and-wait). Then, the utilization achieved by our sliding
window reliable transport protocol is at most 1 − C. Moreover, for a large-enough window
size, W , our sliding window protocol comes close to achieving it.
The reason for the upper bound on utilization is that in this protocol, a data packet is
acknowledged only when the sender gets an ACK explicitly for that packet. Now consider
the number of transmissions that any given data packet must incur before its ACK is re
ceived by the sender. With probability 1 − C, we need one transmission, with probability
C(1 − C), we need two transmissions, and so on, giving us an expected number of transmis
1
sions of 1−£ . If we make this number of transmissions, one data packet is successfully sent
and acknowledged. Hence, the utilization of the protocol can be at most 11 = 1 − C. In
1−£
fact, it turns out the 1 − C is the capacity (i.e., upper-bound on throughput) for any channel
(network path) with packet loss rate C.
If the sender picks a window size sufficiently larger than the bandwidth-minimum-
RTT product, so that at least bandwidth-minimum-RTT packets are in transit (unacknowl
edged) even in the face of data and ACK losses, then the protocol’s utilization will be close
to the maximum value of 1 − C.
322 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
Given that our sliding window protocol always sends a data packet every time the sender
gets an ACK, one might reasonably ask whether setting a good timeout value, which under
even the best of conditions involves a hard trade-off, is essential. The answer turns out to
be subtle: it’s true that the timeout can be quite large, because data packets will continue to
flow as long as some ACKs are arriving. However, as data packets (or ACKs) get lost, the
effective window size keeps falling, and eventually the protocol will stall until the sender
retransmits. So one can’t ignore the task of picking a timeout altogether, but one can pick
a more conservative (longer) timeout than in the stop-and-wait protocol. However, the
longer the timeout, the bigger the stalls experienced by the receiver application—even
though the receiver’s transport protocol would have received the data packets, they can’t
be delivered to the application because it wants the data to be delivered in order. Therefore,
a good timeout is still quite useful, and the principles discussed in setting it are widely
useful.
Secondly, we note that the longer the timeout, the bigger the receiver’s buffer has to be
when there are losses; in fact, in the worst case, there is no bound on how big the receiver’s
buffer can get. To see why, think about what happens if we were unlucky and a data packet
with a particular sequence number kept getting lost, but everything else got through.
The two factors mentioned above affect the throughput of the transport protocol, but
the biggest consequence of a long timeout is the effect on the latency perceived by appli
cations (and users). The reason is that data packets are delivered in-order by the protocol
to the application, which means that a missing packet with sequence number k will cause
the application to stall, even though data packets with sequence numbers larger than k
have arrived and are in the transport protocol’s receiver buffer. Hence, an excessively long
timeout hurts interactivity and degrades the user’s experience.
• 19.6 Summary
This chapter described the key concepts in the design on a reliable data transport proto
col. The big idea is to use redundancy in the form of careful retransmissions, for which
we developed the idea of using sequence numbers to uniquely identify data packets and
acknowledgments for the receiver to signal the successful reception of a data packet to
the sender. We discussed how the sender can set a good timeout, balancing between the
ability to track a persistent change of the round-trip times against the ability to ignore non-
persistent glitches. The method to calculate the timeout involved estimating a smoothed
mean and linear deviation using an exponential weighted moving average, which is a sin
gle real-zero low-pass filter. The timeout itself is set at the mean + 4 times the deviation to
ensure that the tail probability of a spurious retransmission is small. We used these ideas
in developing the simple stop-and-wait protocol.
We then developed the idea of a sliding window to improve performance, and showed
how to modify the sender and receiver to use this concept. Both the sender and receiver are
now more complicated than in the stop-and-wait protocol, but when there are no losses,
one can set the window size to the bandwidth-delay product and achieve high throughput
in this protocol. We also studied how increasing the window size increases the throughput
linearly up to a point, after only the (queueing) delay increases, and not the throughput of
SECTION 19.6. SUMMARY 323
the connection.
• Acknowledgments
Thanks to Karl Berggren, Katrina LaCurts, Alexandre Megretski, Anirudh Sivaraman, Sari
Canelake and Patricia Saylor for suggesting various improvements to this chapter.
2. The 802.11 (WiFi) link-layer uses a stop-and-wait protocol to improve link reliability.
The protocol works as follows:
(a) The sender transmits data packet k + 1 to the receiver as soon as it receives an
ACK for the data packet k.
(b) After the receiver gets the entire data packet, it computes a checksum (CRC).
The processing time to compute the CRC is Tp and you may assume that it does
not depend on the packet size.
(c) If the CRC is correct, the receiver sends a link-layer ACK to the sender. The
ACK has negligible size and reaches the sender instantaneously.
The sender and receiver are near each other, so you can ignore the propagation delay.
The bit rate is R = 54 Megabits/s, the smallest data packet size is 540 bits, and the
largest data packet size is 5,400 bits.
What is the maximum processing time Tp that ensures that the protocol will achieve
a throughput of at least 50% of the bit rate of the link in the absence of data packet
and ACK losses, for any data packet size?
3. Alyssa P. Hacker sets up a wireless network in her home to enable her computer
(“client”) to communicate with an Access Point (AP). The client and AP communi
cate with each other using a stop-and-wait protocol.
The data packet size is 10000 bits. The total round-trip time (RTT) between the AP
and client is equal to 0.2 milliseconds (that includes the time to process the packet,
transmit an ACK, and process the ACK at the sender) plus the transmission time of
the 10000 bit packet over the link.
Alyssa can configure two possible transmission bit rates for her link, with the follow
ing properties:
324 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
10 Megabits/s 1/11
20 Megabits/s 1/4
Alyssa’s goal is to select the bit rate that provides the higher throughput for a stream
of packets that need to be delivered reliably between the AP and client using stop
and-wait. For both bit rates, the retransmission timeout (RTO) is 2.4 milliseconds.
(a) Calculate the round-trip time (RTT) for each bit rate?
(b) For each bit rate, calculate the expected time, in milliseconds, to successfully
deliver a packet and get an ACK for it. Show your work.
(c) Using the above calculations, which bit rate would you choose to achieve
Alyssa’s goal?
4. Suppose the sender in a reliable transport protocol uses an EWMA filter to estimate
the smoothed round trip time, srtt, every time it gets an ACK with an RTT sample r.
5. TCP computes an average round-trip time (RTT) for the connection using an EWMA
estimator, as in the previous problem. Suppose that at time 0, the initial estimate,
srtt, is equal to the true value, r0 . Suppose that immediately after this time, the RTT
for the connection increases to a value R and remains at that value for the remainder
of the connection. You may assume that R >> r0 .
Suppose that the TCP retransmission timeout value at step n, RT O(n), is set to β · srtt.
Calculate the number of RTT samples before we can be sure that there will be no
spurious retransmissions. Old TCP implementations used to have β = 2 and α =
1/8. How many samples does this correspond to before spurious retransmissions
are avoided, for this problem? (As explained in Section 19.3, TCP now uses the mean
linear deviation as its RTO formula. Originally, TCP didn’t incorporate the linear
deviation in its RTO formula.)
6. Consider a sliding window protocol between a sender and a receiver. The receiver
should deliver data packets reliably and in order to its application.
The sender correctly maintains the following state variables:
unacked pkts – the buffer of unacknowledged data packets
first unacked – the lowest unacked sequence number (undefined if all data
packets have been acked)
last unacked – the highest unacked sequence number (undefined if all data
SECTION 19.6. SUMMARY 325
If the receiver gets a data packet that is strictly larger than the next one in sequence,
it adds the packet to a buffer if not already present. We want to ensure that the size
of this buffer of data packets awaiting delivery never exceeds a value W ≥ 0. Write
down the check(s) that the sender should perform before sending a new data packet
in terms of the variables mentioned above that ensure the desired property.
7. Alyssa P. Hacker measures that the network path between two computers has a
round-trip time (RTT) of 100 milliseconds. The queueing delay is negligible. The
rate of the bottleneck link between them is 1 Mbyte/s. Alyssa implements the re
liable sliding window protocol studied in 6.02 and runs it between these two com
puters. The data packet size is fixed at 1000 bytes (you can ignore the size of the
acknowledgments). There is no other traffic.
(a) Alyssa sets the window size to 10 data packets. What is the resulting maximum
utilization of the bottleneck link? Explain your answer.
(b) Alyssa’s implementation of a sliding window protocol uses an 8-bit field for
the sequence number in each data packet. Assuming that the RTT remains the
same, what is the smallest value of the bottleneck link bandwidth (in Mbytes/s)
that will cause the protocol to stop working correctly when packet losses occur?
Assume that the definition of a window in her protocol is the difference between
the last transmitted sequence number and the last in-sequence ACK.
(c) Suppose the window size is 10 data packets and that the value of the sender’s
retransmission timeout is 1 second. A data packet gets lost before it reaches the
receiver. The protocol continues and no other data packets or acks are lost. The
receiver wants to deliver data to the application in order.
What is the maximum size, in packets, that the buffer at the receiver can grow
to in the sliding window protocol? Answer this question for the two different
definitions of a “window” below.
i. When the window is the maximum difference between the last transmitted
data packet and the last in-sequence ACK received at the sender:
ii. When the window is the maximum number of unacknowledged data pack
ets at the sender:
1234444444
(a) Now, suppose that the sender times out and retransmits the first unacknowl
edged data packet. When the receiver gets that retransmitted data packet, what
can you say about the ACK, a, that it sends?
i. a = 5.
ii. a ≥ 5.
iii. 5 ≤ a ≤ 11.
iv. a = 11.
v. a ≤ 11.
(b) Assuming no ACKs were lost, what is the minimum window size that can pro
duce the sequence of ACKs shown above?
(c) Is it possible for the given sequence of cumulative ACKs to have arrived at the
sender even when no data packets were lost en route to the receiver when they
were sent?
(d) A little bit into the data transfer, the sender observes the following sequence of
cumulative ACKs sent from the receiver:
21 22 23 25 28
The window size is 8 packets. What data packet(s) should the sender transmit
upon receiving each of the above ACKs, if it wants to maximize the number of
unacknowledged data packets?
On getting ACK # → Send ?? On getting ACK # → Send ??
21 → 22 →
23 → 25 →
28 →
9. Give one example of a situation where the cumulative ACK protocol described in
the previous problem gets higher throughput than the sliding window protocol de
scribed in this chapter.
10. A sender S and receiver R communicate reliably over a series of links using a sliding
window protocol with some window size, W packets. The path between S and R
has one bottleneck link (i.e., one link whose rate bounds the throughput that can be
achieved), whose data rate is C packets/second. When the window size is W , the
queue at the bottleneck link is always full, with Q data packets in it. The round trip
time (RTT) of the connection between S and R during this data transfer with window
SECTION 19.6. SUMMARY 327
size W is T seconds, including the queueing delay. There are no data packet or ACK
losses in this case, and there are no other connections sharing this path.
(a) Write an expression for W in terms of the other parameters specified above.
(b) We would like to reduce the window size from W and still achieve high uti
lization. What is the minimum window size, Wmin , which will achieve 100%
utilization of the bottleneck link? Express your answer as a function of C, T ,
and Q.
(c) Now suppose the sender starts with a window size set to Wmin . If all these data
packets get acknowledged and no packet losses occur in the window, the sender
increases the window size by 1. The sender keeps increasing the window size
in this fashion until it reaches a window size that causes a data packet loss to
occur. What is the smallest window size at which the sender observes a data
packet loss caused by the bottleneck queue overflowing? Assume that no ACKs
are lost.
11. Ben Bitdiddle decides to use the sliding window transport protocol described in
these notes on the network shown in Figure 19-9. The receiver sends end-to-end
ACKs to the sender. The switch in the middle simply forwards packets in best-effort
fashion.
�ueue 10 bytes s
10 bytes s
Sender
enderr �witc�
�w
witc�
wit � �ecei�er
�ecei�
10 bytes
bytees s 10 bytes
byyytes s
(a) The sender’s window size is 10 packets. At what approximate rate (in packets
per second) will the protocol deliver a multi-gigabyte file from the sender to the
receiver? Assume that there is no other traffic in the network and packets can
only be lost because the queues overflow.
i. Between 900 and 1000.
ii. Between 450 and 500.
iii. Between 225 and 250.
iv. Depends on the timeout value used.
328 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
(b) You would like to double the throughput of this sliding window transport pro
tocol running on the network shown on the previous page. To do so, you can
apply one of the following techniques alone:
i. Double the window size.
ii. Halve the propagation time of the links.
iii. Double the rate of the link between the Switch and Receiver.
For each of the following sender window sizes, list which of the above tech
niques, if any, can approximately double the throughput. If no technique does
the job, say “None”. There might be more than one answer for each window
size, in which case you should list them all. Each technique works in isolation.
1. W = 10:
2. W = 50:
3. W = 30:
12. Eager B. Eaver starts MyFace, a next-generation social networking web site in which
the only pictures allowed are users’ faces. MyFace has a simple request-response
interface. The client sends a request (for a face), the server sends a response (the
face). Both request and response fit in one packet (the faces in the responses are small
pictures!). When the client gets a response, it immediately sends the next request.
The size of the largest packet is S = 1000 bytes.
Eager’s server is in Cambridge. Clients come from all over the world. Eager’s mea
surements show that one can model the typical client as having a 100 millisecond
round-trip time (RTT) to the server (i.e., the network component of the request-
response delay, not counting the additional processing time taken by the server, is
100 milliseconds).
If the client does not get a response from the server in a time τ , it resends the request.
It keeps doing that until it gets a response.
(a) Is the protocol described above “at least once”, “at most once”, or “exactly
once”?
(b) Eager needs to provision the link bandwidth for MyFace. He anticipates that at
any given time, the largest number of clients making a request is 2000. What
minimum outgoing link bandwidth from MyFace will ensure that the link con
necting MyFace to the Internet will not experience congestion?
(c) Suppose the probability of the client receiving a response from the server for
any given request is p. What is the expected time for a client’s request to obtain
a response from the server? Your answer will depend on p, RTT, and τ .
13. Lem E. Tweetit is designing a new protocol for Tweeter, a Twitter rip-off. All tweets
in Tweeter are 1000 bytes in length. Each tweet sent by a client and received by the
Tweeter server is immediately acknowledged by the server; if the client does not
receive an ACK within a timeout, it re-sends the tweets, and repeats this process
until it gets an ACK.
SECTION 19.6. SUMMARY 329
Sir Tweetsalot uses a device whose data transmission rate is 100 Kbytes/s, which you
can assume is the bottleneck rate between his client and the server. The round-trip
propagation time between his client and the server is 10 milliseconds. Assume that
there is no queueing on any link between client and server and that the processing
time along the path is 0. You may also assume that the ACKs are very small in size,
so consume neglible bandwidth and transmission time (of course, they still need to
propagate from server to client). Do not ignore the transmission time of a tweet.
(a) What is the smallest value of the timeout, in milliseconds, that will avoid spuri
ous retransmissions?
(b) Suppose that the timeout is set to 90 milliseconds. Unfortunately, the probability
that a given client transmission gets an ACK is only 75%. What is the utilization
of the network?
14. A sender A and a receiver B communicate using the stop-and-wait protocol studied
in this chapter. There are n links on the path between A and B, each with a data rate
of R bits per second. The size of a data packet is S bits and the size of an ACK is K
bits. Each link has a physical distance of D meters and the speed of signal propaga
tion over each link is c meters per second. The total processing time experienced by a
data packet and its ACK is Tp seconds. ACKs traverse the same links as data packets,
except in the opposite direction on each link (the propagation time and data rate are
the same in both directions of a link). There is no queueing delay in this network.
Each link has a packet loss probability of p, with packets being lost independently.
What are the following four quantities in terms of the parameters given?
(a) Transmission time for a data packet on one link between A and B:
.
(b) Propagation time for a data packet across n links between A and B:
.
(c) Round-trip time (RTT) between A and B?
.
(The RTT is defined as the elapsed time between the start of transmission of a
data packet and the completion of receipt of the ACK sent in response to the
data packet’s reception by the receiver.)
(d) Probability that a data packet sent by A will reach B:
.
15. Ben Bitdiddle gets rid of the timestamps from the packet header in this chapter’s
stop-and-wait transport protocol running over a best-effort network. The network
may lose or reorder packets, but it never duplicates a packet. In the protocol, the
receiver sends an ACK for each data packet it receives, echoing the sequence number
of the packet that was just received.
The sender uses the following method to estimate the round-trip time (RTT) of the
connection:
330 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
1. When the sender transmits a packet with sequence number k, it stores the time
on its machine at which the packet was sent, tk . If the transmission is a retrans
mission of sequence number k, then tk is updated.
2. When the sender gets an ACK for packet k, if it has not already gotten an ACK
for k so far, it observes the current time on its machine, ak , and measures the
RTT sample as ak − tk .
If the ACK received by the sender at time ak was sent by the receiver in response
to a data packet sent at time tk , then the RTT sample ak − tk is said to be correct.
Otherwise, it is incorrect.
State True or False for the following statements, with an explanation for your choice.
(a) If the sender never retransmits a data packet during a data transfer, then all the
RTT samples produced by Ben’s method are correct.
(b) If data and ACK packets are never reordered in the network, then all the RTT
samples produced by Ben’s method are correct.
(c) If the sender makes no spurious retransmissions during a data transfer (i.e., it
only retransmits a data packet if all previous transmissions of data packets with
the same sequence number did in fact get dropped before reaching the receiver),
then all the RTT samples produced by Ben’s method are correct.
16. Opt E. Miser implements this chapter’s stop-and-wait reliable transport protocol
with one modification: being stingy, he replaces the sequence number field with a
1-bit field, deciding to reuse sequence numbers across data packets. The first data
packet has sequence number 1, the second has number 0, the third has number 1, the
fourth has number 0, and so on. Whenever the receiver gets a packet with sequence
number s(= 0 or 1), it sends an ACK to the sender echoing s. The receiver delivers a
data packet to the application if, and only if, its sequence number is different from the
last one delivered, and upon delivery, updates the last sequence number delivered.
He runs this protocol over a best-effort network that can lose packets (with prob
ability < 1) or reorder them, and whose delays are variable. Explain whether the
modified protocol always provides reliable, in-order delivery of a stream of packets.
17. Consider a reliable transport connection using this chapter’s sliding window proto
col on a network path whose RTT in the absence of queueing is RTTmin = 0.1 seconds.
The connection’s bottleneck link has a rate of C = 100 packets per second, and the
queue in front of the bottleneck link has space for Q = 20 packets.
Assume that the sender uses a sliding window protocol with fixed window size.
There is no other traffic on the path.
(a) If the window size is 8 packets, then what is the throughput of the connection?
(b) If the window size is 16 packets, then what is the throughput of the connection?
(c) What is the smallest window size for which the connection’s RTT exceeds
RTTmin ?
SECTION 19.6. SUMMARY 331
(d) What is the largest value of the sender window size for which no packets are
lost due to a queue overflow?
18. Annette Werker correctly implements the fixed-size sliding window protocol de
scribed in this chapter. She instruments the sender to store the time at which each
DATA packet is sent and the time at which each ACK is received. A snippet of the
DATA and ACK traces from an experiment is shown in the picture below. Each + is
a DATA packet transmission, with the x-axis showing the transmission time and the
y-axis showing the sequence number. Each × is an ACK reception, with the x-axis
showing the ACK reception time and the y-axis showing the ACK sequence number.
All DATA packets have the same size.
1080
’DATA’
’ACK’
1060
1040
1020
DATA or ACK Sequence Number
1000
980
960
940
920
900
880
860
840
1160 1180 1200 1220 1240 1260 1280 1300 1320 1340 1360 1380 1400
Time (milliseconds)
Answer the following questions, providing a brief explanation for each one.
(a) Estimate any one sample round-trip time (RTT) of the connection.
(b) Estimate the sender’s retransmission timeout (RTO) for this trace.
(c) On the picture, circle DATA packet retransmissions for four different sequence
numbers.
(d) Some DATA packets in this trace may have incurred more than one retransmis
sion? On the picture, draw a square around one such retransmission.
(e) What is your best estimate of the sender’s window size?
(f) What is your best estimate of the throughput in packets per second of the con
nection?
332 CHAPTER 19. RELIABLE DATA TRANSPORT PROTOCOLS
(g) Considering only sequence numbers > 880, what is your best estimate of the
packet loss rate experienced by DATA packets?
19. Consider the same setup as the previous problem. Suppose the window size for the
connection is equal to twice the bandwidth-delay product of the network path.
For each change to the parameters of the network path or the sender given below,
explain if the connection’s throughput (not utilization) will increase, decrease, or
remain the same. In each statement, nothing other than what is specified in that
statement changes.
20. Annette Werker conducts tests between a server and a client using the sliding win
dow protocol described in this chapter. There is no other traffic on the path and no
packet loss. Annette finds that:
• With a window size W1 = 50 packets, the throughput is 200 packets per second.
• With a window size W2 = 100 packets, the throughput is 250 packets per second.
Annette finds that even this small amount of information allows her to calculate
several things, assuming there is only one bottleneck link. Calculate the following:
(a) The minimum round-trip time between the client and server.
(b) The average queueing delay at the bottleneck when the window size is 100
packets.
(c) The average queue size when the window size is 100 packets.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.