100% found this document useful (6 votes)
10K views

Computer Networking Chap3

This document provides an overview of transport layer concepts and protocols. It discusses: 1) The relationship between the transport and network layers, and how transport layer protocols like TCP and UDP build upon the delivery services of the network layer to provide end-to-end communication between application processes. 2) Key concepts like multiplexing, demultiplexing, connectionless transport with UDP, reliable data transfer principles, connection-oriented transport with TCP, and congestion control principles. 3) Details on UDP including its segment structure, checksum calculation, and controversies regarding its lack of reliability and congestion control capabilities. It also discusses how some applications build their own reliability on top of UDP.

Uploaded by

feelif
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
10K views

Computer Networking Chap3

This document provides an overview of transport layer concepts and protocols. It discusses: 1) The relationship between the transport and network layers, and how transport layer protocols like TCP and UDP build upon the delivery services of the network layer to provide end-to-end communication between application processes. 2) Key concepts like multiplexing, demultiplexing, connectionless transport with UDP, reliable data transfer principles, connection-oriented transport with TCP, and congestion control principles. 3) Details on UDP including its segment structure, checksum calculation, and controversies regarding its lack of reliability and congestion control capabilities. It also discusses how some applications build their own reliability on top of UDP.

Uploaded by

feelif
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Chap.

3 Transport Layer
‰ Goal : study principle of providing comm services to app processes
and implementation issues in the Internet protocols, TCP and UDP
‰ Contents
z Relationship bw transport and net layers

{ extending net layer’s delivery service to a delivery service bw

two app-layer processes, by covering UDP


z Principles of reliable data transfer and TCP

z Principles of congestion control and TCP’s congestion control

3-1
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
z Relationship Between Transport and Network Layers

z Overview of the Transport Layer in the Internet

‰ Multiplexing and Demultiplexing


‰ Connectionless Transport: UDP
‰ Principle of Reliable Data Transfer
‰ Connection-Oriented Transport: TCP
‰ Principles of Congestion Control
‰ TCP Congestion Control

3-2
Overview of Transport-layer
‰ provide logical comm bw app
processes running on diff hosts
‰ transport protocols run in end
systems
z sending side: converts msgs from
app process into transport-layer
pkts (segments in Internet term),
passes them to net layer
{ (possibly) break app msgs into

small chunks, and add headers


z receiving side: processes
segments from net layer, making
them available to app
‰ more than one transport protocol
available to apps
z Internet: TCP and UDP
3-3
Relationship bw Transport and Network layers
‰ transport layer provides logical comm bw processes, whereas net layer
provides logical comm bw hosts
‰ Household analogy
z kids in one household (A) write letters to kids in another household (B)

{ Ann in A and Bill in B collect/distribute mail from/to other kids

z analogies

{ letters in envelopes ~ app messages

{ kids ~ processes

{ houses ~ hosts

{ Ann and Bill ~ transport protocol

Š not involved in delivering mail bw mail centers


Š Susan-Harvey, substituting Ann-Bill, may provide diff service
Š services (e.g., delay and bw guarantees) clearly constrained by the
service the postal service provides
Š certain service (e.g., reliable, secure) can be offered even when
postal service doesn’t offer the corresponding service
{ postal service ~ net layer protocol
3-4
Overview of Transport-layer in the Internet
‰ IP (Internet Protocol) provides best-effort delivery service
z makes “best-effort” to deliver segments, but no guarantees : no
guarantee on orderly delivery, integrity of data in segments
⇒ unreliable service
‰ User Datagram Protocol (UDP) : provides an unreliable
connectionless service, no-frills extension of IP service
z transport-layer multiplexing and demultiplexing : extend IP’s
host-to-host delivery to process-to-process delivery
z integrity checking by including error-detection fields in segment
header
‰ Transmission Control Protocol (TCP) : provides a reliable
connection-oriented service with several additional services to app
z reliable data transfer : correct and in-order delivery by using
{ flow control and error control (seq #, ack, timers)

z connection setup
z congestion control

3-5
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport: UDP
‰ Principle of Reliable Data Transfer
‰ Connection-Oriented Transport: TCP
‰ Principles of Congestion Control
‰ TCP Congestion Control

3-6
Multiplexing and Demultiplexing
‰ a process can have one or more sockets; each socket having a unique id
‰ multiplexing at sending host : Ann’s job in household analogy
z gathering data chunks at sources from diff sockets

z encapsulating each chunk with header info to create segments

z passing segments to net layer

‰ demultiplexing at receiving host : Bill’s job in household analogy


z delivering data in a seg to the correct socket

3-7
How Demultiplexing Works
‰ host receives IP datagrams
z each datagram has src and dst IP addrs

{ each datagram carries a transport-layer seg

z each seg has src and dst port #s

{ well-known port #s : reserved for well-known app protocols,

ranging 0 ~ 1023 : HTTP(80), FTP(21), SMTP(25) , DNS(53)


{ other #s : can be used for user apps

‰ IP addrs and port #s used to direct seg to appropriate socket

3-8
Connectionless Multiplexing and Demultiplexing
‰ creating UDP socket
DatagramSocket mySocket1 = new DatagramSocket();
{ transport layer automatically assigns a port # to the socket, in the

range 1024~65535 not currently used by other UDP ports


DatagramSocket mySocket2 = new DatagramSocket(19157);
{ app assigns a specific port # 19157 to the UDP socket

z typically, the port # in the client side is automatically assigned,


whereas the server side assigns a specific port #
‰ When a host receives UDP seg, it
checks dst port # in the seg and
directs the seg to the socket with
that port #
z UDP socket identified by 2-tuple :
(dst IP addr, dst port #)
{ IP datagrams with diff src IP

addrs and/or src port #s are


directed to the same socket
z src port addr is used as dst port
addr in return seg 3-9
Connection-Oriented Mux/Dumux (1)
‰ TCP socket identified by 4-tuple
(src IP addr, src port #, dst IP addr, dst port #)
‰ demultiplexing at receiving host
z 4-tuple used to direct seg to appropriate socket

z TCP segs with diff src IP addrs or src IP port #s are directed
to two diff sockets (except TCP seg carrying conn-
establishment request)
‰ server host may support many simultaneous TCP sockets
z each socket identified by its own 4-tuple

3-10
Connection-Oriented Mux/Dumux (2)

3-11
Connection-Oriented Mux/Demux : Threaded Server
‰ Today’s high-performing Web server uses only one process, but
creating a new thread with a new conn for each new client conn
z connection sockets may be attached to the same process

3-12
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport: UDP
z UDP Segment Structure

z UDP Checksum

‰ Principle of Reliable Data Transfer


‰ Connection-Oriented Transport: TCP
‰ Principles of Congestion Control
‰ TCP Congestion Control

3-13
User Datagram Protocol (UDP) [RFC 768]
‰ no-frills, bare bones transport protocol : adds nothing to IP but,
z multiplexing/demultiplexing : src and dst port #s

z (light) error checking

‰ features of UDP
z unreliable best-effort service : no guarantee on correct delivery

{ UDP segments may be lost and delivered out of order to app

z connectionless : no handshaking bw UDP sender and receiver

‰ Q: Isn’t TCP always preferable to UDP? A: No


z simple, but suitable to certain apps such as real-time apps

{ stringent to delay, but tolerable to some data loss

z no conn establishment ⇒ no additional notable delay

z simple ⇒ no conn state, including send/receive buffers,


congestion-control parameters, seq and ack # parameters
z small pkt header overhead : 8 bytes compared to 20 bytes in TCP

3-14
Popular Internet Apps and Their Protocols

3-15
Controversy on UDP
‰ UDP is lack of congestion control and reliable data transfer
‰ when many users starts streaming high-bit rate video, packet
overflow at routers, resulting in
z high loss rates for UDP packets

z decrease TCP sending rate

⇒ adaptive congestion control, forcing all sources including UDP


sources, required in particular streaming multimedia apps
‰ build reliability directly into app (e.g., adds ack/rexmission)
z many of today’s proprietary streaming apps run over UDP, but
builds ack and rexmission into app in order to reduce pkt loss
z nontrivial, but can avoid xmission-rate constraint imposed by
TCP’s congestion control mechanism

3-16
UDP Segment Structure
‰ Source port #, dst port # : used for multiplexing/demultiplexing
‰ Length : length of UDP seg including header, in bytes
‰ Checksum : to detect errors (i.e., bits altered) on an end-end basis
z error source : noise in the links or while store in a router

{ some link-layer protocol may not provide error checking

3-17
UDP Checksum Calculation (1) : Sender
‰ sum all of 16-bit words in segment in a row, with two words for
each calculation with overflow wrapped around
‰ take 1’s complement of the sum; the result is the checksum value
(ex) three 16-bit words 0110011001100000
0101010101010101
1000111100001100

z sum of first two words 0110011001100000


0101010101010101
1011101110110101
z adding third word 1011101110110101
1000111100001100
10100101011000001
1 wrapped around
0100101011000010
checksum value : 1011010100111101 1’s complement

3-18
UDP Checksum Calculation (2) : Receiver
‰ add all 16-bit words including checksum, and decide
z no error detected, if the result is 1111111111111111

z error detected, otherwise

{ nonetheless the decision is not perfect : error may actually

have taken place even when no error detection is decided


‰ UDP is not responsible for recovering from error
z reaction to detecting errors depends on implementations

{ simply discard damaged seg, or

{ pass damaged seg to app with warning

3-19
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport: UDP
‰ Principle of Reliable Data Transfer
z Building a Reliable Data Transfer Protocol

z Pipelined Reliable Data Transfer Protocol

z Go-Back-N (GBN)

z Selective Repeat (SR)

‰ Connection-Oriented Transport: TCP


‰ Principles of Congestion Control
‰ TCP Congestion Control

3-20
Reliable Data Transfer : Service Model and Implementation
‰ reliable data transfer : no corruption, no loss, and in-order delivery
z of central importance to networking : not only at transport layer,
but also at link layer and app layer

rdt_send() : deliver_data() : called by rdt


called from app to deliver data to app

udt_send() : called by
rdt to sen pkt over
unreliable channel rdt_rcv() : called from
channel upon pkt arrival

3-21
Reliable Data Transfer: Implementation Consideration
‰ characteristics of unreliable channel determines the complexity of
reliable data transfer protocol
‰ We will
z incrementally develop sender and receiver sides of rdt protocol,
considering increasingly complex model of underlying channel
z consider only unidirectional data transfer for simplicity purpose

{ but, control packet is sent back and forth

z use finite state machines (FSM) to specify sender, receiver

dashed arrow : initial state event causing state transition


actions taken on state transition
state: next state uniquely
determined by event state Λ : no event or no action
state
1 2

3-22
rdt1.0 : Perfectly Reliable Channel
‰ Assumptions of underlying channel
z perfectly reliable : no bit errors, no loss of packets

‰ separate FSMs for sender and receiver


z sender sends data into underlying channel

z receiver read data from underlying channel

3-23
rdt2.0 : Channel with Errors
‰ New assumptions of underlying channel
z may be corrupted when transmitted, propagated, or buffered

z no loss and in-order delivery

‰ Automatic Repeat reQuest (ARQ) protocols


z error detection : extra bits placed in checksum field

z receiver feedback : ACK/NAK pkt explicitly sent back to sender

{ ACK (positive acknowledgement) : when pkt received OK

{ NAK (negative acknowledgement) : when pkt received in error

z rexmission : sender rexmits pkt on receipt of NAK

3-24
rdt2.0 : not Corrupted

3-25
rdt2.0 : Corrupted

3-26
rdt2.0 : Fatal Flaw
Q: How to recover from errors in ACK or NAK pkts?
z minimally, need to add checksum bits to ACK/NAK pkts

z possible solutions
{ repeated requests from sender/receiver for a garbled ACK
and NAK : hard to find a clue to way out
{ add enough checksum bits for correction : not applicable for
lost pkt
{ simply resend the pkt when receiving a garbled ACK or NAK ⇒
incurs possible duplicate at receiver
Š receiver doesn’t know whether it is a new pkt or a rexmission
(i.e., a duplicate pkt)
‰ handling duplicates : add a new field (seq # field) to the packet
z sender puts a seq # into this field, and receiver discards
duplicate pkt
z 1-bit seq # suffice for stop-and-stop protocol

‰ rdt2.0 is stop-and-wait protocol : sender sends one pkt, then waits


for receiver response
3-27
Description of sol 1 of Fatal Flaw of rdt2.0

A dict
ates s
ometh
in g to B

r epeat
lease
r “p
i e s ok o
rep l
A didn’t understand WhaBt
did yo
u say? b
ut corrup
ted
B has no idea whether it is part of dictation
? or request for repetition of last reply
yo u say
h at did
W

3-28
rdt2.1 : Employing Seq # - Sender

3-29
rdt2.1 : Employing Seq # - Receiver

3-30
rdt2.1 : Discussion
‰ sender
z seq # added to pkt

z two seq #’s (0,1) will suffice

z must check if received ACK/NAK corrupted

z twice as many states

{ state must remember whether current pkt has seq # of 0 or 1

‰ receiver
z must check if received pkt is duplicate

{ state indicates whether 0 or 1 is expected pkt seq #

z receiver cannot know if its last ACK/NAK received OK at sender

3-31
rdt2.2 : NAK-free
‰ accomplish the same effect as a NAK, by sending an ACK for the
last correctly received pkt
z receiver must explicitly include seq # of pkt being ACKed

‰ sender that receives two ACKs (i.e., duplicate ACKs) knows that
receiver didn’t correctly receive the pkt following the pkt being
acked twice, thus rexmits the latter

3-32
rdt2.2 : NAK-free (Sender)

3-33
rdt2.2 : NAK-free (Receiver)

3-34
rdt3.0 : Channel with Errors and Loss
‰ new assumptions of underlying channels :
z can lose pkts (data or ACKs)

Q : how to detect pkt loss and what to do when pkt loss occurs
z checksum, seq #, ACKs, rexmissions are of help, but not enough

‰ approaches
z sender waits proper amount of time (at least round-trip delay +
processing time at receiver) to convince itself of pkt loss
z rexmits the pkt if ACK not received within this time

z if a pkt (or its ACK) just overly delayed, sender may rexmit the
pkt even though it has not been lost
{ but, seq # handles the possibility of duplicate pkts

‰ implementation
z countdown timer set appropriately starts each time pkt is sent

z rexmit pkt when the timer is expired

3-35
rdt3.0 : Channel with Errors & Loss (Sender)

3-36
rdt3.0 : Channel with Errors & Loss – Operation (1)

3-37
rdt3.0 : Channel with Errors & Loss – Operation (2)

3-38
Performance of rdt3.0 (Stop-and-Wait Protocol)

‰ assumption : ignore xmission time of ACK pkt (which extremely small)


and processing time of pkt at the sender and receiver
‰ sender utilization Usender : frac. of time sender is busy sending into ch
ex) 1 Gbps link, 30 ms RTT, 1 KB packet
ttrans 0.008 8, 000 bits/packet
U sender = = ≈ 0.00027 ; ttrans = L R = = 0.008 ms
RTT + ttrans 30 + 0.008 9
10 bits/sec
very poor!
z net protocol limits the capabilities provided by underlying net HW
3-39
Pipelining
‰ sends multiple pkts without waiting for acks
z range of seq #s is increased

z buffering at sender and/or receiver required

{ sender : pkts that have been xmitted by not yet acked

{ receiver : pkts correctly receiver

sender is assumed to send 3 pkts


before being acked
3ttrans 0.024
U sender = = ≈ 0.0008 : essentially tripled
RTT + ttrans 30.008
‰ two generic forms of pipelined protocols: go-Back-N, selective repeat
3-40
Go-Back-N (GBN) Protocol
‰ sender’s view of seq #s in GBN

z window size N : # of pkts allowed to send without waiting for ACK


{ GBN often referred to as sliding window protocol

z pkt’s seq # : carried in a k-bit field in pkt header


{ range of seq # : [0, 2 -1] with modulo 2 arithmetic
k k

‰ events at GBN sender


z invocation from above : before sending, check if window isn’t full

z receipt of an ACK : cumulative ack - ack with seq # n indicates all


pkts with a seq up to and including n have been correctly received
z timeout : resend all pkts previously xmitted but not yet acked

‰ drawback of GBN : when widow size and bw-delay product are large,
a single pkt error cause a large # of unnecessarily rexmissions
3-41
Go-Back-N (GBN) Protocol : Sender

‰ a single timer : for the oldest xmitted


but not yet acked pkt
‰ upon receipt of an ACK, if there are
z no outstanding unacked pkts, the
timer is stopped
z still xmitted but not yet acked
3-42
pkts, the timer is restarted
Go-Back-N (GBN) Protocol : Receiver
‰ when pkt with seq # n is received correctly and in-order, receiver
sends an ACK for pkt n and delivers data portion to upper layer
‰ receiver discards out-of-order pkts and resends an ACK for the
most recently received in-order pkt
z simple receiver buffering : needn’t buffer any out-of-order pkts
z only info needed : seq # of next in-order pkt, expectedseqnum

3-43
Go-Back-N (GBN) Protocol : Operation

window size = 4

3-44
Selective Repeat (SR) Protocol
‰ sender rexmits only pkts for which ACK not received ⇒ avoid unnecessary
rexmission
‰ receiver individually acks correctly received pkts regardless of their order
z out-of-order pkts are buffered until missing pkts are received

3-45
SR Protocol : Sender/Receiver Events and Actions
‰ sender
z data from above : if next available seq # is in window, send pkt

z timeout(n) : resend pkt n, restart timer


{ each pkt has its own (logical) timer

z ACK(n) in [sendbase,sendbase+N]
{ mark pkt n as received

{ if n is equal to send_base, window base is moved forward to next

unacked pkt, and xmit unxmitted pkts in advanced window


‰ receiver
z pkt n in [rcvbase, rcvbase+N-1] correctly received : send ACK(n)
{ if not previously received, it is buffered

{ if n is equal to rcv_base, this pkt and previously buffered in-order pkts

are delivered to upper layer, and receive window moved forward by the
# of pkts delivered to upper layer
z pkt n in [rcvbase-N,rcvbase-1] correctly received
{ an ACK generated even though previously acked

{ if not acks, sender’s window may never move forward; for example, ack

for send_base pkt in Figure 3.23


z otherwise : ignore 3-46
SR Operation

3-47
Max. Window Size
‰ stop-and-wait protocol A B
z window size N ≤ 2k-1 (k: # of seq field), not
2k, why?
ex) k=2 ⇒ seq #s : 0, 1, 2, 3; max N = 3
‰ SR protocol
z scenarios
(a) : all acks are lost
Š incorrectly sends duplicate as new
(b) : all acks received correctly, but pkt 3
is lost
{ receiver can’t distinguish xmission of pkt
0 in (b) from rexmission of pkt 0 in (a)
z further consideration on scenario (a)
{ A rexmits pkt 0; B receives and buffer it
{ B sends piggybacked ack for pkt 2 that is
already acked but lost
{ A advanced window 3 0 1, and sends pkt 3
{ B receives pkt 3, and delivers pkt 0 (no
good!) in buffer and pkt 3 to upper layer
z wayout : avoid overlapping of SR windows
{ N ≤ 2k-1, k: # of bits in seq field 3-48
rdt : Comment on Packet Reordering
‰ since seq #s are reused, old copies of a pkt with a seq/ack # of x
can appear, even though neither sender’s nor receiver’s window
contains x
z use of max pkt lifetime : constrain pkt to live in the net

{ ~ 3 minutes in TCP for high-speed net

3-49
Summary of rdt Mechanisms

3-50
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport: UDP
‰ Principle of Reliable Data Transfer
‰ Connection-Oriented Transport: TCP
z TCP Connection

z TCP Segment Structure

z Round-Trip Time Estimation and Timeout

z Reliable Data Transfer

z Flow Control

z TCP Connection Management

‰ Principles of Congestion Control


‰ TCP Congestion Control
3-51
TCP Connection
‰ two processes established connection via 3-way handshake before sending
data, and initialize TCP variables
z full duplex : bi-directional flow bw processes in the same conn
z point-to-point : bw one sender and one receiver
{ multicasting is not possible with TCP

‰ a stream of data passes through a socket into send buffer


z TCP grab chunks of data from send buffer
z max seg size (MSS) : max amount of app-layer data in seg
{ set based on Path MTU of link-layer

{ typically, 1,460 bytes, 536 bytes, or 512 bytes

z each side of conn has send buffer and receive buffer

3-52
TCP Segment Structure

for reliable data xfer


count in bytes, not pkts

4-bit # counting for flow control, # of bytes


in 32-bit words receiver willing to receive
for error detection
typically, empty
- time-stamping
- mss, window scaling factor
negotiation, etc.

• ACK : indicates value in ack field is valid


• SYN, RST, FIN : used for connection setup and teardown
• PSH : receiver should pass data to upper layer immediately
• URG : indicates there is an urgent data in the seg marked by sending-side upper layer
- urgent data pointer indicates the last bytes of urgent data
- generally, PSH and URG are not used 3-53
Seq Numbers and Ack Numbers
‰ seq # : 1st byte in seg over xmitted bytes stream, not over series
of xmitted segs
z TCP implicitly number each byte in data stream

z initial seq # is chosen randomly rather than set 0, why?


‰ ack # : seq # of next byte expected from other side
z cumulative ACK

Q : how to handle out-of-order segs at receiver? discard or buffer


waiting for missing bytes to fill in the gaps
z TCP leaves the decision up to implementation, but the latter is
chosen in practice
3-54
Telnet : Case Study of Seq and Ack Numbers
‰ each ch typed by A is echoed back by B and displayed on A’s screen

ACK piggybacked on B-to-A data seg

explicit ACK with no data

3-55
Estimating Round-Trip Time (RTT)
‰ clearly, TCP timeout value > RTT
Q : How much larger? How to estimate RTT? Each seg exploited in
estimating RTT? …
‰ estimating RTT
z SampleRTT : time measured from seg xmission until ACK receipt

{ measured not for every seg xmitted, but for one of xmitted segs

approximately once every RTT


{ rexmitted segs are not considered in measurements

{ fluctuates from seg to seg : atypical ⇒ needs some sort of avg

‰ Exponential Weighted Moving Average (EWMA) of RTT


z avg several recent measurements, not just current SampleRTT

EstimatedRTT = (1 - α)⋅EstimatedRTT + α⋅SampleRTT


{ recommended value of α : 0.125

z more weight on recent samples than on old samples


z weight of a given sampleRTT decays exponentially fast as updates

proceed
3-56
RTT Samples and RTT Estimates

variations in the Sample RTT are smoothed out in Estimated RTT

3-57
Retransmission Timeout Interval
‰ DevRTT, variation of RTT : an estimate of how much SampleRTT
deviates from EstimatedRTT
DevRTT = (1-β)⋅DevRTT + β⋅|SampleRTT−EstimatedRTT|
z large (or small) when there is a lot of (or little) fluctuation

z recommended value of β : 0.25

‰ TCP’s timeout interval


z should be larger, or unnecessarily rexmit!

z but, if too much larger, TCP wouldn’t quickly rexmit, leading to


large data transfer delay
z thus, timeout interval should be EstimatedRTT plus some safety
margin that varies as a function of fluctuation in SampleRTT
TimeoutInterval = EstimatedRTT + 4⋅DevRTT

3-58
TCP Reliable Data Transfer
‰ reliable data transfer service on top of IP’s unreliable service
z seq # : to identify lost and duplicate segs

z cumulative ack : positive ACK (i.e, NAK-free)

z timer

{ a single rexmission timer is recommended [RFC 2988], even if

there are multiple xmitted but not yet acked segs


{ rexmissions triggered by

Š when timed out


Š 3 duplicate acks at sender : fast rexmit in certain versions

‰ We’ll discuss TCP rdt in two incremental steps


z highly simplified description : only timeouts considered

z more subtle description : duplicate acks as well as timeouts


considered
in both cases, error and flow control are not taken into account

3-59
Simplified TCP Sender

seq # is byte-stream # of the first data byte in seg

TimeoutInterval = EstimatedRTT + 4⋅DevRTT

some not-yet-acked segs are acked


move window forward

3-60
TCP Retransmission Scenarios

SendBase=120
SendBase=100

SendBase=120

SendBase=100 SendBase=120

rexmission due to a lost ack segment 100 not rexmitted cumulative ack avoids
rexmission of first seg

3-61
TCP Modifications : Doubling Timeout Interval
‰ at each timeout, TCP rexmits and set next timeout interval to
twice the previous value
⇒ timeout intervals grow exponentially after each rexmission
‰ but, for the other events (i.e., data received from app and ACK
received) timeout interval is derived from most recent values of
EstimatedRTT and DevRTT

3-62
TCP ACK Gen Recommendation [RFC 1122, 2581]

‰ timeout period can be relatively long ⇒ may increase e-t-e delay


‰ when sending a large # of segs back to back (such as a large file), if
one seg is lost, there will be likely many back-to-back ACKs for it

3-63
TCP Modifications : TCP Fast Retransmit
‰ TCP Fast Retransmit : rexmits a (missing) seg before its timer
expiration, if TCP sender receives 3 duplicate ACKs

if (y > SendBase) { // event: ACK received,


with ACK field value of y
SendBase = y
if (there are currently not-yet-
acked segs)
start timer
}
else { // a duplicate ACK for already ACKed
segment
increment count of dup ACKs
received for y
if (count of dup ACKs received
for y = 3) // TCP fast retransmit
resend seg with seq # y
}

3-64
Is TCP Go-Back-N or Selective Repeat?
‰ similarity of TCP with Go-Back-N
z TCP : cumulative ack for the last correctively received, in-order seg

z cumulative and correctly received but out-of-order segs are not


individually acked
⇒ TCP sender need only maintain SendBase and NextSeqNum
‰ differences bw TCP and Go-Back-N : many TCP implementations
z buffer correctly received but out-of-order segs rather than discard

z also, suppose a seq of segs 1, 2, … N, are received correctively in-order,


ACK(n), n < N, gets lost, and remaining N-1 acks arrive at sender before
their respective timeouts
{ TCP rexmits at most one seg, i.e., seg n, instead of pkts, n, n+1, …, N

{ TCP wouldn’t even rexmit seg n if ACK(n+1) arrived before timeout for

seg n
‰ a modification to TCP in [RFC 2018] : selective acknowledgement
z TCP receiver acks out-of-order segs selectively rather than cumulatively

z when combined with selective rexmission - skipping segs selectively


acked by receiver – TCP looks a lot like generic SR protocol
‰ Thus, TCP can be categorized as a hybrid of GBN and SR protocols
3-65
Flow Control : Goal
‰ receiving app may not read data in rcv buffer as quickly as
supposed to be
z it may be busy with some other task

z may relatively slow at reading data, leading to overflowing


receiver’s buffer by too much data too quickly sent by sender
‰ flow control : a speed-matching service, matching sending rate
against reading rate of receiving app
z goal : eliminate possibility of sender overflowing receiver buffer

(note) to make the discussion simple, TCP receiver is assumed to


discard out-of-order segs

3-66
Flow Control : How It Works?
RevBuffer : size of buffer space allocated to a conn
RcvWindow : amount of free buffer space at rcv’s buffer
initial value of RcvWindow = RevBuffer

LastByteRcvd, LastByteRead : variables at receiver


LastByteSent, LastByteAcked : variables at sender

‰ at receiver
z not to overflow : LastByteRcvd – LastByteRead ≤ RcvBuffer
LastByteRcvd – LastByteRead : # of bytes received not yet read
z RevWindow advertising : RcvWindow placed in receive window field in
every seg sent to sender
RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead]
‰ at sender : limits unacked # of bytes to RcvWindow
z LastByteSent – LastByteAcked ≤ RcvWindow

LastByteSent – LastByteAcked : # of byte sent but not yet acked

3-67
Flow Control : Avoiding Sender Blocking
‰ suppose A is sending to B, B’s rcv buffer becomes full so that
RcvWindow = 0, and after advertising RcvWindow = 0 to A, B has
nothing to send to A
z note that TCP at B sends a seg only if it has data or ack to send

{ there is no way for B to inform A of some space having opened

up in B’s rcv buffer ⇒ A is blocked, and can’t xmit any more!


z wayout : A continue to send segs with one data byte when
RcvWindow = 0, which will be acked
{ eventually, the buffer will begin to empty and ack will contain

a nonzero RcvWindow value

3-68
TCP Connection Management : Establishment
‰ 3-way handshake
1. client sends SYN seg to server
{ contains no app data

{ randomly select client initial

seq # SYN s
egmen
t
2. server replies with SYNACK seg
{ server allocates buffers and ent
m
seg
variables to the connection YN ACK
S
{ contains no app data

{ randomly select server initial ACK


se gmen
seq # t
3. client replies with ACK seg
{ client allocates buffers and

variables to the connection


{ may contain data

3-69
TCP Connection Management : Termination

‰ Either of client or server can end the


TCP connection
‰ duration of TIME_WAIT period :
implementation dependent
z typically, 30 secs, 1 min, 2 mins

‰ RST seg : seg with RST flag set to 1


z sent when receiving a TCP seg
whose dst port # or src IP addr is
not matched with on-going one

3-70
TCP State Transition : Client
Socket clientSocket = new Socket("hostname","port#");

3-71
TCP State Transition : Server
ServerSocket welcomeSocket = new ServerSocket(port#)

Socket connectionSocket = welcomeSocket.accept(); 3-72


Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport: UDP
‰ Principle of Reliable Data Transfer
‰ Connection-Oriented Transport: TCP
‰ Principles of Congestion Control
z The Causes and the Costs of Congestion

z Approaches to Congestion Control

z Network-Assisted Congestion-Control Example for ATM AVR

‰ TCP Congestion Control

3-73
Preliminary of Congestion Control
‰ pkt loss (at least, perceived by sender) results from overflowing of
router buffers as the net becomes congested
z rexmission treats a symptom, but not the cause, of net
congestion
‰ cause of net congestion : too many sources attempting to send data
at too high a rate
z basic idea of wayout : throttle senders in face of net congestion

z what’s different from flow control?

‰ ranked high in top-10 list of networking problem

3-74
Causes and Costs of Congestion : Scenario 1
‰ assumptions
z no error control, flow control, and congestion control
z host A and B send data at an avg rate of λin bytes/sec, respectively
z share a router with outgoing link capacity of R and infinite buffer space
z ignore additional header info (transport-layer and lower-layer)

cost of congested net : avg delay


grows unboundedly large as arrival
rate nears link capacity
3-75
Causes and Costs of Congestion : Scenario 2 (1)
‰ assumptions
z one finite buffer space

z each host with same λin, retransmit dropped packets

3-76
Causes and Costs of Congestion : Scenario 2 (2)

case a case b case c


‰ case a (unrealistic) : host A can somehow determine if router
buffer is free, and send a pkt when buffer is free
z no loss, thus no rexmission ⇒ λ’in= λin

‰ case b : a pkt is known for certain to be dropped


z R/3 : original data, R/6 : rexmitted data

z cost of congested net : sender must rexmit dropped pkt

‰ case c : premature timeout for each pkt ⇒ rexmit each pkt twice
z cost of congested net : unneeded rexmissions waste link bw
3-77
Causes and Costs of Congestion : Scenario 3
‰ assumptions
z 4 routers, each with finite buffer space and link capacity of R
z each of 4 hosts has same λin, rexmits over 2-hop paths

• consider A→C conn


• a pkt dropped at R2 (due to high λin
from B) wastes the work done by R1

cost of congested net : a pkt


dropped at some point wastes the
xmission capacity up to that
3-78point
Two Broad Approaches to Congestion Control
‰ end-end congestion control
z no explicit support (by feedback) from net layer

z congestion inferred by end-system based on observed net


behavior, e.g., pkt loss and delay
z approach taken by TCP

{ congestion is inferred by TCP seg loss indicated by timeout or

triple duplicate acks


‰ network-assisted congestion control
z routers provide explicit feedback to end systems regarding
congestion state in the net
z single bit indication

{ SNA, DECnet, TCP/IP ECN [RFC2481], ATM AVR congestion

control
z explicit rate : the rate router can support on its outgoing link

3-79
Two Types of Feedback of Congestion Info
‰ direct feedback : from a router to the sender by using choke pkt
‰ feedback via receiver
z router mark/update a field in a pkt flowing forward to indicate
congestion
z upon receipt of the pkt, receiver notifies sender of congestion

3-80
ATM ABR Congestion Control
‰ Asynchronous Transfer Mode (ATM)
z a virtual-circuit switching architecture

z info delivered in fixed size cell of 53 bytes

z each switch on src-to-dst path maintains per-VC state

‰ Available Bit Rate (ABR) : an elastic service


z if net underloaded, use as much as available bandwidth

z if net congested, sender rate is throttled to predetermined min


guaranteed rate
‰ Resource Management (RM) cells
z interspersed with data cells, conveying congestion-related info

{ rate of RM cell interspersion : tunable parameter

Š default value : one every 32 data cells


z provides both feedback-via-receiver and direct feedback

{ sent by src flowing thru switches to dst, and back to src

{ switch possibly generate RM cell itself, and send directly to src

3-81
Mechanisms of Congestion Indication in ATM AVR

‰ Explicit Forward Congestion Indication (EFCI) bit


z EFCI bit in a data cell is set to 1 at congested switch
z if a data cell preceding RM cell has EFCI set, dst sets CI bit of RM cell,
and sends it back to src
‰ CI (Congestion Indication) and NI (No Increase) bits
z set by congested switch, NI/CI bit for mild/severe congestion
z dst sends the RM cell back to src with CI and NI bits intact

‰ Explicit Rate (ER) : two-byte field in RM cell


z congested switch may lower ER value in a passing RM cell
z when retuned back to src, it contains max supportable rate on the path
3-82
Chap.3 Transport Layer
‰ Introduction and Transport-Layer Services
‰ Multiplexing and Demultiplexing
‰ Connectionless Transport : UDP
‰ Principle of Reliable Data Transfer
‰ Connection-Oriented Transport : TCP
‰ Principles of Congestion Control
‰ TCP Congestion Control
z Fairness

z TCP Delay Modeling

3-83
Preliminary of TCP Congestion Control (1)
‰ basic idea of TCP congestion control : limit sending rate based on
the network congestion perceived by sender
z increase/reduce sending rate when sender perceives little/∗
congestion along the path bw itself and dst
‰ to keep the description concrete, sending a large file is assumed
‰ How does sender limit sending rate?
LastByteSent - LastByteAcked ≤ min{CongWin, RcvWindow} (1)
z CongWin : a variable limiting sending rate due to perceive

congestion
z henceforth, RcvWindow constraint ignored in order to focus on
congestion control
z (1) limits the amount of unacked data, thus the sending rate
{ consider conn for which loss and xmission delay are negligible

CongWin
then, sending rate ≈
RTT
3-84
Preliminary of TCP Congestion Control (2)
‰ How does sender perceive congestion on path bw itself and dst?
z a timeout or the receipt of three duplicate ACKs

‰ TCP is self-clocking : acks are used to trigger its increase on cong


window size, thus the sending rate
z consider an optimistic case of cong-free, in which acks are taken as
an indication that seg are successfully delivered to dst
z if acks arrive at a slow/high rate, cong window is increased more
slowly/quickly
‰ How to regulate sending rate as a function of perceived congestion?
z TCP congestion control algorithms, consisting of 3 components

{ additive-increase, multiplicative-decrease (AIMD)

Š AIMD is a big-picture description; details are more complicated


{ slow start

{ reaction to timeout events

3-85
Additive-Increase, Mulitplicative-Decrease
‰ multiplicative decrease : cut CongWin in half down to 1 MSS when
detecting a loss
‰ additive increase: increase CongWin by 1 MSS every RTT until a loss
detected (i.e., when perceiving e-t-e path is congestion-free)
z commonly, accomplished by increasing CongWin by MSS⋅(MSS/CongWin)
bytes for each receipt of new ack
ex) MSS=1,460 bytes, ConWin=14,600 bytes ⇒ 10 segs sent within RTT
Š an ACK for a seg increases CongWin by 1/10⋅MSS, thus after ack for all 10
segs (thus, for one RTT) CongWin is increased by MSS
z congestion avoidance : linear increase phase of TCP cong control

saw-toothed pattern of CongWin

3-86
TCP Slow Start
‰ When a TCP conn begins, CongWin is typically
initialized to 1 MSS ⇒ initial rate ≈ MSS/RTT
ex) MSS = 500 bytes, RTT = 200 msec ⇒ initial
sending rate : only about 20 kbps
z linear increase at init. phase results in a
waste of bw, considering available bw may be
>> MSS/RTT
z desirable to quickly ramp up to some
respectable rate
‰ slow start (SS) : during initial phase, increase
sending rate exponentially fast by doubling
CongWin every RTT until a loss occurs
z achieved by increasing CongWin by 1 MSS

for receipt of ack

3-87
Reaction to Congestion
Q: When does CongWin switch from exponential increase to linear increase?
A: when CongWin is reached to Threshold
z Threshold : a variable set to a half of CongWin just before a loss
{ initially set large, typically 65 Kbytes, so that it has no initial effect

{ maintained until the next loss

‰ TCP Tahoe, early version of TCP


z CongWin is cut to 1 MSS both for a timeout and for 3 duplicate acks
{ Jacobson’s algorithm [Jacobson 1988]

‰ TCP Reno [RFC2581, Stevens ’94] : reaction to loss depends on loss type
z for 3 duplicate acks receipt : CongWin is cut in half, then grows linearly
z for a timeout event : CongWin is set to 1 MSS (SS phase), then grows
exponentially to a Threshold, then grows linearly (CA phase)
z idea : 3 dup acks anyhow indicates capability of delivering some pkts
{ TCP Reno cancels SS phase after a triple duplicate ack : fast recovery

‰ many variations of TCP Reno [RFC 3782, RFC 2018]


z TCP Vegas [Brakmo 1995]
{ idea : early warning - detect congestion in routers before pkt loss occurs

{ when this imminent pkt loss, predicted by observing RTT, is detected,

CongWin is lowered linearly; the longer the RTT, the greater the congestion
3-88
TCP Congestion Control Algorithms

• initial value of Threshold = 8 MSS


• triple duplicate acks just after 8th round

3-89
TCP Reno Congestion Control Algorithm
‰ [RFC 2581, Stevens 1994]

3-90
Steady-State Behavior of a TCP Connection
‰ Consider a highly simplified macroscopic model for steady-state
behavior of TCP
z SS phases ignored since they are typically very short

z Letting W be the window size when a loss event occurs, RTT and
W are assumed to be approximately constant during a conn
Q : What’s avg throughput of a long-lived TCP conn as a function of
window size and RTT?
0.75 ⋅ W
A : avg throughput of a TCP connection = (2)
RTT
z a pkt is dropped when the rate increases to W/RTT

z then the rate is cut in half and linearly increases by MSS/RTT


every RTT until it again reaches W/RTT
z this process repeats over and over again

3-91
TCP Futures
‰ TCP congestion control has evolved over the years and continue to evolve
z [RFC 2581] : a summary as of the late 1990s

z [Floyd 2001] : some recent developments

z traditional scheme is not necessarily good for today’s HTTP-dominated


Internet or for a future Internet service high bandwidth delay product
ex) Consider a high-speed TCP conn with 1500-byte segments, 100ms RTT, and
want to achieve 10 Gbps throughput through this conn
z to meet this, from (2) required window size is

RTT 0.1 sec 1 107


W= ⋅ tput = ⋅ 10 bits/sec ⋅
10
= ≈ 111,111 segs
0.75 0.75 1,500 × 8 bits/seg 90
z this is a lot of segs, so that there is high possibility of errors, leading us
to derive a relationship bw throughput and error rate [prob. P39]
1.22 ⋅ MSS
avg throughput of a TCP conn =
RTT L
⇒ L = 2⋅10-10, i.e., one loss for every 5 ⋅10-9segs : unattainably low!
⇒ new vers of TCP required for high-speed environments [RFC 3649, Jin 2004]
3-92
TCP Fairness (1)
‰ suppose K TCP conns pass though a bottleneck link bw of R, with each conn
sending a large file
⇒ avg xmission rate of each conn is approximately R/K
‰ TCP congestion control is fair : each conn gets an equal share of
bottleneck link’s bw among competing TCP conns
‰ consider a link of R shared by two TCP conn, with idealized assumptions
z same MSS and RTT, sending a large amount of data, operating in CA
mode (AIMD) at all times, i.e., ignore SS phase

3-93
TCP Fairness (2)

‰ bw realized by two conns fluctuates


along equal bw share line, regardless
of their initial rates
‰ in practice, RTT value differs from
conn to conn
z conns with a smaller RTT grab the ideal operating
available bw more quickly (i.e., open point
D
their cong window faster), thus get loss occurs
higher throughput than those conns B
with larger RTTs
C
CA phase
A

3-94
Some other Fairness Issues
‰ Fairness and UDP
z multimedia apps, e.g., Internet phone and video conferencing do
not want their rate throttled even if net is congested
z thus runs over UDP rather than TCP, pumping audio/video at
const rate, and occasionally lose pkt rather than reducing rate
when congested ⇒ UDP sources may crowd out TCP traffic
z research issue : TCP-friendly cong control

{ goal : let UDP traffic behave fairly, thus prevent the Internet

from flooding
‰ Fairness and parallel TCP connections
z a session can open multiple parallel TCP conn’s bw C/S, thus gets
a large portion of bw in a congested link
{ a Web browser to xfer multiple objects in a page

ex) a link of rate R supporting 9 ongoing C/S apps


{ a new app, asking for 1 TCP conn, gets an equal share of R/10

{ a new app, asking for 11 TCP conns, gets an unfair rate of R/2

3-95
TCP Delay Modeling
‰ We’d compute the time for TCP to send an object for some simple models
z latency : defined as the time from when a client initiate a TCP conn until
the time at which it receives the requested object
‰ assumptions : made in order not to obscure the central issues
z simple one-link net of rate R bps
z amount of data sender can xmit is limited solely by cong window
z pkts are neither lost or corrupted, thus no rexmission
z all protocol header overheads : ignored
z object consist of an integer # of MSS
{ O: object size [bits], S : seg size [bits] (e.g., 536 bits)

z xmission time for segs including control info : ignored


z initial threshold of TCP cong control scheme is so large as not to be
attained by cong window
‰ without cong window constraint : the latency is 2⋅RTT+O/R
z clearly, SS procedure, dynamic cong window increase this minimal latency

3-96
Static Congestion Window (1)
‰ W : a positive integer, denoting a
fixed-size static congestion window
z upon receipt of rqst, server
immediately sends W segs back to
back to client, then one seg for
each ack from client W=4

‰ 1st case : WS/R > RTT+S/R


z ack for 1st seg in 1st window
received before sending 1st
window’s worth of segs
z server xmit segs continuously until
entire object is xmitted
z thus, the latency is
2⋅RTT+O/R

3-97
Static Congestion Window (2)
‰ 2nd case : WS/R < RTT+S/R
z ack for 1st seg in 1st window received
after sending 1st window’s worth of segs
‰ latency = setup time + time for xmitting
object + sum of times in idle state
z let K : # of windows covering object
K = O/WS or ⎡K⎤ if K is not an integer W=2
z # of times being in idle state = K-1

z duration of server being in idle state


S/R+RTT-WS/R
z thus, the latency is
2⋅RTT+O/R+(K-1)[S/R+RTT-WS/R]+
where [x]+ = max(x,0)

transmitting state
idle state
3-98
Dynamic Congestion Window (1)
‰ cong window grows according to slow start,
i.e., doubled every RTT O/S=15
K=4
z O/S : # of segs in the object
Q=2
z # of segs in kth window : 2k-1 P=min{Q,K-1}=2
z K : # of windows covering object

⎧ O⎫
K = min ⎨k : 20 + 21 + " + 2k −1 ≥ ⎬
⎩ S⎭
⎧ O⎫
= min ⎨k : 2k −1 − 1 ≥ ⎬
⎩ S⎭
⎧ ⎛O ⎞⎫
= min ⎨k : k ≥ log2 ⎜ + 1 ⎟ ⎬
⎩ ⎝S ⎠⎭
⎡ ⎛O ⎞⎤
= ⎢log2 ⎜ + 1 ⎟ ⎥
⎢ ⎝S ⎠⎥
z xmission time of kth window = (S/R)2k-1
z duration in idle state of kth window
=[S/R+RTT-2k-1(S/R)]+
3-99
Dynamic Congestion Window (2)
‰ latency = setup time + time for xmitting object + Σ times in idle state
O K −1 ⎡ S S⎤
+

latency = 2 ⋅ RTT + + ∑ ⎢ + RTT − 2k −1 ⎥ (3)


R k =1 ⎣ R R⎦
z Q : # of times server being idle if object were of infinite size

{
Q = max k :
S
R
S
} ⎧
+ RTT − 2k −1 ≥ 0 = max ⎨k : 2k −1 ≤ 1 +
R ⎩
RTT ⎫

S /R ⎭
⎧ ⎛ RTT ⎞ ⎫ ⎢ ⎛ RTT ⎞ ⎥
= max ⎨k : k ≤ log2 ⎜ 1 + ⎟ + 1 ⎬ = log
⎢ 2⎜ 1 + ⎟⎥ + 1
⎩ ⎝ S / R ⎠ ⎭ ⎣ ⎝ S / R ⎠⎦
‰ actual # of times server is idle is P=min{Q, K-1}, then (3) becomes
O P ⎡S S⎤
latency = 2 ⋅ RTT + + ∑ ⎢ + RTT − 2k −1 ⎥
R k =1 ⎣ R R⎦
O ⎡S ⎤ S
P
= 2 ⋅ RTT + + P ⎢ + RTT ⎥ − ∑ 2k −1
R ⎣R ⎦ R k =1
O ⎡S ⎤ S
= 2 ⋅ RTT + + P ⎢ + RTT ⎥ − (2P − 1 ) (4)
R ⎣R ⎦ R
3-100
Dynamic Congestion Window (3)
‰ comparing TCP latency of (4) with minimal latency
latency P ⎡⎣(S R ) RTT + 1⎤⎦ − ⎡⎣(2p − 1 ) (S R ) RTT ⎤⎦
=1+
minimal latency 2 + (O R ) RTT
P + (S R ) RTT ⎡⎣P + 1 − 2p ⎤⎦ P
=1+ ≤1+
2 + (O R ) RTT 2 + (O R ) RTT

latency contributed by slow start


z slow start significantly increase latency when object size is
relatively small (implicitly, high xmission rate) and RTT is
relatively large
{ this is often the case with the Web

‰ See the examples in the text

3-101
HTTP Modeling
Assume Web page consists of
z 1 base HTML page (of size O bits)

z M images (each of size O bits)

‰ non-persistent HTTP
z M+1 TCP conns in series

z response time = 2⋅(M+1)RTT + (M+1)O/R + sum of idle times

‰ persistent HTTP
z 2 RTT to request and receive base HTML file

z 1 RTT to request and receive M images

z response time = 3⋅RTT + (M+1)O/R + sum of idle times

‰ non-persistent HTTP with X parallel conns


z suppose M/X is integer

z 1 TCP conn for base file

z M/X sets of parallel conns for images

z response time = 2⋅(M/X + 1)RTT + (M+1)O/R + sum of idle times

3-102

You might also like