Transport Layer PDF
Transport Layer PDF
Our goals:
❒ understand ❒ learn about transport
principles behind layer protocols in the
transport layer Internet:
services: ❍ UDP: connectionless
❍ Multiplexing and transport
demultiplexing ❍ TCP: connection-oriented
❍ reliable data transport
transfer ❍ TCP congestion control
❍ flow control
❍ congestion control 1. Multiplexing is to support multiple flows
2. Network can damage pkt, lose pkt,
duplicate pkt
3. One of my favorite layer!!
3-1
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-2
Transport services and protocols
❒ providelogical application
transport
communication between app network
data link
processes running on physical
network
data link
network
lo
different hosts physical
g
data link
ic
physical
al
❒ transport protocols run in network
en
data link
d-
end systems physical network
en
data link
d
❍ send side: breaks app physical
tr
an
messages into segments, network
sp
data link
or
passes to network layer physical
t
❍ rcv side: reassembles application
transport
segments into messages, network
data link
passes to app layer physical
lo
physical
g
data link
(distributed control)
ic
physical
al
network
en
❍ flow control data link
d-
physical network
en
connection setup data link
d
❍ physical
tr
an
❒ unreliable, unordered network
sp
data link
or
delivery: UDP physical
t
❍ no-frills extension of application
transport
“best-effort” IP network
data link
physical
❒ services not available:
❍ delay guarantees
❍ bandwidth guarantees
Research issues
3-5
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-6
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process
application P3 P1
P1 application P2 P4 application
host 2 host 3
host 1 FTP telnet
3-7
How demultiplexing works
❒ host receives IP datagrams
❍ each datagram has 32 bits
source IP address,
destination IP address source port # dest port #
3-8
Connectionless demultiplexing
❒ Create sockets with port ❒ When host receives UDP
numbers: segment:
DatagramSocket mySocket1 = new ❍ checks destination port
DatagramSocket(99111); number in segment
DatagramSocket mySocket2 = new ❍ directs UDP segment to
DatagramSocket(99222);
socket with that port
❒ UDP socket identified by number
two-tuple: ❒ IP datagrams with
(dest IP address, dest port number) different source IP
addresses and/or source
port numbers directed to
same socket (this is how a
system can serve multiple
requests!!)
3-9
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
Based on destination
P3 IP and port # P1
P1
P3
3-11
Connection-oriented demux (cont)
P3 P3 P4 P1
P1
SP: 80 SP: 80
DP: 9157 DP: 5775
3-12
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-13
UDP: User Datagram Protocol [RFC 768]
3-15
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in
transmitted segment
Sender: Receiver:
❒ compute checksum of
❒ treat segment contents
received segment
as sequence of 16-bit
integers ❒ check if computed checksum
equals checksum field value:
❒ checksum: addition (1’s
complement sum) of ❍ NO - error detected
3-16
Internet Checksum Example
❒ Note
❍ When adding numbers, a carryout from the
most significant bit needs to be added to the
result
❒ Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
3-17
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-18
Principles of Reliable data transfer
❒ important in app., transport, link layers
❒ top-10 list of important networking topics!
abstraction
This picture sets the scenario
send receive
side side
sender receiver
3-22
Rdt2.0: channel with bit errors
❒ underlying channel may flip bits in packet
❍ recall: UDP checksum to detect bit errors
❒ the question: how to recover from errors:
❍ acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
❍ negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
❍ sender retransmits pkt on receipt of NAK
Ack: I love u, I love u 2.
❍ human scenarios using ACKs, NAKs? Nak: I love u, I don’t love u
❒ new mechanisms in rdt2.0 (beyond rdt1.0):
❍ error detection
❍ receiver feedback: control msgs (ACK,NAK) rcvr-
>sender
3-23
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
Buffer is needed to
store data from rdt_rcv(rcvpkt) && isACK(rcvpkt)
application layer or Λ Wait for
to block call. call from
below
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
3-24
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
3-25
rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
GOT IT ? deliver_data(data)
udt_send(ACK)
3-26
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted? ❒ sender adds sequence
❒ sender doesn’t know what number to each pkt
happened at receiver! ❒ sender retransmits
❒ can’t just retransmit: current pkt if ACK/NAK
possible duplicate garbled
❒ receiver discards (doesn’t
What to do? deliver up) duplicate pkt
❒ sender ACKs/NAKs
stop and wait protocol
receiver’s ACK/NAK?
What if sender ACK/NAK Sender sends one packet,
lost? then waits for receiver
response
❒ retransmit, but this might
cause retransmission of
correctly received pkt!
3-27
rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Λ
Λ
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
3-29
rdt2.1: discussion
Sender: Receiver:
❒ seq # added to pkt ❒ must check if
❒ two seq. #’s (0,1) received packet is
will suffice. Why? duplicate
❍ state indicates
❒ must check if
whether 0 or 1 is
received ACK/NAK expected pkt seq #
corrupted
❒ note: receiver can
❒ twice as many states not know if its last
❍ state must “remember” ACK/NAK received
whether “current” pkt
OK at sender
has 0 or 1 seq. #
3-30
rdt2.2: a NAK-free protocol
3-31
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || Λ
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) 3-32
rdt3.0: channels with errors and loss
New assumption: Approach: sender waits
underlying channel can “reasonable” amount of
also lose packets (data time for ACK
or ACKs) ❒ retransmits if no ACK received
❍ checksum, seq. #, in this time
ACKs, retransmissions ❒ if pkt (or ACK) just delayed (not
will be of help, but not lost):
enough ❍ retransmission will be
Q: how to deal with loss? duplicate, but use of seq.
❍ sender waits until #’s already handles this
certain data or ACK ❍ receiver must specify seq #
lost, then retransmits of pkt being ACKed
❍ yuck: drawbacks? ❒ requires countdown timer
What is the “right value” for timer? It depends on the flow and network condition!
3-33
rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer Λ
Λ Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) Λ
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
Λ
3-34
rdt3.0 in action
Timer: tick,tick,…
3-35
rdt3.0 in action
Is it
necessary
to send
Ack1
again?
3-36
Performance of rdt3.0
L/R .008
U = = = 0.00027
sender 30.008
RTT + L / R microsec
3-37
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
L/R .008
U = = = 0.00027
sender 30.008
RTT + L / R microsec
3-38
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged pkts
❍ range of sequence numbers must be increased
❍ buffering at sender and/or receiver
Increase utilization
by a factor of 3!
3*L/R .024
U = = = 0.0008
sender 30.008
RTT + L / R microsecon
3-40
DON’T FALL ASLEEP !!!!!
3-45
Selective repeat: sender, receiver windows
3-46
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
❒ if next available seq # in ❒ send ACK(n)
window, send pkt ❒ out-of-order: buffer
timeout(n) for pkt n: ❒ in-order: deliver (also
❒ resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]:
next not-yet-received pkt
❒ mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
❒ if n smallest unACKed pkt,
❒ ACK(n) Q: why we need this?
advance window base to next
unACKed seq # otherwise: The ack got lost.
Sender may timeout,
❒ ignore resend pkt, we need
(slide the window)
to ack
3-47
Selective repeat in action (N=4)
3-48
Selective repeat:
dilemma
In real life, we use k-bits to
implement seq #. Practical issue:
Example:
❒ seq #’s: 0, 1, 2, 3
❒ window size (N)=3
❒ receiver sees no
difference in two
scenarios!
❒ incorrectly passes
duplicate data as new in
(a)
Q: what relationship
between seq # size and
window size?
N <= 2^k/2 3-49
Why bother study reliable data
transfer?
❒ We know it is provided by TCP, so why bother
to study?
❒ Sometimes, we may need to implement “some
form” of reliable transfer without the heavy
duty TCP.
❒ A good example is multimedia streaming. Even
though the application is loss tolerant, but if
too many packets got lost, it affects the visual
quality. So we may want to implement some for
of reliable transfer.
❒ At the very least, appreciate the “good
services” provided by some Internet gurus.
3-50
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-51
TCP: Overview RFCs: 793, 1122, 1323,
2018, 2581 (The 800 lbs gorilla in the transport stack! PAY ATTENTION!!)
3-52
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by “bytes”
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data Due to this
(variable length) field we have a
checksum
variable length
(as in UDP) header
3-53
Negotiate
TCP seq. #’s and ACKs during 3-way
handshake
Seq. #’s:
Host A Host B
❍ byte stream
“number” of first User Seq=4
2 , ACK
types =79, d
byte in segment’s ‘C’
a ta = ‘C
’
data host ACKs
receipt of
ACKs: ’
‘C ‘C’, echoes
3 , data =
❍ seq # of next 9, A CK=4 back ‘C’
e q= 7
S
byte expected
from other side host ACKs
❍ cumulative ACK receipt Seq=4
of echoed 3, AC
K =80
Q: how receiver handles ‘C’
out-of-order
segments
time
❍ A: TCP spec
simple telnet scenario
doesn’t say, - up
to implementor 3-54
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? ❒ SampleRTT: measured time from
❒ longer than RTT segment transmission until ACK
❍ but RTT varies
receipt
❍ ignore retransmissions
❒ too short: premature
timeout ❒ SampleRTT will vary, want
❍ unnecessary
estimated RTT “smoother”
retransmissions ❍ average several recent
3-55
TCP Round Trip Time and Timeout
EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT
ERTT(0) = 0
ERTT(1) = (1- α)ERTT(0) + αSRTT(1)= αSRTT(1)
3-56
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
3-57
TCP Round Trip Time and Timeout
Setting the timeout (by Jacobson/Karel)
❒ EstimtedRTT plus “safety margin”
❍ large variation in EstimatedRTT -> larger safety margin
❒ first estimate of how much SampleRTT deviates from
EstimatedRTT:
β)*DevRTT +
DevRTT = (1-β
β*|SampleRTT-EstimatedRTT|
(typically, β = 0.25)
3-58
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-59
TCP reliable data transfer
❒ TCP creates rdt ❒ Retransmissions are
service on top of IP’s triggered by:
unreliable service ❍ timeout events
❍ duplicate ack ( for
❒ Pipelined segments performance reason)
(for performance)
❒ Initially consider
❒ Cumulative acks simplified TCP
❒ TCP uses single sender:
retransmission timer ❍ ignore duplicate acks
❍ ignore flow control,
congestion control
3-60
TCP sender events:
data rcvd from app: timeout:
❒ Create segment with ❒ retransmit segment
seq # that caused timeout
❒ seq # is byte-stream ❒ restart timer
number of first data Ack rcvd:
byte in segment ❒ If acknowledges
❒ start timer if not previously unacked
already running (think segments
of timer as for ❍ update what is known
oldest unacked to be acked
segment) ❍ start timer if there
❒ expiration interval: are outstanding
segments
TimeOutInterval
3-61
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
TCP
switch(event)
sender
event: data received from application above
create TCP segment with sequence number NextSeqNum (simplified)
if (timer currently not running)
start timer
pass segment to IP Comment:
NextSeqNum = NextSeqNum + length(data) • SendBase-1: last
cumulatively
event: timer timeout
ack’ed byte
retransmit not-yet-acknowledged segment with
smallest sequence number Example:
start timer • SendBase-1 = 71;
y= 73, so the rcvr
event: ACK received, with ACK field value of y wants 73+ ;
if (y > SendBase) { y > SendBase, so
SendBase = y that new data is
if (there are currently not-yet-acknowledged segments) acked
start timer
}
Seq=9 Seq=9
2 , 8 byt 2 , 8 byt
es da es da
ta ta
Seq=92 timeout
Seq=
1 00, 2
0 byt
es da
timeout
ta
C K=100
A 0
10
X C K
A AC
=
K = 120
loss
Seq=9 Seq=9
2 2 , 8 byt
, 8 byt
es da Sendbase es da
ta
ta = 100
Seq=92 timeout
SendBase
= 120 0
=1 2
=100 A CK
A C K
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
3-63
TCP retransmission scenarios (more)
Host A Host B
Seq=9
2 , 8 byt
es da
ta
=100
timeout
Seq=1 A C K
0 0, 20
bytes
data
X
loss
time
Cumulative ACK scenario
3-64
TCP ACK generation [RFC 1122, RFC 2581]
3-65
Fast Retransmit
❒ If sender receives 3
ACKs for the same
❒ Time-out period data, it supposes that
often relatively long: segment after ACKed
❍ long delay before data was lost:
resending lost packet
❍ fast retransmit: resend
❒ Detect lost segments segment before timer
via duplicate ACKs. expires
❍ Sender often sends
many segments back-
to-back
❍ If segment is lost,
there will likely be
many duplicate ACKs. timeout
3-66
Fast retransmit algorithm:
3-68
TCP Flow Control flow control
sender won’t overflow
receiver’s buffer by
❒ receive side of TCP transmitting too much,
connection has a too fast
receive buffer:
❒ speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
❒ app process may be
slow at reading from
buffer
3-69
TCP Flow control: how it works
❒ Rcvr advertises spare
room by including
value of RcvWindow in
segments
❒ Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order ❍ guarantees receive
segments) buffer doesn’t overflow
❒ spare room in buffer
= RcvWindow This goes to show that the design
= RcvBuffer-[LastByteRcvd - process of header is important!!
LastByteRead]
3-70
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-71
G !!!!
TCP Connection Management FL O ODIN
N
: SY
T
Recall: TCP sender, Three way
OR
T handshake:
AN
P
receiver establish IM
“connection” before Step 1: client host sends TCP SYN
exchanging data segments segment to server
❒ initialize TCP variables: ❍ specifies initial seq #
❍ seq. #s ❍ no data
❍ buffers, flow control Step 2: server host receives SYN,
info (e.g. RcvWindow) replies with SYN-ACK segment
❒ client: connection initiator
❍ server allocates buffers
Socket clientSocket = new
Socket("hostname","port ❍ specifies server initial seq. #
3-72
TCP three-way handshake
Connection SYN=
1, seq
request =clien
t_isn
Connection
r _isn, granted
se rv e
s eq= n+1
N =1, nt_is
SY =clie
Ack
Sy n
=
ACK Ack 0, se
=se q
rve =clien
r_is t_
n+1 isn+1
3-73
TCP Connection Management (cont.)
close
client closes socket: FIN
clientSocket.close();
timed wait
A CK
Step 2: server receives FIN?
FIN, replies with ACK.
Closes connection, sends Sender may have
some data in the
FIN. pipeline!
closed
3-74
TCP Connection Management (cont.)
timed wait
A CK
Note: with small
closed
modification, can handle
simultaneous FINs.
closed
3-75
TCP Connection Management (cont)
TCP server
lifecycle
TCP client
lifecycle
3-76
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control
3-77
Principles of Congestion Control
TCP provides one of the MANY WAYS to perform CC.
Congestion:
❒ informally: “too many sources sending too much
data too fast for network to handle”
❒ different from flow control!
❒ manifestations:
❍ lost packets (buffer overflow at routers)
❍ long delays (queueing in router buffers)
❒ another top-10 problem!
3-78
Causes/costs of congestion: scenario 1
From application To
Host A λout application
λin : original data
❒ two senders, two
receivers
(homogeneous) Host B unlimited shared
output link buffers
❒ one router, infinite
buffers, capacity C
❒ no retransmission
3-79
Causes/costs of congestion: scenario 2
3-80
Causes/costs of congestion: scenario 2
❒ always: λ= λ (goodput)
in out
❒ “perfect” retransmission only when loss: λ > λout
in
❒ retransmission of delayed (not lost) packet makes λ larger
in
λ
(than perfect case) for same out
“costs” of congestion:
❒ more work (retrans) for given “goodput”
❒ unneeded retransmissions: link carries multiple copies of pkt
3-81
Causes/costs of congestion: scenario 3
❒ four senders
Q: what happens as λ
❒ multihop paths in
and λ increase ?
❒ timeout/retransmit in
Host A λout
λin : original data
λ'in : original data, plus
retransmitted data
Host B
3-82
Causes/costs of congestion: scenario 3
H λ
o
o
s
u
t
A t
H
o
s
t
B
3-83
Approaches towards congestion control
3-84
Case study: ATM ABR congestion control
3-85
Case study: ATM ABR congestion control
3-87
TCP Congestion Control
❒ end-end control (no network How does sender
assistance!!!) perceive congestion?
❒ sender limits transmission: ❒ loss event = timeout
LastByteSent-LastByteAcked or 3 duplicate acks
≤ CongWin ❒ TCP sender reduces
❒ Roughly, rate (CongWin) after
CongWin loss event
rate = Bytes/sec
RTT three mechanisms:
❍ AIMD
❒ CongWin is dynamic, function
❍ slow start
of perceived network
❍ conservative after
congestion
timeout events
Note: CC must be efficient to make use of available BW !!!! 3-88
TCP AIMD Human analogy: HK government or CUHK !!
8 Kbytes
time
3-90
TCP Slow Start (more)
❒ When connection Host A Host B
begins, increase rate
one segm
exponentially until ent
RTT
first loss event:
two segm
❍ double CongWin every ents
RTT
❍ done by incrementing
CongWin for every four segm
ents
ACK received
❒ Summary: initial rate
is slow but ramps up
exponentially fast time
3-91
Refinement Philosophy:
• 3 dup ACKs indicates
❒ After 3 dup ACKs: network capable of
❍ CongWin is cut in half delivering some
❍ window then grows linearly segments
❍ This is known as “fast- • timeout before 3 dup
recovery” phase. ACKs is “more alarming”
Implemented in new TCP-
Reno.
❒ But after timeout event:
❍ CongWin instead set to 1
MSS;
❍ window then grows
exponentially threshold,
then grows linearly
3-92
Refinement (more)
Q: When should the
exponential
increase switch to
14
linear?
(segments)
gets to 1/2 of its 8
value before 6
timeout. 4
2 TCP TCP
Implementation: 0 Tahoe Reno
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
❒ Variable Threshold
Transmission round
❒ At loss event, Threshold
is set to 1/2 of CongWin
just before loss event
3-93
Summary: TCP Congestion Control
❒ When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
3-95
TCP throughput
❒ What’s the average throughout ot TCP as
a function of window size and RTT?
❍ Ignore slow start
❒ Let W be the window size when loss
occurs.
❒ When window is W, throughput is
W/RTT
❒ Just after loss, window drops to W/2,
throughput to W/2RTT.
❒ Average throughout: .75 W/RTT
3-96
TCP Futures
❒ Example: 1500 byte segments, 100ms RTT,
want 10 Gbps throughput
❒ Requires window size W = 83,333 in-flight
segments
❒ Throughput in terms of loss rate:
1.22 ⋅ MSS
RTT L
❒ ➜ L = 2·10-10 Wow
❒ New versions of TCP for high-speed needed!
3-97
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should
have average rate of R/K
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
3-98
Why is TCP fair?
Two competing, homogeneous sessions (e.g., similar
propagation delay,..etc) :
❒ Additive increase gives slope of 1, as throughout increases
❒ multiplicative decrease decreases throughput proportionally
R equal bandwidth share
Connection 2 throughput
Connection 1 throughput R
3-99
Fairness (more)
Fairness and UDP Fairness and parallel TCP
❒ Multimedia apps often
connections
do not use TCP ❒ nothing prevents app
❍ do not want rate from opening parallel
throttled by congestion cnctions between 2
control hosts.
❒ Instead use UDP: ❒ Web browsers do this
❍ pump audio/video at
❒ Example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
❍ new app asks for 1 TCP,
❒ Research area: TCP gets rate R/10
friendly ❍ new app asks for 11 TCPs,
gets around R/2 !
3-100
Notation, assumptions:
Delay modeling ❒ Assume one link between client
and server of rate R
Q: How long does it take ❒ S: MSS (bits)
to receive an object ❒ O: object size (bits)
from a Web server ❒ no retransmissions (no loss, no
corruption)
after sending a request?
❒ Protocol’s overhead is negligible.
Delay is influenced by:
Window size:
❒ TCP connection establishment
❒ First assume: fixed congestion
❒ data transmission delay
window, W segments
❒ slow start
❒ Then dynamic window, modeling
❒ Sender’s congestion window slow start
3-101
Fixed congestion window (1): assume
no congestion window constratint Object request
is piggybacked.
Let W denote a ‘fixed’
congestion window size
(a positive integer).
First case:
WS/R > RTT + S/R: ACK
for first segment in
window returns before
window’s worth of data
sent
3-102
Fixed (or static) congestion window (2)
Second case:
❒ WS/R < RTT + S/R:
wait for ACK after
sending window’s worth
of data sent
3-103
TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start
P = min{Q, K − 1}
3-104
TCP Delay Modeling: Slow Start (2)
Delay components: initiate TCP
connection
• 2 RTT for connection
establishment and request request
object
• O/R to transmit object first window
= S/R
• time server idles due to
slow start RTT
second window
= 2S/R
Server idles:
P = min{K-1,Q} times third window
= 4S/R
Example:
• O/S = 15 segments
fourth window
• K = 4 windows = 8S/R
• Q = 2
• P = min{K-1,Q} = 2
3-105
TCP Delay Modeling (3)
S
+ RTT = time from when server starts to send segment
R
until server receives acknowledgement
initiate TCP
S connection
2 k −1 = time to transmit the kth window
R
request
Where k=1,2,….,K object
first window
= S/R
+
S k −1 S
RTT
second window
R + RTT − 2 = idle time after the kth window
R
= 2S/R
P third window
O
delay = + 2 RTT + ∑ idleTime p
= 4S/R
R p =1
P
O S S fourth window
= + 2 RTT + ∑ [ + RTT − 2 k −1 ] = 8S/R
R k =1 R R
O S S
= + 2 RTT + P[ RTT + ] − (2 P − 1) complete
R R R object transmission
delivered
time at
time at server
client
3-106
TCP Delay Modeling (4)
Recall K = number of windows that cover object
How do we calculate K ?
K = min{k : 20 S + 21 S + Λ + 2 k −1 S ≥ O}
= min{k : 20 + 21 + Λ + 2 k −1 ≥ O / S }
k O
= min{k : 2 − 1 ≥ }
S
O
= min{k : k ≥ log 2 ( + 1)}
S
O
= log 2 ( + 1)
S
Calculation of Q, number of idles for infinite-size object,
is similar (see HW).
3-107
Experiment A
S= 536 bytes, RTT=100 msec, O=100 kbytes (relatively large)
1. Slow start adds appreciable delay only when R is high. If R is low, ACK
comes back quickly and TCP quickly ramps up to its maximum rate.
3-108
Experiment B
S= 536 bytes, RTT=100 msec, O = 5 kbytes (relatively small)
1. Slow start adds appreciable delay when R is high and for a relatively
small object.
3-109
Experiment C
S= 536 bytes, RTT=1 sec, O = 5 kbytes (relatively small)
3-110
HTTP Modeling
❒ Assume Web page consists of:
❍ 1 base HTML page (of size O bits)
❍ M images (each of size O bits)
❒ Non-persistent HTTP:
❍ M+1 TCP connections in series
❍ Response time = (M+1)O/R + (M+1)2RTT + sum of idle times
❒ Persistent HTTP:
❍ 2 RTT to request and receive base HTML file
❍ 1 RTT to request and receive M images
❍ Response time = (M+1)O/R + 3RTT + sum of idle times
❒ Non-persistent HTTP with X parallel connections
❍ Suppose M/X integer.
❍ 1 TCP connection for base file
❍ M/X sets of parallel connections for images.
❍ Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle
times
3-111
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and X=5
20
18
16
14
non-persistent
12
10
persistent
8
6
4 parallel non-
persistent
2
0
28 100 1 10
Kbps Kbps Mbps Mbps