0% found this document useful (0 votes)
9 views136 pages

Chapter3 2021

Chapter 3 covers the transport layer of the Internet protocol stack, focusing on services such as multiplexing, reliable data transfer, flow control, and congestion control. It discusses transport layer protocols, specifically UDP for connectionless transport and TCP for connection-oriented reliable transport, along with their functionalities and differences. The chapter also outlines the principles of reliable data transfer and the mechanisms for demultiplexing and error detection in transport protocols.

Uploaded by

notus.cameron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views136 pages

Chapter3 2021

Chapter 3 covers the transport layer of the Internet protocol stack, focusing on services such as multiplexing, reliable data transfer, flow control, and congestion control. It discusses transport layer protocols, specifically UDP for connectionless transport and TCP for connection-oriented reliable transport, along with their functionalities and differences. The chapter also outlines the principles of reliable data transfer and the mechanisms for demultiplexing and error detection in transport protocols.

Uploaded by

notus.cameron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 136

Chapter 3 Transport Layer

1
Layering in Internet protocol
stack
Applications
… built on ... Application

Reliable (or unreliable) transport Transport

… built on ... Network

Best-effort global packet Link


delivery Physical
… built on ...
Best-effort local packet delivery
… built on ...
Physical transfer of bits
Modified from Scott Shenker (UC Berkeley): The Future of Networking, and the
Chapter 3: Our Goals
 understand principles behind transport
layer services:
 multiplexing, de-multiplexing
 reliable data transfer
 flow control
 congestion control
 Learn about transport layer protocols:
 UDP: connectionless transport
 TCP: connection-oriented reliable transport
 TCP congestion control

3-3
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-4
Upper and Lower Layers

Application Process-to-
Transport Process
Network
Host-to-Host
Link

Physical

5
Transport services and protocols
applicatio
n

 provide logical
transport
network
data link

communication physical

lo
between app

gi
ca
l
processes running on

en
d-
en
different hosts

d
tr
a ns
po
tr
applicatio
n
transport
network
data link
physical

3-6
Transport services and protocols
applicatio
n

 transport protocols run in


transport
network
data link

end systems physical

lo
 send side: breaks app

gi
ca
l
messages into segments,

en
d-
passes to network layer

en
d
tr
 rcv side: reassembles

a ns
po
segments into messages,

tr
passes to app layer applicatio
n
transport
network
data link
physical
 more than one transport
protocol available to apps
 Internet: TCP and UDP 3-7
Transport vs. network layer
 transport layer: household analogy:
logical communication 12 kids in Ann’s house sending
between processes letters to 12 kids in Bill’s
 relies on, enhances, house:
network layer services  hosts = houses
 processes = kids
 app messages = letters in
 network layer: envelopes
logical communication  transport protocol = Ann
and Bill who demux to in-
between hosts house siblings
 network-layer protocol =
postal service

3-8
Internet transport-layer protocols
applicatio
 reliable, in-order n
transport

delivery (TCP)
network
data link
physical network

lo
 congestion control
network data link

gi
data link physical

ca
physical

 flow control
network

l en
data link

d-
physical

 connection setup

en
network

d
data link

tr
a
physical
 unreliable, unordered

ns
network

po
data link

delivery: UDP

r
physical

t
network
data link applicatio

 no-frills extension of physical


network
data link
n
transport
network
“best-effort” IP
physical
data link
physical

 services not available:


 delay guarantees
 bandwidth guarantees 3-9
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-10
Multiplexing/demultiplexing
multiplexing at sender:
handle data from multiple demultiplexing at receiver:
sockets, add transport header use header info to deliver
(later used for demultiplexing) received segments to correct
socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

3-11
How demultiplexing works
 host receives IP datagrams 32 bits
 each datagram has source IP source port # dest port #
address, destination IP
address other header fields
 each datagram carries one
transport-layer segment
application
 each segment has source, data
destination port number (payload)
 host uses IP addresses &
port numbers to direct TCP/UDP segment format
segment to appropriate
socket 3-12
Connectionless demultiplexing
 recall: create and bind a socket:
sockfd = socket(AF_INET , SOCK_DGRAM,1) //
serv_addr.sin_port = htons(portno); // local port # is specified
bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));

Note: both destination IP address and destination port # are used


 when host receives UDP IP datagrams with same
segment: dest. port #, but different
 checks destination port # source IP addresses
in segment and/or source port
 directs UDP segment to numbers will be directed
to same socket at dest
socket with that port #

3-13
Connectionless demux: example
DatagramSocket
DatagramSocket serverSocket = new
DatagramSocket DatagramSocket
mySocket2 = new mySocket1 = new
DatagramSocket (6428); DatagramSocket
(9157); application
(5775);
application P1 application
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

source port: 6428 source port: 6428


dest port: 9157 dest port: 5775

source port: 9157 source port: 5775


dest port: 6428 dest port: 6428
3-14
Connection-oriented demux
 TCP socket identified  server host may
by 4-tuple: support many
 source IP address simultaneous TCP
 source port number sockets:
 dest IP address  each socket identified by
 dest port number its own 4-tuple
 demux: receiver
 web servers have
uses all four values different sockets for
to direct segment to each connecting client
appropriate socket  non-persistent HTTP will
have different socket for
each request 3-15
Connection-oriented demux: example
application
application P4 P5 P6 application
P1 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: physical
IP
address
B
host: IP source IP,port: B,80 host: IP
address dest IP,port: A,9157 source IP,port: C,5775 address
A dest IP,port: B,80 C
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,5776
dest IP,port: B,80

three segments, all destined to IP address: B,


est port: 80 are demultiplexed to different sockets
Question: What if UDP
How many sockets on the server side?
application
application application
P1 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: physical
IP
address
B
host: IP source IP,port: B,80 host: IP
address dest IP,port: A,9157 source IP,port: C,5775 address
A dest IP,port: B,80 C
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,5776
dest IP,port: B,80

3-17
Question:
why no checking of dest IP?
 check port# only in the UDP example
 Check port# + source IP in the TCP example
 Correctness of IP address is ensured on the
Networking layer
 Destination IP: (not delivered to the node)

18
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-19
UDP: User Datagram Protocol
[RFC 768]
 “no frills,” “bare bones” Internet transport
protocol

 “best effort” service, UDP segments may


be:
 lost
 delivered out-of-order to app
 connectionless:
 no handshaking between UDP sender, receiver
 each UDP segment handled independently of
others
3-20
UDP: User Datagram Protocol
 UDP usage:
 streaming multimedia apps (loss tolerant,
rate sensitive)
 DNS
 SNMP

 reliable transfer over UDP?


 add reliability at application layer
 application-specific error recovery!
21
UDP: segment header
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
why is there a UDP?
 no connection
application establishment (which can
data add delay)
(payload)  simple: no connection
state at sender, receiver
 small header size
UDP segment format
 no congestion control:
UDP can blast away as fast
as desired

3-22
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
sender: receiver:
 treat segment contents,  compute checksum of
including header fields, received segment
as sequence of 16-bit  check if computed
integers checksum equals checksum
 checksum: addition field value:
(one’s complement sum)  NO - error detected
of segment contents
 sender puts checksum  YES - no error detected.
value into UDP checksum But maybe errors
field nonetheless? More later
….
3-23
Internet checksum: example
example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

Note: when adding numbers, a carryout


from the most significant bit needs to be
added to the result
3-24
After-class practice: UDP checksum
 1st: 0110
 2nd: 0101
 3rd: 1000
 Calculate UDP checksum of 1st + 2nd + 3rd
 sum = 10011, -> 0011 + 1 (carryout) = 0100
 checksum = 1s complement = 1011
 Check: receiving 1011?
 Check: receiving 1001?
 Errors if receiving 1011??
 See the notes for this slide for answers
25
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-26
Principles of reliable data
transfer
 important in application, transport, link
layers
 Reliable transport of packets
 A single sender and a single receiver
 Packet delivery imperfect
 With bit errors, dropping packets, out-of-order
delivery, duplicate copies, long delay, ….

logical end-end reliable transport

sender receiver

X packets received
packets in queue/buffer errors loss

Packet delivery misbehaviors 3-27


Principles of reliable data
transfer
 important in application, transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
3-28
Principles of reliable data transfer
 important in application, transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
3-29
Principles of reliable data
transfer
 important in application, transport, link
layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will


determine complexity of reliable data transfer
protocol (rdt) 3-30
Reliable data transfer: getting started
rdt_send(): called from above, (e.g., by deliver_data(): called by rdt to
app.). Passed data to deliver data to upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet arrives on


to transfer packet over rcv-side of channel
unreliable channel to receiver

3-31
Reliable data transfer: getting started
we’ll:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer
 but control info will flow on both directions!
 use finite state machines (FSM) to specify sender,
receiver
event causing state transition
actions taken on state transition
state: when in this “state”
next state uniquely state state
determined by next 1 event
event 2
actions

3-32
rdt1.0: reliable transfer over a reliable channel
 underlying channel perfectly reliable
 no bit errors
 no loss of packets
 separate FSMs for sender, receiver:
 sender sends data into underlying channel
 receiver reads data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

3-33
“Stop and Wait” Scenario
 Simple setting: one packet at a time (stop and wait)
 One sender, one receiver
 the sender has infinite number of packets to transfer to the
receiver
 the sender starts one-packet transmission at a time, and
will not proceed with the next new packet transmission
until the current packet has been successfully received &
acknowledged by the receiver.

sender receiver

One packet in transit


packets received
packets in the buffer

3-34
“Stop and Wait” Scenario
 We progressively consider more complex cases
 Bit errors
 Packet loss
 Duplicate copies of packets
 Long delay (thus also out of order)
 ….
 Designs: rdt2.0 (initial)  rdt3.0 (stop & wait)

sender receiver

X
packets received
packets in the buffer errors loss

Packet delivery misbehaviors 3-35


rdt2.0: channel with bit errors
 underlying channel may flip bits in packet
 How to detect bit errors?
 question: how to recover from errors?
 acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
 negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
 sender
How do humans
retransmitsrecover from
pkt on receipt of“errors”
NAK
 new mechanisms in rdt2.0 (beyond rdt1.0):
during conversation?
 error detection
 receiver feedback: control msgs (ACK,NAK) rcvr->sender

3-36
rdt2.0: channel with bit errors
 How to detect bit errors in packet?
 Internet checksum algorithm
 How to recover from errors?
 acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
 negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
 sender retransmits packet upon receiving NAK

 new mechanisms in rdt2.0 (beyond rdt1.0):


1) Error detection at receiver
2) Feedback from receiver: control messages (ACK,NAK)
from receiver to sender
3) Retransmission at the sender upon NAK feedback
3-37
rdt2.0 in action
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 pkt1
send pkt1 errors
rcv pkt1 rcv garbled
ack send ack pkt1,
rcv ack nack
send pkt2 pkt2 send NACK
drop pkt1
rcv pkt2 rcv nack
ack send ack Resend pkt1 pkt1
rcv pkt1
ack send ack
rcv ack1
send pkt2 pkt2
(a) no error rcv pkt2
ack send ack

(b) packet with bit errors


3-38
rdt2.0: FSM specification
rdt_send(data)
sndpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L
call from
below
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-39
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-40
rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-41
rdt2.0 has a fatal flaw!
what happens if handling
ACK/NAK duplicates:
corrupted?  sender retransmits
 sender doesn’t know current pkt if
what happened at
receiver! ACK/NAK corrupted
 sender adds
 can’t just retransmit:
possible duplicate sequence number to
each pkt
 receiver discards
stop and wait
sender sends(doesn’t
one deliver up)
packet, duplicate pkt
then waits for
receiver
response 3-42
rdt2.0’s flaw: garbled ACK/NACK
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack send ack errors
send pkt2 rcv garbled pkt1
nack
how to know? Pkt2
resend pkt1 send NACK
how to know? Pkt1

(a) Corrupted ack (b) Corrupted NACK

imply retransmitting upon corrupted ACK/NACK is not sufficient


Sender cannot tell whether it is corrupted ACK or NACK!
3-43
rdt2.1: need seq #!
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack send ack rcv garbled pkt1
rcv garbled drop pkt 1
resend pkt1 pkt1 rcv dup pkt1 nack
send NACK
drop dup pkt1 rcv garbled pkt1
ack send ack resend pkt1
rcv ack rcv pkt1
send pkt2 pkt2 ack send ack
rcv pkt2 rcv ack
ack send ack send pkt2 pkt2
rcv pkt2
ack send ack

(a) Corrupted ack (b) Corrupted NACK


3-44
rdt2.1: sender, handles garbled
ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

3-45
rdt2.1: receiver, handles garbled
ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

3-46
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 Reliable nothing
channel
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 handling fatal flaw with rdt 2.0:
(4)need seq #. for each packet

3-47
rdt2.1: sender, handles garbled
ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

3-48
rdt2.1: receiver, handles garbled
ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

3-49
Rdt2.1 discussion
 Rdt2.1 mechanisms
1. Error detection (checksum)
2. Feedback (ACK and NAK)
3. Retransmission
4. Seq number (fresh or duplicate packets)
 Q1: How many bits are needed for seq#?
 two seq. #’s (0,1) will suffice. Why?
 Under various scenarios to send 3 packets:
(1) all ACK, no error, (2) ACK 0 (1st time)
corrupted, (3) NAK 0 (1st time) corrupted, (4)
ACK 0 and ACK 1, both corrupted for the first
time
 Q2: Do we still need NAK? If not, how? 3-50
Rdt2.1 discussion:
how many bits for a seq
number?
sender: receiver:
 seq # added to pkt  must check if received
 two seq. #’s (0,1) will packet is duplicate
suffice. Why?  state indicates whether
0 or 1 is expected pkt
seq #
 must check if received
ACK/NAK corrupted  Note: receiver can not
 twice as many states know if its last
 state must
ACK/NAK is received
“remember” whether
“expected” pkt should
OK at sender or not
have seq # of 0 or 1
3-51
rdt2.1: 1-bit seq # is enough!
Sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack send ack rcv garbled pkt1
rcv ack1 drop pkt1
send pkt0 pkt0 NACK
rcv pkt0 rcv NACK send NACK
(new pkt!)
ack send ack resend pkt1 pkt1
rcv pkt1
ack send ack
rcv ack
send pkt0 pkt0
(a) no error (new pkt!) rcv pkt0
ack send ack

(b) packet with bit errors


3-52
rdt2.2: a NAK-free protocol
 same functionality as rdt2.1, using ACKs
only

 instead of NAK, receiver sends ACK for last


pkt received OK
 receiver must explicitly include seq # of pkt
being ACKed

 duplicate ACK at sender results in same


action as NAK: retransmit current pkt
3-53
rdt2.2: NAK-free
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack0 send ack0 ack0 send ack0
rcv ack0 rcv ack0
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack1 send ack1 rcv garbled pkt1
rcv garbled drop pkt 1
resend pkt1 pkt1 rcv pkt1 (dup) ack0
rcv dup ack0 send ack0
drop dup pkt1
ack1 resend pkt1 pkt1
send ack1 rcv pkt1
rcv ack1
send pkt0 pkt0 ack1 send ack1
rcv pkt0 rcv ack1
ack0 send ack0 send pkt0 pkt0
rcv pkt0
ack0 send ack0

(a) Corrupted ack (b) dup ack for garbled pkt


3-54
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || L
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) 3-55
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 Reliable nothing
channel
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 (4)seq# (1 bit, 0/1) for each pkt
(fatal flaw)
rdt2.2 Same as 2.0 A variant to rdt2.1 (no NAK)
Duplicate ACK = NAK
3-56
rdt3.0: channels with errors and loss
new assumption: approach: sender waits
underlying channel “reasonable” amount of
can also lose packets time for ACK (timer)
(data, ACKs)  retransmits if no ACK
 checksum, seq. #, ACKs, received in this time
retransmissions will be
 if pkt (or ACK) just delayed
(not lost):
of help … but not  retransmission will be
enough
duplicate, but seq. #’s
already handles this
 receiver must specify seq
# of pkt being ACKed
 requires countdown timer

3-57
rdt3.0
sender rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer L
L Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) L
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(1, data, checksum)
( corrupt(rcvpkt) || udt_send(sndpkt)
isACK(rcvpkt,0) ) start_timer
L

3-58
rdt3.0 in action
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack0 send ack0 ack0 send ack0
rcv ack0 rcv ack0
send pkt1 pkt1 send pkt1 pkt1
rcv pkt1 X
ack1 send ack1 loss
rcv ack1
send pkt0 pkt0
rcv pkt0 timeout
ack0 send ack0 resend pkt1 pkt1
rcv pkt1
ack1 send ack1
rcv ack1
send pkt0 pkt0
(a) no loss rcv pkt0
ack0 send ack0

(b) packet loss


3-59
rdt3.0 in action sender receiver
sender receiver send pkt0 pkt0
send pkt0 pkt0 rcv pkt0
ack0 send ack0
rcv pkt0
send ack0 rcv ack0
ack0 send pkt1 pkt1
rcv ack0 rcv pkt1
send pkt1 pkt1
rcv pkt1 send ack1
ack1 ack1
send ack1
X
loss timeout
resend pkt1 pkt1
rcv pkt1
timeout
resend pkt1 pkt1 rcv ack1 pkt0 (detect duplicate)
rcv pkt1 send pkt0 send ack1
(detect duplicate) ack1
ack1 send ack1 rcv ack1 rcv pkt0
rcv ack1 ack0 send ack0
pkt0 send pkt0 pkt0
send pkt0 rcv pkt0
rcv pkt0 ack0 (detect duplicate)
ack0 send ack0 send ack0

(c) ACK loss (d) premature timeout/ delayed ACK

3-60
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 Reliable channel nothing
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 (4)seq# (1 bit) for each pkt
rdt2.2 Same as 2.0 A variant to rdt2.1 (no NAK)
Unexpected ACK = NAK
ACK0 = ACK for pkt0, NAK for pkt1

Rdt3.0 Bit errors + (5) retransmission upon timeout


loss No NAK, only ACK 3-61
Performance of rdt3.0
 rdt3.0 is correct, but performance stinks
 e.g.: 1 Gbps link, 15 ms prop. delay, 8000
bit packet:
L 8000 bits
Dtrans = R = 9 = 8 microsecs
10 bits/sec
 U sender: utilization – fraction of time sender busy
sending L/R .008
U = 0.00027
sender = =
30.008
RTT + L / R
 if RTT=30 msec, 1KB pkt every 30 msec:
33kB/sec thruput over 1 Gbps link
 network protocol limits use of physical
resources!
3-62
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

U L/R .008
sender = = = 0.00027
RTT + L / R 30.008

3-63
Pipelined protocols
pipelining: sender allows multiple,
“in-flight”, yet-to-be-acknowledged
pkts
 range of sequence numbers must be
increased
 buffering at sender and/or receiver

 two generic forms of pipelined


protocols: go-Back-N, selective 3-64
Pipelining: increased
utilization sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
3-packet pipelining increases
utilization by a factor of 3!

U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008

3-65
Pipelined protocols:
overview
Go-back-N: Selective Repeat:
 sender can have up  sender can have
to N unacked up to N unack’ed
packets in pipeline packets in pipeline
 receiver only sends  rcvr sends
cumulative ack individual ack for
 doesn’t ack packet if each packet
there’s a gap
 sender has timer
for oldest unacked  sender maintains
packet timer for each
 when timer expires, unacked packet
retransmit all  when timer expires,
retransmit only that
3-66
Go-Back-N: sender
 k-bit seq # in pkt header
 “window” of up to N, consecutive unack’ed pkts allowed

 ACK(n): ACKs all pkts up to, including seq # n - “cumulative


ACK”
 may receive duplicate ACKs (see receiver)
 timer for oldest in-flight pkt
 timeout(n): retransmit packet n and all higher seq # pkts in
window
3-67
GBN: Sender
 sender can have up to N unacked packets in
pipeline
 if data from the above AND nextseqnum <
base+N,
 Send(packet)
 Nextseqnum++
 Start timer (for one oldest unacked packet)
 If Timeout, retransmit all unacked packets
 [base, nextseqnum-1]
 If receiving ACK, update base
 base = getacknum(rcvpkt)+1
 If base==nextseqnum, stop timer
 If receiving corrupted ACK, do nothing
68
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
L else
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-
1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
3-69
GBN: receiver extended
FSM default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
L && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for


correctly-received pkt with highest
in-order seq #
 may generate duplicate ACKs
 need only remember expectedseqnum
 out-of-order pkt:
 discard (don’t buffer): no receiver 3-70
GBN in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

3-71
Selective Repeat
 receiver individually acknowledges all
correctly received pkts
 buffers pkts, as needed, for eventual in-order
delivery to upper layer
 sender only resends pkts for which ACK not
received
 sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 limits seq #s of sent, unACKed pkts

3-72
Selective repeat: sender, receiver windows

3-73
Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
 if next available seq # in  send ACK(n)
window, send pkt  out-of-order: buffer
timeout(n):  in-order: deliver (also
 resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
 mark pkt n as received
 if n smallest unACKed pkt,
pkt n in [rcvbase-N,rcvbase-1]
advance window base to
 ACK(n)
next unACKed seq # otherwise:
 ignore

3-74
Selective repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

3-75
After-class Practice: GBN vs SR
 How many unique seq# may appear in GBN
and SR, respectively?
 N=2
 GBN: sender [4,5], what is the expected
number at the receiver? [4, 5, 6]
• No error
GBN: give the expected number
• ACK 4 is lost x, the sender window will be [x-
• ACK 4 and ACK 5 are lost 2, x-1], [x-1, x], [x, x+1]
 Given the expected number 6, how to infer the
sender window?
 How about SR (expected window)? [4,5], [5,6], [6,7]
 What if we have N+1 sequence numbers for
SR? 3-76
sender window receiver window
Selective repeat: (after receipt) (after receipt)

dilemma (N+1) 0123012


0123012
pkt0
pkt1 0123012
0123012 pkt2 0123012
example: 0123012
pkt3
 window size=3 0123012
0123012
X
 seq #’s: 0, 1, 2, 3 pkt0 will accept packet
with seq number 0
(a) no problem
 receiver sees no
difference in two receiver can’t see sender side.
scenarios! receiver behavior identical in both cases!
something’s (very) wrong!
 duplicate data
accepted as new in 0123012 pkt0
(b) 0123012 pkt1 0123012
0123012 pkt2 0123012
X
Q: what relationship X
0123012

between seq # size timeout


retransmit pkt0 X
and window size to 0123012 pkt0
avoid problem in (b)? will accept packet
with seq number 0
(b) oops!
2N 3-77
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 No error/loss nothing
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 (4)seq# (1 bit) for each pkt
rdt2.2 Same as 2.0 (no NAK): Unexpected ACK = NAK
Rdt3.0 errors + loss (5)Retransmission upon timeout; ACK-
only
Performance issue: low utilization
Go-back- Same as 3.0 N sliding window (pipeline)
N Discard out-of-order pkts (recovery)
Selective Same as 3.0 N sliding window,
Repeat selective recovery 3-78
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-79
TCP: Overview RFCs: 793,1122,1323, 2018, 2581

 point-to-point:  full duplex data:


 one sender, one  bi-directional data
receiver flow in same
 reliable, in-order byte connection
steam:  MSS: maximum
 no “message segment size
boundaries”  connection-
 pipelined: oriented:
 TCP congestion and  handshaking
flow control set window (exchange of control
size msgs) inits sender,
receiver state before
data exchange 3-80
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

3-81
TCP seq. numbers, ACKs
sequence number: outgoing segment from sender
source port # dest port #
byte stream “number” sequence number

of first byte in segment’s


acknowledgement number
rwnd

data checksum urg pointer

window size
Acknowledgement #.: N

seq # of next byte


expected from other sender sequence number space

side sent sent, not- usable not


cumulative ACK ACKed yet
ACKed
but not usable
yet sent

Q: how receiver
(“in-
flight”)
incoming segment to sender
handles out-of-order source port # dest port #
sequence number

segments acknowledgement number


A rwnd
A: TCP spec doesn’t say, checksum urg pointer

- up to implementation 3-82
TCP seq. numbers, ACKs
Host A Host B

User
types
‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80

simple telnet scenario

What if sending “ABC”?


3-83
TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
 SampleRTT: measured time from
timeout value? segment transmission until ACK
 longer than RTT receipt
 but RTT varies  ignore retransmissions
• Karn’s algorithm: TCP
 too short: ignores RTTs of
premature retransmitted segments
timeout, • Why? Avoid ACK
unnecessary ambiguity.
retransmissions – ACK for which transmitted
segment? Original segment
 too long: slow or retransmitted segment?
reaction to  SampleRTT will vary, want
estimated RTT “smoother”
segment loss 3-84
TCP round trip time, timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
 exponential weighted moving average
 influence of past sample decreases
exponentially fast
 typical value:  = 0.125
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr


RTT (milliseconds)

300

250
RTT (milliseconds)

200

sampleRTT
150

EstimatedRTT

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time 3-85
SampleRTT Estimated RTT
TCP round trip time,
timeout
 timeout interval: EstimatedRTT plus
“safety margin”
 large variation in EstimatedRTT -> larger safety
margin
DevRTT =
estimate
(1-)*DevRTT +
SampleRTT deviation from
EstimatedRTT:*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

3-86
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-87
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 pipelined segments
let’s initially consider
 cumulative acks simplified TCP
 single retransmission sender:
timer  ignore duplicate acks
 retransmissions  ignore flow control,
triggered by: congestion control
 timeout events
 duplicate acks
3-88
TCP sender events:
data rcvd from app: timeout:
 create segment  retransmit segment
with seq # that caused
 seq # is byte- timeout
stream number of  restart timer
first data byte in ack rcvd:
segment  if ack acknowledges
 start timer if not
previously unacked
already running segments
 think of timer as for  update what is
oldest unacked known to be ACKed
segment  start timer if there
 expiration interval: are still unacked 3-89
TCP: retransmission scenarios
Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
timeout

timeout
Seq=100, 20 bytes of data
ACK=100
X
ACK=100
ACK=120

Seq=92, 8 bytes of data Seq=92, 8


SendBase=100 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout


3-91
TCP: retransmission scenarios
Host A Host B

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

ACK=100
X
ACK=120

Seq=120, 15 bytes of data

cumulative ACK
3-92
TCP ACK generation [RFC 1122, RFC
2581]

event at receiver TCP receiver action


arrival of in-order segment with delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

arrival of in-order segment with immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

arrival of out-of-order segment immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

arrival of segment that immediate send ACK, provided that


partially or completely fills gap segment starts at lower end of gap

3-93
TCP fast retransmit
 time-out period TCP fast retransmit
often relatively long: if sender receives 3
 long delay before dup ACKs for same
resending lost packet data
 detect lost segments (“triple duplicate
ACKs”), resend
via duplicate ACKs. unacked segment
 sender often sends with smallest seq #
many segments back-  likely that unacked
to-back segment lost, so
 if segment is lost, don’t wait for
there will likely be timeout
many duplicate ACKs.  Why wait for 3 dup
ACKs? Why not 3-94
TCP fast retransmit
Host A Host B

Seq=92, 8 bytes of data


Seq=100, 20 bytes of data
X

ACK=100
timeout

ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data

fast retransmit after sender


receipt of triple duplicate ACK
3-95
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
lectures (not textbook)
3-96
TCP flow control application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is
delivering TCP
(sender is code
sending)

IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by from sender

transmitting too much, too fast receiver protocol stack

3-97
TCP flow control
 receiver “advertises”
free buffer space by to application process
including rwnd value in
TCP header of receiver-
buffered data
to-sender segments RcvBuffer

 RcvBuffer size set via rwnd free buffer space


socket options (typical
default is 4096 bytes)
 many operating systems TCP segment payloads

autoadjust RcvBuffer receiver-side buffering


 sender limits amount of
unacked (“in-flight”)
data to receiver’s rwnd
3-98
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
lectures (not textbook)
3-99
Connection Management
before exchanging data, sender/receiver “handshake”:
 agree to establish connection (each knowing the other
willing to establish connection)
 agree on connection parameters

application application

connection state: connection state:


ESTAB ESTAB
connection variables: connection Variables:
seq # client-to- seq # client-to-
server server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
network
at server,client network
at server,client

Socket clientSocket = Socket connectionSocket =


newSocket("hostname","port welcomeSocket.accept();
number");
3-100
Agreeing to establish a connection
2-way handshake:
Q: will 2-way
handshake always
Let’s talk work in network?
ESTAB
OK
 variable delays
ESTAB  retransmitted messages
(e.g. req_conn(x)) due to
message loss
 message reordering
choose x
req_conn(x)  can’t “see” other side
ESTAB
acc_conn(x)
ESTAB

3-101
Agreeing to establish a connection
2-way handshake failure scenarios:

choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn( req_conn(
x) x)
ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1
data(x+1) )
connection connection
client x completes server x completes server
client
terminat forgets x terminat forgets x
es req_conn(x)
es

ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1
(no client!) )
3-102
TCP 3-way handshake
client state server state
LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB

3-103
How to set SYNC, ACK bit?
32 bits

source port # dest port #


sequence number
ACK: ACK #
valid acknowledgement number
head not
len used
UAP R S F receive window
checksum Urg data pointer

RST, SYN, FIN: options (variable length)


connection estab
(setup, teardown
commands)
application
data
(variable length)

3-104
TCP: closing a connection
 client, server each close their side of
connection
 send TCP segment with FIN bit = 1
 respond to received FIN with ACK
 on receiving FIN, ACK can be combined with
own FIN
 simultaneous FIN exchanges can be
handled

3-106
TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime

CLOSED

3-107
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-108
Principles of Congestion Control

Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!

 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queuing in router buffers)

 a top-10 problem!
3-109
Approaches towards congestion
control
Two broad approaches towards congestion control:
End-end congestion Network-assisted
control: congestion control:
 no explicit feedback  routers provide
from network feedback to end
 congestion inferred systems
from end-system  single bit indicating
observed loss, delay congestion (SNA,
 approach taken by TCP DECbit, TCP/IP ECN,
ATM)
 explicit rate sender
should send at
TCP Congestion Control
 Idea
 Assumes best-effort network
 Each source determines network capacity for
itself
 Implicit feedback via ACKs or timeout events
 ACKs pace transmission (self-clocking)

 Challenge
 Determining initial available capacity
 Adjusting to changes in capacity in a timely
manner
Recall in selective repeat protocol
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Congestion control: adjust the sender window size!


3-112
TCP Congestion Control
 Basic idea
 Add notion of congestion window
• Effective window (for selective repeat
reliable transfer) is the smaller of
– Advertised window (flow control) rwnd
– Congestion window (congestion control) cwnd
 Changes in congestion window size
• Slow increases to absorb new bandwidth
• Quick decreases to eliminate congestion
TCP Congestion Control
 sender limits transmission: How does sender perceive
LastByteSent-LastByteAcked congestion?
 cwnd  loss event = timeout or 3
sender sequence number space duplicate ACKs
cwnd
 TCP sender reduces rate
(cwnd) after loss event
last byte last byte
three mechanisms:
ACKed sent, not-yet
sent  AIMD:
ACKed
(“in-flight”)
how to grow cwnd
 slow start: startup
 cwnd is dynamic, function of
 conservative after loss
perceived network congestion (timeout, duplicate ACKs)
events
AIMD Rule: additive increase,
multiplicative decrease
 Approach: increase transmission rate (window size),
probing for usable bandwidth, until loss occurs
 additive increase: increase cwnd by 1 MSS every
RTT until loss detected
 multiplicative decrease: cut cwnd by 50% after
congestion

loss
window
congestion window size

24 K bytes

Saw tooth
16 K bytes
behavior: probing
for bandwidth
8 K bytes

time
time
Why AIMD? TCP Fairness
Two competing sessions:
 Additive increase gives slope of 1, as throughout
increases
 multiplicative decrease decreases throughput
proportionally
R equal bandwidth share
Connection 2 throughput

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R
TCP Congestion Control (RFC 5681)

How to implement TCP Congestion Control?

Multiple algorithms work together:


 slow start: how to jump-start
 congestion avoidance: additive increase
 fast retransmit/fast recovery: recover from
single packet loss: multiplicative decrease
 retransmission upon timeout: conservative
loss/failure handling
Trace for TCP Congestion
Window

Transport Layer 3-118


TCP Slow Start
 When connection • When connection
begins, cwnd  2 begins, increase rate
MSS, typically, set exponentially fast until
cwnd = 1MSS cwnd reaches a
 Example: MSS = 500 threshold value: slow-
bytes & RTT = 200 start-threshold
msec ssthresh
 initial rate = 20 kbps  cwnd < ssthresh
 available bandwidth
may be >> MSS/RTT
 desirable to quickly
ramp up to
TCP Slow Start (more)
 When connection Host A Host B
begins, increase rate one s e gm
exponentially when ent

RTT
cwnd<ssthresh two segm
en ts
 Goal: double cwnd
every RTT by setting
 Action: cwnd += 1 MSS four segm
ents

for every ACK received


 Summary: initial rate is
slow but ramps up time
exponentially fast
TCP: switching from slow start to CA
Q: when should the
exponential
increase switch to
linear?
A: when cwnd gets
to 1/2 of its value
before timeout.

Implementation:
 variable ssthresh
 on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Transport Layer 3-121
Congestion Avoidance
 Goal: increase cwnd by 1 MSS per RTT until
congestion (loss) is detected

 Conditions: when cwnd > ssthresh and no loss


occurs

 Actions: cwnd += (MSS/cwnd)*MSS (bytes)


upon every incoming non-duplicate ACK
TCP Congestion Control
Algoritms condition Design action

Slow Start cwnd <= ssthresh; cwnd doubles per RTT cwnd+=1MSS per ACK

Congestion cwnd++ per RTT cwnd+=1/cwnd * MSS per


Avoidance cwnd > ssthresh (additive increase) ACK
When loss occurs
 Detecting losses and reacting to them:

 through duplicate ACKs


• fast retransmit / fast recovery
– Goal: multiplicative decrease cwnd upon loss

 through retransmission timeout


– Goal: reset everything
Fast Retransmit/Fast Recovery
Philosophy:
 fast retransmit: to detect and repair
loss, based on incoming duplicate  3 dup ACKs to infer losses
ACKs and differentiate from
 use 3 duplicate ACKs to transient out-of-order
infer packet loss delivery
 What about only 1 or 2
 set ssthresh = max(cwnd/2,
dup ACKs?
2MSS)  Do nothing; this allows for
 cwnd = ssthresh + 3MSS transient out-of-order
delivery
 retransmit the lost packet
 fast recovery: governs the  receiving each duplicate
transmission of new data until a ACK indicates one more
non-duplicate ACK arrives packet left the network
 increase cwnd by 1 MSS upon every and arrived at the receiver
Putting them together
 Initially, fastretx = false;
 If upon 3rd duplicate ACK
 ssthresh = max (cwnd/2, 2*MSS)
 cwnd = ssthresh + 3*MSS
• why add 3 packets here?
 retransmit the lost TCP packet
 Set fastretx = true;
 If fastretx == true; upon each additional duplicate ACK
 cwnd += 1 MSS
 transmit a new packet if allowed
• by the updated cwnd and rwnd
 If fastretx == true; upon a new (i.e., non-duplicate) ACK
 cwnd = ssthresh
 Fastretx = false; // After fast retx/fast recovery, cwnd decreases by half
Retransmission Timeout
when retransmission timer expires
 ssthresh = max ( cwnd/2, 2*MSS)
• cwnd should be flight size to be more accurate
• see RFC 2581

 cwnd = 1 MSS

 retransmit the lost TCP packet

 why resetting?
 heavy loss detected
TCP Congestion Window
Trace

Transport Layer 3-128


TCP Congestion Control
Algoritms condition Design action
Slow Start cwnd <= ssthresh; cwnd doubles per RTT cwnd+=1MSS per ACK
Congestion cwnd++ per RTT cwnd+=1/cwnd * MSS per
Avoidance cwnd > ssthresh (additive increase) ACK
reduce the cwnd by ssthresh = max(cwnd/2,2)
fast half (multicative cwnd = ssthresh + 3 MSS;
retransmit 3 duplicate ACK decreasing) retx the lost packet
finish the 1/2 reduction
fast receiving a new of cwnd in fast cwnd = ssthresh;
recovery ACK after fast retx retx/fast recovery tx if allowed by cwnd

upon a dup ACK cwnd +=1MSS;


after fast retx ("transition phrase) Note: it is different from
before fast recovery slow start.
ssthresh = max(cwnd/2,2)
RTO cwnd = 1;
timeout time out Reset everything retx the lost packet
Practice
 The receiver acknowledges every segment, and the
sender always has data to transmit.
 Initially ssthresh at the sender is set to 4. Assume cwnd
and ssthresh are measured in segments.

 Assumptions for simplification


When ssthresh ==cwnd, use slow start algorithm
In congestion avoidance, let us set cwnd = cwnd + 1/[cwnd]
All data delivery is done in segments, so we can send the interger
number of segments (for example, cwnd = 2.5MSS, we can send 2
segments)
All out-of-order segments will be buffered at the receiver side
Illustrative Example
Example Setting
 Use all following TCP congestion control algorithms:
 Slow start
 Congestion avoidance (CA)
 Fast retransmit/fast recovery
 Retransmission timeout (say, RTO=500ms)
 When cwnd=ssthresh, use slow start algorithm (instead of CA)
 Assume rwnd is always large enough, then the window size by selective
repeat (SR) is =cwnd
 Assume 1 acknowledgement per packet, and we use TCP cumulative
ACK (i.e., ACK # = (largest sequence # received in order at the receiver +
1) )
 Assume each packet size is 1 unit (1B) for simple calculation
 TCP sender has infinite packets to send, 1, 2, 3, 4, 5,….
 Assume packet #5 is lost once

We will how TCP congestion control algorithms work together with SR


CC algorithm SR after algo runs
1
slow start cwnd =1 pkt 1
ssh =4 1 1
ack
2
slow start (upon ack2) cwnd =1+1=2 2 pkt 2
3
ssh =4 2 3 pkt 3 2
3
ack3
slow start (upon ack3) cwnd =2+1=3 4 pkt 4
ssh =4 3 4 5 5 pkt 5 4
ack4
slow start (upon ack4) cwnd =3+1=4 6 X
7 pkt 6
ssh =4 4 5 6 7 ack5 pkt 7 6
slow start (upon ack5 )cwnd =4+1=5 8 7
pkt 8
ssh =4 9
ack5 (1 dup)
st
5 6 7 8 9 pkt 9 8
)
ack5 (2 dup
nd
Do nothing upon ack5 (1st dup ) 9
Do nothing upon ack5 (2nd| dup )
)
ack5 (3 dup
rd

Fast retransmit (upon 3 dup ack5 )


5

Pkt 5
)
ack5 (4 dup
th
5 6 7 8 9
5
Fast recovery w/ additional dup ACK (upon 4 th dup)
10
ssh = 2, cwnd = 5 +1 =6 Pkt 10
send pkt 10
ack10 10
5 6 7 8 9 10 ack11

133
CC algorithm SR after algo runs
Pkt 5
Fast recovery w/ additional dup ACK (upon 4 th dup) 5
ssh = 2, cwnd = 5 +1 =6; send 10
pkt 10 5 6 7 8 9 10
Pkt 10
Fast recovery w/ a new ACK (upon ack10)
ssh =2 cwnd = ssh = 2 CK)
10
Fast retx/fast recovery is over Ack10 (new A

1011 11 Pkt 11
11
Slow start also upon ack10 12 pkt 12
12
ssh =2 cwnd =2 + 1 = 3
Send new packet 12
101112 ack 11
Pkt 13
Congestion avoidance upon ack11 13 13
ssh =2
Ack12
111213
Pkt 14
Congestion avoidance upon ack 12 14 14
ssh =2
Ack13
121314
Congestion avoidance upon ack 13 15 Pkt 15
16 Pkt 16 15
ssh =2 cwnd = 3 + 3/3=4 16
Send packets 15, 16 Ack14
16
13141516 Ack15 Ack
Ack17

134
Practice:
TCP Congestion Window Trace
Putting Things Together in TCP
 use selective repeat to do reliable data
transfer for a window of packets win at any
time

 update win = min (cwnd, rwnd)


 cwnd is updated by TCP congestion control
 rwnd is updated by TCP flow control

 Example: cwnd = 20; rwnd = 10


 Then win=10
TCP throughput
 avg. TCP thruput as function of window size, RTT?
 ignore slow start, assume always data to send
 W: window size (measured in bytes) where loss occurs
 avg. window size (# in-flight bytes) is ¾ W
 avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT

W/2

Trans
TCP Futures: TCP over “long, fat pipes”
 example: 1500 byte segments, 100ms RTT, want
10 Gbps throughput
 requires W = 83,333 in-flight segments
 throughput in terms of segment loss probability, L
[Mathis 1997]:
1.22 . MSS
TCP throughput =
RTT L

➜ to achieve 10 Gbps throughput, need a loss rate of L =


2·10-10 – a very small loss rate!
 new versions of TCP for high-speed

Trans

You might also like