Chapter3 2021
Chapter3 2021
1
Layering in Internet protocol
stack
Applications
… built on ... Application
3-3
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-4
Upper and Lower Layers
Application Process-to-
Transport Process
Network
Host-to-Host
Link
Physical
5
Transport services and protocols
applicatio
n
provide logical
transport
network
data link
communication physical
lo
between app
gi
ca
l
processes running on
en
d-
en
different hosts
d
tr
a ns
po
tr
applicatio
n
transport
network
data link
physical
3-6
Transport services and protocols
applicatio
n
lo
send side: breaks app
gi
ca
l
messages into segments,
en
d-
passes to network layer
en
d
tr
rcv side: reassembles
a ns
po
segments into messages,
tr
passes to app layer applicatio
n
transport
network
data link
physical
more than one transport
protocol available to apps
Internet: TCP and UDP 3-7
Transport vs. network layer
transport layer: household analogy:
logical communication 12 kids in Ann’s house sending
between processes letters to 12 kids in Bill’s
relies on, enhances, house:
network layer services hosts = houses
processes = kids
app messages = letters in
network layer: envelopes
logical communication transport protocol = Ann
and Bill who demux to in-
between hosts house siblings
network-layer protocol =
postal service
3-8
Internet transport-layer protocols
applicatio
reliable, in-order n
transport
delivery (TCP)
network
data link
physical network
lo
congestion control
network data link
gi
data link physical
ca
physical
flow control
network
l en
data link
d-
physical
connection setup
en
network
d
data link
tr
a
physical
unreliable, unordered
ns
network
po
data link
delivery: UDP
r
physical
t
network
data link applicatio
application
3-11
How demultiplexing works
host receives IP datagrams 32 bits
each datagram has source IP source port # dest port #
address, destination IP
address other header fields
each datagram carries one
transport-layer segment
application
each segment has source, data
destination port number (payload)
host uses IP addresses &
port numbers to direct TCP/UDP segment format
segment to appropriate
socket 3-12
Connectionless demultiplexing
recall: create and bind a socket:
sockfd = socket(AF_INET , SOCK_DGRAM,1) //
serv_addr.sin_port = htons(portno); // local port # is specified
bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));
3-13
Connectionless demux: example
DatagramSocket
DatagramSocket serverSocket = new
DatagramSocket DatagramSocket
mySocket2 = new mySocket1 = new
DatagramSocket (6428); DatagramSocket
(9157); application
(5775);
application P1 application
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical
3-17
Question:
why no checking of dest IP?
check port# only in the UDP example
Check port# + source IP in the TCP example
Correctness of IP address is ensured on the
Networking layer
Destination IP: (not delivered to the node)
18
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-19
UDP: User Datagram Protocol
[RFC 768]
“no frills,” “bare bones” Internet transport
protocol
length checksum
why is there a UDP?
no connection
application establishment (which can
data add delay)
(payload) simple: no connection
state at sender, receiver
small header size
UDP segment format
no congestion control:
UDP can blast away as fast
as desired
3-22
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
sender: receiver:
treat segment contents, compute checksum of
including header fields, received segment
as sequence of 16-bit check if computed
integers checksum equals checksum
checksum: addition field value:
(one’s complement sum) NO - error detected
of segment contents
sender puts checksum YES - no error detected.
value into UDP checksum But maybe errors
field nonetheless? More later
….
3-23
Internet checksum: example
example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
sender receiver
X packets received
packets in queue/buffer errors loss
send receive
side side
3-31
Reliable data transfer: getting started
we’ll:
incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify sender,
receiver
event causing state transition
actions taken on state transition
state: when in this “state”
next state uniquely state state
determined by next 1 event
event 2
actions
3-32
rdt1.0: reliable transfer over a reliable channel
underlying channel perfectly reliable
no bit errors
no loss of packets
separate FSMs for sender, receiver:
sender sends data into underlying channel
receiver reads data from underlying channel
sender receiver
3-33
“Stop and Wait” Scenario
Simple setting: one packet at a time (stop and wait)
One sender, one receiver
the sender has infinite number of packets to transfer to the
receiver
the sender starts one-packet transmission at a time, and
will not proceed with the next new packet transmission
until the current packet has been successfully received &
acknowledged by the receiver.
sender receiver
3-34
“Stop and Wait” Scenario
We progressively consider more complex cases
Bit errors
Packet loss
Duplicate copies of packets
Long delay (thus also out of order)
….
Designs: rdt2.0 (initial) rdt3.0 (stop & wait)
sender receiver
X
packets received
packets in the buffer errors loss
3-36
rdt2.0: channel with bit errors
How to detect bit errors in packet?
Internet checksum algorithm
How to recover from errors?
acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
sender retransmits packet upon receiving NAK
3-39
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
3-40
rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
3-41
rdt2.0 has a fatal flaw!
what happens if handling
ACK/NAK duplicates:
corrupted? sender retransmits
sender doesn’t know current pkt if
what happened at
receiver! ACK/NAK corrupted
sender adds
can’t just retransmit:
possible duplicate sequence number to
each pkt
receiver discards
stop and wait
sender sends(doesn’t
one deliver up)
packet, duplicate pkt
then waits for
receiver
response 3-42
rdt2.0’s flaw: garbled ACK/NACK
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack send ack errors
send pkt2 rcv garbled pkt1
nack
how to know? Pkt2
resend pkt1 send NACK
how to know? Pkt1
3-45
rdt2.1: receiver, handles garbled
ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
3-46
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 Reliable nothing
channel
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 handling fatal flaw with rdt 2.0:
(4)need seq #. for each packet
3-47
rdt2.1: sender, handles garbled
ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)
3-48
rdt2.1: receiver, handles garbled
ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
3-49
Rdt2.1 discussion
Rdt2.1 mechanisms
1. Error detection (checksum)
2. Feedback (ACK and NAK)
3. Retransmission
4. Seq number (fresh or duplicate packets)
Q1: How many bits are needed for seq#?
two seq. #’s (0,1) will suffice. Why?
Under various scenarios to send 3 packets:
(1) all ACK, no error, (2) ACK 0 (1st time)
corrupted, (3) NAK 0 (1st time) corrupted, (4)
ACK 0 and ACK 1, both corrupted for the first
time
Q2: Do we still need NAK? If not, how? 3-50
Rdt2.1 discussion:
how many bits for a seq
number?
sender: receiver:
seq # added to pkt must check if received
two seq. #’s (0,1) will packet is duplicate
suffice. Why? state indicates whether
0 or 1 is expected pkt
seq #
must check if received
ACK/NAK corrupted Note: receiver can not
twice as many states know if its last
state must
ACK/NAK is received
“remember” whether
“expected” pkt should
OK at sender or not
have seq # of 0 or 1
3-51
rdt2.1: 1-bit seq # is enough!
Sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack send ack ack send ack
rcv ack rcv ack
send pkt1 pkt1 send pkt1
rcv pkt1 pkt1
ack send ack rcv garbled pkt1
rcv ack1 drop pkt1
send pkt0 pkt0 NACK
rcv pkt0 rcv NACK send NACK
(new pkt!)
ack send ack resend pkt1 pkt1
rcv pkt1
ack send ack
rcv ack
send pkt0 pkt0
(a) no error (new pkt!) rcv pkt0
ack send ack
3-57
rdt3.0
sender rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer L
L Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) L
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(1, data, checksum)
( corrupt(rcvpkt) || udt_send(sndpkt)
isACK(rcvpkt,0) ) start_timer
L
3-58
rdt3.0 in action
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack0 send ack0 ack0 send ack0
rcv ack0 rcv ack0
send pkt1 pkt1 send pkt1 pkt1
rcv pkt1 X
ack1 send ack1 loss
rcv ack1
send pkt0 pkt0
rcv pkt0 timeout
ack0 send ack0 resend pkt1 pkt1
rcv pkt1
ack1 send ack1
rcv ack1
send pkt0 pkt0
(a) no loss rcv pkt0
ack0 send ack0
3-60
Summary: reliable data transfer
Version Channel Mechanism
rdt1.0 Reliable channel nothing
rdt2.0 bit errors (1)error detection via checksum
(no loss) (2)receiver feedback (ACK/NAK)
(3)retransmission upon NAK
rdt2.1 Same as 2.0 (4)seq# (1 bit) for each pkt
rdt2.2 Same as 2.0 A variant to rdt2.1 (no NAK)
Unexpected ACK = NAK
ACK0 = ACK for pkt0, NAK for pkt1
U L/R .008
sender = = = 0.00027
RTT + L / R 30.008
3-63
Pipelined protocols
pipelining: sender allows multiple,
“in-flight”, yet-to-be-acknowledged
pkts
range of sequence numbers must be
increased
buffering at sender and/or receiver
U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008
3-65
Pipelined protocols:
overview
Go-back-N: Selective Repeat:
sender can have up sender can have
to N unacked up to N unack’ed
packets in pipeline packets in pipeline
receiver only sends rcvr sends
cumulative ack individual ack for
doesn’t ack packet if each packet
there’s a gap
sender has timer
for oldest unacked sender maintains
packet timer for each
when timer expires, unacked packet
retransmit all when timer expires,
retransmit only that
3-66
Go-Back-N: sender
k-bit seq # in pkt header
“window” of up to N, consecutive unack’ed pkts allowed
3-71
Selective Repeat
receiver individually acknowledges all
correctly received pkts
buffers pkts, as needed, for eventual in-order
delivery to upper layer
sender only resends pkts for which ACK not
received
sender timer for each unACKed pkt
sender window
N consecutive seq #’s
limits seq #s of sent, unACKed pkts
3-72
Selective repeat: sender, receiver windows
3-73
Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
if next available seq # in send ACK(n)
window, send pkt out-of-order: buffer
timeout(n): in-order: deliver (also
resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
mark pkt n as received
if n smallest unACKed pkt,
pkt n in [rcvbase-N,rcvbase-1]
advance window base to
ACK(n)
next unACKed seq # otherwise:
ignore
3-74
Selective repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2
3-75
After-class Practice: GBN vs SR
How many unique seq# may appear in GBN
and SR, respectively?
N=2
GBN: sender [4,5], what is the expected
number at the receiver? [4, 5, 6]
• No error
GBN: give the expected number
• ACK 4 is lost x, the sender window will be [x-
• ACK 4 and ACK 5 are lost 2, x-1], [x-1, x], [x, x+1]
Given the expected number 6, how to infer the
sender window?
How about SR (expected window)? [4,5], [5,6], [6,7]
What if we have N+1 sequence numbers for
SR? 3-76
sender window receiver window
Selective repeat: (after receipt) (after receipt)
3-81
TCP seq. numbers, ACKs
sequence number: outgoing segment from sender
source port # dest port #
byte stream “number” sequence number
window size
Acknowledgement #.: N
Q: how receiver
(“in-
flight”)
incoming segment to sender
handles out-of-order source port # dest port #
sequence number
- up to implementation 3-82
TCP seq. numbers, ACKs
Host A Host B
User
types
‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80
350
300
250
RTT (milliseconds)
200
sampleRTT
150
EstimatedRTT
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time 3-85
SampleRTT Estimated RTT
TCP round trip time,
timeout
timeout interval: EstimatedRTT plus
“safety margin”
large variation in EstimatedRTT -> larger safety
margin
DevRTT =
estimate
(1-)*DevRTT +
SampleRTT deviation from
EstimatedRTT:*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
3-86
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-87
TCP reliable data transfer
TCP creates rdt
service on top of IP’s
unreliable service
pipelined segments
let’s initially consider
cumulative acks simplified TCP
single retransmission sender:
timer ignore duplicate acks
retransmissions ignore flow control,
triggered by: congestion control
timeout events
duplicate acks
3-88
TCP sender events:
data rcvd from app: timeout:
create segment retransmit segment
with seq # that caused
seq # is byte- timeout
stream number of restart timer
first data byte in ack rcvd:
segment if ack acknowledges
start timer if not
previously unacked
already running segments
think of timer as for update what is
oldest unacked known to be ACKed
segment start timer if there
expiration interval: are still unacked 3-89
TCP: retransmission scenarios
Host A Host B Host A Host B
SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
timeout
timeout
Seq=100, 20 bytes of data
ACK=100
X
ACK=100
ACK=120
SendBase=120
ACK=100
X
ACK=120
cumulative ACK
3-92
TCP ACK generation [RFC 1122, RFC
2581]
3-93
TCP fast retransmit
time-out period TCP fast retransmit
often relatively long: if sender receives 3
long delay before dup ACKs for same
resending lost packet data
detect lost segments (“triple duplicate
ACKs”), resend
via duplicate ACKs. unacked segment
sender often sends with smallest seq #
many segments back- likely that unacked
to-back segment lost, so
if segment is lost, don’t wait for
there will likely be timeout
many duplicate ACKs. Why wait for 3 dup
ACKs? Why not 3-94
TCP fast retransmit
Host A Host B
ACK=100
timeout
ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data
IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by from sender
3-97
TCP flow control
receiver “advertises”
free buffer space by to application process
including rwnd value in
TCP header of receiver-
buffered data
to-sender segments RcvBuffer
application application
3-101
Agreeing to establish a connection
2-way handshake failure scenarios:
choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn( req_conn(
x) x)
ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1
data(x+1) )
connection connection
client x completes server x completes server
client
terminat forgets x terminat forgets x
es req_conn(x)
es
ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1
(no client!) )
3-102
TCP 3-way handshake
client state server state
LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB
3-103
How to set SYNC, ACK bit?
32 bits
3-104
TCP: closing a connection
client, server each close their side of
connection
send TCP segment with FIN bit = 1
respond to received FIN with ACK
on receiving FIN, ACK can be combined with
own FIN
simultaneous FIN exchanges can be
handled
3-106
TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close
LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime
CLOSED
3-107
Chapter 3: Outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and segment structure
demultiplexing reliable data transfer
3.3 connectionless flow control
transport: UDP connection
management
3.4 principles of
reliable data 3.6 Principles of
transfer congestion control
3.7 TCP congestion control
3-108
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queuing in router buffers)
a top-10 problem!
3-109
Approaches towards congestion
control
Two broad approaches towards congestion control:
End-end congestion Network-assisted
control: congestion control:
no explicit feedback routers provide
from network feedback to end
congestion inferred systems
from end-system single bit indicating
observed loss, delay congestion (SNA,
approach taken by TCP DECbit, TCP/IP ECN,
ATM)
explicit rate sender
should send at
TCP Congestion Control
Idea
Assumes best-effort network
Each source determines network capacity for
itself
Implicit feedback via ACKs or timeout events
ACKs pace transmission (self-clocking)
Challenge
Determining initial available capacity
Adjusting to changes in capacity in a timely
manner
Recall in selective repeat protocol
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2
loss
window
congestion window size
24 K bytes
Saw tooth
16 K bytes
behavior: probing
for bandwidth
8 K bytes
time
time
Why AIMD? TCP Fairness
Two competing sessions:
Additive increase gives slope of 1, as throughout
increases
multiplicative decrease decreases throughput
proportionally
R equal bandwidth share
Connection 2 throughput
Connection 1 throughput R
TCP Congestion Control (RFC 5681)
RTT
cwnd<ssthresh two segm
en ts
Goal: double cwnd
every RTT by setting
Action: cwnd += 1 MSS four segm
ents
Implementation:
variable ssthresh
on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event
Slow Start cwnd <= ssthresh; cwnd doubles per RTT cwnd+=1MSS per ACK
cwnd = 1 MSS
why resetting?
heavy loss detected
TCP Congestion Window
Trace
Pkt 5
)
ack5 (4 dup
th
5 6 7 8 9
5
Fast recovery w/ additional dup ACK (upon 4 th dup)
10
ssh = 2, cwnd = 5 +1 =6 Pkt 10
send pkt 10
ack10 10
5 6 7 8 9 10 ack11
133
CC algorithm SR after algo runs
Pkt 5
Fast recovery w/ additional dup ACK (upon 4 th dup) 5
ssh = 2, cwnd = 5 +1 =6; send 10
pkt 10 5 6 7 8 9 10
Pkt 10
Fast recovery w/ a new ACK (upon ack10)
ssh =2 cwnd = ssh = 2 CK)
10
Fast retx/fast recovery is over Ack10 (new A
1011 11 Pkt 11
11
Slow start also upon ack10 12 pkt 12
12
ssh =2 cwnd =2 + 1 = 3
Send new packet 12
101112 ack 11
Pkt 13
Congestion avoidance upon ack11 13 13
ssh =2
Ack12
111213
Pkt 14
Congestion avoidance upon ack 12 14 14
ssh =2
Ack13
121314
Congestion avoidance upon ack 13 15 Pkt 15
16 Pkt 16 15
ssh =2 cwnd = 3 + 3/3=4 16
Send packets 15, 16 Ack14
16
13141516 Ack15 Ack
Ack17
134
Practice:
TCP Congestion Window Trace
Putting Things Together in TCP
use selective repeat to do reliable data
transfer for a window of packets win at any
time
W/2
Trans
TCP Futures: TCP over “long, fat pipes”
example: 1500 byte segments, 100ms RTT, want
10 Gbps throughput
requires W = 83,333 in-flight segments
throughput in terms of segment loss probability, L
[Mathis 1997]:
1.22 . MSS
TCP throughput =
RTT L
Trans