0% found this document useful (0 votes)
7 views114 pages

Transport Layer PDF

Chapter 3 focuses on the transport layer of the Internet, detailing its services and protocols, particularly UDP and TCP. It covers key concepts such as multiplexing, demultiplexing, reliable data transfer, flow control, and congestion control. The chapter outlines the differences between connectionless and connection-oriented transport, emphasizing the importance of reliable data transfer mechanisms.

Uploaded by

SACHIN KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views114 pages

Transport Layer PDF

Chapter 3 focuses on the transport layer of the Internet, detailing its services and protocols, particularly UDP and TCP. It covers key concepts such as multiplexing, demultiplexing, reliable data transfer, flow control, and congestion control. The chapter outlines the differences between connectionless and connection-oriented transport, emphasizing the importance of reliable data transfer mechanisms.

Uploaded by

SACHIN KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Chapter 3: Transport Layer

Our goals:
❒ understand ❒ learn about transport
principles behind layer protocols in the
transport layer Internet:
services: ❍ UDP: connectionless
❍ Multiplexing and transport
demultiplexing ❍ TCP: connection-oriented
❍ reliable data transport
transfer ❍ TCP congestion control
❍ flow control
❍ congestion control 1. Multiplexing is to support multiple flows
2. Network can damage pkt, lose pkt,
duplicate pkt
3. One of my favorite layer!!

3-1
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-2
Transport services and protocols
❒ providelogical application
transport
communication between app network
data link
processes running on physical
network
data link
network

lo
different hosts physical

g
data link

ic
physical

al
❒ transport protocols run in network

en
data link

d-
end systems physical network

en
data link

d
❍ send side: breaks app physical

tr
an
messages into segments, network

sp
data link

or
passes to network layer physical

t
❍ rcv side: reassembles application
transport
segments into messages, network
data link
passes to app layer physical

❒ more than one transport


protocol available to apps
❍ Internet: TCP and UDP
3-3
Transport vs. network layer
Household analogy:
❒ network layer: logical 12 kids sending letters
communication to 12 kids
between hosts ❒ processes = kids
❒ transport layer: ❒ app messages =
logical communication letters in envelopes
between processes
❍ relies on, enhances,
❒ hosts = houses
network layer services ❒ transport protocol =
Ann and Bill
Another analogy:
1. Post office -> network layer
❒ network-layer
2. My wife -> transport layer protocol = postal
service
3-4
Internet transport-layer protocols

❒ reliable, in-order application


transport
delivery (TCP) network
data link network
physical data link
❍ congestion control network

lo
physical

g
data link
(distributed control)

ic
physical

al
network

en
❍ flow control data link

d-
physical network

en
connection setup data link

d
❍ physical

tr
an
❒ unreliable, unordered network

sp
data link

or
delivery: UDP physical

t
❍ no-frills extension of application
transport
“best-effort” IP network
data link
physical
❒ services not available:
❍ delay guarantees
❍ bandwidth guarantees
Research issues
3-5
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-6
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process

application P3 P1
P1 application P2 P4 application

transport transport transport

network network network

link link link

physical physical physical

host 2 host 3
host 1 FTP telnet
3-7
How demultiplexing works
❒ host receives IP datagrams
❍ each datagram has 32 bits
source IP address,
destination IP address source port # dest port #

❍ each datagram carries 1


transport-layer segment other header fields
❍ each segment has source,
destination port number
(recall: well-known port application
numbers for specific data
applications) (message)
❒ host uses IP addresses &
port numbers to direct
segment to appropriate TCP/UDP segment format
socket

3-8
Connectionless demultiplexing
❒ Create sockets with port ❒ When host receives UDP
numbers: segment:
DatagramSocket mySocket1 = new ❍ checks destination port
DatagramSocket(99111); number in segment
DatagramSocket mySocket2 = new ❍ directs UDP segment to
DatagramSocket(99222);
socket with that port
❒ UDP socket identified by number
two-tuple: ❒ IP datagrams with
(dest IP address, dest port number) different source IP
addresses and/or source
port numbers directed to
same socket (this is how a
system can serve multiple
requests!!)
3-9
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
Based on destination
P3 IP and port # P1
P1
P3

SP: 6428 SP: 6428


DP: 9157 DP: 5775

SP: 9157 SP: 5775


client DP: 6428 DP: 6428 Client
server
IP: A IP:B
IP: C

SP provides “return address”


Source IP and port # can be spoofed !!!!
3-10
Connection-oriented demux
❒ TCP socket identified ❒ Server host may
by 4-tuple: support many
❍ source IP address simultaneous TCP
❍ source port number sockets:
❍ dest IP address ❍ each socket identified by
❍ dest port number its own 4-tuple
❒ recv host uses all four ❒ Web servers have
values to direct different sockets for
segment to each connecting client
appropriate socket ❍ non-persistent HTTP will
have different socket for
each request

3-11
Connection-oriented demux (cont)

(S-IP,SP#, D-IP, DP#)

P3 P3 P4 P1
P1

SP: 80 SP: 80
DP: 9157 DP: 5775

SP: 9157 SP: 5775


client DP: 80 DP: 80 Client
server
IP: A IP:B
IP: C

3-12
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-13
UDP: User Datagram Protocol [RFC 768]

❒ “no frills,” “bare bones”


Internet transport Why is there a UDP?
protocol
❒ no connection
❒ “best effort” service, establishment (which can
UDP segments may be: add delay)
❍ lost ❒ simple: no connection
❍ delivered out of order state at sender, receiver
to app ❒ small segment header
❒ connectionless: ❒ no congestion control:
❍ no handshaking UDP can blast away as
between UDP sender, fast as desired
receiver
❍ each UDP segment
handled independently
of others
3-14
UDP: more
❒ often used for streaming
multimedia apps 32 bits
❍ loss tolerant source port # dest port #
Length, in
❍ rate sensitive bytes of UDP length checksum
❒ other UDP uses segment,
❍ DNS including
When the network is
❍ SNMP stressed, you PRAY! header

❒ reliable transfer over


UDP: add reliability at Application
application layer data
❍ application-specific
(message)
error recovery! (e.g,
FTP based on UDP but
with recovery) UDP segment format

3-15
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in
transmitted segment
Sender: Receiver:
❒ compute checksum of
❒ treat segment contents
received segment
as sequence of 16-bit
integers ❒ check if computed checksum
equals checksum field value:
❒ checksum: addition (1’s
complement sum) of ❍ NO - error detected

segment contents ❍ YES - no error

❒ sender puts checksum detected. But maybe


value into UDP checksum errors nonetheless?
field More later ….
e.g: 1+2+3 = 6. So is 0+3+3=6

3-16
Internet Checksum Example
❒ Note
❍ When adding numbers, a carryout from the
most significant bit needs to be added to the
result
❒ Example: add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
3-17
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-18
Principles of Reliable data transfer
❒ important in app., transport, link layers
❒ top-10 list of important networking topics!

abstraction
This picture sets the scenario

❒ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)!!!!!!!!
3-19
Reliable data transfer: getting started
deliver_data(): called by
rdt_send(): called from above,
rdt to deliver data to
(e.g., by app.). Passed data to
upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt,


rdt_rcv(): called when packet
to transfer packet over
arrives on rcv-side of channel
unreliable channel to receiver

** Let us now look at the gut of these modules. Any question?


3-20
(DON’T FALL ASLEEP!!!)

Reliable data transfer: getting started


We’ll:
❒ incrementally develop sender, receiver sides
of reliable data transfer protocol (rdt)
❒ consider only unidirectional data transfer
❍ but control info will flow on both directions!
❒ use finite state machines (FSM) to specify
sender, receiver
event causing state transition
state: when in this
“state” next state actions taken on state transition
uniquely determined
by next event state event state
1 2
actions

Event: timer, receives message, …etc.


Action: executes a program, send message, …etc.
3-21
Rdt1.0: reliable transfer over a reliable channel

❒ underlying channel perfectly reliable


❍ no bit errors
In reality, this is an unrealistic assumption, but..
❍ no loss of packets
❒ separate FSMs for sender, receiver:
❍ sender sends data into underlying channel
❍ receiver reads data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

3-22
Rdt2.0: channel with bit errors
❒ underlying channel may flip bits in packet
❍ recall: UDP checksum to detect bit errors
❒ the question: how to recover from errors:
❍ acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK
❍ negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors
❍ sender retransmits pkt on receipt of NAK
Ack: I love u, I love u 2.
❍ human scenarios using ACKs, NAKs? Nak: I love u, I don’t love u
❒ new mechanisms in rdt2.0 (beyond rdt1.0):
❍ error detection
❍ receiver feedback: control msgs (ACK,NAK) rcvr-
>sender

3-23
rdt2.0: FSM specification
rdt_send(data)
snkpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
Buffer is needed to
store data from rdt_rcv(rcvpkt) && isACK(rcvpkt)
application layer or Λ Wait for
to block call. call from
below
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-24
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
Λ call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

3-25
rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
Λ call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
GOT IT ? deliver_data(data)
udt_send(ACK)

3-26
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted? ❒ sender adds sequence
❒ sender doesn’t know what number to each pkt
happened at receiver! ❒ sender retransmits
❒ can’t just retransmit: current pkt if ACK/NAK
possible duplicate garbled
❒ receiver discards (doesn’t
What to do? deliver up) duplicate pkt
❒ sender ACKs/NAKs
stop and wait protocol
receiver’s ACK/NAK?
What if sender ACK/NAK Sender sends one packet,
lost? then waits for receiver
response
❒ retransmit, but this might
cause retransmission of
correctly received pkt!
3-27
rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
Λ
Λ
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

THE FSM GETS MESSY!!! 3-28


rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

3-29
rdt2.1: discussion
Sender: Receiver:
❒ seq # added to pkt ❒ must check if
❒ two seq. #’s (0,1) received packet is
will suffice. Why? duplicate
❍ state indicates
❒ must check if
whether 0 or 1 is
received ACK/NAK expected pkt seq #
corrupted
❒ note: receiver can
❒ twice as many states not know if its last
❍ state must “remember” ACK/NAK received
whether “current” pkt
OK at sender
has 0 or 1 seq. #

3-30
rdt2.2: a NAK-free protocol

❒ same functionality as rdt2.1, using ACKs only


❒ instead of NAK, receiver sends ACK for last pkt
received OK
❍ receiver must explicitly include seq # of pkt being ACKed
❒ duplicate ACK at sender results in same action as
NAK: retransmit current pkt
❒ This is important because TCP uses this approach
(NO NAC).

3-31
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || Λ
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) 3-32
rdt3.0: channels with errors and loss
New assumption: Approach: sender waits
underlying channel can “reasonable” amount of
also lose packets (data time for ACK
or ACKs) ❒ retransmits if no ACK received
❍ checksum, seq. #, in this time
ACKs, retransmissions ❒ if pkt (or ACK) just delayed (not
will be of help, but not lost):
enough ❍ retransmission will be
Q: how to deal with loss? duplicate, but use of seq.
❍ sender waits until #’s already handles this
certain data or ACK ❍ receiver must specify seq #
lost, then retransmits of pkt being ACKed
❍ yuck: drawbacks? ❒ requires countdown timer

What is the “right value” for timer? It depends on the flow and network condition!
3-33
rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer Λ
Λ Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) Λ
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
Λ

3-34
rdt3.0 in action

Timer: tick,tick,…

3-35
rdt3.0 in action

Is it
necessary
to send
Ack1
again?

3-36
Performance of rdt3.0

❒ rdt3.0 works, but performance stinks


❒ example: 1 Gbps link, 15 ms e-e prop. delay, 1KB
packet:
T = L (packet length in bits) 8kb/pkt
transmit = = 8 microsec
R (transmission rate, bps) 10**9 b/sec

L/R .008
U = = = 0.00027
sender 30.008
RTT + L / R microsec

❍ U sender: utilization – fraction of time sender busy sending


❍ 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
❍ network protocol limits use of physical resources!

3-37
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send
ACK

ACK arrives, send next


packet, t = RTT + L / R

L/R .008
U = = = 0.00027
sender 30.008
RTT + L / R microsec

3-38
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”,
yet-to-be-acknowledged pkts
❍ range of sequence numbers must be increased
❍ buffering at sender and/or receiver

❒ Two generic forms of pipelined protocols: go-Back-


N, selective repeat
3-39
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R

Increase utilization
by a factor of 3!
3*L/R .024
U = = = 0.0008
sender 30.008
RTT + L / R microsecon

3-40
DON’T FALL ASLEEP !!!!!

Go-Back-N (sliding window protocol)


Sender: (For now, treat seq # as unlimited)
❒ k-bit seq # in pkt header
❒ “window” of up to N, consecutive unack’ed pkts allowed

❒ ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”


❍ Sender may receive duplicate ACKs (see receiver)
❒ timer for each in-flight pkt
❒ timeout(n): retransmit pkt n and all higher seq # pkts in window

Q: what happen when a receiver is totally disconnected? MAX RETRY


3-41
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
Λ else Buffer data or block higher app.
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1 No pkt in pipe
If (base == nextseqnum)
stop_timer
else
start_timer Reset timer
3-42
GBN: receiver extended FSM If in order pkt is
received, deliver
default
to app and ack!
udt_send(sndpkt) rdt_rcv(rcvpkt) Else, just drop it!
&& notcurrupt(rcvpkt)
Λ && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for correctly-received


eed r .
pkt with highest in-order seq # eve
n n
ceiv
e
d on’t the re he
❍ may generate duplicate ACKs We r at on t er!
e y
❍ need only remember expectedseqnum tim ly rel e send
p
Sim by th
x
❒ out-of-order pkt: r et

❍ discard (don’t buffer) -> no receiver buffering!


❍ Re-ACK pkt with highest in-order seq #
3-43
GBN
in action
Window size=N=4

What determine the


size of window?
1. RTT
2. Buffer at the
receiver(flow
control)
3. Network congestion

Q: GBN has poor performance. How?


Sender sends pkt 1,2,3,4,5,6,7,8,9..
pkt 1 got lost, receiver got pkt 2,3,4,5,… but will discard them! 3-44
Selective Repeat (improvement of
the GBN Protocol)
❒ receiver individually acknowledges all correctly
received pkts
❍ buffers pkts, as needed, for eventual in-order
delivery to upper layer
❍ E.g., sender: pkt 1,2,3,4,….,10; receiver got
2,4,6,8,10. Sender resends 1,3,5,7,9.
❒ sender only resends pkts for which ACK not
received
❍ sender timer for EACH unACKed pkt
❒ sender window
❍ N consecutive seq #’s
❍ again limits seq #s of sent, unACKed pkts

3-45
Selective repeat: sender, receiver windows

Q: why we have this?


Ack is lost or ack
is on its way

3-46
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
❒ if next available seq # in ❒ send ACK(n)
window, send pkt ❒ out-of-order: buffer
timeout(n) for pkt n: ❒ in-order: deliver (also
❒ resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]:
next not-yet-received pkt
❒ mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
❒ if n smallest unACKed pkt,
❒ ACK(n) Q: why we need this?
advance window base to next
unACKed seq # otherwise: The ack got lost.
Sender may timeout,
❒ ignore resend pkt, we need
(slide the window)
to ack

3-47
Selective repeat in action (N=4)

Under GBN, this


pkt will be
dropped.

3-48
Selective repeat:
dilemma
In real life, we use k-bits to
implement seq #. Practical issue:
Example:
❒ seq #’s: 0, 1, 2, 3
❒ window size (N)=3

❒ receiver sees no
difference in two
scenarios!
❒ incorrectly passes
duplicate data as new in
(a)

Q: what relationship
between seq # size and
window size?
N <= 2^k/2 3-49
Why bother study reliable data
transfer?
❒ We know it is provided by TCP, so why bother
to study?
❒ Sometimes, we may need to implement “some
form” of reliable transfer without the heavy
duty TCP.
❒ A good example is multimedia streaming. Even
though the application is loss tolerant, but if
too many packets got lost, it affects the visual
quality. So we may want to implement some for
of reliable transfer.
❒ At the very least, appreciate the “good
services” provided by some Internet gurus.

3-50
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-51
TCP: Overview RFCs: 793, 1122, 1323,
2018, 2581 (The 800 lbs gorilla in the transport stack! PAY ATTENTION!!)

❒ point-to-point: ❒ full duplex data:


❍ one sender, one receiver ❒ bi-directional data flow in
(not multicast) same connection
❒ reliable, in-order byte ❍ MSS: maximum segment
steam: size
❍ no “message boundaries” ❒ connection-oriented:
❍ In App layer, we need ❍ handshaking (exchange of
delimiters. control msgs) init’s
❒ pipelined: sender, receiver state
❍ TCP congestion and flow (e.g., buffer size) before
control set window size data exchange

❒ send & receive buffers ❒ flow controlled:


❍ sender will not overwhelm
application application receiver
writes data reads data
socket socket
door door
TCP TCP
send buffer receive buffer
segment

3-52
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by “bytes”
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pnter
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data Due to this
(variable length) field we have a
checksum
variable length
(as in UDP) header

3-53
Negotiate
TCP seq. #’s and ACKs during 3-way
handshake

Seq. #’s:
Host A Host B
❍ byte stream
“number” of first User Seq=4
2 , ACK
types =79, d
byte in segment’s ‘C’
a ta = ‘C

data host ACKs
receipt of
ACKs: ’
‘C ‘C’, echoes
3 , data =
❍ seq # of next 9, A CK=4 back ‘C’
e q= 7
S
byte expected
from other side host ACKs
❍ cumulative ACK receipt Seq=4
of echoed 3, AC
K =80
Q: how receiver handles ‘C’
out-of-order
segments
time
❍ A: TCP spec
simple telnet scenario
doesn’t say, - up
to implementor 3-54
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? ❒ SampleRTT: measured time from
❒ longer than RTT segment transmission until ACK
❍ but RTT varies
receipt
❍ ignore retransmissions
❒ too short: premature
timeout ❒ SampleRTT will vary, want

❍ unnecessary
estimated RTT “smoother”
retransmissions ❍ average several recent

❒ too long: slow reaction to


measurements, not just current
segment loss, poor SampleRTT
performance.
tx
tx
retx Too short
Estimated retx
RTT Estimated
RTT
ack
ack
Too long

3-55
TCP Round Trip Time and Timeout
EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT

❒ Exponential weighted moving average


❒ influence of past sample decreases exponentially
fast
❒ typical value: α = 0.125

ERTT(0) = 0
ERTT(1) = (1- α)ERTT(0) + αSRTT(1)= αSRTT(1)

ERTT(2) =(1- α) αSRTT(1) + αSRTT(2)

ERTT(3) = (1- α)(1- α) αSRTT(1) + (1- α) αSRTT(2) + αSRTT(3)

3-56
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

300

250
RTT (milliseconds)

200

150

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)

SampleRTT Estimated RTT

3-57
TCP Round Trip Time and Timeout
Setting the timeout (by Jacobson/Karel)
❒ EstimtedRTT plus “safety margin”
❍ large variation in EstimatedRTT -> larger safety margin
❒ first estimate of how much SampleRTT deviates from
EstimatedRTT:

β)*DevRTT +
DevRTT = (1-β
β*|SampleRTT-EstimatedRTT|

(typically, β = 0.25)

Then set timeout interval:

TimeoutInterval = EstimatedRTT + 4*DevRTT

3-58
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-59
TCP reliable data transfer
❒ TCP creates rdt ❒ Retransmissions are
service on top of IP’s triggered by:
unreliable service ❍ timeout events
❍ duplicate ack ( for
❒ Pipelined segments performance reason)
(for performance)
❒ Initially consider
❒ Cumulative acks simplified TCP
❒ TCP uses single sender:
retransmission timer ❍ ignore duplicate acks
❍ ignore flow control,
congestion control

3-60
TCP sender events:
data rcvd from app: timeout:
❒ Create segment with ❒ retransmit segment
seq # that caused timeout
❒ seq # is byte-stream ❒ restart timer
number of first data Ack rcvd:
byte in segment ❒ If acknowledges
❒ start timer if not previously unacked
already running (think segments
of timer as for ❍ update what is known
oldest unacked to be acked
segment) ❍ start timer if there
❒ expiration interval: are outstanding
segments
TimeOutInterval
3-61
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum

loop (forever) {
TCP
switch(event)
sender
event: data received from application above
create TCP segment with sequence number NextSeqNum (simplified)
if (timer currently not running)
start timer
pass segment to IP Comment:
NextSeqNum = NextSeqNum + length(data) • SendBase-1: last
cumulatively
event: timer timeout
ack’ed byte
retransmit not-yet-acknowledged segment with
smallest sequence number Example:
start timer • SendBase-1 = 71;
y= 73, so the rcvr
event: ACK received, with ACK field value of y wants 73+ ;
if (y > SendBase) { y > SendBase, so
SendBase = y that new data is
if (there are currently not-yet-acknowledged segments) acked
start timer
}

} /* end of loop forever */ 3-62


TCP: retransmission scenarios
Host A Host B Host A Host B

Seq=9 Seq=9
2 , 8 byt 2 , 8 byt
es da es da
ta ta

Seq=92 timeout
Seq=
1 00, 2
0 byt
es da
timeout

ta
C K=100
A 0
10
X C K
A AC
=
K = 120
loss
Seq=9 Seq=9
2 2 , 8 byt
, 8 byt
es da Sendbase es da
ta
ta = 100

Seq=92 timeout
SendBase
= 120 0
=1 2
=100 A CK
A C K

SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
3-63
TCP retransmission scenarios (more)
Host A Host B

Seq=9
2 , 8 byt
es da
ta

=100
timeout
Seq=1 A C K
0 0, 20
bytes
data
X
loss

SendBase K =120 Room for improvement


A C
= 120

time
Cumulative ACK scenario

3-64
TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver TCP Receiver action


Arrival of in-order segment with Delayed ACK. Wait up to 500ms
!
expected seq #. All data up to Y HALF!
for next segment. If no next segment,
B
TR AFFIC
expected seq # already ACKed send ACK
D B ACK
E
ED UCE FE
R
Arrival of in-order segment with Immediately send single cumulative
expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending Ack the “largest in-order byte” seq #

Arrival of out-of-order segment Immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

Arrival of segment that Immediate send ACK, provided that


partially or completely fills gap segment startsat lower end of gap

3-65
Fast Retransmit
❒ If sender receives 3
ACKs for the same
❒ Time-out period data, it supposes that
often relatively long: segment after ACKed
❍ long delay before data was lost:
resending lost packet
❍ fast retransmit: resend
❒ Detect lost segments segment before timer
via duplicate ACKs. expires
❍ Sender often sends
many segments back-
to-back
❍ If segment is lost,
there will likely be
many duplicate ACKs. timeout

3-66
Fast retransmit algorithm:

event: ACK received, with ACK field value of y


if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}

Q: why resend pkt


a duplicate ACK for fast retransmit with seq # y?
already ACKed segment
A: That is what the
receiver expect!
3-67
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-68
TCP Flow Control flow control
sender won’t overflow
receiver’s buffer by
❒ receive side of TCP transmitting too much,
connection has a too fast
receive buffer:

❒ speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
❒ app process may be
slow at reading from
buffer
3-69
TCP Flow control: how it works
❒ Rcvr advertises spare
room by including
value of RcvWindow in
segments
❒ Sender limits unACKed
(Suppose TCP receiver data to RcvWindow
discards out-of-order ❍ guarantees receive
segments) buffer doesn’t overflow
❒ spare room in buffer
= RcvWindow This goes to show that the design
= RcvBuffer-[LastByteRcvd - process of header is important!!
LastByteRead]

3-70
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-71
G !!!!
TCP Connection Management FL O ODIN
N
: SY
T
Recall: TCP sender, Three way
OR
T handshake:
AN
P
receiver establish IM
“connection” before Step 1: client host sends TCP SYN
exchanging data segments segment to server
❒ initialize TCP variables: ❍ specifies initial seq #
❍ seq. #s ❍ no data
❍ buffers, flow control Step 2: server host receives SYN,
info (e.g. RcvWindow) replies with SYN-ACK segment
❒ client: connection initiator
❍ server allocates buffers
Socket clientSocket = new
Socket("hostname","port ❍ specifies server initial seq. #

number"); Step 3: client receives SYN-ACK,


❒ server: contacted by replies with ACK segment, which
client may contain data
Socket connectionSocket =
welcomeSocket.accept();

3-72
TCP three-way handshake

Connection SYN=
1, seq
request =clien
t_isn
Connection
r _isn, granted
se rv e
s eq= n+1
N =1, nt_is
SY =clie
Ack
Sy n
=
ACK Ack 0, se
=se q
rve =clien
r_is t_
n+1 isn+1

3-73
TCP Connection Management (cont.)

Closing a connection: client server

close
client closes socket: FIN
clientSocket.close();

Step 1: client end system AC K


close
sends TCP FIN control FIN
segment to server Q: why don’t we
combine ACK and

timed wait
A CK
Step 2: server receives FIN?
FIN, replies with ACK.
Closes connection, sends Sender may have
some data in the
FIN. pipeline!
closed

3-74
TCP Connection Management (cont.)

Step 3: client receives FIN, client server


replies with ACK. closing
FIN
❍ Enters “timed wait” -
will respond with ACK
to received FINs AC K
closing
FIN
Step 4: server, receives
ACK. Connection closed.

timed wait
A CK
Note: with small
closed
modification, can handle
simultaneous FINs.
closed

3-75
TCP Connection Management (cont)

TCP server
lifecycle

TCP client
lifecycle

3-76
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-77
Principles of Congestion Control
TCP provides one of the MANY WAYS to perform CC.

Congestion:
❒ informally: “too many sources sending too much
data too fast for network to handle”
❒ different from flow control!
❒ manifestations:
❍ lost packets (buffer overflow at routers)
❍ long delays (queueing in router buffers)
❒ another top-10 problem!

3-78
Causes/costs of congestion: scenario 1
From application To
Host A λout application
λin : original data
❒ two senders, two
receivers
(homogeneous) Host B unlimited shared
output link buffers
❒ one router, infinite
buffers, capacity C
❒ no retransmission

❒ large delays when


congested
❒ maximum
achievable
throughput

3-79
Causes/costs of congestion: scenario 2

❒ one router, finite buffers


❒ sender retransmission of lost packet Due to transport layer

Host A λin : original data λout

λ'in : original data, plus


retransmitted data

Host B finite shared output


link buffers

Pkt got dropped

3-80
Causes/costs of congestion: scenario 2
❒ always: λ= λ (goodput)
in out
❒ “perfect” retransmission only when loss: λ > λout
in
❒ retransmission of delayed (not lost) packet makes λ larger
in
λ
(than perfect case) for same out

“costs” of congestion:
❒ more work (retrans) for given “goodput”
❒ unneeded retransmissions: link carries multiple copies of pkt
3-81
Causes/costs of congestion: scenario 3
❒ four senders
Q: what happens as λ
❒ multihop paths in
and λ increase ?
❒ timeout/retransmit in
Host A λout
λin : original data
λ'in : original data, plus
retransmitted data

finite shared output


link buffers

Host B

3-82
Causes/costs of congestion: scenario 3
H λ
o
o
s
u
t
A t

H
o
s
t
B

System collapses (e.g., students)

Another “cost” of congestion:


❒ when packet dropped, any “upstream transmission
capacity used for that packet was wasted!

3-83
Approaches towards congestion control

Two broad approaches towards congestion


control:
End-end congestion Network-assisted
control: congestion control:
❒ no explicit feedback from ❒ routers provide feedback
network to end systems
❒ congestion inferred from ❍ single bit indicating
end-system observed congestion (SNA,
loss, delay DECbit, TCP/IP ECN,
❒ approach taken by TCP ATM)
❍ explicit rate sender
should send at

3-84
Case study: ATM ABR congestion control

ABR: available bit rate: RM (resource management)


❒ “elastic service” cells:
❒ if sender’s path ❒ sent by sender, interspersed
“underloaded”: with data cells
❍ sender should use ❒ bits in RM cell set by switches
available bandwidth (“network-assisted”)
❒ if sender’s path ❍ NI bit: no increase in rate
congested: (mild congestion)
❍ sender throttled to ❍ CI bit: congestion indication
minimum guaranteed ❒ RM cells returned to sender by
rate receiver, with bits intact

3-85
Case study: ATM ABR congestion control

❒ two-byte ER (explicit rate) field in RM cell


❍ congested switch may lower ER value in cell
❍ sender’ send rate thus minimum supportable rate on path
❒ EFCI (explicit forward congestion indication) bit in
cells: set to 1 in congested switch
❍ if data cell preceding RM cell has EFCI set, receiver sets
CI bit in returned RM cell
3-86
Chapter 3 outline
❒ 3.1 Transport-layer ❒ 3.5 Connection-
services oriented transport:
❒ 3.2 Multiplexing and TCP
demultiplexing ❍ segment structure
❍ reliable data transfer
❒ 3.3 Connectionless
❍ flow control
transport: UDP
❍ connection management
❒ 3.4 Principles of
reliable data transfer ❒ 3.6 Principles of
congestion control
❒ 3.7 TCP congestion
control

3-87
TCP Congestion Control
❒ end-end control (no network How does sender
assistance!!!) perceive congestion?
❒ sender limits transmission: ❒ loss event = timeout
LastByteSent-LastByteAcked or 3 duplicate acks
≤ CongWin ❒ TCP sender reduces
❒ Roughly, rate (CongWin) after
CongWin loss event
rate = Bytes/sec
RTT three mechanisms:
❍ AIMD
❒ CongWin is dynamic, function
❍ slow start
of perceived network
❍ conservative after
congestion
timeout events
Note: CC must be efficient to make use of available BW !!!! 3-88
TCP AIMD Human analogy: HK government or CUHK !!

additive increase: multiplicative decrease:


increase CongWin by cut CongWin in half
1 MSS every RTT in after loss event
the absence of loss
events: probing
congestion
window
Role of ACK:
24 Kbytes 1. An indication of loss
2. Ack is self clocking:
if delay is large, the
16 Kbytes
rate of congestion
window will be
reduced.

8 Kbytes

time

Long-lived TCP connection


3-89
TCP Slow Start
❒ When connection ❒ When connection
begins, CongWin = 1 begins, increase rate
MSS exponentially fast until
first loss event
❍ Example: MSS = 500
bytes & RTT = 200
msec
❍ initial rate = 20 kbps
❒ available bandwidth
may be >> MSS/RTT
❍ desirable to quickly ramp
up to respectable rate

3-90
TCP Slow Start (more)
❒ When connection Host A Host B
begins, increase rate
one segm
exponentially until ent

RTT
first loss event:
two segm
❍ double CongWin every ents
RTT
❍ done by incrementing
CongWin for every four segm
ents
ACK received
❒ Summary: initial rate
is slow but ramps up
exponentially fast time

3-91
Refinement Philosophy:
• 3 dup ACKs indicates
❒ After 3 dup ACKs: network capable of
❍ CongWin is cut in half delivering some
❍ window then grows linearly segments
❍ This is known as “fast- • timeout before 3 dup
recovery” phase. ACKs is “more alarming”
Implemented in new TCP-
Reno.
❒ But after timeout event:
❍ CongWin instead set to 1
MSS;
❍ window then grows
exponentially threshold,
then grows linearly

3-92
Refinement (more)
Q: When should the
exponential
increase switch to
14
linear?

congestion window size


12 threshold
A: When CongWin 10

(segments)
gets to 1/2 of its 8
value before 6
timeout. 4
2 TCP TCP
Implementation: 0 Tahoe Reno
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
❒ Variable Threshold
Transmission round
❒ At loss event, Threshold
is set to 1/2 of CongWin
just before loss event

3-93
Summary: TCP Congestion Control
❒ When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.

❒ When CongWin is above Threshold, sender is


in congestion-avoidance phase, window grows
linearly.

❒ When a triple duplicate ACK occurs, Threshold


set to CongWin/2 and CongWin set to
Threshold.

❒ When timeout occurs, Threshold set to


CongWin/2 and CongWin is set to 1 MSS.

READ THE BOOK on TCP-Vegas: instead of


react to a loss, anticipate & prepare for a loss!
3-94
TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt Slow Start CongWin = CongWin + MSS, Resulting in a doubling of
for previously (SS) If (CongWin > Threshold) CongWin every RTT
unacked set state to “Congestion
data Avoidance”
ACK receipt Congestion CongWin = CongWin+MSS * Additive increase, resulting
for previously Avoidance (MSS/CongWin) in increase of CongWin by
unacked (CA) 1 MSS every RTT
data
Loss event SS or CA Threshold = CongWin/2, Fast recovery,
detected by CongWin = Threshold, implementing multiplicative
triple Set state to “Congestion decrease. CongWin will not
duplicate Avoidance” drop below 1 MSS.
ACK
Timeout SS or CA Threshold = CongWin/2, Enter slow start
CongWin = 1 MSS,
Set state to “Slow Start”
Duplicate SS or CA Increment duplicate ACK count CongWin and Threshold not
ACK for segment being acked changed

3-95
TCP throughput
❒ What’s the average throughout ot TCP as
a function of window size and RTT?
❍ Ignore slow start
❒ Let W be the window size when loss
occurs.
❒ When window is W, throughput is
W/RTT
❒ Just after loss, window drops to W/2,
throughput to W/2RTT.
❒ Average throughout: .75 W/RTT
3-96
TCP Futures
❒ Example: 1500 byte segments, 100ms RTT,
want 10 Gbps throughput
❒ Requires window size W = 83,333 in-flight
segments
❒ Throughput in terms of loss rate:

1.22 ⋅ MSS
RTT L
❒ ➜ L = 2·10-10 Wow
❒ New versions of TCP for high-speed needed!

3-97
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should
have average rate of R/K

TCP connection 1

bottleneck
TCP
router
connection 2
capacity R

3-98
Why is TCP fair?
Two competing, homogeneous sessions (e.g., similar
propagation delay,..etc) :
❒ Additive increase gives slope of 1, as throughout increases
❒ multiplicative decrease decreases throughput proportionally
R equal bandwidth share
Connection 2 throughput

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R

3-99
Fairness (more)
Fairness and UDP Fairness and parallel TCP
❒ Multimedia apps often
connections
do not use TCP ❒ nothing prevents app
❍ do not want rate from opening parallel
throttled by congestion cnctions between 2
control hosts.
❒ Instead use UDP: ❒ Web browsers do this
❍ pump audio/video at
❒ Example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
❍ new app asks for 1 TCP,
❒ Research area: TCP gets rate R/10
friendly ❍ new app asks for 11 TCPs,
gets around R/2 !

3-100
Notation, assumptions:
Delay modeling ❒ Assume one link between client
and server of rate R
Q: How long does it take ❒ S: MSS (bits)
to receive an object ❒ O: object size (bits)
from a Web server ❒ no retransmissions (no loss, no
corruption)
after sending a request?
❒ Protocol’s overhead is negligible.
Delay is influenced by:
Window size:
❒ TCP connection establishment
❒ First assume: fixed congestion
❒ data transmission delay
window, W segments
❒ slow start
❒ Then dynamic window, modeling
❒ Sender’s congestion window slow start

3-101
Fixed congestion window (1): assume
no congestion window constratint Object request
is piggybacked.
Let W denote a ‘fixed’
congestion window size
(a positive integer).
First case:
WS/R > RTT + S/R: ACK
for first segment in
window returns before
window’s worth of data
sent

delay = 2RTT + O/R


This is the “lower bound” of latency!

3-102
Fixed (or static) congestion window (2)

Second case:
❒ WS/R < RTT + S/R:
wait for ACK after
sending window’s worth
of data sent

delay = 2RTT + O/R


+ (K-1)[S/R + RTT - WS/R]

Where K be the # of windows of data that cover the object.


If O is the “size” of the object, we have
K = O/WS (round to an integer).

3-103
TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start

Will show that the delay for one object is:


O  S S
Latency = 2 RTT + + P  RTT +  − ( 2 P − 1)
R  R R
where P is the number of times TCP idles at server:

P = min{Q, K − 1}

- where Q is the number of times the server idles


if the object were of infinite size.

- and K is the number of windows that cover the object.

3-104
TCP Delay Modeling: Slow Start (2)
Delay components: initiate TCP
connection
• 2 RTT for connection
establishment and request request
object
• O/R to transmit object first window
= S/R
• time server idles due to
slow start RTT
second window
= 2S/R

Server idles:
P = min{K-1,Q} times third window
= 4S/R

Example:
• O/S = 15 segments
fourth window
• K = 4 windows = 8S/R
• Q = 2
• P = min{K-1,Q} = 2

Server idles P=2 times object


complete
transmission
delivered
time at
time at server
client

3-105
TCP Delay Modeling (3)
S
+ RTT = time from when server starts to send segment
R
until server receives acknowledgement
initiate TCP
S connection
2 k −1 = time to transmit the kth window
R
request
Where k=1,2,….,K object
first window
= S/R
+
S k −1 S 
RTT
second window
R + RTT − 2 = idle time after the kth window
R 
= 2S/R

P third window
O
delay = + 2 RTT + ∑ idleTime p
= 4S/R

R p =1
P
O S S fourth window
= + 2 RTT + ∑ [ + RTT − 2 k −1 ] = 8S/R

R k =1 R R
O S S
= + 2 RTT + P[ RTT + ] − (2 P − 1) complete
R R R object transmission
delivered
time at
time at server
client

3-106
TCP Delay Modeling (4)
Recall K = number of windows that cover object

How do we calculate K ?

K = min{k : 20 S + 21 S + Λ + 2 k −1 S ≥ O}
= min{k : 20 + 21 + Λ + 2 k −1 ≥ O / S }
k O
= min{k : 2 − 1 ≥ }
S
O
= min{k : k ≥ log 2 ( + 1)}
S
 O 
= log 2 ( + 1)
 S 
Calculation of Q, number of idles for infinite-size object,
is similar (see HW).

3-107
Experiment A
S= 536 bytes, RTT=100 msec, O=100 kbytes (relatively large)

R O/R P Min latency Latency with


O/R+2 RTT Slow start

28 kbps 28.6 sec 1 28.8 sec 28.9 sec


100 kbps 8.0 sec 2 8.2 sec 8.4 sec
1 Mbps 800 msec 5 1.0 sec 1.5 sec
10 Mbps 80 msec 7 0.28 sec 0.98 sec

1. Slow start adds appreciable delay only when R is high. If R is low, ACK
comes back quickly and TCP quickly ramps up to its maximum rate.

3-108
Experiment B
S= 536 bytes, RTT=100 msec, O = 5 kbytes (relatively small)

R O/R P Min latency Latency with


O/R+2 RTT Slow start

28 kbps 1.43 sec 1 1.63 sec 1.73 sec


100 kbps 0.4 sec 2 0.6 sec 0.76 sec
1 Mbps 40 msec 3 0.24 sec 0.52 sec
10 Mbps 4 msec 3 0.20 sec 0.50 sec

1. Slow start adds appreciable delay when R is high and for a relatively
small object.

3-109
Experiment C
S= 536 bytes, RTT=1 sec, O = 5 kbytes (relatively small)

R O/R P Min latency Latency with


O/R+2 RTT Slow start

28 kbps 1.43 sec 3 3.4 sec 5.8 sec


100 kbps 0.4 sec 3 2.4 sec 5.2 sec
1 Mbps 40 msec 3 2.0 sec 5.0 sec
10 Mbps 4 msec 3 2.0 sec 5.0 sec

1. Slow start can significantly increase the latency when the


object size is relatively small and the RTT is relatively
large.

3-110
HTTP Modeling
❒ Assume Web page consists of:
❍ 1 base HTML page (of size O bits)
❍ M images (each of size O bits)
❒ Non-persistent HTTP:
❍ M+1 TCP connections in series
❍ Response time = (M+1)O/R + (M+1)2RTT + sum of idle times
❒ Persistent HTTP:
❍ 2 RTT to request and receive base HTML file
❍ 1 RTT to request and receive M images
❍ Response time = (M+1)O/R + 3RTT + sum of idle times
❒ Non-persistent HTTP with X parallel connections
❍ Suppose M/X integer.
❍ 1 TCP connection for base file
❍ M/X sets of parallel connections for images.
❍ Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle
times
3-111
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and X=5
20
18
16
14
non-persistent
12
10
persistent
8
6
4 parallel non-
persistent
2
0
28 100 1 10
Kbps Kbps Mbps Mbps

For low bandwidth, connection & response time dominated by


transmission time.
Persistent connections only give minor improvement over parallel
connections. 3-112
HTTP Response time (in seconds)
RTT =1 sec, O = 5 Kbytes, M=10 and X=5
70
60
50
non-persistent
40
30 persistent
20
parallel non-
10 persistent
0
28 100 1 10
Kbps Kbps Mbps Mbps
For larger RTT, response time dominated by TCP establishment
& slow start delays. Persistent connections now give important
improvement: particularly in high delay••bandwidth networks.
3-113
Chapter 3: Summary
❒ principles behind
transport layer services:
❍ multiplexing,
demultiplexing
❍ reliable data transfer
❍ flow control Next:
❍ congestion control ❒ leaving the
❒ instantiation and network “edge”
implementation in the (application,
Internet transport layers)
❍ UDP ❒ into the network
❍ TCP “core”
3-114

You might also like