Computer Networking Chap3
Computer Networking Chap3
3 Transport Layer
Goal : study principle of providing comm services to app processes
and implementation issues in the Internet protocols, TCP and UDP
Contents
z Relationship bw transport and net layers
3-1
Chap.3 Transport Layer
Introduction and Transport-Layer Services
z Relationship Between Transport and Network Layers
3-2
Overview of Transport-layer
provide logical comm bw app
processes running on diff hosts
transport protocols run in end
systems
z sending side: converts msgs from
app process into transport-layer
pkts (segments in Internet term),
passes them to net layer
{ (possibly) break app msgs into
z analogies
{ kids ~ processes
{ houses ~ hosts
z connection setup
z congestion control
3-5
Chap.3 Transport Layer
Introduction and Transport-Layer Services
Multiplexing and Demultiplexing
Connectionless Transport: UDP
Principle of Reliable Data Transfer
Connection-Oriented Transport: TCP
Principles of Congestion Control
TCP Congestion Control
3-6
Multiplexing and Demultiplexing
a process can have one or more sockets; each socket having a unique id
multiplexing at sending host : Ann’s job in household analogy
z gathering data chunks at sources from diff sockets
3-7
How Demultiplexing Works
host receives IP datagrams
z each datagram has src and dst IP addrs
3-8
Connectionless Multiplexing and Demultiplexing
creating UDP socket
DatagramSocket mySocket1 = new DatagramSocket();
{ transport layer automatically assigns a port # to the socket, in the
z TCP segs with diff src IP addrs or src IP port #s are directed
to two diff sockets (except TCP seg carrying conn-
establishment request)
server host may support many simultaneous TCP sockets
z each socket identified by its own 4-tuple
3-10
Connection-Oriented Mux/Dumux (2)
3-11
Connection-Oriented Mux/Demux : Threaded Server
Today’s high-performing Web server uses only one process, but
creating a new thread with a new conn for each new client conn
z connection sockets may be attached to the same process
3-12
Chap.3 Transport Layer
Introduction and Transport-Layer Services
Multiplexing and Demultiplexing
Connectionless Transport: UDP
z UDP Segment Structure
z UDP Checksum
3-13
User Datagram Protocol (UDP) [RFC 768]
no-frills, bare bones transport protocol : adds nothing to IP but,
z multiplexing/demultiplexing : src and dst port #s
features of UDP
z unreliable best-effort service : no guarantee on correct delivery
3-14
Popular Internet Apps and Their Protocols
3-15
Controversy on UDP
UDP is lack of congestion control and reliable data transfer
when many users starts streaming high-bit rate video, packet
overflow at routers, resulting in
z high loss rates for UDP packets
3-16
UDP Segment Structure
Source port #, dst port # : used for multiplexing/demultiplexing
Length : length of UDP seg including header, in bytes
Checksum : to detect errors (i.e., bits altered) on an end-end basis
z error source : noise in the links or while store in a router
3-17
UDP Checksum Calculation (1) : Sender
sum all of 16-bit words in segment in a row, with two words for
each calculation with overflow wrapped around
take 1’s complement of the sum; the result is the checksum value
(ex) three 16-bit words 0110011001100000
0101010101010101
1000111100001100
3-18
UDP Checksum Calculation (2) : Receiver
add all 16-bit words including checksum, and decide
z no error detected, if the result is 1111111111111111
3-19
Chap.3 Transport Layer
Introduction and Transport-Layer Services
Multiplexing and Demultiplexing
Connectionless Transport: UDP
Principle of Reliable Data Transfer
z Building a Reliable Data Transfer Protocol
z Go-Back-N (GBN)
3-20
Reliable Data Transfer : Service Model and Implementation
reliable data transfer : no corruption, no loss, and in-order delivery
z of central importance to networking : not only at transport layer,
but also at link layer and app layer
udt_send() : called by
rdt to sen pkt over
unreliable channel rdt_rcv() : called from
channel upon pkt arrival
3-21
Reliable Data Transfer: Implementation Consideration
characteristics of unreliable channel determines the complexity of
reliable data transfer protocol
We will
z incrementally develop sender and receiver sides of rdt protocol,
considering increasingly complex model of underlying channel
z consider only unidirectional data transfer for simplicity purpose
3-22
rdt1.0 : Perfectly Reliable Channel
Assumptions of underlying channel
z perfectly reliable : no bit errors, no loss of packets
3-23
rdt2.0 : Channel with Errors
New assumptions of underlying channel
z may be corrupted when transmitted, propagated, or buffered
3-24
rdt2.0 : not Corrupted
3-25
rdt2.0 : Corrupted
3-26
rdt2.0 : Fatal Flaw
Q: How to recover from errors in ACK or NAK pkts?
z minimally, need to add checksum bits to ACK/NAK pkts
z possible solutions
{ repeated requests from sender/receiver for a garbled ACK
and NAK : hard to find a clue to way out
{ add enough checksum bits for correction : not applicable for
lost pkt
{ simply resend the pkt when receiving a garbled ACK or NAK ⇒
incurs possible duplicate at receiver
receiver doesn’t know whether it is a new pkt or a rexmission
(i.e., a duplicate pkt)
handling duplicates : add a new field (seq # field) to the packet
z sender puts a seq # into this field, and receiver discards
duplicate pkt
z 1-bit seq # suffice for stop-and-stop protocol
A dict
ates s
ometh
in g to B
”
r epeat
lease
r “p
i e s ok o
rep l
A didn’t understand WhaBt
did yo
u say? b
ut corrup
ted
B has no idea whether it is part of dictation
? or request for repetition of last reply
yo u say
h at did
W
3-28
rdt2.1 : Employing Seq # - Sender
3-29
rdt2.1 : Employing Seq # - Receiver
3-30
rdt2.1 : Discussion
sender
z seq # added to pkt
receiver
z must check if received pkt is duplicate
3-31
rdt2.2 : NAK-free
accomplish the same effect as a NAK, by sending an ACK for the
last correctly received pkt
z receiver must explicitly include seq # of pkt being ACKed
sender that receives two ACKs (i.e., duplicate ACKs) knows that
receiver didn’t correctly receive the pkt following the pkt being
acked twice, thus rexmits the latter
3-32
rdt2.2 : NAK-free (Sender)
3-33
rdt2.2 : NAK-free (Receiver)
3-34
rdt3.0 : Channel with Errors and Loss
new assumptions of underlying channels :
z can lose pkts (data or ACKs)
Q : how to detect pkt loss and what to do when pkt loss occurs
z checksum, seq #, ACKs, rexmissions are of help, but not enough
approaches
z sender waits proper amount of time (at least round-trip delay +
processing time at receiver) to convince itself of pkt loss
z rexmits the pkt if ACK not received within this time
z if a pkt (or its ACK) just overly delayed, sender may rexmit the
pkt even though it has not been lost
{ but, seq # handles the possibility of duplicate pkts
implementation
z countdown timer set appropriately starts each time pkt is sent
3-35
rdt3.0 : Channel with Errors & Loss (Sender)
3-36
rdt3.0 : Channel with Errors & Loss – Operation (1)
3-37
rdt3.0 : Channel with Errors & Loss – Operation (2)
3-38
Performance of rdt3.0 (Stop-and-Wait Protocol)
drawback of GBN : when widow size and bw-delay product are large,
a single pkt error cause a large # of unnecessarily rexmissions
3-41
Go-Back-N (GBN) Protocol : Sender
3-43
Go-Back-N (GBN) Protocol : Operation
window size = 4
3-44
Selective Repeat (SR) Protocol
sender rexmits only pkts for which ACK not received ⇒ avoid unnecessary
rexmission
receiver individually acks correctly received pkts regardless of their order
z out-of-order pkts are buffered until missing pkts are received
3-45
SR Protocol : Sender/Receiver Events and Actions
sender
z data from above : if next available seq # is in window, send pkt
z ACK(n) in [sendbase,sendbase+N]
{ mark pkt n as received
are delivered to upper layer, and receive window moved forward by the
# of pkts delivered to upper layer
z pkt n in [rcvbase-N,rcvbase-1] correctly received
{ an ACK generated even though previously acked
{ if not acks, sender’s window may never move forward; for example, ack
3-47
Max. Window Size
stop-and-wait protocol A B
z window size N ≤ 2k-1 (k: # of seq field), not
2k, why?
ex) k=2 ⇒ seq #s : 0, 1, 2, 3; max N = 3
SR protocol
z scenarios
(a) : all acks are lost
incorrectly sends duplicate as new
(b) : all acks received correctly, but pkt 3
is lost
{ receiver can’t distinguish xmission of pkt
0 in (b) from rexmission of pkt 0 in (a)
z further consideration on scenario (a)
{ A rexmits pkt 0; B receives and buffer it
{ B sends piggybacked ack for pkt 2 that is
already acked but lost
{ A advanced window 3 0 1, and sends pkt 3
{ B receives pkt 3, and delivers pkt 0 (no
good!) in buffer and pkt 3 to upper layer
z wayout : avoid overlapping of SR windows
{ N ≤ 2k-1, k: # of bits in seq field 3-48
rdt : Comment on Packet Reordering
since seq #s are reused, old copies of a pkt with a seq/ack # of x
can appear, even though neither sender’s nor receiver’s window
contains x
z use of max pkt lifetime : constrain pkt to live in the net
3-49
Summary of rdt Mechanisms
3-50
Chap.3 Transport Layer
Introduction and Transport-Layer Services
Multiplexing and Demultiplexing
Connectionless Transport: UDP
Principle of Reliable Data Transfer
Connection-Oriented Transport: TCP
z TCP Connection
z Flow Control
3-52
TCP Segment Structure
3-55
Estimating Round-Trip Time (RTT)
clearly, TCP timeout value > RTT
Q : How much larger? How to estimate RTT? Each seg exploited in
estimating RTT? …
estimating RTT
z SampleRTT : time measured from seg xmission until ACK receipt
{ measured not for every seg xmitted, but for one of xmitted segs
proceed
3-56
RTT Samples and RTT Estimates
3-57
Retransmission Timeout Interval
DevRTT, variation of RTT : an estimate of how much SampleRTT
deviates from EstimatedRTT
DevRTT = (1-β)⋅DevRTT + β⋅|SampleRTT−EstimatedRTT|
z large (or small) when there is a lot of (or little) fluctuation
3-58
TCP Reliable Data Transfer
reliable data transfer service on top of IP’s unreliable service
z seq # : to identify lost and duplicate segs
z timer
3-59
Simplified TCP Sender
3-60
TCP Retransmission Scenarios
SendBase=120
SendBase=100
SendBase=120
SendBase=100 SendBase=120
rexmission due to a lost ack segment 100 not rexmitted cumulative ack avoids
rexmission of first seg
3-61
TCP Modifications : Doubling Timeout Interval
at each timeout, TCP rexmits and set next timeout interval to
twice the previous value
⇒ timeout intervals grow exponentially after each rexmission
but, for the other events (i.e., data received from app and ACK
received) timeout interval is derived from most recent values of
EstimatedRTT and DevRTT
3-62
TCP ACK Gen Recommendation [RFC 1122, 2581]
3-63
TCP Modifications : TCP Fast Retransmit
TCP Fast Retransmit : rexmits a (missing) seg before its timer
expiration, if TCP sender receives 3 duplicate ACKs
3-64
Is TCP Go-Back-N or Selective Repeat?
similarity of TCP with Go-Back-N
z TCP : cumulative ack for the last correctively received, in-order seg
{ TCP wouldn’t even rexmit seg n if ACK(n+1) arrived before timeout for
seg n
a modification to TCP in [RFC 2018] : selective acknowledgement
z TCP receiver acks out-of-order segs selectively rather than cumulatively
3-66
Flow Control : How It Works?
RevBuffer : size of buffer space allocated to a conn
RcvWindow : amount of free buffer space at rcv’s buffer
initial value of RcvWindow = RevBuffer
at receiver
z not to overflow : LastByteRcvd – LastByteRead ≤ RcvBuffer
LastByteRcvd – LastByteRead : # of bytes received not yet read
z RevWindow advertising : RcvWindow placed in receive window field in
every seg sent to sender
RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead]
at sender : limits unacked # of bytes to RcvWindow
z LastByteSent – LastByteAcked ≤ RcvWindow
3-67
Flow Control : Avoiding Sender Blocking
suppose A is sending to B, B’s rcv buffer becomes full so that
RcvWindow = 0, and after advertising RcvWindow = 0 to A, B has
nothing to send to A
z note that TCP at B sends a seg only if it has data or ack to send
3-68
TCP Connection Management : Establishment
3-way handshake
1. client sends SYN seg to server
{ contains no app data
seq # SYN s
egmen
t
2. server replies with SYNACK seg
{ server allocates buffers and ent
m
seg
variables to the connection YN ACK
S
{ contains no app data
3-69
TCP Connection Management : Termination
3-70
TCP State Transition : Client
Socket clientSocket = new Socket("hostname","port#");
3-71
TCP State Transition : Server
ServerSocket welcomeSocket = new ServerSocket(port#)
3-73
Preliminary of Congestion Control
pkt loss (at least, perceived by sender) results from overflowing of
router buffers as the net becomes congested
z rexmission treats a symptom, but not the cause, of net
congestion
cause of net congestion : too many sources attempting to send data
at too high a rate
z basic idea of wayout : throttle senders in face of net congestion
3-74
Causes and Costs of Congestion : Scenario 1
assumptions
z no error control, flow control, and congestion control
z host A and B send data at an avg rate of λin bytes/sec, respectively
z share a router with outgoing link capacity of R and infinite buffer space
z ignore additional header info (transport-layer and lower-layer)
3-76
Causes and Costs of Congestion : Scenario 2 (2)
case c : premature timeout for each pkt ⇒ rexmit each pkt twice
z cost of congested net : unneeded rexmissions waste link bw
3-77
Causes and Costs of Congestion : Scenario 3
assumptions
z 4 routers, each with finite buffer space and link capacity of R
z each of 4 hosts has same λin, rexmits over 2-hop paths
control
z explicit rate : the rate router can support on its outgoing link
3-79
Two Types of Feedback of Congestion Info
direct feedback : from a router to the sender by using choke pkt
feedback via receiver
z router mark/update a field in a pkt flowing forward to indicate
congestion
z upon receipt of the pkt, receiver notifies sender of congestion
3-80
ATM ABR Congestion Control
Asynchronous Transfer Mode (ATM)
z a virtual-circuit switching architecture
3-81
Mechanisms of Congestion Indication in ATM AVR
3-83
Preliminary of TCP Congestion Control (1)
basic idea of TCP congestion control : limit sending rate based on
the network congestion perceived by sender
z increase/reduce sending rate when sender perceives little/∗
congestion along the path bw itself and dst
to keep the description concrete, sending a large file is assumed
How does sender limit sending rate?
LastByteSent - LastByteAcked ≤ min{CongWin, RcvWindow} (1)
z CongWin : a variable limiting sending rate due to perceive
congestion
z henceforth, RcvWindow constraint ignored in order to focus on
congestion control
z (1) limits the amount of unacked data, thus the sending rate
{ consider conn for which loss and xmission delay are negligible
CongWin
then, sending rate ≈
RTT
3-84
Preliminary of TCP Congestion Control (2)
How does sender perceive congestion on path bw itself and dst?
z a timeout or the receipt of three duplicate ACKs
3-85
Additive-Increase, Mulitplicative-Decrease
multiplicative decrease : cut CongWin in half down to 1 MSS when
detecting a loss
additive increase: increase CongWin by 1 MSS every RTT until a loss
detected (i.e., when perceiving e-t-e path is congestion-free)
z commonly, accomplished by increasing CongWin by MSS⋅(MSS/CongWin)
bytes for each receipt of new ack
ex) MSS=1,460 bytes, ConWin=14,600 bytes ⇒ 10 segs sent within RTT
an ACK for a seg increases CongWin by 1/10⋅MSS, thus after ack for all 10
segs (thus, for one RTT) CongWin is increased by MSS
z congestion avoidance : linear increase phase of TCP cong control
3-86
TCP Slow Start
When a TCP conn begins, CongWin is typically
initialized to 1 MSS ⇒ initial rate ≈ MSS/RTT
ex) MSS = 500 bytes, RTT = 200 msec ⇒ initial
sending rate : only about 20 kbps
z linear increase at init. phase results in a
waste of bw, considering available bw may be
>> MSS/RTT
z desirable to quickly ramp up to some
respectable rate
slow start (SS) : during initial phase, increase
sending rate exponentially fast by doubling
CongWin every RTT until a loss occurs
z achieved by increasing CongWin by 1 MSS
3-87
Reaction to Congestion
Q: When does CongWin switch from exponential increase to linear increase?
A: when CongWin is reached to Threshold
z Threshold : a variable set to a half of CongWin just before a loss
{ initially set large, typically 65 Kbytes, so that it has no initial effect
TCP Reno [RFC2581, Stevens ’94] : reaction to loss depends on loss type
z for 3 duplicate acks receipt : CongWin is cut in half, then grows linearly
z for a timeout event : CongWin is set to 1 MSS (SS phase), then grows
exponentially to a Threshold, then grows linearly (CA phase)
z idea : 3 dup acks anyhow indicates capability of delivering some pkts
{ TCP Reno cancels SS phase after a triple duplicate ack : fast recovery
CongWin is lowered linearly; the longer the RTT, the greater the congestion
3-88
TCP Congestion Control Algorithms
3-89
TCP Reno Congestion Control Algorithm
[RFC 2581, Stevens 1994]
3-90
Steady-State Behavior of a TCP Connection
Consider a highly simplified macroscopic model for steady-state
behavior of TCP
z SS phases ignored since they are typically very short
z Letting W be the window size when a loss event occurs, RTT and
W are assumed to be approximately constant during a conn
Q : What’s avg throughput of a long-lived TCP conn as a function of
window size and RTT?
0.75 ⋅ W
A : avg throughput of a TCP connection = (2)
RTT
z a pkt is dropped when the rate increases to W/RTT
3-91
TCP Futures
TCP congestion control has evolved over the years and continue to evolve
z [RFC 2581] : a summary as of the late 1990s
3-93
TCP Fairness (2)
3-94
Some other Fairness Issues
Fairness and UDP
z multimedia apps, e.g., Internet phone and video conferencing do
not want their rate throttled even if net is congested
z thus runs over UDP rather than TCP, pumping audio/video at
const rate, and occasionally lose pkt rather than reducing rate
when congested ⇒ UDP sources may crowd out TCP traffic
z research issue : TCP-friendly cong control
{ goal : let UDP traffic behave fairly, thus prevent the Internet
from flooding
Fairness and parallel TCP connections
z a session can open multiple parallel TCP conn’s bw C/S, thus gets
a large portion of bw in a congested link
{ a Web browser to xfer multiple objects in a page
{ a new app, asking for 11 TCP conns, gets an unfair rate of R/2
3-95
TCP Delay Modeling
We’d compute the time for TCP to send an object for some simple models
z latency : defined as the time from when a client initiate a TCP conn until
the time at which it receives the requested object
assumptions : made in order not to obscure the central issues
z simple one-link net of rate R bps
z amount of data sender can xmit is limited solely by cong window
z pkts are neither lost or corrupted, thus no rexmission
z all protocol header overheads : ignored
z object consist of an integer # of MSS
{ O: object size [bits], S : seg size [bits] (e.g., 536 bits)
3-96
Static Congestion Window (1)
W : a positive integer, denoting a
fixed-size static congestion window
z upon receipt of rqst, server
immediately sends W segs back to
back to client, then one seg for
each ack from client W=4
3-97
Static Congestion Window (2)
2nd case : WS/R < RTT+S/R
z ack for 1st seg in 1st window received
after sending 1st window’s worth of segs
latency = setup time + time for xmitting
object + sum of times in idle state
z let K : # of windows covering object
K = O/WS or ⎡K⎤ if K is not an integer W=2
z # of times being in idle state = K-1
transmitting state
idle state
3-98
Dynamic Congestion Window (1)
cong window grows according to slow start,
i.e., doubled every RTT O/S=15
K=4
z O/S : # of segs in the object
Q=2
z # of segs in kth window : 2k-1 P=min{Q,K-1}=2
z K : # of windows covering object
⎧ O⎫
K = min ⎨k : 20 + 21 + " + 2k −1 ≥ ⎬
⎩ S⎭
⎧ O⎫
= min ⎨k : 2k −1 − 1 ≥ ⎬
⎩ S⎭
⎧ ⎛O ⎞⎫
= min ⎨k : k ≥ log2 ⎜ + 1 ⎟ ⎬
⎩ ⎝S ⎠⎭
⎡ ⎛O ⎞⎤
= ⎢log2 ⎜ + 1 ⎟ ⎥
⎢ ⎝S ⎠⎥
z xmission time of kth window = (S/R)2k-1
z duration in idle state of kth window
=[S/R+RTT-2k-1(S/R)]+
3-99
Dynamic Congestion Window (2)
latency = setup time + time for xmitting object + Σ times in idle state
O K −1 ⎡ S S⎤
+
{
Q = max k :
S
R
S
} ⎧
+ RTT − 2k −1 ≥ 0 = max ⎨k : 2k −1 ≤ 1 +
R ⎩
RTT ⎫
⎬
S /R ⎭
⎧ ⎛ RTT ⎞ ⎫ ⎢ ⎛ RTT ⎞ ⎥
= max ⎨k : k ≤ log2 ⎜ 1 + ⎟ + 1 ⎬ = log
⎢ 2⎜ 1 + ⎟⎥ + 1
⎩ ⎝ S / R ⎠ ⎭ ⎣ ⎝ S / R ⎠⎦
actual # of times server is idle is P=min{Q, K-1}, then (3) becomes
O P ⎡S S⎤
latency = 2 ⋅ RTT + + ∑ ⎢ + RTT − 2k −1 ⎥
R k =1 ⎣ R R⎦
O ⎡S ⎤ S
P
= 2 ⋅ RTT + + P ⎢ + RTT ⎥ − ∑ 2k −1
R ⎣R ⎦ R k =1
O ⎡S ⎤ S
= 2 ⋅ RTT + + P ⎢ + RTT ⎥ − (2P − 1 ) (4)
R ⎣R ⎦ R
3-100
Dynamic Congestion Window (3)
comparing TCP latency of (4) with minimal latency
latency P ⎡⎣(S R ) RTT + 1⎤⎦ − ⎡⎣(2p − 1 ) (S R ) RTT ⎤⎦
=1+
minimal latency 2 + (O R ) RTT
P + (S R ) RTT ⎡⎣P + 1 − 2p ⎤⎦ P
=1+ ≤1+
2 + (O R ) RTT 2 + (O R ) RTT
3-101
HTTP Modeling
Assume Web page consists of
z 1 base HTML page (of size O bits)
non-persistent HTTP
z M+1 TCP conns in series
persistent HTTP
z 2 RTT to request and receive base HTML file
3-102