0% found this document useful (0 votes)
27 views68 pages

Mod 4 Trans F24

Uploaded by

woshijuruo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views68 pages

Mod 4 Trans F24

Uploaded by

woshijuruo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

04834480

Computer Networks (Honor Track)

Module 4
End-to-End Data Transfer

Chenren Xu(许辰人)
Fall 2024

 Includes material from lectures by David Wetherall (UW) and Jim Kurose (UMASS);
 Jointly prepared with Ruihan Li. 1
Where we are in the Course
• Starting the Transport Layer! Application

- Builds on the network layer to deliver data across networks Transport


Network
for applications with the desired reliability or quality
Link
• Recall Physical
- Transport layer provides end-to-end connectivity across the
app app
network
TCP TCP
- Segments carry application data across the network
IP IP IP IP
- Segments are carried within packets within frames 802.11 802.11 802.3 802.3

Host Router Host

802.11 IP TCP

2
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
3
Socket API
• Simple abstraction to use the network
Socket,
- The “network” API (really transport service) used to write all Internet apps
Port #2
- Part of all major OSes and languages; originally Berkeley (Unix) ~1983
• Supports both Internet transport services (Streams and Datagrams)
• Sockets let apps attach to the local network at different ports Socket,
Port #1
• Same API used for Streams and Datagrams
TCP (Streams) UDP (Datagrams) Primitive Meaning
SOCKET Create a new communication endpoint
Connections Datagrams
BIND Associate a local address (port) with a socket
Bytes are delivered Messages may be lost,
reliably, and in order reordered, duplicated Only needed LISTEN Announce willingness to accept connections
for Streams ACCEPT Passively establish an incoming connection
Arbitrary length content Limited message size
CONNECT Actively attempt to establish a connection
Flow control matches Can send regardless SEND Send some data over the socket
sender to receiver of receiver state To/From
forms for RECEIVE Receive some data over the socket
Congestion control Can send regardless Datagrams CLOSE Release the socket
matches sender to network of network state

4
Ports
• Application process is identified by the • Some wellknown ports
tuple IP address, protocol, and port Port Protocol Use

- Ports are 16-bit integers representing local 20, 21 FTP File transfer
Remote login, replacement for
“mailboxes” that a process leases 22 SSH
Telnet
• Servers often bind to “well-known” ports 25 SMTP Email
80 HTTP World Wide Web
- <1024, require administrative privileges
110 POP-3 Remote email access
- But raises security issues
143 IMAP Remote email access
• Clients often assigned “ephemeral” ports 443 HTTPS Secure Web (HTTP over SSL/TLS)
- Chosen by OS, used temporarily 543 RTSP Media player control
631 IPP Printer sharing

5
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
I just want to
• User Datagram Protocol (UDP) send a packet!
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP) Network

• Transmission Control Protocol (TCP)


- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
6
User Datagram Protocol (UDP)
• A shim layer on packets • Datagram Sockets
• Used by apps that don’t want reliability or byte Client Time Server
1: socket
1: socket
streams, or want to build those by themselves 2: bind
request
- VoIP, RTP (unreliable) 4: sendto
3: recvfrom*
- DNS, RPC (message-oriented) reply
6: sendto
5: recvfrom*
- DHCP (bootstrapping)
7: close 7: close
• UDP Buffering • UDP Header *= call blocks
Application App App App
- Uses ports to identify sending and receiving
Ports application processes
- Datagram length up to 64K
Transport
(TCP) - Checksum (16 bits) for reliability

Message queues Port Mux/Demux

Network (IP) Packet 7


Example: Remote Procedure Call (RPC)
• Call a remote procedure and get its result
- Looks a lot like “sending a request and get the reply”
• Client/server stubs are built to hide network details
- Programming like calling local functions
• Why based on UDP?
- Message-oriented (request/reply)
- Duplication can still be detected by identifiers
- Reliability can still be guaranteed by retries
- Avoid unnecessary connection setup overhead

8
Example: Real-time Transport Protocol (RTP)
• Real-time video/audio streaming
- Commonplace in many applications, so RTP born and was
standardized in RFC 3550
• A transport protocol that happens to be implemented in the
application layer
• Retransmitting lost packets may not be necessary
- So based on UDP
• Some design details:
- The payload type field tells the encoding algorithm
- The sequence number field can be used to detect
loss or out-of-order packets
- The timestamp field tells when the first sample in
the packet was made

9
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
SYN! ACK! SYNACK!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network

- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
10
Transmission Control Protocol (TCP)
• How TCP works! We  TCP/IP!
Welove
We love TCP/IP!
TCP/IP!
- The transport protocol used for most content on the Internet
• TCP Features TCP TCP TCP

- A reliable bytestream service Network

- Based on connections
- Sliding window for reliability
▪ With adaptive timeout
- Flow control for slow receivers
- Congestion control to allocate network bandwidth

11
Reliable Bytestream
• Message boundaries not preserved from send() to recv()
- But reliable and ordered (receive bytes in same order as sent)
Sender Receiver

Four segments, each with 512 bytes of 2048 bytes of data delivered to
data and carried in an IP packet app in a single recv() call

• Bidirectional data transfer


- Control information (e.g., ACK) piggybacks on data segments in reverse direction
ACK B → A

ACK A → B data A → B
A B
data B → A
12
TCP Header
• Ports identify apps (socket API)
- 16-bit identifiers
• SEQ/ACK used for sliding window
- Selective Repeat, with byte positions
• SYN/FIN/RST flags for connections
- Flag indicates segment is a SYN etc.
• Window size for flow control
- Relative to ACK, and in bytes

13
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
Yeah!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
14
Connection Establishment
• How to set up connections
- We’ll see how TCP does it
• Both sender and receiver must be ready before we start the transfer of data
- Need to agree on a set of parameters
▪ E.g., the Maximum Segment Size (MSS)

• This is signaling
- It sets up state at the endpoints
- Like “dialing” for a telephone call

15
Three-Way Handshake
• Used in TCP Active party
(client)
Passive party
(server)
1
• Opens connection for data in both directions
2
• Each side probes the other with a fresh Initial Sequence Number
3
- Sends on a SYNchronize segment
- Echo on an ACKnowledge segment Time

- SYNs are retransmitted if lost


• Sequence and ACK numbers carried on further (data) segments

16
TCP Connection State Machine (Connection Establishment)
• Captures the states (rectangles) and transitions (arrows)
- A/B means event A (active or passive) triggers the transition with action B
Both parties run instances Active party (client) Passive party (server)

of this state machine CLOSED 1 CLOSED


SYN_SENT LISTEN

2 SYN_RCVD

ESTABLISHED 3

Time ESTABLISHED

• Finite state machines are a useful tool to specify and check the handling of all
cases that may occur
• TCP allows for simultaneous open
- i.e., both sides open at once instead of the client-server pattern 17
Connection Release
• How to release connections FIN! FIN!
- We’ll see how TCP does it
Network
• Orderly release by both parties when done
- Delivers all pending data and “hangs up”
- Cleans up state in sender and receiver Active Passive
party party
• Key problem is to provide reliability while releasing
- TCP uses a “symmetric” close in which both sides 1
shutdown independently
• TCP Connection Release
2
- Active sends FIN(x), ACKs
- Passive sends FIN(y), ACKs
- FINs are retransmitted if lost
- Each FIN/ACK closes one direction of data transfer
18
TCP Connection State Machine (Connection Release)
Both parties run
instances of this
state machine Active party Passive party

ESTABLISHED ESTABLISHED
FIN_WAIT_1
1

FIN_WAIT_2
LAST_ACK
2
TIME_WAIT
(timeout)
CLOSED CLOSED

• TIME_WAIT State
- We wait a long time (two times the maximum segment lifetime of 60 seconds) after sending all
segments and before completing the close (from TIME_WAIT to CLOSED), but why?
▪ In case some packet sent earlier by arrive later than FIN from peer
▪ Last ACK (y+1) might have been lost, in which case FIN will be resent 19
TCP Connection State Machine Complete

20
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
Yeah!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
21
Sliding Window
• Principles of the algorithm
Sender Receiver
- Pipelining and reliability Frame 0
- Building on Stop-and-Wait, i.e., ARQ with one message at a time Timeout ACK 0 Time
• Limitations of Stop-and-Wait Frame 1

- It allows only a single message to be outstanding from the sender:


ACK 1
▪ Fine for LAN (only one frame fit)
▪ Not efficient for network paths with BD >> 1 packet
- Example: R = 1 Mbps, D = 50 ms, RTT = 2D = 100 ms
▪ Assume pkt is 1250 Byte = 10 Kb, 10 Kb / 100 ms = 100 Kbps = 0.1 Mbps
= only 10% channel utilization
➢ Send the next packet only if the ACK of the previous one is received
▪ What if R = 10 Mbps?
• Generalization of Stop-and-Wait
- Allows W packets to be outstanding – can send W packets per RTT ( = 2D)
▪ Need W = 2BD to fill network path, ideally … 22
Sliding Window – Sender
• Sender buffers up to W segments until they are Sliding W=5
Window Available
acknowledged
.. 5Acked
6 7 .. Unacked
2 3 4 5 2Unavailable
3 .. 3 ..
- LFS = last frame sent, LAR = last ack received
- Sends while LFS – LAR ≤ W LAR LFS seq. number

W=5
• Transport accepts another segment of data from the
.. Acked Unacked Unavailable ..
application ...
- Transport sends it (as LFS – LAR = 5) LAR LFS seq. number

W=5 Available
• Next higher ACK arrives from peer…
- Window advances, buffer is freed .. 5 6Acked
7 2 2 Unacked
3 4 5 .. 2Unavailable
3 .. 3 ..

- LFS – LAR → 4 (can send one more) seq. number


LAR LFS

23
Sliding Window Protocol Optimizations – Receiver Coordination
• Go-Back-N • Selective Repeat
- Receiver keeps only a single packet buffer for - Receiver passes data to app in order, and buffers
the next segment out-of-order segments to reduce retransmissions
▪ State variable, LAS (LAST ACK SENT) - TCP uses a selective repeat design
- On receive: - Buffers W segments, keeps state variable
▪ If seq. number is LAS+1, accept and pass it to - On receive:
app, update LAS, send ACK ▪ Buffer segments [LAS+1, LAS+W]
▪ Otherwise discard (as out of order) and resend an ▪ Pass up to app in-order segments from LAS+1, and
(duplicate) ACK of LAS update LAS
- Retransmission: sender uses a single timer to ▪ Send ACK for LAS regardless
detect losses - Retransmission: sender uses a timer per unacked
▪ On timeout or receiving an duplicate ACK of LAR, segment to detect losses
resends buffered packets starting at LAR+1 ▪ On timeout for segment, resend it
▪ Hope to resend fewer segments
Use more timer resource to trade off bandwidth efficiency
http://www.ccs-labs.org/teaching/rn/animations/gbn_sr/ 24
Sequence Numbers
• Need more than 0/1 for Stop-and-Wait …
- But how many?
• For Selective Repeat, need W numbers for packets, plus W for ACKs of earlier packets
- 2W seq. numbers
- Fewer for Go-Back-N (W+1)
• Typically implement seq. number with an N-bit counter that wraps around at 2N– 1
- E.g., N = 8: …, 253, 254, 255, 0, 1, 2, 3, …
- TCP uses 32-bit
Retransmissions
Transmissions
Seq. Number

(at Sender) Go-Back-N scenario


Loss
Acks
Timeout (at Receiver)
Delay (=RTT/2) Time

25
TCP Sliding Window
• Receiver
- Cumulative ACK tells next expected byte sequence number (“LAS+1”) ACK up to 100 and 200-299
- Optionally, selective ACKS (SACK) give hints for receiver buffer state
▪ List up to 3 ranges of received bytes

• Sender
- Uses an adaptive retransmission timeout to resend data from LAS+1
- Uses heuristics to infer loss quickly and resend to avoid timeouts ACK 100, ACK 100, ACK 100,
▪ “Three duplicate ACKs” treated as loss ACK 100 200-299 200-399 200-499

Sender decides 100-199 is lost

26
A Simple TCP Example
Pkt 1: A sends a SYN to B with seq#
0 and ack# 0
Pkt 2: B starts its seq# of 0 and acknowledges
A’s seq # by adding a 1

Pkt 3: A adds a 1 to B’s initial


sequence number (ISN) and sends
an ack to B to acknowledge its ISN
Pkt 4: A sends a 100 byte long GET
request to B Pkt 5: B responds to the request from A. Since B
has not sent data yet, its seq # is still 1. It sends
the packet with ack = 101 to acknowledge receipt

Pkt 6: A sends another request of 50 of the 100 bytes from A

bytes. Its seq# starts at 101. The Pkt 7: B responds to the request with a 1000 byte
next expected seq# should be 151. It packet. The starting seq is 201, so the next
acknowledges the 200 bytes sent by expected seq# is 1201. It receives 50 bytes from
B by sending an ack# of 201 A, so it acks with 151.

27
Problem of Sliding Window LFS = last frame sent, LAR = last ack received

Arg …
• Sliding window uses pipelining to keep the network busy
- What if the receiver is overloaded? Streaming video
Big Iron Mobile
Sliding
• Sliding window – Receiver Window W=5
seq. number
- Consider receiver with W buffers .. Finished
5 6 7 5 Acceptable
5 5 5 5 2 Too
3 high
.. 3 ..
▪ LAS = last ack sent, app pulls in-order data from buffer with recv() call LAS
W=5 Acceptable

- Suppose the next two segments arrive but app does not call recv() .. 5 6 7 Acked
Finished 4
5 4
5 5 5 5 2 Too
3 high
.. 3 ..
▪ LAS rises, but we can’t slide window! LAS
Nothing
W=5
Acceptable
- If further segments arrive (even in order) we can fill the buffer .. Finished 4 4 4Acked
5 6 7 Acked 4 4 2 Too
3 high 3 ..
▪ Must drop segments until app recvs! LAS
W=5 Acceptable
- App recv() takes two segments
.. 5 Finished
6 7 5 4Acked
4 4 2 3 Too ..
.. 3high
▪ Window slides
LAS
Need a mechanism to inform the sender regarding receive buffer 28
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
29
Flow Control
• Solution: Adding flow control to the sliding window algorithm W=5 Acceptable
- To slow the over-enthusiastic sender 5Finished
6 7 Acked
5 4
4 5 5 5 5 2 Too
3 high
.. 3
.. ..
• Avoid loss at receiver by telling sender the available buffer LAS
seq. number
WIN=3
space
- WIN = #Acceptable, not W (from LAS) .. 5Finished
6 7 Acked
5 4 5
4 5 5 2 Too
3 high
.. 3 ..

• Sender uses the lower of the sliding window and flow control LAS
seq. number

window (WIN) as the effective window size


• TCP-style example
- SEQ/ACK sliding window
- Flow control with WIN
- SEQ + length < ACK + WIN
- 4 KB buffer at receiver
- Circular buffer of bytes

30
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
31
Retransmission Timeouts
• How to set the timeout for sending a retransmission Lost?
- Adapting to the network path
• Retransmissions Network

- With sliding window, the strategy for detecting loss is the timeout
Retransmit!
▪ Set timer when a segment is sent
▪ Cancel timer when ack is received
BCN → SEA → BCN
▪ If timer fires, retransmit data as lost
1000
• Timeout Problem 900 Variation due to

Round Trip Time (ms)


800 queuing at routers, Need to adapt
- Timeout should be “just right” 700 changes in network to the network
▪ Too long wastes network capacity 600 paths, etc. conditions

▪ Too short leads to spurious resends 500


400
- Easy to set on a LAN (Link) 300
▪ Short, fixed, predictable RTT 200
100 Propagation (+transmission) delay ≈ 2D
- Hard on the Internet (Transport) 0
0 50 100 150 200
▪ Wide range, variable RTT
- Evan harder in (highly) mobile access 32
Adaptive Timeout
• Keep smoothed estimates of the mean SRTT and variance 1000

800
Svar of RTT

RTT (ms)
600
- Update estimates with a moving average
400 SRTT
▪ SRTTN+1 = 0.9 ✕ SRTTN + 0.1 ✕ RTT N+1

▪ SvarN+1 = 0.9 ✕ SvarN + 0.1 ✕ |RTTN+1– SRTTN+1| 200


Svar
• Set timeout to a multiple of estimates 0
0 50 100 150 200
- To estimate the upper RTT in practice Seconds

- TCP TimeoutN = SRTTN + 4 ✕ SvarN Early timeout


1000

• Simple to compute, does a good job of tracking actual RTT 800 Timeout

RTT (ms)
(SRTT + 4 ✕ Svar)
- Little “headroom” to lower 600

- Yet very few early timeouts 400

• Important for good performance and robustness 200

0
0 50 100 150 200
Seconds

33
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
What’s the hold up?
- Flow control
- Retransmission timers Network

- Congestion control
• Quick UDP Internet Connection (QUIC)
34
Congestion Overview
• More fun in the Transport Layer!
- The mystery of congestion control
- Depends on the Network layer too
• Understanding congestion, a “traffic jam” in the network
- Later we will learn how to control it
• Topics
- Nature of congestion
- Fairness of Bandwidth Allocation
- AIMD Control Law
- TCP Congestion Control history
- ACK Clocking
- TCP Slow-start
- TCP Fast Retransmit/Recovery
- Congestion Avoidance (ECN)
35
Nature of Congestion
• Routers/switches have internal buffering for contention Input Output

• Simplified view of per port output queues


- Typically FIFO (First In First Out), discard when full ...
...

...

...
• Queues help by absorbing bursts when input > output rate
• But if input > output rate persistently, queue will overflow Input Buffer Fabric Output Buffer
Router
- http://www.ccs-labs.org/teaching/rn/animations/queue/
Router
• Congestion is a function of the traffic patterns – can occur even if =
every link have the same capacity
• Effects of Congestion (FIFO) Queue
Queued
Packets

- What happens to performance as we increase the load?


- As offered load rises, congestion occurs as queues begin to fill:
▪ Delay and loss rise sharply with more load
▪ Throughput falls below load (due to loss)
▪ Goodput may fall below throughput (due to spurious retransmissions)

Want to operate network just before the onset of congestion 36


Bandwidth Allocation
• Important task for network is to allocate its • Why is it hard? (Just split equally!)
capacity to senders - Number of senders and their offered load is
- Good allocation is efficient and fair constantly changing
• Efficient means most capacity is used but there - Senders may lack capacity in different parts of the

is no congestion network
- Network is distributed; no single party has an
• Fair means every sender gets a reasonable
overall picture of its state
share the network
• Solution context:
• Key observation:
- Senders adapt concurrently based on their own
- In an effective solution, Transport and Network
view of the network
layers must work together
- Design this adaption so the network usage as a
• Network layer witnesses congestion whole is efficient and fair
- Only it can provide direct feedback
- Adaption is continuous since offered loads
• Transport layer causes congestion continue to change over time
- Only it can reduce offered load
37
Fairness of Bandwidth Allocation
• What’s a “fair” bandwidth allocation?
• Recall
- We want a good bandwidth allocation to be fair and efficient
▪ Now we learn what fair means A B C
1 1
- Caveat: in practice, efficiency is more important than fairness
• Efficiency vs. Fairness
- Cannot always have both!
- If we care about fairness:
▪ Example network with traffic A → B, B → C and A → C
▪ Give equal bandwidth to each flow
▪ How much traffic can we carry?
▪ A → B: ½ unit, B → C: ½, and A → C: ½
• The Slippery Notion of Fairness ▪ Total traffic carried is 1 ½ units
- Is “equal per flow” fair anyway?
- If we care about efficiency:
▪ A → C uses more network resources (two links) than A → B or B → C
▪ Maximize total traffic in network
▪ Host A sends two flows, B sends one
▪ A → B: 1 unit, B → C: 1, and A → C: 0
- Another idea: maximizing the minimum flow ▪ Total traffic carried is 2 units
- Not productive to seek exact fairness
▪ More important to avoid starvation; “Equal per flow” is good enough 38
Generalizing “Equal per Flow”
• Bottleneck for a flow of traffic is the link that limits its bandwidth
- Where congestion occurs for the flow A B C
1 10
- For A → C, link A – B is the bottleneck
• Flows may have different bottlenecks Bottleneck
- For A → C, link A – B is the bottleneck
- For B → C, link B – C is the bottleneck
- Can no longer divide links equally …

39
Max-Min Fairness
• Intuitively, flows bottlenecked on a link get • Example: network
an equal share of that link with 4 flows, links
• Max-min fair allocation is one that: equal bandwidth
- Increasing the rate of one flow will decrease When rate=1/3, flows B, C,
the rate of a smaller flow and D bottleneck R4 – R5. Bottleneck

- This “maximizes the minimum” flow Fix B, C, and D, continue to


increase A
• To find it given a network, imagine Bottleneck

“pouring water into the network” When rate=2/3, flow A


1. Start with all flows at rate 0 bottlenecks R2 – R3. Bottleneck

2. Increase the flows until there is a new Done.

bottleneck in the network


End with A=2/3, B, C,
3. Hold fixed the rate of the flows that are
D=1/3, and R2 – R3, R4 –
bottlenecked R5 full. Other links have
4. Go to step 2 for any remaining flows extra capacity that can’t be
used 40
Adapting over Time
• Allocation changes as flows start and stop

Flow 1 slows when Flow 1 speeds up


Flow 2 starts when Flow 2
stops
Flow 3 limit
is elsewhere

Time

41
Bandwidth Allocation Models
• Want to allocate capacity to senders, but how?
• Open loop versus closed loop
- Open: reserve bandwidth before use
- Closed: use feedback to adjust rates
• Host versus Network support
- Who sets/enforces allocations?
• Window versus Rate based
- How is allocation expressed?
• We’ll look at closed-loop, host-driven, and window-based approach which TCP adopts
• Network layer returns feedback on current allocation to senders
AIMD!
- At least tells if (not when) there is congestion
• Transport layer adjusts sender’s behavior via window in response
- How senders adapt is a control law Sawtooth

▪ Example: Additive Increase Multiplicative Decrease


42
Additive Increase Multiplicative Decrease (AIMD)
• AIMD is a control law hosts can use to reach a good allocation • Properties
- Hosts additively increase rate while network is not congested - Produces a “sawtooth” pattern
- Hosts multiplicatively decrease rate when congestion occurs over time for rate of each host
- Used by TCP - Converges to an allocation that is
• Let’s explore the AIMD game … Host 1 efficient and fair when hosts run it
- Hosts 1 and 2 share a bottleneck 1
Bottleneck ▪ Holds for more general topologies
Rest of
▪ But do not talk to each other directly Host 2 1 Network
- Other increase/decrease control
Router
- Router provides binary feedback 1 laws do not! (Try MIAD, MIMD, AIAD)

▪ Tells hosts if network is congested - Requires only binary feedback


Host 1 Host 1 from the network
Each point is a 1 Congested 1 Congested
Host 1 or Multiplicative Additive
possible allocation Additive Fair, y=x Fair 2’s Rate Decrease Increase
Increase Optimal A
Allocation Startin
AI and MD move Multiplicative g point
Decrease Efficient, x+y=1 Efficient
the allocation
0 1 Host 2 0 1 Host 2
Time
Always converge to
good allocation! 43
Feedback Signals
• Several possible signals, with different pros/cons
- We’ll look at classic TCP that uses packet loss as a signal

Signal Example Protocol Pros / Cons


Packet TCP NewReno Hard to get wrong
loss Cubic TCP (Linux) Hear about congestion late
Packet Compound TCP Hear about congestion early
delay (Windows*) Need to infer congestion
Router TCPs with Explicit Hear about congestion early
indication Congestion Notification Require router support

* Use CUBIC as default since WIN10


44
History of TCP Congestion Control
• The story of TCP congestion control What’s up?
- Collapse, control, and diversification
• Congestion Collapse in the 1980s Internet
- Early TCP used a fixed size sliding window (e.g., 8 packets)
▪ Initially fine for reliability
- But something strange happened as the ARPANET grew
▪ Links stayed busy but transfer rates fell by orders of magnitude!
- Queues became full, retransmissions clogged the network,
and goodput fell Congestion
collapse

45
Van Jacobson (1950 – )
• Widely credited with saving the Internet from congestion
collapse in the late 80s
- Introduced congestion control principles
- Practical solutions (TCP Tahoe/Reno)
- V. Jacobson and M. J. Karels, Congestion Avoidance and
Control, ACM SIGCOMM 1988
▪ https://web.stanford.edu/class/cs244/papers/CongestionControl.pdf
• Much other pioneering work:
- Tools like traceroute, tcpdump, pathchar
- IP header compression, multicast tools

46
TCP Tahoe/Reno
• Avoid congestion collapse without changing routers • TCP behaviors we will study:
(or even receivers) - ACK clocking
• Idea is to fix timeouts and introduce a congestion window - Adaptive timeout (mean and variance)
- Slow-start
(cwnd) over the sliding window to limit queues/loss
- Fast Retransmission
• TCP Tahoe/Reno implements AIMD by adapting cwnd
- Fast Recovery
using packet loss as the network feedback signal
• Together, they implement AIMD
TCP Reno Router support ECN Background TCP LEDBAT
TCP/IP “flag day” (Jacobson, ‘90) (Floyd, ‘94) (IETF ’08)
3-way handshake Delay TCP Vegas
(BSD Unix 4.2, ‘83) Compound TCP
(Tomlinson, ‘75) based (Brakmo, ‘93)
TCP and IP
TCP Tahoe (Windows, ’07)
TCP with SACK FAST TCP
(Jacobson, ’88)
(RFC 791/793, ‘81) TCP Reno (Floyd, ‘96) (Low et al., ’04) TCP CUBIC
Origins of “TCP”
(Jacobson, ‘90) TCP New Reno
(Cerf & Kahn, ’74) Congestion collapse TCP BIC (Linux, ’06)
Observed,1988
‘86 (Hoe, ‘95)
(Linux, ’04)

1970 1975 1980 1985 1990 1990 1995 2000 2005 2010
... ...
Pre-history Congestion control Classic congestion control Diversification

47
TCP ACK Clocking
• The self-clocking behavior of sliding windows, and how it is used by TCP

• The network has smoothed out the burst of data segments


- ACK clock transfers this smooth timing back to the sender
- Subsequent data segments are not sent in bursts so do not queue up in the network
• TCP uses ACK Clocking – Sliding window controls how many segments are inside the network
- Called the congestion window, or cwnd; Rate is roughly cwnd/RTT
• State-of-Art Work: “TACK: Improving Wireless Transport Performance by Taming
Acknowledgments”, SIGCOMM’20 by Huawei
48
TCP Slow Start
• How TCP implements AIMD, part 1
- “Slow start” is a component of the AI portion of AIMD Slow-start

• Recall
- We want TCP to follow an AIMD control law for a good allocation
- Sender uses a congestion window or cwnd to set its rate (≈cwnd/RTT)
- Sender uses packet loss as the network congestion signal
- Need TCP to work across a very large range of rates and RTTs
• TCP Startup Problem
- We want to quickly near the right rate, cwndIDEAL, but it varies greatly
▪ Fixed sliding window doesn’t adapt and is rough on the network (loss!)
▪ AI with small bursts adapts cwnd gently to the network, but might take a long time to become efficient

49
Slow-Start Solution
• Start by doubling cwnd every RTT Window
(cwnd) Fixed
- Exponential growth (1, 2, 4, 8, 16, …)
- Start slow, quickly reach large values Slow-start
• Eventually packet loss will occur when the AI

network is congested Time


- Loss timeout tells us cwnd is too large Window
cwndC
- Next time, switch to AI beforehand
- Slowly adapt cwnd near right value cwndIDEAL
Fixed AI phase

• In terms of cwnd: ssthresh


Slow-start AI
- Expect loss for cwndC ≈ 2BD + queue
- Use ssthresh = cwndC/2 to switch to AI Time

• Combined behavior, after first time


- Most time spend near right value 50
Comparison of Slow Start and Additive Increase

Increment cwnd by 1 packet Increment cwnd by 1 packet


for each ACK every cwnd ACKs (or 1 RTT)

51
TCP Tahoe (Implementation)
• Initial slow-start (doubling) phase • Timeout misfortunes
- Start with cwnd = 1 (or small value) - Why do a slow-start after timeout?
- cwnd += 1 packet per ACK ▪ Instead of MD cwnd (for AIMD)

• Later Additive Increase phase - Timeouts are sufficiently long that the ACK

- cwnd += 1/cwnd packets per ACK clock will have run down
▪ Slow-start ramps up the ACK clock
▪ Roughly adds 1 packet per RTT
- We need to detect loss before a timeout to get
• Switching threshold
to full AIMD
- Switch to AI when cwnd > ssthresh
▪ Done in TCP Reno
- Set ssthresh = cwnd/2 after loss
- Begin with slow-start after timeout

52
TCP Fast Retransmit / Fast Recovery
• How TCP implements AIMD, part 2
- “Fast retransmit” and “fast recovery” are the MD portion of AIMD AIMD sawtooth
• Recall
- We want TCP to follow an AIMD control law for a good allocation
- Sender uses a congestion window or cwnd to set its rate (≈cwnd/RTT)
- Sender uses slow-start to ramp up the ACK clock, followed by Additive Increase
- But after a timeout, sender slow-starts again with cwnd = 1 (as it no ACK clock)
• Inferring loss from ACKs
- TCP uses a cumulative ACK
▪ Carries highest in-order seq. number
▪ Normally a steady advance
- Duplicate ACKs give us hints about what data hasn’t arrived
▪ Tell us some new data did arrive, but it was not next segment, thus the next segment may be lost
53
Fast Retransmit
• Treat three duplicate ACKs as a loss
- Retransmit next expected segment
- Some repetition allows for reordering, but still detects loss quickly Ack 1 2 3 4 5 5 5 5 5 5
Sender Receiver
... ...
Ack 10
Ack 11
Data 14 was lost
Ack 12
Ack 13
earlier, but got 15 to 20
Ack 13
Data 20
Third duplicate Ack 13
ACK, so send 14 Ack 13
Ack 13
Data 14
Retransmission fills
... ... in the hole at 14
ACK jumps after
Ack 20
loss is repaired ... ...

• It can repair single segment loss quickly, typically before a timeout


• However, we have quiet time at the sender/receiver while waiting for the ACK to jump
• And we still need to MD cwnd …
54
Fast Recovery
• First fast retransmit, and MD cwnd
• Then pretend further duplicate ACKs
are the expected ACKs
- Let new segments be sent for ACKs Ack 1 2 3 4 5 5 5 5 5 5
- Reconcile views when the ACK jumps
Sender Receiver
• With fast retransmit, it repairs a Data 14 was
Ack 12 lost earlier, but
single segment loss quickly and Ack 13 got 15 to 20
Third duplicate
keeps the ACK clock running ACK, so send 14 Ack 13
Data 20
Ack 13
• This allows us to realize AIMD Set ssthresh (and Ack 13
- No timeouts or slow-start after loss cwnd) = cwnd/2 Ack 13 Retx fills in
Data 14 the hole at 14
Ack 13
(and set cwnd to 1), just continue with Ack 20
Data 21
a smaller cwnd More ACKs advance ... ...
Data 22
window; may send
Exit Fast
segments before jump
Recovery

55
TCP Reno
• TCP Reno combines slow-start, fast retransmit and fast recovery
- Multiplicative Decrease is ½

TCP sawtooth

ACK clock
running

MD of ½ ,
no slow-start

Try out in GENI: https://witestlab.poly.edu/blog/tcp-congestion-control-basics/ 56


TCP Reno, NewReno, and SACK
• Reno can repair one loss per RTT
- Multiple losses cause a timeout
• NewReno further refines ACK heuristics
- Repairs multiple losses without timeout
• Selective ACK (SACK) is a better idea
- Receiver sends ACK ranges so sender can retransmit without guesswork

57
Congestion Avoidance vs. Control
• Classic TCP drives the network into congestion and then recovers
- Needs to see loss to slow down
• Would be better to use the network but avoid congestion altogether!
- Reduces loss and delay
• But how can we do this? – Feedback Signals
- Delay and router signals can let us avoid congestion

Signal Example Protocol Pros / Cons


Packet loss Classic TCP Hard to get wrong
Cubic TCP (Linux) Hear about congestion late

Packet delay Compound TCP Hear about congestion early


(Windows) Need to infer congestion

Router TCPs with Explicit Hear about congestion early


indication Congestion Notification Require router support

58
Explicit Congestion Notification (ECN)
• Router detects the onset of congestion via its queue • Advantages:
- When congested, it marks affected packets (IP header) - Routers deliver clear signal to hosts
• Marked packets arrive at receiver - Congestion is detected early, no loss
- TCP receiver informs TCP sender of the congestion - No extra packets need to be sent
• Disadvantages:
- Routers and hosts must be upgraded

59
A little bit dive into the Linux TCP/IP
• Send • Receive

Inter-Frame Gap

More readings:
• Understanding TCP/IP Network Stack & Writing Network Apps
 Optional material • The performance analysis of linux networking – Packet receiving 60
Modern TCP variants
• CUBIC
- Cubic Increase Increasing to probe
for the optimal rate
▪ Window expanded based on elapsed time from
last congestion
▪ More aggressive than Reno, optimized for high
bandwidth-delay product (BDP)
- Multiplicative Decrease
▪ Loss-based congestion detection Plateauing to the last
▪ Can be misguided by random loss and may congestion window
cause bufferbloat
- Default TCP for Linux, arguably largest
worldwide deployment

61
Modern TCP variants
• BBR – Bottleneck Bandwidth and Round-trip propagation time
- Limit inflight bytes by RTprop ✕ BtlBw (cwnd)
- Send at rate BtlBw ✕ PacingGain
▪ RTprop: minimum RTT in last 10 seconds
▪ BtlBw: maximum throughput in last 10 RTprop
▪ PacingGain (i.e., 1, 1, 1, 1, 1, 1, 1.25, 0.75 periodically) is used to probe higher bandwidth
▪ Set cwnd to 4 if RTprop has not been updated for 10 seconds, and use their minimum RTT
as new RTprop
- Avoid bufferbloat and ignore random loss
- Developed by Google and available since Linux kernel 4.9

62
Modern TCP variants

63
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
64
Problems of TCP
• TCP does not understand its payload at all
- It provides nothing but a reliable byte stream
• What if we want security?
- Another layer (e.g. TLS) over TCP
- So two handshakes needed, for TCP and TLS respectively
• What if we have multiple requests?
- … over one TCP connection, head-of-line (HOL) problems
- … over multiple TCP connections, need duplicative slow
starts and may burden the server
• TCP is implemented in the kernel
- Difficult to evolve
- Not under the control of applications

 Optional material 65
Quick UDP Internet Connection (QUIC)
• Based on UDP
- But provides reliability and connections
• Built inside applications
- But conceptually at transport layer
- Easy to evolve
• Born with security
- No another security layer needed
- Unified handshake, faster connection establishment
• Multiple streams
- Independent, no HOL problems
• Connection IDs
- Keep the connection between Wi-Fi/LTE networks
• Proposed by Google in 2012, finally standardized as RFC 9000 in May, 2021

 Optional material 66
A quick look at QUIC protocol
• Multiple frames in one packet, multiple packets in one UDP datagram
• Some packet types:
- Initial packet: Used to initiate client/server handshake
- 0-RTT packet: Used to carry data before client/server handshake
- Handshake packet: Used to perform client/server handshake
- 1-RTT packet: Used to carry data during or after client/server handshake
- Retry packet: Used to validate client address at the beginning of client/server handshake
• Example frame types:
- CRYPTO frame: Used to transmit cryptographic handshake messages
- HANDSHAKE_DONE frame: Used to signal confirmation of the handshake to the client
- STREAM frame: Used to create a stream and/or carry stream data
- ACK frame: Used to inform senders of packets they have received and processed

 Optional material 67
A quick look at QUIC protocol
• Example connection establishment
Client Server

Initial[0]: CRYPTO[CH]
0-RTT[0]: STREAM[0, "..."] ->

Initial[0]: CRYPTO[SH] ACK[0]


Handshake[0] CRYPTO[EE, FIN]
<- 1-RTT[0]: STREAM[1, "..."] ACK[0]

Initial[1]: ACK[0]
Handshake[0]: CRYPTO[FIN], ACK[0]
1-RTT[1]: STREAM[0, "..."] ACK[0] ->

Handshake[1]: ACK[0]
<- 1-RTT[1]: HANDSHAKE_DONE, STREAM[3, "..."], ACK[1]

• Only a few fields in packet header are public


- All other fields (including the Packet Number field) and payload are encrypted and authenticated

 Optional material 68

You might also like