Mod 4 Trans F24
Mod 4 Trans F24
Module 4
End-to-End Data Transfer
Chenren Xu(许辰人)
Fall 2024
Includes material from lectures by David Wetherall (UW) and Jim Kurose (UMASS);
Jointly prepared with Ruihan Li. 1
Where we are in the Course
• Starting the Transport Layer! Application
802.11 IP TCP
2
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
3
Socket API
• Simple abstraction to use the network
Socket,
- The “network” API (really transport service) used to write all Internet apps
Port #2
- Part of all major OSes and languages; originally Berkeley (Unix) ~1983
• Supports both Internet transport services (Streams and Datagrams)
• Sockets let apps attach to the local network at different ports Socket,
Port #1
• Same API used for Streams and Datagrams
TCP (Streams) UDP (Datagrams) Primitive Meaning
SOCKET Create a new communication endpoint
Connections Datagrams
BIND Associate a local address (port) with a socket
Bytes are delivered Messages may be lost,
reliably, and in order reordered, duplicated Only needed LISTEN Announce willingness to accept connections
for Streams ACCEPT Passively establish an incoming connection
Arbitrary length content Limited message size
CONNECT Actively attempt to establish a connection
Flow control matches Can send regardless SEND Send some data over the socket
sender to receiver of receiver state To/From
forms for RECEIVE Receive some data over the socket
Congestion control Can send regardless Datagrams CLOSE Release the socket
matches sender to network of network state
4
Ports
• Application process is identified by the • Some wellknown ports
tuple IP address, protocol, and port Port Protocol Use
- Ports are 16-bit integers representing local 20, 21 FTP File transfer
Remote login, replacement for
“mailboxes” that a process leases 22 SSH
Telnet
• Servers often bind to “well-known” ports 25 SMTP Email
80 HTTP World Wide Web
- <1024, require administrative privileges
110 POP-3 Remote email access
- But raises security issues
143 IMAP Remote email access
• Clients often assigned “ephemeral” ports 443 HTTPS Secure Web (HTTP over SSL/TLS)
- Chosen by OS, used temporarily 543 RTSP Media player control
631 IPP Printer sharing
5
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
I just want to
• User Datagram Protocol (UDP) send a packet!
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP) Network
8
Example: Real-time Transport Protocol (RTP)
• Real-time video/audio streaming
- Commonplace in many applications, so RTP born and was
standardized in RFC 3550
• A transport protocol that happens to be implemented in the
application layer
• Retransmitting lost packets may not be necessary
- So based on UDP
• Some design details:
- The payload type field tells the encoding algorithm
- The sequence number field can be used to detect
loss or out-of-order packets
- The timestamp field tells when the first sample in
the packet was made
9
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
SYN! ACK! SYNACK!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
10
Transmission Control Protocol (TCP)
• How TCP works! We TCP/IP!
Welove
We love TCP/IP!
TCP/IP!
- The transport protocol used for most content on the Internet
• TCP Features TCP TCP TCP
- Based on connections
- Sliding window for reliability
▪ With adaptive timeout
- Flow control for slow receivers
- Congestion control to allocate network bandwidth
11
Reliable Bytestream
• Message boundaries not preserved from send() to recv()
- But reliable and ordered (receive bytes in same order as sent)
Sender Receiver
Four segments, each with 512 bytes of 2048 bytes of data delivered to
data and carried in an IP packet app in a single recv() call
ACK A → B data A → B
A B
data B → A
12
TCP Header
• Ports identify apps (socket API)
- 16-bit identifiers
• SEQ/ACK used for sliding window
- Selective Repeat, with byte positions
• SYN/FIN/RST flags for connections
- Flag indicates segment is a SYN etc.
• Window size for flow control
- Relative to ACK, and in bytes
13
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
Yeah!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
14
Connection Establishment
• How to set up connections
- We’ll see how TCP does it
• Both sender and receiver must be ready before we start the transfer of data
- Need to agree on a set of parameters
▪ E.g., the Maximum Segment Size (MSS)
• This is signaling
- It sets up state at the endpoints
- Like “dialing” for a telephone call
15
Three-Way Handshake
• Used in TCP Active party
(client)
Passive party
(server)
1
• Opens connection for data in both directions
2
• Each side probes the other with a fresh Initial Sequence Number
3
- Sends on a SYNchronize segment
- Echo on an ACKnowledge segment Time
16
TCP Connection State Machine (Connection Establishment)
• Captures the states (rectangles) and transitions (arrows)
- A/B means event A (active or passive) triggers the transition with action B
Both parties run instances Active party (client) Passive party (server)
2 SYN_RCVD
ESTABLISHED 3
Time ESTABLISHED
• Finite state machines are a useful tool to specify and check the handling of all
cases that may occur
• TCP allows for simultaneous open
- i.e., both sides open at once instead of the client-server pattern 17
Connection Release
• How to release connections FIN! FIN!
- We’ll see how TCP does it
Network
• Orderly release by both parties when done
- Delivers all pending data and “hangs up”
- Cleans up state in sender and receiver Active Passive
party party
• Key problem is to provide reliability while releasing
- TCP uses a “symmetric” close in which both sides 1
shutdown independently
• TCP Connection Release
2
- Active sends FIN(x), ACKs
- Passive sends FIN(y), ACKs
- FINs are retransmitted if lost
- Each FIN/ACK closes one direction of data transfer
18
TCP Connection State Machine (Connection Release)
Both parties run
instances of this
state machine Active party Passive party
ESTABLISHED ESTABLISHED
FIN_WAIT_1
1
FIN_WAIT_2
LAST_ACK
2
TIME_WAIT
(timeout)
CLOSED CLOSED
• TIME_WAIT State
- We wait a long time (two times the maximum segment lifetime of 60 seconds) after sending all
segments and before completing the close (from TIME_WAIT to CLOSED), but why?
▪ In case some packet sent earlier by arrive later than FIN from peer
▪ Last ACK (y+1) might have been lost, in which case FIN will be resent 19
TCP Connection State Machine Complete
20
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
Yeah!
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP) Network
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
21
Sliding Window
• Principles of the algorithm
Sender Receiver
- Pipelining and reliability Frame 0
- Building on Stop-and-Wait, i.e., ARQ with one message at a time Timeout ACK 0 Time
• Limitations of Stop-and-Wait Frame 1
W=5
• Transport accepts another segment of data from the
.. Acked Unacked Unavailable ..
application ...
- Transport sends it (as LFS – LAR = 5) LAR LFS seq. number
W=5 Available
• Next higher ACK arrives from peer…
- Window advances, buffer is freed .. 5 6Acked
7 2 2 Unacked
3 4 5 .. 2Unavailable
3 .. 3 ..
23
Sliding Window Protocol Optimizations – Receiver Coordination
• Go-Back-N • Selective Repeat
- Receiver keeps only a single packet buffer for - Receiver passes data to app in order, and buffers
the next segment out-of-order segments to reduce retransmissions
▪ State variable, LAS (LAST ACK SENT) - TCP uses a selective repeat design
- On receive: - Buffers W segments, keeps state variable
▪ If seq. number is LAS+1, accept and pass it to - On receive:
app, update LAS, send ACK ▪ Buffer segments [LAS+1, LAS+W]
▪ Otherwise discard (as out of order) and resend an ▪ Pass up to app in-order segments from LAS+1, and
(duplicate) ACK of LAS update LAS
- Retransmission: sender uses a single timer to ▪ Send ACK for LAS regardless
detect losses - Retransmission: sender uses a timer per unacked
▪ On timeout or receiving an duplicate ACK of LAR, segment to detect losses
resends buffered packets starting at LAR+1 ▪ On timeout for segment, resend it
▪ Hope to resend fewer segments
Use more timer resource to trade off bandwidth efficiency
http://www.ccs-labs.org/teaching/rn/animations/gbn_sr/ 24
Sequence Numbers
• Need more than 0/1 for Stop-and-Wait …
- But how many?
• For Selective Repeat, need W numbers for packets, plus W for ACKs of earlier packets
- 2W seq. numbers
- Fewer for Go-Back-N (W+1)
• Typically implement seq. number with an N-bit counter that wraps around at 2N– 1
- E.g., N = 8: …, 253, 254, 255, 0, 1, 2, 3, …
- TCP uses 32-bit
Retransmissions
Transmissions
Seq. Number
25
TCP Sliding Window
• Receiver
- Cumulative ACK tells next expected byte sequence number (“LAS+1”) ACK up to 100 and 200-299
- Optionally, selective ACKS (SACK) give hints for receiver buffer state
▪ List up to 3 ranges of received bytes
• Sender
- Uses an adaptive retransmission timeout to resend data from LAS+1
- Uses heuristics to infer loss quickly and resend to avoid timeouts ACK 100, ACK 100, ACK 100,
▪ “Three duplicate ACKs” treated as loss ACK 100 200-299 200-399 200-499
26
A Simple TCP Example
Pkt 1: A sends a SYN to B with seq#
0 and ack# 0
Pkt 2: B starts its seq# of 0 and acknowledges
A’s seq # by adding a 1
bytes. Its seq# starts at 101. The Pkt 7: B responds to the request with a 1000 byte
next expected seq# should be 151. It packet. The starting seq is 201, so the next
acknowledges the 200 bytes sent by expected seq# is 1201. It receives 50 bytes from
B by sending an ack# of 201 A, so it acks with 151.
27
Problem of Sliding Window LFS = last frame sent, LAR = last ack received
Arg …
• Sliding window uses pipelining to keep the network busy
- What if the receiver is overloaded? Streaming video
Big Iron Mobile
Sliding
• Sliding window – Receiver Window W=5
seq. number
- Consider receiver with W buffers .. Finished
5 6 7 5 Acceptable
5 5 5 5 2 Too
3 high
.. 3 ..
▪ LAS = last ack sent, app pulls in-order data from buffer with recv() call LAS
W=5 Acceptable
- Suppose the next two segments arrive but app does not call recv() .. 5 6 7 Acked
Finished 4
5 4
5 5 5 5 2 Too
3 high
.. 3 ..
▪ LAS rises, but we can’t slide window! LAS
Nothing
W=5
Acceptable
- If further segments arrive (even in order) we can fill the buffer .. Finished 4 4 4Acked
5 6 7 Acked 4 4 2 Too
3 high 3 ..
▪ Must drop segments until app recvs! LAS
W=5 Acceptable
- App recv() takes two segments
.. 5 Finished
6 7 5 4Acked
4 4 2 3 Too ..
.. 3high
▪ Window slides
LAS
Need a mechanism to inform the sender regarding receive buffer 28
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
29
Flow Control
• Solution: Adding flow control to the sliding window algorithm W=5 Acceptable
- To slow the over-enthusiastic sender 5Finished
6 7 Acked
5 4
4 5 5 5 5 2 Too
3 high
.. 3
.. ..
• Avoid loss at receiver by telling sender the available buffer LAS
seq. number
WIN=3
space
- WIN = #Acceptable, not W (from LAS) .. 5Finished
6 7 Acked
5 4 5
4 5 5 2 Too
3 high
.. 3 ..
• Sender uses the lower of the sliding window and flow control LAS
seq. number
30
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
31
Retransmission Timeouts
• How to set the timeout for sending a retransmission Lost?
- Adapting to the network path
• Retransmissions Network
- With sliding window, the strategy for detecting loss is the timeout
Retransmit!
▪ Set timer when a segment is sent
▪ Cancel timer when ack is received
BCN → SEA → BCN
▪ If timer fires, retransmit data as lost
1000
• Timeout Problem 900 Variation due to
800
Svar of RTT
RTT (ms)
600
- Update estimates with a moving average
400 SRTT
▪ SRTTN+1 = 0.9 ✕ SRTTN + 0.1 ✕ RTT N+1
• Simple to compute, does a good job of tracking actual RTT 800 Timeout
RTT (ms)
(SRTT + 4 ✕ Svar)
- Little “headroom” to lower 600
0
0 50 100 150 200
Seconds
33
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
What’s the hold up?
- Flow control
- Retransmission timers Network
- Congestion control
• Quick UDP Internet Connection (QUIC)
34
Congestion Overview
• More fun in the Transport Layer!
- The mystery of congestion control
- Depends on the Network layer too
• Understanding congestion, a “traffic jam” in the network
- Later we will learn how to control it
• Topics
- Nature of congestion
- Fairness of Bandwidth Allocation
- AIMD Control Law
- TCP Congestion Control history
- ACK Clocking
- TCP Slow-start
- TCP Fast Retransmit/Recovery
- Congestion Avoidance (ECN)
35
Nature of Congestion
• Routers/switches have internal buffering for contention Input Output
...
...
• Queues help by absorbing bursts when input > output rate
• But if input > output rate persistently, queue will overflow Input Buffer Fabric Output Buffer
Router
- http://www.ccs-labs.org/teaching/rn/animations/queue/
Router
• Congestion is a function of the traffic patterns – can occur even if =
every link have the same capacity
• Effects of Congestion (FIFO) Queue
Queued
Packets
is no congestion network
- Network is distributed; no single party has an
• Fair means every sender gets a reasonable
overall picture of its state
share the network
• Solution context:
• Key observation:
- Senders adapt concurrently based on their own
- In an effective solution, Transport and Network
view of the network
layers must work together
- Design this adaption so the network usage as a
• Network layer witnesses congestion whole is efficient and fair
- Only it can provide direct feedback
- Adaption is continuous since offered loads
• Transport layer causes congestion continue to change over time
- Only it can reduce offered load
37
Fairness of Bandwidth Allocation
• What’s a “fair” bandwidth allocation?
• Recall
- We want a good bandwidth allocation to be fair and efficient
▪ Now we learn what fair means A B C
1 1
- Caveat: in practice, efficiency is more important than fairness
• Efficiency vs. Fairness
- Cannot always have both!
- If we care about fairness:
▪ Example network with traffic A → B, B → C and A → C
▪ Give equal bandwidth to each flow
▪ How much traffic can we carry?
▪ A → B: ½ unit, B → C: ½, and A → C: ½
• The Slippery Notion of Fairness ▪ Total traffic carried is 1 ½ units
- Is “equal per flow” fair anyway?
- If we care about efficiency:
▪ A → C uses more network resources (two links) than A → B or B → C
▪ Maximize total traffic in network
▪ Host A sends two flows, B sends one
▪ A → B: 1 unit, B → C: 1, and A → C: 0
- Another idea: maximizing the minimum flow ▪ Total traffic carried is 2 units
- Not productive to seek exact fairness
▪ More important to avoid starvation; “Equal per flow” is good enough 38
Generalizing “Equal per Flow”
• Bottleneck for a flow of traffic is the link that limits its bandwidth
- Where congestion occurs for the flow A B C
1 10
- For A → C, link A – B is the bottleneck
• Flows may have different bottlenecks Bottleneck
- For A → C, link A – B is the bottleneck
- For B → C, link B – C is the bottleneck
- Can no longer divide links equally …
39
Max-Min Fairness
• Intuitively, flows bottlenecked on a link get • Example: network
an equal share of that link with 4 flows, links
• Max-min fair allocation is one that: equal bandwidth
- Increasing the rate of one flow will decrease When rate=1/3, flows B, C,
the rate of a smaller flow and D bottleneck R4 – R5. Bottleneck
Time
41
Bandwidth Allocation Models
• Want to allocate capacity to senders, but how?
• Open loop versus closed loop
- Open: reserve bandwidth before use
- Closed: use feedback to adjust rates
• Host versus Network support
- Who sets/enforces allocations?
• Window versus Rate based
- How is allocation expressed?
• We’ll look at closed-loop, host-driven, and window-based approach which TCP adopts
• Network layer returns feedback on current allocation to senders
AIMD!
- At least tells if (not when) there is congestion
• Transport layer adjusts sender’s behavior via window in response
- How senders adapt is a control law Sawtooth
45
Van Jacobson (1950 – )
• Widely credited with saving the Internet from congestion
collapse in the late 80s
- Introduced congestion control principles
- Practical solutions (TCP Tahoe/Reno)
- V. Jacobson and M. J. Karels, Congestion Avoidance and
Control, ACM SIGCOMM 1988
▪ https://web.stanford.edu/class/cs244/papers/CongestionControl.pdf
• Much other pioneering work:
- Tools like traceroute, tcpdump, pathchar
- IP header compression, multicast tools
46
TCP Tahoe/Reno
• Avoid congestion collapse without changing routers • TCP behaviors we will study:
(or even receivers) - ACK clocking
• Idea is to fix timeouts and introduce a congestion window - Adaptive timeout (mean and variance)
- Slow-start
(cwnd) over the sliding window to limit queues/loss
- Fast Retransmission
• TCP Tahoe/Reno implements AIMD by adapting cwnd
- Fast Recovery
using packet loss as the network feedback signal
• Together, they implement AIMD
TCP Reno Router support ECN Background TCP LEDBAT
TCP/IP “flag day” (Jacobson, ‘90) (Floyd, ‘94) (IETF ’08)
3-way handshake Delay TCP Vegas
(BSD Unix 4.2, ‘83) Compound TCP
(Tomlinson, ‘75) based (Brakmo, ‘93)
TCP and IP
TCP Tahoe (Windows, ’07)
TCP with SACK FAST TCP
(Jacobson, ’88)
(RFC 791/793, ‘81) TCP Reno (Floyd, ‘96) (Low et al., ’04) TCP CUBIC
Origins of “TCP”
(Jacobson, ‘90) TCP New Reno
(Cerf & Kahn, ’74) Congestion collapse TCP BIC (Linux, ’06)
Observed,1988
‘86 (Hoe, ‘95)
(Linux, ’04)
1970 1975 1980 1985 1990 1990 1995 2000 2005 2010
... ...
Pre-history Congestion control Classic congestion control Diversification
47
TCP ACK Clocking
• The self-clocking behavior of sliding windows, and how it is used by TCP
• Recall
- We want TCP to follow an AIMD control law for a good allocation
- Sender uses a congestion window or cwnd to set its rate (≈cwnd/RTT)
- Sender uses packet loss as the network congestion signal
- Need TCP to work across a very large range of rates and RTTs
• TCP Startup Problem
- We want to quickly near the right rate, cwndIDEAL, but it varies greatly
▪ Fixed sliding window doesn’t adapt and is rough on the network (loss!)
▪ AI with small bursts adapts cwnd gently to the network, but might take a long time to become efficient
49
Slow-Start Solution
• Start by doubling cwnd every RTT Window
(cwnd) Fixed
- Exponential growth (1, 2, 4, 8, 16, …)
- Start slow, quickly reach large values Slow-start
• Eventually packet loss will occur when the AI
51
TCP Tahoe (Implementation)
• Initial slow-start (doubling) phase • Timeout misfortunes
- Start with cwnd = 1 (or small value) - Why do a slow-start after timeout?
- cwnd += 1 packet per ACK ▪ Instead of MD cwnd (for AIMD)
• Later Additive Increase phase - Timeouts are sufficiently long that the ACK
- cwnd += 1/cwnd packets per ACK clock will have run down
▪ Slow-start ramps up the ACK clock
▪ Roughly adds 1 packet per RTT
- We need to detect loss before a timeout to get
• Switching threshold
to full AIMD
- Switch to AI when cwnd > ssthresh
▪ Done in TCP Reno
- Set ssthresh = cwnd/2 after loss
- Begin with slow-start after timeout
52
TCP Fast Retransmit / Fast Recovery
• How TCP implements AIMD, part 2
- “Fast retransmit” and “fast recovery” are the MD portion of AIMD AIMD sawtooth
• Recall
- We want TCP to follow an AIMD control law for a good allocation
- Sender uses a congestion window or cwnd to set its rate (≈cwnd/RTT)
- Sender uses slow-start to ramp up the ACK clock, followed by Additive Increase
- But after a timeout, sender slow-starts again with cwnd = 1 (as it no ACK clock)
• Inferring loss from ACKs
- TCP uses a cumulative ACK
▪ Carries highest in-order seq. number
▪ Normally a steady advance
- Duplicate ACKs give us hints about what data hasn’t arrived
▪ Tell us some new data did arrive, but it was not next segment, thus the next segment may be lost
53
Fast Retransmit
• Treat three duplicate ACKs as a loss
- Retransmit next expected segment
- Some repetition allows for reordering, but still detects loss quickly Ack 1 2 3 4 5 5 5 5 5 5
Sender Receiver
... ...
Ack 10
Ack 11
Data 14 was lost
Ack 12
Ack 13
earlier, but got 15 to 20
Ack 13
Data 20
Third duplicate Ack 13
ACK, so send 14 Ack 13
Ack 13
Data 14
Retransmission fills
... ... in the hole at 14
ACK jumps after
Ack 20
loss is repaired ... ...
55
TCP Reno
• TCP Reno combines slow-start, fast retransmit and fast recovery
- Multiplicative Decrease is ½
TCP sawtooth
ACK clock
running
MD of ½ ,
no slow-start
57
Congestion Avoidance vs. Control
• Classic TCP drives the network into congestion and then recovers
- Needs to see loss to slow down
• Would be better to use the network but avoid congestion altogether!
- Reduces loss and delay
• But how can we do this? – Feedback Signals
- Delay and router signals can let us avoid congestion
58
Explicit Congestion Notification (ECN)
• Router detects the onset of congestion via its queue • Advantages:
- When congested, it marks affected packets (IP header) - Routers deliver clear signal to hosts
• Marked packets arrive at receiver - Congestion is detected early, no loss
- TCP receiver informs TCP sender of the congestion - No extra packets need to be sent
• Disadvantages:
- Routers and hosts must be upgraded
59
A little bit dive into the Linux TCP/IP
• Send • Receive
Inter-Frame Gap
More readings:
• Understanding TCP/IP Network Stack & Writing Network Apps
Optional material • The performance analysis of linux networking – Packet receiving 60
Modern TCP variants
• CUBIC
- Cubic Increase Increasing to probe
for the optimal rate
▪ Window expanded based on elapsed time from
last congestion
▪ More aggressive than Reno, optimized for high
bandwidth-delay product (BDP)
- Multiplicative Decrease
▪ Loss-based congestion detection Plateauing to the last
▪ Can be misguided by random loss and may congestion window
cause bufferbloat
- Default TCP for Linux, arguably largest
worldwide deployment
61
Modern TCP variants
• BBR – Bottleneck Bandwidth and Round-trip propagation time
- Limit inflight bytes by RTprop ✕ BtlBw (cwnd)
- Send at rate BtlBw ✕ PacingGain
▪ RTprop: minimum RTT in last 10 seconds
▪ BtlBw: maximum throughput in last 10 RTprop
▪ PacingGain (i.e., 1, 1, 1, 1, 1, 1, 1.25, 0.75 periodically) is used to probe higher bandwidth
▪ Set cwnd to 4 if RTprop has not been updated for 10 seconds, and use their minimum RTT
as new RTprop
- Avoid bufferbloat and ignore random loss
- Developed by Google and available since Linux kernel 4.9
62
Modern TCP variants
63
Topics
• Service Models
- Socket API and ports
- Datagrams, Streams
• User Datagram Protocol (UDP)
- Example: Remote Procedure Call (RPC)
- Example: Real-time Transport Protocol (RTP)
• Transmission Control Protocol (TCP)
- Connections
- Sliding Window
- Flow control
- Retransmission timers
- Congestion control
• Quick UDP Internet Connection (QUIC)
64
Problems of TCP
• TCP does not understand its payload at all
- It provides nothing but a reliable byte stream
• What if we want security?
- Another layer (e.g. TLS) over TCP
- So two handshakes needed, for TCP and TLS respectively
• What if we have multiple requests?
- … over one TCP connection, head-of-line (HOL) problems
- … over multiple TCP connections, need duplicative slow
starts and may burden the server
• TCP is implemented in the kernel
- Difficult to evolve
- Not under the control of applications
Optional material 65
Quick UDP Internet Connection (QUIC)
• Based on UDP
- But provides reliability and connections
• Built inside applications
- But conceptually at transport layer
- Easy to evolve
• Born with security
- No another security layer needed
- Unified handshake, faster connection establishment
• Multiple streams
- Independent, no HOL problems
• Connection IDs
- Keep the connection between Wi-Fi/LTE networks
• Proposed by Google in 2012, finally standardized as RFC 9000 in May, 2021
Optional material 66
A quick look at QUIC protocol
• Multiple frames in one packet, multiple packets in one UDP datagram
• Some packet types:
- Initial packet: Used to initiate client/server handshake
- 0-RTT packet: Used to carry data before client/server handshake
- Handshake packet: Used to perform client/server handshake
- 1-RTT packet: Used to carry data during or after client/server handshake
- Retry packet: Used to validate client address at the beginning of client/server handshake
• Example frame types:
- CRYPTO frame: Used to transmit cryptographic handshake messages
- HANDSHAKE_DONE frame: Used to signal confirmation of the handshake to the client
- STREAM frame: Used to create a stream and/or carry stream data
- ACK frame: Used to inform senders of packets they have received and processed
Optional material 67
A quick look at QUIC protocol
• Example connection establishment
Client Server
Initial[0]: CRYPTO[CH]
0-RTT[0]: STREAM[0, "..."] ->
Initial[1]: ACK[0]
Handshake[0]: CRYPTO[FIN], ACK[0]
1-RTT[1]: STREAM[0, "..."] ACK[0] ->
Handshake[1]: ACK[0]
<- 1-RTT[1]: HANDSHAKE_DONE, STREAM[3, "..."], ACK[1]
Optional material 68