NPTEL Week8 Programmable Networks
NPTEL Week8 Programmable Networks
Sameer G. Kulkarni
Assistant Professor,
Department of Computer Science and Engineering,
Indian Institute of Technology, Gandhinagar
ADVANCED COMPUTER NETWORKS, OUR LEARNING JOURNEY
“Cleanslate” “RMT”
“4D” “PISA”
“Ethane” “P4”
“OpenFlow” “INT”
Disaggregation Programmable forwarding
2010 SDN NFV Telemetry 2020 2030
-10 years +10 years
Part 1
Network owners take control of their software
Part 2
Network owners take control of packet processing too
v6 Hdr Actions
IPv6 Table
IPv4 Table
ACL Table
L2 Table
Fixed Parser
Fixed Header Processing Pipeline
Switch OS
Driver
NEW
Engineering
Feature Feature Division
Network Software
Network
Equipment Team
Owner Vendor
ASIC
Team
Years Years
Switch OS
Driver
Fixed-function switch
Programmable Networks Advanced Computer Networks 8
NETWORK SYSTEMS STARTING TO BE BUILT “ TOP-DOWN”
“This is precisely how you must
Switch OS
process packets”
Driver
Programmable Switch
Programmable Networks Advanced Computer Networks 9
DOMAIN-SPECIFIC PROCESSORS
Signal M achine
Computers Graphics Processing Learning Networking
Java OpenCL Matlab TensorFlow Language
Compiler Compiler Compiler Compiler
>>> Compiler
? ?
Signal M achine
Computers Graphics Processing Learning Networking
Java OpenCL Matlab TensorFlow P4
Compiler Compiler Compiler Compiler
>>> Compiler
Match+Action
Stage
Memory ALU
Programmable
Parser Programmable Match-Action Pipeline
Memory ALU
Programmable
Parser Programmable Match-Action Pipeline
P4 [CCR ‘14]
Programmable Networks Advanced Computer Networks 14
SDN, PART 2: PROGRAMMABLE FORWARDING
How it gets used
1. Reducing complexity
2. Adding new features to the network
3. Telemetry
P4.org
• Now part of ONF
• Lots of activities and workshops: get involved!
• P4-16 stable. Device independent: Switches, NICs, FPGAs, vSwitches
• P4Runtime part of Stratum, launched this week
A cast of many, led by:
Nate Foster (Cornell), Amin Vahdat (Google), Jennifer Rexford (Princeton), Chang Kim (Barefoot)
Phase 1
2010 Network owners take control of their software 2020 2030
Phase 2
Network owners take control of packet processing too
Ref: Scott Shenker, et.al. The Future of Networking, and the Past of Protocols, Open Network Summit, 2011
v4 Hdr Actions
v6 Hdr Actions
L2 Hdr Actions
IPv4 Table
IPv6 Table
ACL Table
ACL Actions
L2 Table
Fixed Parser
Fixed Header Processing Pipeline
Changing interface:
→ C/Python APIs (auto-generated)
→ P4RunTime (led by Google)
Programmable ACL L3
MPLS Programmable
Parser Deparser
Programmable Match-Action Pipeline
Programmable Networks Advanced Computer Networks 23
NEED FOR FLEXIBLE PACKET FORWARDING
• Modify the set of packet fields
• Network virtualization: new tunneling formats
• Support for new flags
• Disabling existing fields
Match+Action
Stage
Memory ALU
Programmable
Parser Programmable Match-Action Pipeline
• Pipelined structure
• Multiple, identical pipeline stages
• Clocked at a high rate (1 GHz)
Parser Program Header and Data Declarations Tables and Control Flow
parser parse_ethernet { header_type ethernet_t { … } table port_table { … }
extract(ethernet); header_type l2_metadata_t { … }
return switch(ethernet.ethertype) { control ingress {
0x8100 : parse_vlan_tag; apply(port_table);
header ethernet_t ethernet; if (l2_meta.vlan_tags == 0) {
0x0800 : parse_ipv4;
0x8847 : parse_mpls; header vlan_tag_t process_assign_vlan();
default: ingress; vlan_tag[2]; }
} metadata l2_metadata_t l2_meta; }
Memory ALU
Programmable
Parser Networks
Programmable Programmable Match-Action
Advanced ComputerPipeline
Networks 30
BETTER ABSTRACTIONS
• Developers can write programs for networking with a reasonably general “mental
model” of what the switch looks like.
• … without attending to specifics like here are the specific header fields on a
specific table.
• … similar to the sequential model of computing we’re all familiar with (Von
Neumann architecture)
• Flexible packet formats allow you to put useful information directly on the
original packet
• In-band Network Telemetry (INT)
How do we know if a
programmable switch chip has
the same power, performance
and cost as a fixed function
switch chip?
• No “loop” constructs
?????????
X X X X X
Action: permit/deny
Action: set L2D, dec
Queues
ACL Table
L3 Table
L2 Table
In L2 L3 ACL Out
TTL
Stage Stage Stage
Deparser
Parser
Data
▪ Ideal RMT should allow a set of pipeline stages each with a match
table of arbitrary depth and width.
• The parser specifies the header fields of interests, and generates a 4kb header
vector (put the fields in a specific location in the vector).
❑ change header field, add new header
• 32 Match/action stages
• 224 action processors per stage
Area
Section Area % of chip Extra Cost
IO, buffer, queue, CPU, etc 37% 0.0%
Match memory & logic 54.3% 8.0%
VLIW action engine 7.4% 5.5%
Parser + deparser 1.3% 0.7%
Total extra area cost 14.2%
Power
Section Power % of chip Extra Cost
I/O 26.0% 0.0%
Memory leakage 43.7% 4.0%
Logic leakage 7.3% 2.5%
RAM active 2.7% 0.4%
TCAM active 3.5% 0.0%
Logic active 16.8% 5.5%
Total extra power cost 12.4%
Programmable Networks Advanced Computer Networks 56
CONCLUSION
• How do we design a flexible chip?
• The RMT switch model
• Bring processing close to the memories:
• pipeline of many stages
• Bring the processing to the wires:
• 224 action CPUs per stage
• Lots of the details how this is designed in 28nm CMOS are in the paper
• Two steps
• Identify sequence of
headers
• Content-addressable memory
allows us to find matches in
one clock cycle, rather than
fixed-function header
processor
- Match and Action units supplied with the Packet Header Vector
- Each pipeline stage accesses its own local memory
- VLIW: modify each component of PHV if needed
PHV
• Reconfigurability works
• Flexible logic is cheap (1—2% of chip area)
• Memory is more expensive, but overheads are worth the cost
74
Programmable Networks Advanced Computer Networks 74
WE CAN DO THIS!
• Target independence
• Program without knowledge of switch details
• Rely on compiler to configure the target switch
• Reconfigurability
• Change parsing and processing in the field
Installing and
querying rules
Target Switch
78
Programmable Networks Advanced Computer Networks 78
“OPENFLOW 2.0”
Target Switch
79
Programmable Networks Advanced Computer Networks 79
P4 LANGUAGE
80
Programmable Networks Advanced Computer Networks 80
ABSTRACT FORWARDING MODEL (TARGET ABSTRACTION)
❑ Parsers: A parser definition specifies how to identify headers and valid header sequence
within packets
❑ Tables: Match+action tables are the mechanism for performing packet processing. P4
program defines the fields on which a table may match and the actions it may execute.
❑ Control programs: the control program determines the order of match-action tables that
are applied to a packet, describing the flow of control between match+action tables.
up2 down1
up1 down2
ToR ToR
header ethernet {
fields {
dst_addr : 48;
src_addr : 48; header vlan {
ethertype : 16; fields {
} pcp : 3;
} cfi : 1; header mTag {
vid : 12; fields {
ethertype : 16; up1 : 8;
} up2 : 8;
} down1 : 8;
down2 : 8;
ethertype : 16;
}
}
copy_field(mTag.ethertype, vlan.ethertype);
set_field(vlan.ethertype, 0xaaaa);
set_field(mTag.up1, up1);
set_field(mTag.up2, up2);
set_field(mTag.down1, down1);
set_field(mTag.down2, down2);
set_field(metadata.outport, outport);
}
87
Programmable Networks Advanced Computer Networks 87
CONTROL FLOW
88
Programmable Networks Advanced Computer Networks 88
CONTROL FLOW
• Flow of control from one table to the next
• Collection of functions, conditionals, and tables
• Simple imperative representation
control main() {
table(source_check);
if (!defined(metadata.ingress_error)) {
table(local_switching);
if (!defined(metadata.outport)) {
table(mTag_table);
}
table(egress_check);
}
}
89
Programmable Networks Advanced Computer Networks 89
P4 COMPILATION
90
Programmable Networks Advanced Computer Networks 90
P4 COMPILER
• Parser
• Programmable parser: translate to state machine
• Fixed parser: verify the description is consistent
• Control program
• Target-independent: table graph of dependencies
• Target-dependent: mapping to switch resources
• Rule translation
• Verify that rules agree with the (logical) table types
• Translate the rules to the physical tables
control main() {
table(source_check);
if (!defined(metadata.ingress_error)) {
table(local_switching);
if (!defined(metadata.outport)) {
table(mTag_table);
}
table(egress_check);
}
}
94
Programmable Networks Advanced Computer Networks 94
TELEMETRY
# Rule
1
2 “In Switch 1, I followed rules 75 and
3 250. In Switch 9, I followed rules 3
… and 80. ”
75 192.168.0/24
2 “Which rules did my packet follow?”
Queue
Time
Programmable Networks Advanced Computer Networks 97
3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”
Aggressor flow!
Queue
Time
Programmable Networks Advanced Computer Networks 98
TODAY, BASIC INFORMATION IS HARD TO FIND
“Which
1 path did my packet take?”
“Which
2 rules did my packet follow?”
“How
3 long did it queue at each switch?”
“Who
4 did it share the queues with?”
With P4 + INT we can answer all four questions for the first
time. At full line rate. Without generating additional packets.
Programmable Networks Advanced Computer Networks 99
INT: IN-BAND NETWORK TELEMETRY
Original Packet
Log, Analyze
Replay and Visualize
Programmable Networks Advanced Computer Networks 100
Software Defined Network (SDN)
Control: Generate and Verify control code
Control
Programs
Control
Programs
Code Observe
Measure and Validate
Abstract Network View Control code
Network Virtualization
Global Network View
Network OS
State Observe
Measure and Validate
Partition, Generate, Verify, Download State
Packet INT
Forwarding Packet INT
Packet
Forwarding
INT Observe
Forwarding Packet INT Measure and Validate
Packet Forwarding Packets
Forwarding INT
NETWORK INTERFACE CARD (NIC)
NIC: Hardware component that connects a computing
device to network. Usually onboard or connected as a
discreet card on Motherboard. Wired and WiFi
Network. Sits on Layer-2 of OSI model working on
MAC. [10/100/1000 Mbps]
SoC
Images taken from Internet
Programmable Networks Advanced Computer Networks 102
TECHNOLOGICAL CONSTRAINTS: MOORE’S LAW
• Processors aren’t clocked faster any more (Dennard scaling)
• Soon, can no longer pack more transistors in the same area (feature size limits)
• Implication (2): Need to re-design applications or the hardware from the ground
up
• Other accelerators:
• GPUs
• TPUs
• Matrix computation accelerators in the research realm
• Problems:
SmartNIC
Traffic manager
TX/RX ports
Host cores
NIC cores
SmartNIC
Traffic manager
TX/RX ports
Host cores
NIC cores
SmartNIC
Traffic manager
TX/RX ports
Host cores
NIC cores
Host cores
Host cores
Host cores
○ System-On-Chip
○ Functional Offload Coprocessors (FOCP)
○ Runs its own OS that is separate from the host
○ DPU
Programmable Networks Advanced Computer Networks 132
BLOCK DIAGRAM OF NIC | SMART NIC | DPU
NIC CPU
SmartNIC Traditional Storage and Networking | High throughput
and Low Latency
vSwitch, Security
● DPU is a system on a chip (SoC) that allows for high performance network interfaces
able to process data at much faster rates.
Communication
Schedule and Complete
Offload Communication
NIC Communication
DPU
0 25 50 75 100 0 25 50 75 100
Application Runtime Application Runtime
P4-OVS P4-OVS
Stratum OS Stratum OS Stratum OS
NIC NIC
P4 NIC P4 switch P4 switch P4 switch P4 NIC
Early research tools (p4v)
Verifiable Closed-Loop Control
Trellis Control Control Control Control
App App App App Generation &
Verification
ONOS Control Plane
P4Runtime Contract
P4-OVS P4-OVS
Stratum OS Stratum OS Stratum OS
NIC NIC
P4 NIC P4 switch P4 switch P4 switch P4 NIC
“Cleanslate” “RMT”
“4D” “PISA”
“Ethane” “P4”
“OpenFlow” “INT”
Disaggregation Programmable forwarding
2010 SDN NFV Telemetry 2020 2030
-10 years +10 years
Phase 1 Phase 3
Network owners take control of their software Networks managed by verifiable closed loop control
Phase 2
Network owners take control of packet processing too
1 2 3
Microbursts (µbursts)
146
Detecting & characterizing µbursts is hard
• Commercial Solutions
• Can detect the occurrence of microbursts
• Provide no information about the cause
147
Solution:
• Key Insight:
Egress
Port µbursts are localized to a
Queues switch’s egress port queue
Switch’s Queuing Engine
Key Idea:
◦ We can detect the microburst directly on the
switch where it happens
148
BurstRadar Overview
Queuing Telemetry Markbit Egress Ports
(metadata) (metadata)
Egress Port
Queues Egress Deparser
149
BurstRadar Overview
Egress Ports
Courier Packet
Egress Port
Queues Egress Deparser
Mirror Port Queue
150
BurstRadar Overview
Egress Ports
Courier Packet
Egress Port
Queues Egress Deparser
Mirror Port Queue
151
BurstRadar Overview
Telemetry Info:
- Pkt 5-tuple Egress Ports
Courier Packet - Queuing telemetry data
Egress Port
Queues Egress Deparser
Mirror Port Queue Mirror Port
152
BurstRadar Overview
Egress Ports
Egress Port
Queues Egress Deparser
Mirror Port
153
Evaluation Setup
• Hardware Testbed
ackets rocessed
10
IN
urst adar
Oracle
1
05 20 40 0 80 100
atency Increase olerance hreshold
155
Precise Time-synchronization using
Programmable Switching ASICs
ACM SOSR 2019 (Best Paper)
NTP milliseconds
PTP 10s of ns to us
Server Server
158
159
Line-rate traffic along the direction of the response packet.
160
Conclusion
• Two applications that exploit data plane programmability to
demonstrate the potential of modern programmable ASICs
• BurstRadar: characterize microbursts at multi-gigabit line rates
in high-speed datacenter networks.
• DPTP: precise time synchronization protocol running in the
network data-plane.
161
VIRTUAL MACHINE DEVICE QUEUES
• VMDq is a clean interface for kernel bypass.
• Adds Classification and queueing per VM within the NIC (done through Rx and Tx
Queues),
1
6
Programmable Networks Advanced Computer Networks 162
VM NETWORKING TECHNIQUES
1
6
Programmable Networks Advanced Computer Networks 163
SERVER BASED NETWORKING
1
6
Programmable Networks Advanced Computer Networks 164
THE NEED FOR DISAGGREGATION
# Rule
1
2
“In Switch 1, I followed rules 75 and 250.
3
In Switch 9, I followed rules 3 and 80. ”
…
75 192.168.0/24 2 “Which rules did my packet follow?”
…
3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”
Queue
Time
3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”
Aggressor flow!
Queue
Time
Today, basic information is hard to find
With P4 + INT we can answer all four questions for the first time.
At full line rate. Without generating additional packets.
INT: In-band Network Telemetry
Original Packet
Log, Analyze
Replay and Visualize
+ SONATA [Sigcomm ‘18], Sketches [Sigcomm ‘12] …
Viewing Microbursts (to the nanosecond)
1. The ability to observe
packets, network state and
code, in real-time.
2. The ability to generate new control and
forwarding behaviors, on the fly, to
correct errors.
Three pieces 3. The ability to verify newly generated
code and deploy it quickly.
Header Space Analysis
T 2 (h, p)
2
3
T 1 (h, p) T 3 (h, p)
1
4
T 4 (h, p)
3
T 1 (h, p) T 3 (h, p)
T1 (X , Pin )
1
T4 (T1 (X , Pin ))
4
T 4 (h, p)
1. The ability to observe
packets, network state and
code, in real-time.
Control Control
Programs Programs
Packet
Forwarding Packet
Forwarding
Packet
Forwarding Packet
Packet Forwarding
Forwarding
Phase 1 Phase 3
010 0 Networks managed by verifiable closed loop control 203
202
Network owners take control of their software
0
Phase 2
Network owners take control of packet processing too
RETHINKING RELAYERING WITH NFV
Compute Domain
GBP GBP
EPG1 SFF EPG2
OVS OVS
Legend
VxLAN tunnel SF/SFF Top Of Rack Switch
OpenFlow 1.3/OVSDB
GBP creates VxLAN tunnel
Original packets, no encap
GBP EPG: Group Based Policy, End Point Group
Used as Classifier in OPNFV
OPNFV SFC Brahmaputra Target Use Case
Block SF SF Block
HTTP
Firewall Firewall SSH
ODL SFC
Simple
1. Update/create chains
Test Cases HTTP
Server
1) Can NOT do HTTP
2) Can do SSH 2. Subscriber
classification
3) Can do HTTP rules
4) Can NOT do SSH Classifier SFF
Legend:
SDN network
RSP1
SFF: Service Function Forwarder RSP2
SF: Service Function
RSP: Rendered Service Path, a Service Chain
NSH Overview
• Describes a dataplane header used to carry
information along a service path.
– Identifier for service path selection
– Opaque mandatory metadata fields
– Optional TLVs
• Creates “service plane”
– Transport independent (NSH in VXLAN, NSH in
MPLS, NSH in UDP, etc.)
– Service layer OAM
Implementation Update
• Opensource implementations
– OVS dataplane (with VXLAN)
– OpenDaylight control plane (+ LISP)
• Several vendor specific implementations
• Early deployments underway
Base Header
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver|O|C|R|R|R|R|R|R| Length | MD Type | Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+