0% found this document useful (0 votes)
35 views196 pages

NPTEL Week8 Programmable Networks

Uploaded by

20je1110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views196 pages

NPTEL Week8 Programmable Networks

Uploaded by

20je1110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 196

Advanced Computer Networks

Sameer G. Kulkarni
Assistant Professor,
Department of Computer Science and Engineering,
Indian Institute of Technology, Gandhinagar
ADVANCED COMPUTER NETWORKS, OUR LEARNING JOURNEY

Traditional Networking: Basics of communications; Principles of networking,


Network design philosophy, networking stack, What is E2E? Issues with the
Networking Principles and architecture. Internet Impasse! and Ossification.

Modern day Networking: Network Virtualization, Software Defined


Networking, Network Function Virtualization and use cases.

Future Networks: Softwarized & Programmable Networks: P4-Programming,


Protocol-Independent Packet Processors, In-band network telemetry, Green &
Sustainable Data Centers, Serverless Computing, Zero-trust and Blockchain.

Programmable Networks Advanced Computer Networks 2


Network Softwarization
• Learning Objectives:
• Network Softwarization – the means to not just de-ossify, but redefine the networking!
• Four main pillars of network softwarization:
• Network virtualization & Overlay Networks
• Software Defined Networking
• Network Function Virtualization
• Programmable Networks (data plane)
• Background – Active Networks!
• Introduction – SDN 2.0, In-network Computing, COIN WG,
• Technology Enablers – RMT, PISA, P4,
• Programmable Switches, SmartNICS, DPUs
• In-band Network Telemetry (INT)
• Towards SDN 3.0
• Network Slicing
Programmable Networks Advanced Computer Networks 3
THE INTERNET ARCHITECTURE IS EVOLVING FASTER THAN EVER
Network Ossification:
“closed and proprietary”
“proliferation of standards”
“barrier to entry”
“stranglehold by vendors”

“Cleanslate” “RMT”
“4D” “PISA”
“Ethane” “P4”
“OpenFlow” “INT”
Disaggregation Programmable forwarding
2010 SDN NFV Telemetry 2020 2030
-10 years +10 years

Part 1
Network owners take control of their software

Part 2
Network owners take control of packet processing too

Programmable Networks Advanced Computer Networks 4


SWITCH WITH FIXED FUNCTION PIPELINE

v6 Hdr Actions
IPv6 Table
IPv4 Table

ACL Table
L2 Table

Fixed Parser
Fixed Header Processing Pipeline

Programmable Networks Advanced Computer Networks 5


FIXED FUNCTION PIPELINE

OSPF BGP New etc.

Switch OS
Driver

Programmable Networks Advanced Computer Networks 6


PRE-SDN STATE OF THE NETWORKINDUSTRY

NEW

Engineering
Feature Feature Division

Network Software
Network
Equipment Team
Owner Vendor
ASIC
Team
Years Years

Programmable Networks Advanced Computer Networks 7


NETWORK SYSTEMS WERE BUILT “BOTTOM-UP”

Switch OS
Driver

“This is how I process packets …”

Fixed-function switch
Programmable Networks Advanced Computer Networks 8
NETWORK SYSTEMS STARTING TO BE BUILT “ TOP-DOWN”
“This is precisely how you must
Switch OS
process packets”
Driver

Programmable Switch
Programmable Networks Advanced Computer Networks 9
DOMAIN-SPECIFIC PROCESSORS

Signal M achine
Computers Graphics Processing Learning Networking
Java OpenCL Matlab TensorFlow Language
Compiler Compiler Compiler Compiler
>>> Compiler

? ?

CPU GPU DSP TPU

Programmable Networks Advanced Computer Networks 10


DOMAIN-SPECIFIC PROCESSORS

Signal M achine
Computers Graphics Processing Learning Networking
Java OpenCL Matlab TensorFlow P4
Compiler Compiler Compiler Compiler
>>> Compiler

CPU GPU DSP TPU PISA


(Protocol-Independent
Switch Architecture)
Programmable Networks Advanced Computer Networks 11
PISA: PROTOCOL INDEPENDENT SWITCH ARCHITECTURE

Match+Action
Stage
Memory ALU

Programmable
Parser Programmable Match-Action Pipeline

Generalization of RMT [Sigcomm’13]

Programmable Networks Advanced Computer Networks 12


PISA: PROTOCOL INDEPENDENT SWITCH ARCHITECTURE

Programmable Networks Advanced Computer Networks 13


EXAMPLE P4 PROGRAM
Parser Program Header and Data Declarations Tables and Control Flow
parser parse_ethernet { header_type ethernet_t { … } table port_table { … }
extract(ethernet); header_type l2_metadata_t { … }
return switch(ethernet.ethertype) { control ingress {
0x8100 : parse_vlan_tag; apply(port_table);
header ethernet_t ethernet; if (l2_meta.vlan_tags == 0) {
0x0800 : parse_ipv4;
0x8847 : parse_mpls; header vlan_tag_t process_assign_vlan();
default: ingress; vlan_tag[2]; }
} metadata l2_metadata_t l2_meta; }

Memory ALU

Programmable
Parser Programmable Match-Action Pipeline

P4 [CCR ‘14]
Programmable Networks Advanced Computer Networks 14
SDN, PART 2: PROGRAMMABLE FORWARDING
How it gets used
1. Reducing complexity
2. Adding new features to the network
3. Telemetry
P4.org
• Now part of ONF
• Lots of activities and workshops: get involved!
• P4-16 stable. Device independent: Switches, NICs, FPGAs, vSwitches
• P4Runtime part of Stratum, launched this week
A cast of many, led by:

Nate Foster (Cornell), Amin Vahdat (Google), Jennifer Rexford (Princeton), Chang Kim (Barefoot)

Programmable Networks Advanced Computer Networks 15


PROGRAMMABLE NETWORKING: CONTROL AND DATA PLANES

The network (switch, router, NIC, FW,…) is now a programmable platform.


Top down, including the control plane and the forwarding plane.

Phase 1
2010 Network owners take control of their software 2020 2030
Phase 2
Network owners take control of packet processing too

Programmable Networks Advanced Computer Networks 16


MOTIVATION FOR SOFTWARE DEFINED NETWORKING (SDN)
• Networks:
• Notoriously difficult to manage
• Evolves very slowly
Abstraction is the key to extracting simplicity: easier to
write, maintain and reason about the programs that manage
and control the network

Ref: Scott Shenker, et.al. The Future of Networking, and the Past of Protocols, Open Network Summit, 2011

Programmable Networks Advanced Computer Networks 17


IN THE PRE-SDN ERA…

CLI/custom script interface


--> Device/vendor specific

“Closed” Control Plane


ASIC APIs (closed)

Parser VLAN ACL L2/MAC L3 Deparser


Match-Action Pipeline
Programmable Networks Advanced Computer Networks 18
BEFORE OPENFLOW …

• Open Signaling (1990s)


• Make network control functions more open, extensible, and programmable
• Separation between hardware and control software
• Access to the network hardware via open programmable network interfaces.

• Focuses on connection-oriented network services in the early days


• IETF RFC3294 (2003) General Switch Management Protocol (GSMP)
• IEEE P1520 (1998) standards initiative for programmable network interfaces

Programmable Networks Advanced Computer Networks 19


2008-09: SDN ERA (OPENFLOW/SDN 1.0)

Open, vendor-agnostic interface:


→ Easier network management
→ Centralized control
→ Code reuse/interoperatibility
“Closed” Control Plane
ASIC APIs (closed)

Parser VLAN ACL L2/MAC L3 Deparser


Match-Action Pipeline
Programmable Networks Advanced Computer Networks 20
FIXED-FUNCTION PIPELINE (MMT)

v4 Hdr Actions

v6 Hdr Actions
L2 Hdr Actions

IPv4 Table

IPv6 Table

ACL Table

ACL Actions
L2 Table

Fixed Parser
Fixed Header Processing Pipeline

Programmable Networks Advanced Computer Networks 21


2008-09: SDN ERA (OPENFLOW/SDN 1.0)

Open, vendor-agnostic interface:


→ Easier network management
→ Centralized control
→ Code reuse/interoperatibility
“Closed” Control Plane
It didn’t change the core network functionality!
ASIC APIs (closed)
(But, Control plane became a bit more programmable)

Parser VLAN ACL L2/MAC L3 Deparser


Match-Action Pipeline
Programmable Networks Advanced Computer Networks 22
2013-14: PROGRAMMABLE SWITCHES (SDN 2.0)

Changing interface:
→ C/Python APIs (auto-generated)
→ P4RunTime (led by Google)

“Open” Control Plane


ASIC APIs (closed/licensed)

Programmable ACL L3
MPLS Programmable
Parser Deparser
Programmable Match-Action Pipeline
Programmable Networks Advanced Computer Networks 23
NEED FOR FLEXIBLE PACKET FORWARDING
• Modify the set of packet fields
• Network virtualization: new tunneling formats
• Support for new flags
• Disabling existing fields

• Use tables more flexibly


• Different environments require tables of different shapes and sizes
• Enterprises: ACL-heavy
• Core: IPv4-heavy
• Resource wastage: can’t use another table’s hardware

Programmable Networks Advanced Computer Networks 24


WHY OPENFLOW ISN’T ENOUGH
• In the beginning, OpenFlow was simple: Match-Action
• Single rule table on a fixed set of fields (12 fields in OF 1.0)

• Needed new encapsulation formats, different versions of protocols, additional


measurement-related headers

• Number of headers ballooned to 44 in OF 1.5 specification!


• With multiple stages of heterogenous tables

Programmable Networks Advanced Computer Networks 25


PROTOCOL INDEPENDENT SWITCH ARCH. (PISA)

Match+Action
Stage
Memory ALU

Programmable
Parser Programmable Match-Action Pipeline

Programmable Networks Advanced Computer Networks 26


BASIC FEATURES OF THE RMT DESIGN
• Programmable packet parser

• Pipelined structure
• Multiple, identical pipeline stages
• Clocked at a high rate (1 GHz)

• Match and action logic are separate


• And separate from statistics (counters) memory

• Can keep state on the switch in stage-local memory

Programmable Networks Advanced Computer Networks 27


BENEFITS OF DATAPLANE PROGRAMMABILITY

• Flexible Parsing and matching on non-standard fields:


• Faster and easier network evolution: new protocols/headers
• Traditionally a new protocol addition takes 4-5 years!!
• Hardware upgrades → software upgrades: protection on investment
• H/w goes beyond this
• Exposing other datapath processing primitives (existing + new)
• Accessible and programmable via P4 (high-level DSL)
• Realize new functions (not fully arbitrary) in the datapath
• Researchers: Propose an ASIC-level solution for new/existing problems and
readily “realize” it in production hardware

Programmable Networks Advanced Computer Networks 28


EXTRA DATAPLANE PRIMITIVES
• Transactional Memory (SRAM) + Stateful ALUs
• Stateful operations across multiple packets
• Simple computations: add, subtract, approx. multiply/divide
• Queuing Telemetry Information
• Enqueue/dequeue queue depth
• Time spent in the queue
• High-resolution Timestamping
• nanosecond-scale time stamps
• Ingress/egress MAC timestamps, ingress/egress pipeline timestamps, etc.
• Packet cloning/replication
• Flexible mirroring or conditional multicasts (at run time)

Programmable Networks Advanced Computer Networks 29


BETTER ABSTRACTIONS

Parser Program Header and Data Declarations Tables and Control Flow
parser parse_ethernet { header_type ethernet_t { … } table port_table { … }
extract(ethernet); header_type l2_metadata_t { … }
return switch(ethernet.ethertype) { control ingress {
0x8100 : parse_vlan_tag; apply(port_table);
header ethernet_t ethernet; if (l2_meta.vlan_tags == 0) {
0x0800 : parse_ipv4;
0x8847 : parse_mpls; header vlan_tag_t process_assign_vlan();
default: ingress; vlan_tag[2]; }
} metadata l2_metadata_t l2_meta; }

Memory ALU

Programmable
Parser Networks
Programmable Programmable Match-Action
Advanced ComputerPipeline
Networks 30
BETTER ABSTRACTIONS
• Developers can write programs for networking with a reasonably general “mental
model” of what the switch looks like.

• … without attending to specifics like here are the specific header fields on a
specific table.

• … similar to the sequential model of computing we’re all familiar with (Von
Neumann architecture)

Programmable Networks Advanced Computer Networks 31


BETTER TOOLS
• Enable compilers to translate abstractions to actual hardware
• Algorithms to allocate resources
• Implemented once, implemented well

• Enable formal verification of behaviors existing in the network


• Think of the network program is a contract between the control and the data plane

• Brings network software one step closer to “real” software

Programmable Networks Advanced Computer Networks 32


REDUCED COMPLEXITY
• Network operators can remove the features they don’t need
• And the associated bugs
• Avoid wasting time waiting for switch vendors to fix those bugs

• Network operators can add the features they need


• New ideas and software are owned by the operators
• … rather than switch vendors

• More innovation in networking!

Programmable Networks Advanced Computer Networks 33


TELEMETRY
• Debugging correctness and performance issues in a live network is challenging
• Packets not going where they’re supposed to
• Packets experiencing significant queueing or drops

• Measuring networks effectively requires collecting a lot of data


• Recall number of packets every second even on a single 10 Gbit/s network

• Flexible packet formats allow you to put useful information directly on the
original packet
• In-band Network Telemetry (INT)

Programmable Networks Advanced Computer Networks 34


COMPARISON WITH FIXED FUNCTION SWITCHING ASICS

How do we know if a
programmable switch chip has
the same power, performance
and cost as a fixed function
switch chip?

Programmable Networks Advanced Computer Networks 35


COMPARISON WITH FIXED FUNCTION SWITCHING ASICS
P4 Programmable “Tofino” Fixed Function
L2/L3 Throughput 6.4Tb/s 6.4Tb/s
Number of 100G Ports 64 64 Otherwise, both
Availability Yes Yes
systems
Max Forwarding Rate 5.1B packets per sec 4.2B packets per sec
are identical:
Max 25G/10G Ports 256/258 128/130
• # of Ports
• CPU
Programmability Yes (P4) No
• Power Supplies
Typical System Power draw 4.2W per port 5.3W per port
Large Scale NAT Yes (100k) No
Large scale stateful ACL Yes (100k) No
Large Scale Tunnels Yes (192k) No
Packet Buffer Unified Segmented
Segment Rtg/Bare Metal Yes/Yes No/No
LAG/ECMP Hash Algorithm Full entropy, programmable Hash seed, reduced entropy
ECMP 256 way 128 way Source: Nick
Telemetry and Analytics Line-rate per flow stats Sflow (Sampled)
McKeown “SDN
3.0”, ONF connect
Latency Under 400 ns 450 ns
2019
Programmable Networks Advanced Computer Networks 36
LIMITATIONS IN DATAPLANE PROGRAMMABILITY

• High-level constraint: all processing MUST maintain line rate

• No “loop” constructs

• No floating point computations


• Only approximate computations possible

• Single Ported, per-stage SRAM memory


• Single memory entry can be read/updated in one pkt pass

Programmable Networks Advanced Computer Networks 37


SDN FORWARDING ABSTRACTIONS – FUNDAMENTAL RESEARCH PAPERS

• Programmable data plane

1. “Forwarding Metamorphosis: Fast Programmable Match-Action Processing


in Hardware for SDN”, SIGCOMM’2013.

2. “P4: Programming Protocol-Independent Packet Processors”, SIGCOMM


Computer Communications Review, July 2014.

Programmable Networks Advanced Computer Networks 38


FORWARDING METAMORPHOSIS
The problem addressed in this paper:
▪ The current hardware switches are rigid.
❑Conventional switch chips are inflexible

❑SDN demands flexibility… but, sounds expensive…

▪ How do we make them Flexible?


❑How do we do it: The Reconfigurable Match Table (RMT) switch model

▪ What is the cost to make them flexible to support OpenFlow forwarding


abstraction.
❑Flexibility costs less than 15%

Programmable Networks Advanced Computer Networks 39


FLEXIBLE MATCH-ACTION
▪ Single Match table (SMT)
❑ All header in one table
❑ To support flexibility, the table needs to store all combination in the header.
❑ Wasteful

▪ Multiple Match Tables (MMT)


❑ Smaller tables with a subset of headers in a pipeline of stages
❑ Stage j can be made depend on stage i, i<j.
❑ Existing switches has a small (4-8) number of tables whose widths, depths, and execution
order are fixed.

Programmable Networks Advanced Computer Networks 40


A FIXED FUNCTION SWITCH

L2: 128k x 48 L3: 16k x 32 ACL: 4k


Exact match Longest prefix Ternary match
match

?????????
X X X X X

Action: permit/deny
Action: set L2D, dec
Queues

Action: set L2D

ACL Table
L3 Table
L2 Table
In L2 L3 ACL Out

TTL
Stage Stage Stage

Deparser
Parser

Stage 1 Stage 2 Stage 3

Data

Programmable Networks Advanced Computer Networks 41


WHAT FLEXIBILITY DOES OPENFLOW NEED?

o Multiple stages of match-action


❑ Flexible resource allocation
❑ Flexible header fields
❑ Flexible actions

Programmable Networks Advanced Computer Networks 42


OTHER ALTERNATIVES FOR SDN TO GET THE FLEXIBILITY?

• Software? 100x slower


• NPUs? 10x slower, expensive
• FPGAs? 10x slower, expensive

• Need to keep up with 1Tb/s speed!!

Programmable Networks Advanced Computer Networks 43


KEY QUESTIONS: HOW?

▪ How to design a flexible switch chip? And

▪ What does the flexibility cost?

Programmable Networks Advanced Computer Networks 44


RECONFIGURABLE MATCH TABLES (RMT)

▪ Ideal RMT should allow a set of pipeline stages each with a match
table of arbitrary depth and width.

▪ RMT goes beyond fixed function MMT in four ways


❑ field definitions can be changed and new fields can be added.
❑ The number, topology, widths and depths of match tables can be specified,
subject only to the overall resource limit on the number of matched bits.
❑ New actions can be defined.
❑ Arbitrarily modified packets can be placed in specified queue(s).

❑ The configuration should be done by an SDN controller.

Programmable Networks Advanced Computer Networks 45


RMT MODEL, AND HEADER FIELD FLEXIBILITY

• The parser specifies the header fields of interests, and generates a 4kb header
vector (put the fields in a specific location in the vector).
❑ change header field, add new header

Programmable Networks Advanced Computer Networks 46


THE RMT MODEL: FLEXIBILITY IN THE NUMBER, TOPOLOGY, WIDTHS AND DEPTHS OF MATCH TABLES

• More physical stages than logical stages.


A logical stages can be mapped to the
physical stage.

• Logical stage specified by table graph.

Programmable Networks Advanced Computer Networks 47


THE RMT MODEL: FLEXIBILITY IN THE NUMBER, TOPOLOGY, WIDTHS AND DEPTHS OF MATCH TABLES

• Restrictions for realizability


• Physical pipeline stage

• Match restriction: a fixed number stages (32)


• Packet header limits: the packet header vector
has a fixed size (4kb)
• Memory restriction: identical number of
entries for all
• Action restrictions: number of complexity of
instructions
• Not generic inst., just modify headers.

Programmable Networks Advanced Computer Networks 48


THE RMT MODEL: FLEXIBLE ACTIONS AND HEADER UPDATE

Programmable Networks Advanced Computer Networks 49


RMT ABSTRACT MODEL
• Controlling the configuration
❑ Parse graph
❑Table Flow Graph

Programmable Networks Advanced Computer Networks 50


RCP AND ACL SUPPORT

Programmable Networks Advanced Computer Networks 51


CHIP DESIGN

Programmable Networks Advanced Computer Networks 52


CONFIGURABLE PARSER

• Input: packet data, parser


graph stored in TCAM
• Output 4kb header vector

Programmable Networks Advanced Computer Networks 53


RMT SWITCH DESIGN
• 64 x 10Gb ports • Huge TCAM: 10x current chips
• 960M packets/second • 64K TCAM words x 640b
• 1GHz pipeline

• SRAM hash tables for exact matches


• Programmable parser
• 128K words x 640b

• 32 Match/action stages
• 224 action processors per stage

• All OpenFlow statistics counters

Programmable Networks Advanced Computer Networks 54


COST OF CONFIGURABILITY: COMPARISON WITH CONVENTIONAL SWITCH
• Many functions identical: I/O, data buffer, queueing…
• Make extra functions optional: statistics
• Memory dominates area
• Compare memory area/bit and bit count
• RMT must use memory bits efficiently to compete on cost
• Techniques for flexibility
• Match stage unit RAM configurability
• Ingress/egress resource sharing
• Table predication allows multiple tables per stage
• Match memory overhead reduction
• Match memory multi-word packing

Programmable Networks Advanced Computer Networks 55


CHIP COMPARISON WITH FIXED FUNCTION SWITCHES
Software Defined Networking (COMS 6998-10)

Area
Section Area % of chip Extra Cost
IO, buffer, queue, CPU, etc 37% 0.0%
Match memory & logic 54.3% 8.0%
VLIW action engine 7.4% 5.5%
Parser + deparser 1.3% 0.7%
Total extra area cost 14.2%
Power
Section Power % of chip Extra Cost
I/O 26.0% 0.0%
Memory leakage 43.7% 4.0%
Logic leakage 7.3% 2.5%
RAM active 2.7% 0.4%
TCAM active 3.5% 0.0%
Logic active 16.8% 5.5%
Total extra power cost 12.4%
Programmable Networks Advanced Computer Networks 56
CONCLUSION
• How do we design a flexible chip?
• The RMT switch model
• Bring processing close to the memories:
• pipeline of many stages
• Bring the processing to the wires:
• 224 action CPUs per stage

• How much does it cost?


• 15%

• Lots of the details how this is designed in 28nm CMOS are in the paper

10/20/14 Software Defined Networking (COMS 6998-10) Source: P. Bosshart, TI


Programmable Networks Advanced Computer Networks 57
TWO KEY TECHNICAL INNOVATIONS OF RMT
• Flexible packet parser
• Useful to introduce new fields into packets

• Ability to allocate table memories flexibly


• Use fields of your choice
• Free to choose depth and width of tables
• Run multiple tables per stage
• Or one table across multiple stages

Programmable Networks Advanced Computer Networks 58


PARSING PACKETS

Source: Design principles for packet parsers, Gibb et al.

Programmable Networks Advanced Computer Networks 59


PARSING STATE MACHINE
• Inherently sequential process
• Previous header determines the next
header type
• Current header length determines the
start of the next header

• Parser state: tracks the current


header and its length
• Help jump to the next state

Programmable Networks Advanced Computer Networks 60


GOAL: ENCODE FIELDS FROM PARSE GRAPH

• Representation of all valid header sequences

Programmable Networks Advanced Computer Networks 61


A HARDWARE (FIXED) PACKET PARSER

• Two steps

• Identify sequence of
headers

• Extract fields from


identified headers

Programmable Networks Advanced Computer Networks 62


HEADER IDENTIFICATION

• Identify headers through


fixed-function header
processors

• Simple design: extract one


header per cycle

• Speculate to extract multiple


headers/cycle
• Seq.res. picks one
Programmable Networks Advanced Computer Networks 63
FIELD EXTRACTION

• Extract fields using fixed


offsets into the packet,
depending on parser state

• Field-extract table hard-


coded for specific fields

Programmable Networks Advanced Computer Networks 64


PROGRAMMABLE PARSER

• Morph header processor into


TCAM

• Content-addressable memory
allows us to find matches in
one clock cycle, rather than
fixed-function header
processor

• Field extraction data goes


into the RAM
Programmable Networks Advanced Computer Networks 65
OPTIMIZE PARSING BY CLUSTERING STATES

Consequence: increase parser throughput significantly (Tofino: 400 Gbit/s)


Programmable Networks Advanced Computer Networks 66
MATCH-ACTION TABLE MEMORY DESIGN

- Match and Action units supplied with the Packet Header Vector
- Each pipeline stage accesses its own local memory
- VLIW: modify each component of PHV if needed

PHV

Programmable Networks Advanced Computer Networks 67


MATCH-ACTION TABLE MEMORY DESIGN

Hardware realization: separately configurable memory blocks


➔ Key to table flexibility
SRAM L2
Fixed
- Exact match
match
- Action memory function
- Statistics
VS.
PHV Flexible
TCAM match
- wildcard match function
- Faster than trie 14% extra area (&
power) for fatter wires
Programmable Networks Advanced Computer Networks 68
MATCH-ACTION TABLE MEMORY DESIGN

Hardware realization: separately configurable memory blocks


➔ Key to table flexibility
SRAM L2 Match RAM
- Exact match blocks also
- Action memory contain
- Statistics pointers to
VS. action
PHV memory and
TCAM
- wildcard match instructions
- Faster than trie
Programmable Networks Advanced Computer Networks 69
SUMMARY OF RMT

• Reconfigurability works
• Flexible logic is cheap (1—2% of chip area)
• Memory is more expensive, but overheads are worth the cost

• Better abstractions and tools arise due to reconfigurability

• RMT has inspired a lot of further work


• Compilers, verification, testing
• Architecting other hardware platforms (e.g., NICs)
• Better monitoring of networks
• Possibility of “zero touch” networking

Programmable Networks Advanced Computer Networks 70


SDN FORWARDING ABSTRACTIONS

• Programmable data plane


• “Forwarding Metamorphosis: Fast Programmable Match-Action Processing in
Hardware for SDN”, SIGCOMM’2013.
• “P4: Programming Protocol-Independent Packet Processors”, SIGCOMM
Computer Communications Review, July 2014.

Programmable Networks Advanced Computer Networks 71


IN THE BEGINNING…
• OpenFlow was simple

• A single rule table


• Priority, pattern, actions, counters, timeouts

• Matching on any of 12 fields, e.g.,


• MAC addresses
• IP addresses
• Transport protocol
• Transport port numbers

Programmable Networks Advanced Computer Networks 72


OVER THE PAST FIVE YEARS…

Proliferation of header fields


OpenFlow Version Released on #Match Fields (Headers)

1.0.0 December 2009 12


1.1.0 February 2011 15
1.2.0 December 2011 36
1.3.0 June 2012 40
1.4.0 October 2013 41
1.5.1 March 2015 44

Multiple stages of heterogeneous tables

Still not enough (e.g., VXLAN, NVGRE, STT, …)


73
Programmable Networks Advanced Computer Networks 73
FUTURE SDN SWITCHES
• Configurable packet parser
• Not tied to a specific header format

• Flexible match+action tables


• Multiple tables (in series and/or parallel)
• Able to match on all defined fields

• General packet-processing primitives


• Copy, add, remove, and modify
• For both header fields and meta-data

74
Programmable Networks Advanced Computer Networks 74
WE CAN DO THIS!

• New generation of switch ASICs


• Intel FlexPipe: programmable parser,
• RMT [SIGCOMM’13]
• Cisco Doppler
• But, programming these chips is hard
• Custom, vendor-specific interfaces
• Low-level, akin to microcode programming

Programmable Networks Advanced Computer Networks 75


WE NEED A HIGHER-LEVEL INTERFACE

• To tell the programmable switch how we want it to behave

Programmable Networks Advanced Computer Networks 76


THREE GOALS FOR THE LANGUAGE
• Protocol independence
• Configure a packet parser
• Define a set of typed match+action tables

• Target independence
• Program without knowledge of switch details
• Rely on compiler to configure the target switch

• Reconfigurability
• Change parsing and processing in the field

Programmable Networks Advanced Computer Networks 77


“CLASSIC ” OPENFLOW (1.X)

SDN Control Plane

Installing and
querying rules

Target Switch
78
Programmable Networks Advanced Computer Networks 78
“OPENFLOW 2.0”

SDN Control Plane


Configuring: Populating:
Parser, tables, and Installing and querying
control flow rules

Parser & Table Rule


Compiler Configuration Translator

Target Switch
79
Programmable Networks Advanced Computer Networks 79
P4 LANGUAGE

• Programming Protocol-Independent Packet Processing


• Telling programmable switches what to do

80
Programmable Networks Advanced Computer Networks 80
ABSTRACT FORWARDING MODEL (TARGET ABSTRACTION)

Programmable Networks Advanced Computer Networks 81


P4 CONCEPTS
• A P4 program contains definitions of the following
❑ Headers: describes the sequence and structure of a series of fields (fields width and
constraints on field values)

❑ Parsers: A parser definition specifies how to identify headers and valid header sequence
within packets

❑ Tables: Match+action tables are the mechanism for performing packet processing. P4
program defines the fields on which a table may match and the actions it may execute.

❑ Actions: P4 supports construction of complex actions from simpler protocol-independent


primitives. Actions are available within match-action tables.

❑ Control programs: the control program determines the order of match-action tables that
are applied to a packet, describing the flow of control between match+action tables.

Programmable Networks Advanced Computer Networks 82


P4 LANGUAGE BY EXAMPLE

• Data-center routing • Hierarchical tag (mTag)


• Top-of-rack switches • Pushed by the ToR
• Two tiers of core switches • Four one-byte fields
• Source routing by ToR • Two hops up, two down

up2 down1

up1 down2

ToR ToR

Programmable Networks Advanced Computer Networks 83


HEADER FORMATS
• Header
• Ordered list of fields
• A field has a name and width

header ethernet {
fields {
dst_addr : 48;
src_addr : 48; header vlan {
ethertype : 16; fields {
} pcp : 3;
} cfi : 1; header mTag {
vid : 12; fields {
ethertype : 16; up1 : 8;
} up2 : 8;
} down1 : 8;
down2 : 8;
ethertype : 16;
}
}

Programmable Networks Advanced Computer Networks 84


PARSER

• State machine traversing the packet


• Extracting field values as it goes

parser start { parser vlan {


ethernet; switch(ethertype) {
} case 0xaaaa : mTag;
case 0x800 : ipv4;
parser ethernet { . . .
switch(ethertype) { }
case 0x8100 : vlan;
case 0x9100 : vlan; parser mTag {
case 0x800 : ipv4; switch(ethertype) {
. . . case 0x800 : ipv4;
} . . .
} }
}

Programmable Networks Advanced Computer Networks 85


MATCH-ACTION TABLES

• Describe each packet-processing stage


• What fields are matched, and in what way
• What action functions are performed
• (Optionally) a hint about max number of rules

table mTag_table { table local_switching {


reads { // read destination and checks if local
ethernet.dst_addr : exact; // if miss occurs, goto mtag table
vlan.vid : exact; }
}
actions {
add_mTag; table local_switching {
} //verify egress is resolved
max_size : 20000; // do not retag packets received with tag
} // read egress and whether packet was mtagged
}
Programmable Networks Advanced Computer Networks 86
ACTION FUNCTIONS

• Custom actions built from primitives


• Add, remove, copy, set, increment, checksum

action add_mTag(up1, up2, down1, down2, outport) {


add_header(mTag);

copy_field(mTag.ethertype, vlan.ethertype);
set_field(vlan.ethertype, 0xaaaa);

set_field(mTag.up1, up1);
set_field(mTag.up2, up2);
set_field(mTag.down1, down1);
set_field(mTag.down2, down2);

set_field(metadata.outport, outport);
}

87
Programmable Networks Advanced Computer Networks 87
CONTROL FLOW

• Flow of control from one table to the next


• Collection of functions, conditionals, and tables
• For a ToR switch:

From core Source Local


(with mTag) Egress
Check Switching
Check
Table Table

Miss: Not Local


ToR
mTag
From local hosts Table
(with no mTag)

88
Programmable Networks Advanced Computer Networks 88
CONTROL FLOW
• Flow of control from one table to the next
• Collection of functions, conditionals, and tables
• Simple imperative representation

control main() {
table(source_check);

if (!defined(metadata.ingress_error)) {
table(local_switching);

if (!defined(metadata.outport)) {
table(mTag_table);
}

table(egress_check);
}
}
89
Programmable Networks Advanced Computer Networks 89
P4 COMPILATION

90
Programmable Networks Advanced Computer Networks 90
P4 COMPILER
• Parser
• Programmable parser: translate to state machine
• Fixed parser: verify the description is consistent

• Control program
• Target-independent: table graph of dependencies
• Target-dependent: mapping to switch resources

• Rule translation
• Verify that rules agree with the (logical) table types
• Translate the rules to the physical tables

Programmable Networks Advanced Computer Networks 91


COMPILING PACKET PARSER

• Compile into a state machine

Programmable Networks Advanced Computer Networks 92


COMPILING CONTROL PROGRAMS

control main() {
table(source_check);

if (!defined(metadata.ingress_error)) {
table(local_switching);

if (!defined(metadata.outport)) {
table(mTag_table);
}

table(egress_check);
}
}

• And then to a target specific back-end

Programmable Networks Advanced Computer Networks 93


CONCLUSION
• OpenFlow 1.x
• Vendor-agnostic API
• But, only for fixed-function switches
• An alternate future
• Protocol independence
• Target independence
• Reconfigurability in the field
• P4 language: a straw-man proposal
• To trigger discussion and debate
• Much, much more work to do!

94
Programmable Networks Advanced Computer Networks 94
TELEMETRY

• Debugging correctness and performance issues in a live network is


challenging
• Packets not going where they’re supposed to
• Packets experiencing significant queueing or drops

• Measuring networks effectively requires collecting a lot of data


• Recall number of packets every second even on a single 10 Gbit/s network

• Flexible packet formats allow you to put useful information directly


on the original packet
• In-band Network Telemetry (INT)

Programmable Networks Advanced Computer Networks 95


TODAY, BASIC INFORMATION IS HARD TO FIND

“I visited Switch 1 @780ns,


Switch 9 @1.3µs, Switch 12
1 “Which path did my packet take?”
@2.4µs”

# Rule
1
2 “In Switch 1, I followed rules 75 and
3 250. In Switch 9, I followed rules 3
… and 80. ”
75 192.168.0/24
2 “Which rules did my packet follow?”

Programmable Networks Advanced Computer Networks 96


3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”

Queue

4 “Who did my packet share the queue with?”

Time
Programmable Networks Advanced Computer Networks 97
3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”

Aggressor flow!
Queue

4 “Who did my packet share the queue with?”

Time
Programmable Networks Advanced Computer Networks 98
TODAY, BASIC INFORMATION IS HARD TO FIND

“Which
1 path did my packet take?”
“Which
2 rules did my packet follow?”
“How
3 long did it queue at each switch?”
“Who
4 did it share the queues with?”

With P4 + INT we can answer all four questions for the first
time. At full line rate. Without generating additional packets.
Programmable Networks Advanced Computer Networks 99
INT: IN-BAND NETWORK TELEMETRY

SwitchID, Arrival Time,


Queue Delay, Matched Rules, …

Original Packet

Log, Analyze
Replay and Visualize
Programmable Networks Advanced Computer Networks 100
Software Defined Network (SDN)
Control: Generate and Verify control code

Control
Programs
Control
Programs
Code Observe
Measure and Validate
Abstract Network View Control code

Network Virtualization
Global Network View
Network OS
State Observe
Measure and Validate
Partition, Generate, Verify, Download State

Packet INT
Forwarding Packet INT

Packet
Forwarding
INT Observe
Forwarding Packet INT Measure and Validate
Packet Forwarding Packets
Forwarding INT
NETWORK INTERFACE CARD (NIC)
NIC: Hardware component that connects a computing
device to network. Usually onboard or connected as a
discreet card on Motherboard. Wired and WiFi
Network. Sits on Layer-2 of OSI model working on
MAC. [10/100/1000 Mbps]

Offload NIC: for common Network Traffic functions Foundational


[Eg: TCP/IP stack etc]

SmartNIC: A foundational NIC + implements network


traffic processing on the NIC that would necessarily be ASIC
performed by the CPU. Examples: ASIC, FPGA, SoC
based on implementation/requirement. A SmartNIC is a
type of NIC card and programmable accelerator that
makes data centre networking, security, and storage
efficient and flexible. 102
FPGA
DPU based SmartNIC: A DPU-based NIC that
offloads processing tasks that the system CPU would
normally handle. [25 Gbps to 400Gbps or more…]

SoC
Images taken from Internet
Programmable Networks Advanced Computer Networks 102
TECHNOLOGICAL CONSTRAINTS: MOORE’S LAW
• Processors aren’t clocked faster any more (Dennard scaling)

• Soon, can no longer pack more transistors in the same area (feature size limits)

• Implication (1): Application code won’t automatically get faster

• Implication (2): Need to re-design applications or the hardware from the ground
up

Programmable Networks Advanced Computer Networks 103


TREND: COMPUTE OFFLOADS TO ACCELERATORS

• Example: smartNICs (e.g., Azure NIC)


• Hardware runs (part of) the network stack’s processing

• Other accelerators:
• GPUs
• TPUs
• Matrix computation accelerators in the research realm

Programmable Networks Advanced Computer Networks 104


DISAGGREGATION OF RESOURCES
• Typical server: compute cores, memory, storage.

• Problems:

• Memory wall: not enough bandwidth between compute & memory

• Provisioning for evolution in storage and mem technologies

• Inefficient usage of per-server statically allocated resources

Programmable Networks Advanced Computer Networks 105


ACCELERATORS: DC == DISTRIBUTED COMPUTER

• Don’t burn cores doing data movement


• Use acceleration

• Provide high performance to a single connection


• High throughput (100 Gbit/s+), low latency

• Retain host-stack programmability


• Don’t get stuck with hardware you can’t control

Programmable Networks Advanced Computer Networks 106


Programmable Networks Advanced Computer Networks 107
Programmable Networks Advanced Computer Networks 108
Programmable Networks Advanced Computer Networks 109
Programmable Networks Advanced Computer Networks 110
Programmable Networks Advanced Computer Networks 111
Programmable Networks Advanced Computer Networks 112
Programmable Networks Advanced Computer Networks 113
Programmable Networks Advanced Computer Networks 114
Programmable Networks Advanced Computer Networks 115
Programmable Networks Advanced Computer Networks 116
Programmable Networks Advanced Computer Networks 117
Programmable Networks Advanced Computer Networks 118
Programmable Networks Advanced Computer Networks 119
Programmable Networks Advanced Computer Networks 120
AZURE SMARTNICS: BUMP IN THE WIRE

Source: Firestone et al. NSDI


Programmable Networks Advanced Computer Networks 121
AZURE SMARTNICS: BUMP IN THE WIRE

Other architectures possible: multi-core and manycore


Programmable Networks Advanced Computer Networks 122
SMARTNICS
Vendor BW Processor Deployed SW

Netronome NFP- Netronome 2X 40GbE 60 ARM11 MPcore, 1GHz Firmware


4000
LiquidIOII CN2350 Marvell 2X 10GbE 12 cnMIPS core, 1.2GHz Firmware

LiquidIOII CN2360 Marvell 2X 25GbE 16 cnMIPS core, 1.5GHz Firmware

BlueField 1M332A NVIDIA 2X 25GbE 8 ARM A72 core, 0.8GHz Full OS


(Mellanox)
Stingray PS225 Broadcom 2X 25GbE 8 ARM A72 core, 3.0GHz Full OS

• Low power processors with simple micro-architectures


• Varying level of systems support (firmware to Linux)
• Some support RDMA & DPDK interfaces
Programmable Networks Advanced Computer Networks 123
STRUCTURAL DIFFERENCES

• Classified into two types based on packet flow


– On-path SmartNICs
– Off-path SmartNICs

Programmable Networks Advanced Computer Networks 124


ON-PATH SMARTNICS

• NIC cores handle all traffic on both the send


& receive paths

SmartNIC

Traffic manager

TX/RX ports
Host cores

NIC cores

Programmable Networks Advanced Computer Networks 125


ON-PATH SMARTNICS: RECEIVE PATH

• NIC cores handle all traffic on both the send


& receive paths

SmartNIC

Traffic manager

TX/RX ports
Host cores

NIC cores

Programmable Networks Advanced Computer Networks 126


ON-PATH SMARTNICS: SENDPATH

• NIC cores handle all traffic on both the send


& receive paths

SmartNIC

Traffic manager

TX/RX ports
Host cores

NIC cores

• Tight integration of computing and communication


Programmable Networks Advanced Computer Networks 127
OFF-PATH SMARTNICS

• Programmable NIC switch enables targeted delivery

Host cores

SmartNIC NIC switch

NIC cores TX/RX ports

Programmable Networks Advanced Computer Networks 128


OFF-PATH SMARTNICS: RECEIVE PATH

• Programmable NIC switch enables targeted delivery

Host cores

SmartNIC NIC switch

NIC cores TX/RX ports

Programmable Networks Advanced Computer Networks 129


OFF-PATH SMARTNICS: RECEIVE PATH

• Programmable NIC switch enables targeted delivery

Host cores

SmartNIC NIC switch

NIC cores TX/RX ports

Programmable Networks Advanced Computer Networks 130


OFF-PATH SMARTNICS: SENDPATH

• Programmable NIC switch enables targeted delivery


• Host traffic does not consume NIC cores
• Communication support is less integrated
Host cores

SmartNIC NIC switch

NIC cores TX/RX ports

Programmable Networks Advanced Computer Networks 131


DATA PROCESSING UNIT (DPU)
● Conceptualized in 2015 by M/s Fungible
● A DPU is a programmable processor that helps move
data around Data Centres.

● A DPU offloads networking and communication tasks


from the CPU. It combines processing cores with
hardware accelerator blocks and a high-performance
network interface to handle data-centric workloads.
○ Processing
○ Networking
○ Acceleration

● NVIDIA® BlueField®-2 are Data Centre Infra on a Chip


○ High-speed networking connectivity | SmartNIC Images source: NVIDIA

○ System-On-Chip
○ Functional Offload Coprocessors (FOCP)
○ Runs its own OS that is separate from the host
○ DPU
Programmable Networks Advanced Computer Networks 132
BLOCK DIAGRAM OF NIC | SMART NIC | DPU

Traditional Storage and Networking | Low throughput


NIC NIC and High Latency

NIC CPU
SmartNIC Traditional Storage and Networking | High throughput
and Low Latency
vSwitch, Security

2x NVMe GPU Software Defined Storage and Networking


DPU 100Gbps Initiator Data Leak Detection

NIC Network Performance Optimization and Prediction


Networki Storage
ng SW Services
/RAID

Programmable Networks Advanced Computer Networks 133


WHY A DPU?
● DPU is a system on a chip (SoC) that allows for high performance network interfaces able to process data
at much faster rates.
● The use of hardware acceleration in a DPU to offload processing-intensive tasks can greatly reduce power
use, resulting in more efficient data centre.
● A DPU can be used as a stand-alone embedded processor. DPUs are usually incorporated into a
SmartNIC, a network interface controller. SmartNICs are ideally suited for high-traffic web servers.
● A DPU based SmartNIC is a network interface card that offloads processing tasks that the system CPU
would normally handle. Using its own on-board processor, the DPU based SmartNIC may be able to
perform any combination of encryption/decryption, firewall, TCP/IP and HTTP processing.
● A DPU can enable and isolate environment to accelerate compute-intensive security functions.
● A DPU can store, compute and secure data at the highest speeds while lowering cost and time by
analysing data at the edge.

Programmable Networks Advanced Computer Networks 134


HOW A DPU OPERATES

● DPU is a system on a chip (SoC) that allows for high performance network interfaces
able to process data at much faster rates.

● DPU boosts performance of complex scientific computing workloads by offloading


work from the CPU however that comes with a cost.

Blocking /Stall overall operation 100% Non-blocking operation

CPU Compute Compute Compute


CPU

Communication
Schedule and Complete
Offload Communication
NIC Communication
DPU

0 25 50 75 100 0 25 50 75 100
Application Runtime Application Runtime

Programmable Networks Advanced Computer Networks 135


BLUEFIELD DPU | MODES OF OPERATION

● Separated (Host) | Symmetric Mode | Legacy Mode


○ Both can operate simultaneously or separately
○ Equal Bandwidth
○ RDMA connection between ARM/Host
○ SR-IOV is only supported on the host side

● Embedded (ECPF) | Default Mode for DOCA BFB Image


○ Driver on ARM must load first
○ Any network packet must traverse through the DPU network interface using the
vSwitch which means traffic to and from the host interface always lands at ARM

Programmable Networks Advanced Computer Networks 136


SDN
Phase 3: Getting the
humans out of the way
Source: Nick McKeown (Standford University)
1. NICs, Switches, vSwitches, stacks will have
been programmable for 10 years.
2. We will think of a network as a
Extrapolating programmable platform.
Behavior described at the top.
to 2030 Then partitioned, compiled and run across
elements.
3. Every large network will work slightly
differently, programmed and tailored locally.
4. We will no longer think in terms of protocols.
Instead, we will think in terms of software. All
functions and “protocols” will have migrated
up and out of hardware into software.
Extrapolating 5. Networking students will learn how to
program a network top-down, as a
to 2030 distributed computing platform. Protocols
will be described in quaint historical terms.
6. “Routing” and “Congestion control” will be
programs, partitioned across the end-to-end
system by a compiler.
Getting humans out of the way
SDN with Verifiable Closed-Loop Control
Network owners and operators will use
fine-grain measurement and formal verification
to automate network control at scale.

Joint work with: Nate Foster (Cornell), Guru Parulkar (ONF),


Larry Peterson (ONF), Jennifer Rexford (Princeton)
ONF Open-source Software Today
Trellis Control Control Control Control
App App App App

ONOS Control Plane


P4Runtime Contract

P4-OVS P4-OVS
Stratum OS Stratum OS Stratum OS

NIC NIC
P4 NIC P4 switch P4 switch P4 switch P4 NIC
Early research tools (p4v)
Verifiable Closed-Loop Control
Trellis Control Control Control Control
App App App App Generation &
Verification
ONOS Control Plane
P4Runtime Contract

First production &


research tools exist
(INT/DeepInsight, SONATA)

P4-OVS P4-OVS
Stratum OS Stratum OS Stratum OS

NIC NIC
P4 NIC P4 switch P4 switch P4 switch P4 NIC

Fine-grained Per-packet Measurement


THE INTERNET ARCHITECTURE IS EVOLVING FASTER THAN EVER
Network Ossification:
“closed and proprietary”
“proliferation of standards”
“barrier to entry”
“stranglehold by vendors”

“Cleanslate” “RMT”
“4D” “PISA”
“Ethane” “P4”
“OpenFlow” “INT”
Disaggregation Programmable forwarding
2010 SDN NFV Telemetry 2020 2030
-10 years +10 years

Phase 1 Phase 3
Network owners take control of their software Networks managed by verifiable closed loop control

Phase 2
Network owners take control of packet processing too

Programmable Networks Advanced Computer Networks 143


“Making SDNs Work” ONS 2012
ONF Connect 2019

With SDN we will:


1. Formally verify that our networks are
behaving correctly.
2. Identify bugs, then systematically track
down their root cause.
3. Measure and validate correctness, then
generate and verify code fix.
Download to correct the bug.
4. Goto beach….?
BurstRadar
Practical Real-time Microburst Monitoring for
Datacenter Networks (APSys 2018)
Raj Joshi1, Ting Qu2, Mun Choon Chan1, Ben Leong1, Boon Thau Loo3

1 2 3
Microbursts (µbursts)

• Events of intermittent congestion lasting 10’s or 100’s of µs


◦ Common Causes: TCP Incast, Bursty UDP traffic,
TCP segment offloading

◦ Intermittent increase in latency ➔ variability


◦ Network jitter and Packet loss

146
Detecting & characterizing µbursts is hard

• Measurement study from FB’s datacenter


• Last for less than 200 µs
• Occur unpredictably

• Traditional sampling-based techniques


• Cannot even detect microbursts

• Commercial Solutions
• Can detect the occurrence of microbursts
• Provide no information about the cause

147
Solution:
• Key Insight:

Egress
Port µbursts are localized to a
Queues switch’s egress port queue
Switch’s Queuing Engine

Key Idea:
◦ We can detect the microburst directly on the
switch where it happens

148
BurstRadar Overview
Queuing Telemetry Markbit Egress Ports
(metadata) (metadata)

Snapshot Courier Pkt


Ring Buffer
Algorithm Generator

Egress Processing Pipeline

Egress Port
Queues Egress Deparser

149
BurstRadar Overview
Egress Ports
Courier Packet

Snapshot Courier Pkt


Ring Buffer
Algorithm Generator

Egress Processing Pipeline

Egress Port
Queues Egress Deparser
Mirror Port Queue

150
BurstRadar Overview
Egress Ports
Courier Packet

Snapshot Courier Pkt


Ring Buffer
Algorithm Generator

Egress Processing Pipeline

Egress Port
Queues Egress Deparser
Mirror Port Queue

151
BurstRadar Overview
Telemetry Info:
- Pkt 5-tuple Egress Ports
Courier Packet - Queuing telemetry data

Snapshot Courier Pkt


Ring Buffer
Algorithm Generator

Egress Processing Pipeline

Egress Port
Queues Egress Deparser
Mirror Port Queue Mirror Port

152
BurstRadar Overview
Egress Ports

Snapshot Courier Pkt


Ring Buffer
Algorithm Generator

Egress Processing Pipeline

Egress Port
Queues Egress Deparser

Mirror Port

153
Evaluation Setup
• Hardware Testbed

BurstRadar Prototype Send/Receive µburst Traffic

• ◦ About 550 lines of p4 code


• Generated µburst Traffic Traces
• µbursts data for “web” and “cache” traffic [IMC ‘17]
• Compare BurstRadar against
• In-band Telemetry (INT) → dataplane-based solution
• “Oracle” Algorithm → ground truth (exact pkts in µbursts)
154
Efficiency
100

ackets rocessed
10
IN
urst adar
Oracle
1
05 20 40 0 80 100
atency Increase olerance hreshold

5% RTT → 10 times less packets compared to INT


Note: 5% RTT ≈ 1.25µs of queuing @10Gbps in our testbed

155
Precise Time-synchronization using
Programmable Switching ASICs
ACM SOSR 2019 (Best Paper)

Pravein Govindan Kannan, Raj Joshi & Mun Choon Chan


Time Synchronization in Data Center

NTP milliseconds

PTP 10s of ns to us
Server Server

CPU Switch CPU


CPU
P P P P
NIC H H H H NIC
Y Y Queues Y Y

Network Delays & Jitter affect accuracy!!


Clock Drifts upto 30µs/sec [HUYGENS ’18]
Portable Switch Architecture

High Precision Hardware Timestamps in the Processing Pipeline

158
159
Line-rate traffic along the direction of the response packet.

160
Conclusion
• Two applications that exploit data plane programmability to
demonstrate the potential of modern programmable ASICs
• BurstRadar: characterize microbursts at multi-gigabit line rates
in high-speed datacenter networks.
• DPTP: precise time synchronization protocol running in the
network data-plane.

• Future Work: enable new monitoring frameworks, control


paradigms, virtualization strategies and speedup of large scale
distributed computations.

161
VIRTUAL MACHINE DEVICE QUEUES
• VMDq is a clean interface for kernel bypass.
• Adds Classification and queueing per VM within the NIC (done through Rx and Tx
Queues),

1
6
Programmable Networks Advanced Computer Networks 162
VM NETWORKING TECHNIQUES

• Netmap is a clean interface for kernel bypass.


• Uses kernel modules to avoid copy overheads,

1
6
Programmable Networks Advanced Computer Networks 163
SERVER BASED NETWORKING

• Traditional VM networking: Simple design.


• Processing in Hypervisor and Guest kernel modules

1
6
Programmable Networks Advanced Computer Networks 164
THE NEED FOR DISAGGREGATION

Programmable Networks Advanced Computer Networks 166


DISAGGREGATION
[Network requirements for resource disaggregation, OSDI’1 ]

Programmable Networks Advanced Computer Networks 167


RESEARCH QUESTIONS
• Want to build resource blades: separate compute, mem, storage
• Can we provide a high bandwidth low latency fabric to interconnect the
different components?

• Should communication be reliable? Packet or circuit switched?

• Resource allocation for different applications?

• Application abstractions: move away from VMs?

• How should the new OS look like? Failure models? Abstractions of


memory? Storage?

Programmable Networks Advanced Computer Networks 168


“A programming
language is a tool
that has profound
influence on our
thinking habits!”
Edsger Dijkstra(Eindhoven University, UT Austin)
Turing Award 1972
“The only way to learn
a new programming
language is by writing
programs in it”
Dennis Ritchie(Bell Labs, Lucent Technologies)
Turing Award 1983
If we want to get the
humans out of the way,
what else do we need?
1. The ability to observe packets,
network state and code, in real-
time.
2. The ability to generate new control and
forwarding behaviors, on the fly, to
correct errors.
Three pieces 3. The ability to verify newly generated
code and deploy it quickly.
Observing Per-packet telemetry is already
packets starting to happen
Today, basic information is hard to find
“I visited Switch 1 @780ns,
1 “Which path did my packet take?” Switch 9 @1.3µs, Switch 12 @2.4µs”

# Rule
1
2
“In Switch 1, I followed rules 75 and 250.
3
In Switch 9, I followed rules 3 and 80. ”

75 192.168.0/24 2 “Which rules did my packet follow?”


3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”

Queue

4 “Who did my packet share the queue with?”

Time
3 “How long did my packet queue at each switch?” “Delay: 100ns, 200ns, 19740ns”

Aggressor flow!
Queue

4 “Who did my packet share the queue with?”

Time
Today, basic information is hard to find

1 “Which path did my packet take?”


2 “Which rules did my packet follow?”
3 “How long did it queue at each switch?”
4 “Who did it share the queues with?”

With P4 + INT we can answer all four questions for the first time.
At full line rate. Without generating additional packets.
INT: In-band Network Telemetry

SwitchID, Arrival Time,


Queue Delay, Matched Rules, …

Original Packet

Log, Analyze
Replay and Visualize
+ SONATA [Sigcomm ‘18], Sketches [Sigcomm ‘12] …
Viewing Microbursts (to the nanosecond)
1. The ability to observe
packets, network state and
code, in real-time.
2. The ability to generate new control and
forwarding behaviors, on the fly, to
correct errors.
Three pieces 3. The ability to verify newly generated
code and deploy it quickly.
Header Space Analysis

T 2 (h, p)
2
3
T 1 (h, p) T 3 (h, p)
1

4
T 4 (h, p)

HSA [NSDI ‘12]


Example: Can A talk to B?
T1 (X , Pin )
T2 (T1 (X , Pin ))

T 2 (h, p) T3 (T2 (T1 (X , Pin )))


[
2 T3 (T4 (T1 (X , Pin )))

3
T 1 (h, p) T 3 (h, p)
T1 (X , Pin )
1
T4 (T1 (X , Pin ))
4
T 4 (h, p)
1. The ability to observe
packets, network state and
code, in real-time.

2. The ability to generate new control and


Three pieces forwarding behaviors, on the fly, to
? correct errors.
3. The ability to verify newly generated
code and deploy it quickly.
Software Defined Network (SDN)

Control Control
Programs Programs

Abstract Network View


Network Virtualization
Global Network View
Network OS

Packet
Forwarding Packet
Forwarding
Packet
Forwarding Packet
Packet Forwarding
Forwarding
Phase 1 Phase 3
010 0 Networks managed by verifiable closed loop control 203
202
Network owners take control of their software
0
Phase 2
Network owners take control of packet processing too
RETHINKING RELAYERING WITH NFV

Programmable Networks Advanced Computer Networks 187


NFV MAJOR GAPS & CHALLENGES
applications applications

operating systems network functions


hypervisors operating systems
compute infrastructure hypervisors
network infrastructure compute infrastructure
switching infrastructure switching infrastructure
rack, cable, rack, cable,
power, cooling power, cooling

• management & orchestration • security


• Topology Validation & Enforcement
• infrastructure management • Availability of Management Support
standards Infrastructure
• multi-level identity standard • Secure Boot
• Secure Crash
• resource description language • Performance Isolation
• Tenant Service Accounting

Programmable Networks Advanced Computer Networks 188


DOMAIN ARCHITECTURE

NfV Applications Domain Carrier


Management

NfV Container Interface

Virtual Network Container


Virtual Machine Container Interface
Interface
Orchestration
Hypervisor Domain and
Infrastructure
Network Management
Domain Domain
Compute Container Interface

Compute Domain

Programmable Networks Advanced Computer Networks 189


OPNFV SFC Current Network Topology
Compute Node Control Node
VM VM VM VM VNF
Clients Mgr
Servers
SF1 SF2 ODL Open
SFC Stack

GBP GBP
EPG1 SFF EPG2
OVS OVS

Legend
VxLAN tunnel SF/SFF Top Of Rack Switch
OpenFlow 1.3/OVSDB
GBP creates VxLAN tunnel
Original packets, no encap
GBP EPG: Group Based Policy, End Point Group
Used as Classifier in OPNFV
OPNFV SFC Brahmaputra Target Use Case

Block SF SF Block
HTTP
Firewall Firewall SSH

ODL SFC
Simple
1. Update/create chains
Test Cases HTTP
Server
1) Can NOT do HTTP
2) Can do SSH 2. Subscriber
classification
3) Can do HTTP rules
4) Can NOT do SSH Classifier SFF

Legend:
SDN network
RSP1
SFF: Service Function Forwarder RSP2
SF: Service Function
RSP: Rendered Service Path, a Service Chain
NSH Overview
• Describes a dataplane header used to carry
information along a service path.
– Identifier for service path selection
– Opaque mandatory metadata fields
– Optional TLVs
• Creates “service plane”
– Transport independent (NSH in VXLAN, NSH in
MPLS, NSH in UDP, etc.)
– Service layer OAM
Implementation Update
• Opensource implementations
– OVS dataplane (with VXLAN)
– OpenDaylight control plane (+ LISP)
• Several vendor specific implementations
• Early deployments underway
Base Header
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver|O|C|R|R|R|R|R|R| Length | MD Type | Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

• 8 bit Next Protocol: support non-‐ETprotocols


+ reclaim space
• MD type indicates format of header. NSH
type = 0x1
• Critical TLV present
Service Path Header
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+
| Service Path ID | Service Index |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
+

• Represents the rendering of the chain policy


• Simple identifier: does not imply a static, explicit path
– Resolved locally
• Can be changed: branching within a service graph
– Re-‐classification(and therefore policy) decision
• Index conveys node within the graph
Chain and Paths
(no load distribution)

Chain1: Firewall →→DPI →→IPS

Chain1 is rendered as SFPID = 10

FW FW’ IPS IPS’ DPI’ FW’’’’ DPI IPS’’ DPI’

Classifier SFF1 SFF2 SFF3


SFPID: 10 →→ SFPID: 10 →→
SF1: FW SF2: DPI SFPID: 10 →→
SF2: DPI

Loc(SFF1, FW, FW’) Loc(SFF3,DPI)


Loc(SFF2, FW’’) Loc(SFF2,DPI’) Loc(SFF2, IPS’)
Loc(SFF3, FW’’’’) Loc(SFF1, IPS)
Loc(SFF3, IPS’’)
Local Local
forwarding forwarding Local
policy policy forwarding
policy
Transport Transport Transport
Mandatory Context Headers
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Network Platform Context |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Network Shared Context |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Service Platform Context |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Service Shared Context |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

•Based on initial deployments: many use cases satisfied


with fixed size context headers
• Hardware friendly: easy to parse and skip at high speed
• Opaque, significance allocated via control plane
Optional TLV
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TLV Class | Type |R|R|R| Len |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Metadata |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

• TLV Class: describes the scope of the type field


• Type: type of metadata carried, includes
critical indication

You might also like