Switch Abstraction Interface
Joint work from
Microsoft, Dell, Facebook,
Broadcom, Intel, Mellanox,
Lihua Yuan, Guohan Lu, Dave Maltz,
Kamala Subramaniam, Darren Loher
for: Microsoft Azure Network Team
Agenda
Go through the SAI proposal.
Discussions!
Sorted list of feedbacks/questions from the mailing list.
Please add/prioritize the items of your interest.
Collecting feedbacks and requirements.
And names whod want to contribute
Initial Driving Scenario -- Data Center ToR
L
Key use cases
ToR
This is NOT the only scenario.
L3 forwarding, L2 forwarding, ECMP, port-channel,
monitoring, overlay
Multiple switching chips with different
capabilities
Goal of Switch Abstraction Interface (SAI)
Define a minimum required set of APIs to control the switch chips.
Not necessarily enough to build the entire switch,
but a significant portion of it.
Quick consensus is the goal,
So we can focus on implementation and operational experiences.
Platform Abstraction Layer (LED, Fans, Power etc.) on separate discussion.
Additional features exposed via SAI-extensions.
Unique hardware features/capabilities.
Extra software features.
Graduate towards SAI over time.
Approach of SAI
Focusing on exposing hardware capabilities,
And support management capabilities.
Unified verbs CRUD
Switch ASIC has many types of objects in its ASIC pipeline, every one of them
performs a unique operation
SAI API is all about creating these objects, assigning values to them, and
associating them with other objects
Extensible data -- Entity/Attribute/Value data model
New capability -> new type of entity: vxlan, nvgre
New capability -> new attribute: new hash type attribute
New connection -> new attribute: acl-based ERSPAN
Open Schema allows low cost extensions.
Operating System
SAI in Azure Cloud Switch
Provision
Deployment
Automation
BGP
Other SDN Apps
SNMP
LLDP
ACL
Switch State Service
Sync
Agent1
Sync
Agent2
Sync
Agent3
Switch Abstraction Layer
Switch ASIC SDK
Ethernet Interfaces
Switch ASIC
Driver
Hardware
Hw Load
Balancer
Switch
ASIC
User space
SAI
SAI Development
v0.9.0 available @github
https://github.com/opencomputeproject/OCP-Networking-ProjectCommunity-Contributions
Huge thanks to ocp!
Discussion @ [email protected]
Accepts patches via this mailing list or via github pull requests
Not sure if people think this mailing list can be dev list?
Timeline
v0.9.1 End of the year
Fixes for current v0.9.0
v0.9.2 First quarter of the next year
LAG, STP, Packet tx/rx,
A release every 3-6 month
Before its getting official (v1.x.x), the API is subject to change without
backward compatibility
Objects CRUD API based on attributes
Create object: create an object with all necessary
attributes. object id returned is used to reference
the object.
Certain objects such as FDB entry, L3 route and
VLAN objects do not use object ids, their have their
own keys.
typedef sai_status (*sai_create_object_name_fn) (
_Out_ sai_object_name_id_t *object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_In_ sai_attribute_value_t *values);
Delete object: delete an object with the object id
typedef sai_status (*sai_delete_object_name_fn) (
_In_ sai_object_name_id_t object_name_id);
Set attributes: set a set of attributes for an object
typedef sai_status (*sai_set_object_name_attribute_fn) (
_In_ sai_object_name_id_t object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_In_ sai_attribute_value_t *values);
Get attributes: get a set of attributes for an object
typedef sai_status (*sai_get_object_name_attribute_fn) (
_In_ sai_object_name_id_t object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_Out_ sai_attribute_value_t *values);
Attribute values and additional API
sai_attribute_value_t is
defined as a union of all basic sai
types
Current issue: additional APIs are
also provided when other verbs
are needed such as add/remove
members
typedef union {
sai_uint8_t u8;
sai_uint16_t u16;
sai_uint32_t u32;
sai_uint64_t u64;
sai_switch_id_t switch_id;
sai_port_id_t port_id;
sai_port_phy_id_t port_phy_id;
sai_vlan_id_t vlan_id;
sai_vr_id_t vr_id;
sai_router_interface_id_t rif_id;
sai_cos_t cos;
sai_mac_t mac;
sai_ip_t ip;
sai_ipv6_t ip6;
sai_mpls_label_t mplslabel;
} sai_attribute_value_t;
SAI objects
SAI provides the interface to manage different types of objects in the
ASIC forwarding pipeline.
switch, port, lag
vlan, fdb, stp
virtual router, router interface, neighbor table, next hop, next hop groups, l3
routes
ACL
QoS, buffer
Port mirroring, ERSPAN, sflow
Switch/port initialization
sai_api_initialize(0, NULL);
sai_api_query(SAI_API_SWITCH, &sai_switch_api);
sai_switch_api->initialize_switch(0, NULL, NULL,
&switch_notifications);
sai_api_query(SAI_API_PORT, &sai_port_api);
int port_count = 128;
sai_port_id_t ports[128];
sai_port_api->get_ports(&port_count, ports);
VLAN
sai_api_query(SAI_API_VLAN, &sai_port_vlan);
sai_vlan_id_t vlan_id = 100;
sai_api_vlan->create_vlan(vlan_id);
int port_count = 2;
sai_vlan_port_t vlan_ports =
{ .port_id = 0, .tagging_mode =
SAI_VLAN_PORT_TAGGED },
{ .port_id = 1, .tagging_mode =
SAI_VLAN_PORT_UNTAGGED };
sai_api_vlan->add_ports_to_vlan(vlan_id, port_count,
vlan_ports);
L3 router interface
sai_api_query(SAI_API_ROUTER_INTERFACE, &sai_rif_api);
sai_router_interface_id_t rif_id;
sai_route_interface_attr_t attrs[4];
sai_attribute_value_t values[4];
attrs[0] = SAI_ROUTER_INTERFACE_ATTR_VR_ID;
attrs[1] = SAI_ROUTER_INTERFACE_ATTR_TYPE;
attrs[2] = SAI_ROUTER_INTERFACE_ATTR_VLAN_ID;
attrs[3] = SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS;
values[0].vr_id = vr_id;
values[1].u32 = SAI_ROUTER_INTERFACE_TYPE_VLAN;
values[2].vlan_id = vlan_id;
values[3].mac = src_mac;
sai_rif_api->create_router_interface(&rif_id, 4, attrs, values);
L3 neighbor
sai_api_query(SAI_API_NEIGHBOR, &sai_nb_api);
sai_ip_t nb_ip;
sai_neighbor_attr_t attrs[3];
sai_attribute_value_t values[3];
attrs[0] = SAI_NEIGHBOR_ATTR_VR_ID;
attrs[1] = SAI_NEIGHBOR_ATTR_ROUTER_INTERFACE_ID;
attrs[2] = SAI_NEIGHBOR_DST_MAC_ADDRESS;
values[0].vr_id = vr_id;
values[1].rif_id = rif_id;
values[2].mac = dst_mac;
sai_rif_api->create_neighbor(nb_ip, 4, attrs, values);
L3 next hop
sai_api_query(SAI_API_NEXT_HOP, &sai_nh_api);
sai_next_hop_id_t nh_id;
sai_next_hop_attr_t attrs[3];
sai_attribute_value_t values[3];
attrs[0] = SAI_NEXT_HOP_ATTR_TYPE;
attrs[1] = SAI_NEXT_HOP_ATTR_DST_IP;
attrs[2] = SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID;
values[0].u32 = SAI_NEXT_HOP_IP;
values[1].ip = dst_ip;
Values[2].rif_id = rif_id;
sai_nh_api->create_next_hop(&nh_id, 3, attrs, values);
L3 next hop group
sai_api_query(SAI_API_NEXT_HOP_GROUP, &sai_nh_api);
sai_next_hop_group_id_t nhg_id;
sai_next_hop_group_attr_t attrs[1];
sai_attribute_value_t values[1];
attrs[0] = SAI_NEXT_HOP_GROUP_ATTR_TYPE;
values[0].u32 = SAI_NEXT_HOP_GROUP_ECMP;
sai_nh_api->create_next_hop(&nhg_id, 3, attrs, values);
int nexthop_count = 2;
sai_nexthop_id_t nexthops[2];
nexthops[0] = nh_id;
nexthops[1] = nh_id;
sai_nh_api->add_next_hop_to_gorup(nhg_id, nexthop_count, nexthops);
L3 route
sai_api_query(SAI_API_ROUTE, &sai_route_api);
sai_next_hop_group_attr_t attrs[2];
sai_attribute_value_t values[2];
attrs[0] = SAI_ROUTE_ATTR_NEXT_HOP_TYPE;
attrs[1] = SAI_ROUTE_ATTR_NEXT_HOP_GROUP_ID;
values[0].u32 = SAI_ROUTE_NEXT_HOP_GROUP;
values[1].nhg_id = nhg_id;
sai_ip_prefix_t ip_prefix;
sai_route_api->create_route(ip_prefix, 3, attrs,
values);
ACL
sai_api_query(SAI_API_ACL, &sai_acl_api);
sai_acl_table_id_t acl_table_id;
sai_acl_table_attr_t attrs[1];
sai_attribute_value_t values[1];
sai_acl_field_t fields[2]= {SAI_ACL_FIELD_SRC_IP, SAI_ACL_FIELD_DST_IP };
attrs[0] = SAI_ACL_ATTR_STAGE;
values[0] = SAI_ACL_STAGE_INGRESS;
sai_acl_api->create_acl_table(&acl_table_id, 2, attrs, values, 2, 2,
fields);
sai_acl_id_t acl_id;
sai_acl_attr_t acl_attrs = { SAI_ACL_ATTR_TABLE_ID, SAI_ACL_ATTR_PRIORITY
};
sai_acl_attr_t acl_values = { acl_table_id, 100 };
sai_acl_api->create_acl(&acl_id, 2, attrs, 2, filters, 1, actions);
Discussions
Issue types
API design
BULK API, Async API support
API extensibility
SAI API to support
Packet tx/rx, fault management, trunk, OAM,
Multi-asic support
L2/L3
Port functionality
Misc
API Design
BULK APIs (Cisco, Kamala)
Example: program 100K routes in a second
Synchronous Vs. Asynchronous calls (Cisco, Kamala)
Example: would a notification for a queue exceeding its watermark be
sync/async?
SAI API support
Packet Receive/Send APIs (Tina, Bruce)
Request to define
How Control Plane Stack Rx/Tx data plane packets
Packets type and format, in TLV way.
Fault Management APIs (Tina)
Interrupts and event notifications to the CPS
Define failure level (serious, general, reminder)
Define corresponding process actions (Alarm, reset chip, reset device, etc)
Trunk API (Tina, Furst)
Link Banding
OAM API (Tina)
BFD, Eth-OAM, Y.1731 have been basic functions of switch
Support L2 tunneling (L2 over NVGRE/VXLAN) (Tao Gu, Phanidhar)
Add provisions for L2 tunneling into fdb entry, i.e. to add next_hop_group_id to sai_fdb_entry_t
SPAN/RSPAN & policer support crucial and will be good to have for the first cut.
Host interface tx/rx
Still need to support
standard tx/rx API
However, most of user
applications use
socket to send/recv
packets
TCP/IP
Vlan 4
Vlan 2
LAG
1
LAG
2
p1p2
p8
p6
p7
SAI driver
Router
Vlan 2
Vlan 4
Switch
p1
p2
LAG 1
p3
p4
p5
LAG 2
p8
API Extensibility
Versioning APIs TLV based structures for backward
compatibility/extensibility (Kamala)
APIs: create/delete/set/get; Input: Attribute list
New requirements? New APIs!
New functionality? New attribute id!
Adapter implementation will decide which or all versions to support.
Lacking statements around extensibility framework, specifically
vendor extensions
Lacking details on how to create custom attributes
Vendor extensions are a must have
Multi-chip support
Different ASICs in the same box (Furst)
How will it handle 2 ASICs in the same box?
What happens if they are the same version of silicon. How does it identify between them?
Handling multiple chips with same roles and different roles will also need to be
accommodated. This probably can be a future item. (Phanidhar)
L2/L3
Routing protocol support mandatory (Furst)
Define mandatory in this context
What happens if a host adapter/underlying ASIC does not support routing?
Spanning Tree Protocols should be agnostic in SAI (Tao)
Difference of the STP mode is in the control plane protocol
Merge nbr table and nhop table in Adapter (Tao)
More oriented towards software routing, ASICS have a consolidated table. Merge them in
Adapter and have Adapter Host maintain separate tables and their mapping.
Router function is to manage virtual routers, which is more of a higher level task
(Tao)
Move such high level task from SAI (Adapter) into Adapter Host
L2/L3
Complete L2 Multicast Switch functionality (Tina)
Mac Limit, multicast action combo(Drop, forwarding, Send to CPU)
SAI only supports L2 unicast.
The L2 multicast miss is supported in the SAI via sai_switch_fdb_miss_action_t to
define the behavior, which contains drop, forward, trap (send to cpu) and log.
The fdb_entry assumes the mac_address is the dest MAC (Tao)
Some use cases need action on src MAC (such as drop action)
Add flags to fdb_entry in order to cover these use cases.
Need SAI APIs for enabling/disabling L2/L3 control protocols on a per port.
Not in all chips the capture of control protocol PDUs is done using generic
ACL TCAMs (Phanidhar)
Port functionality
Request- an interface to query SFP information. Some vendors would
like to control the ports based on SFP data. (ramachandran)
Enhancement for additional port parameters is being worked on.
Do we need SAI APIs for handling pluggable port modules, meaning
we need SAI APIs to dynamically add ports (Phanidhar)
Miscellaneous
Intended audience, add page numbers (ramachandran)
Add figures for interaction between adapter module and adapter host.
(Ramachandran)
Warm Restart (Phanidhar)
Warm restart not well defined at this point
Maybe best to remove from v1.0 scope until better defined
Ordering (Furst)
During adapter start/restart what is the guaranteed order of objects creation.?
Currently the SAI states VLANs first, the whole ordering should be enumerated with dependencies
outlined.
Process for other vendors to be added as a reference platform (Omar Sultan)
SAI/HAL should not need to reference HW, but instead have a consistent and predictable behavior
against the SAI/HAL, regardless of the HW