0% found this document useful (0 votes)
14 views26 pages

Hardware Accelerating Linux Network Functions

Uploaded by

Mrinal Madhukar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views26 pages

Hardware Accelerating Linux Network Functions

Uploaded by

Mrinal Madhukar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Hardware accelerating Linux network

functions
Roopa Prabhu, Wilson Kok

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Agenda
● Recap: offload models, offload drivers
● Introduction to switch asic hardware
● L2 offload to switch ASIC
○ Mac Learning, ageing
○ stp handling
○ igmp snooping
○ vxlan
● L3 offload to switch ASIC

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Offload models ... ● Single consistent netlink based
UAPI

● Single kernel offload API to


rtnetlink api: offload to variety of hardware
bridge vlan add (nics, switch asics, ..)
bridge fdb add

Rtnetlink API PATH

Offload API path

kernel kernel
FDB
FDB (in sync with hw)
bridge
bridge bridge

port1 port2 port3 port4


port1 port2 portn

port1 port2 portn


port1 port1 port2
port1 port2 NIC1
switch asic
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

FDB CPU MEM


NIC1 FDB
NIC2
The bigger
tc OVSdb

picture... iproute2
mstpd
bridge
nftables
snmpd
quagga lldpd
brctl
bird

user swp1 swpN


kernel Bonds Bridges VXLAN

hw driver

Routing Bridge Netfilter


ARP Tables tc
Tables FDB/MDB Tables

kernel

HW
Routing Bridge
ARP Tables acls CPU MEM
Tables FDB/MDB

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


HW offload driver (kernel) switch ports
RTnetlink API

switchdev
mstp routing offload API
daemon
br0

swp2 swpN
user swp1
kernel
netdev_ops {
.ndo_fdb_add/del
.ndo_fib_add/del
}
FIB
hw driver

Bridge br0
FDB/MDB

kernel

HW
HW
Routing Bridge
ARP Tables acls CPU CPU ASIC MEM
MEM
Tables FDB/MDB

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


switch ports
HW offload driver (user space) rtnetlink API

RtNetlink
mstp routing rtnetlink notifications
daemon listener
br0
hw driver
swp2 swpN
user swp1
kernel

FIB

Bridge br0
FDB/MDB

kernel

HW
HW
Routing Bridge
ARP Tables acls CPU CPU ASIC MEM
MEM
Tables FDB/MDB

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


switch hardware
switch driver:

● Creates netdevs for front


panel ports
swp1 swp2 swp3
swpn ● Port netdevs only see traffic
kernel forwarded to the CPU port

● Sets hardware offload flag


switch
driver
NETIF_F_HW_SWITCH_OFFLOAD
on netdevs

switch hardware
netdevs for each front
panel ports
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada cpu port

1 2 3 n front panel ports


ip link show switch ports
# ip link show 55: swp53: <BROADCAST,MULTICAST> mtu 1500
1: lo: <LOOPBACK> mtu 16436 qdisc noqueue state qdisc noop state DOWN mode DEFAULT qlen 500
DOWN mode DEFAULT link/ether 00:e0:ec:27:4e:f7 brd ff:ff:ff:ff:ff:ff
link/loopback 00:00:00:00:00:00 brd 00:00:00:00: 56: swp54s0: <BROADCAST,MULTICAST> mtu 1500
00:00 qdisc noop state DOWN mode DEFAULT qlen 500
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> link/ether 00:e0:ec:27:4e:fb brd ff:ff:ff:ff:ff:ff
mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
57: swp54s1: <BROADCAST,MULTICAST> mtu 1500
link/ether 00:e0:ec:27:4e:b6 brd ff:ff:ff:ff:ff:ff qdisc noop state DOWN mode DEFAULT qlen 500
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> link/ether 00:e0:ec:27:4e:fc brd ff:ff:ff:ff:ff:ff
mtu 1500 qdisc pfifo_fast state UP mode DEFAULT
qlen 500 58: swp54s2: <BROADCAST,MULTICAST> mtu 1500
qdisc noop state DOWN mode DEFAULT qlen 500
link/ether 44:38:39:00:27:ac brd ff:ff:ff:ff:ff:ff
link/ether 00:e0:ec:27:4e:fd brd ff:ff:ff:ff:ff:ff
4: swp2: <BROADCAST,MULTICAST> mtu 9000 qdisc
pfifo_fast state DOWN mode DEFAULT qlen 500 59: swp54s3: <BROADCAST,MULTICAST> mtu 1500
qdisc noop state DOWN mode DEFAULT qlen 500
link/ether 00:e0:ec:27:4e:b8 brd ff:ff:ff:ff:ff:ff
link/ether 00:e0:ec:27:4e:fe brd ff:ff:ff:ff:ff:ff

[snip]
switch ports
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

management port
ethtool on switch port
$ethtool swp1
Settings for swp1: Transceiver: external
Supported ports: [ FIBRE ] Auto-negotiation: off
Supported link modes: 1000baseT/Full Current message level: 0x00000000
(0)
10000baseT/Full
Link detected: yes
Supported pause frame use: Symmetric
Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 10000Mb/s
Duplex: Full
Port: FIBRE Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

PHYAD: 0
Creating a hardware accelerated Linux bridge
device
# ip link add br0 type bridge

# ip link set dev swp1 master br0

# ip link set dev swp2 master br0

# bridge vlan add vid 10-20 dev swp1

# bridge vlan add vid 20-30 dev swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bonds as bridge ports
rtnetlink api: ● switch ASICS support
bridge vlan add Link aggregation
bridge fdb add

● bonding driver LAG


config is offloaded to the
switch ASIC

● fdb and vlan offloads go


kernel
through the bonding
FDB (in sync with hw)
driver
bridge bonding driver
bridge
bond0

port1 port2 portn-1 portn

port1 port2 portn-1 portn LAG


bond0 (portn-1, NIC1
switch asic rtnetlink API
portn
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

FDB CPU MEM


switchdev
offload API
Bridging hardware offload: packet path
kernel
known unicast (transit)
bridge
BUM*
system generated/
destined to system
swp1 swp2

switch asic

VLAN
swp1 swp2
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Bridging hardware offload: packet path

● Known unicast traffic not destined to system is


forwarded only in hardware
● BUM traffic is forwarded in hardware plus a copy MAY
be sent to kernel
● BUM traffic in kernel should not be forwarded again
(duplicate copies from hardware and software)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bridging hardware offload: fdb learn
br0

user rtnetlink swp1 swp2 swpN


kernel
fdb add/update switch driver
00:11:22:33:44:55
00:11:22:33:44:55 vlan 10
br0 intf_id 9876
swp2
Bridge br0 notification
FDB/MDB
hw events: learn/move

kernel

HW
CPU ASIC MEM

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bridging hardware offload: learning in HW
● Turn off learning in bridge driver
● switch driver listens to learn notifications from hardware
● converts hardware interface id and vlan to kernel ifindex of bridge
port (and vlan) and bridge
● sends netlink fdb update to kernel (userspace driver) or calls bridge
driver learn sync switchdev API (kernel driver)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bridging hardware offload: kernel ageing

br0

user rtnetlink swp1 swp2 swpN


kernel
fdb update
switch driver
fdb delete

Bridge br0 fdb delete


FDB/MDB
get fdb hit status

kernel

HW
CPU ASIC MEM

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bridging hardware offload: hardware ageing

br0

user rtnetlink swp1 swp2 swpN


kernel

switch driver
fdb delete

Bridge br0 fdb delete


FDB/MDB

kernel

HW
CPU ASIC MEM

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


Bridging hardware offload: ageing
Bridge driver very seldom sees packets with hardware offload. FDB
age is not up to date.
Hardware ageing
● bridge driver should not do ageing if hardware is doing it
● fdb show will need to get age from hardware during ‘show’, or need
periodic age update from switch driver
Kernel ageing
● definitely need periodic age update from switch driver

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


STP offload
STP
● bridge driver maintains STP states (either kernel STP or
userspace STP)
● bridge driver communicates STP states to switch driver
using switchdev offload API
● OR a switch driver in userspace can listen to STP state
notifications to update HW state

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


IGMP snooping offload
kernel
bridge
dev bridge port swp1 grp 224.1.2.3 temp

router ports on bridge: swp2 report


swp1 swp2

switch asic

query
data

swp1 swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada 224.1.2.3 Query
Join 224.1.2.3
IGMP snooping offload
● switch driver configures hardware to send IGMP reports
and queries to software
● bridge driver maintains IGMP group membership
● in some cases the reports or queries need to be re-
forwarded in the kernel

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


VXLAN offload - hardware vtep

swp3
172.16.21.150
MAC Destination
lo: 172.16.20.103 vxlan100
macC 172.16.21.150
macC
20.0.0.2 unknown 172.16.22.125

MAC Interface bridge


macA swp1

macB swp2

macC vxlan100
swp1 swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


macA macB
20.0.0.3 20.0.0.5
VXLAN offload - hardware vtep
Model
● VXLAN link as bridge port
○ bridging between local ports
○ VXLAN tunneling for remote MACs
● BUM traffic handling
○ multicast
○ using off-system replicator
■ could have a list of redundant replicators, need to choose ONE out of
the list of remote dests (per flow or per vni etc.)
○ self replication
■ vtep sends to a list of remote vteps, need to choose ALL of the list of
remote dests

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


VXLAN offload - ovsdb integration
Agent to translate ovsdb schema objects to kernel constructs.

OVSDB Linux kernel

logical switch vxlan link + bridge

physical switch tunnel_ip vxlan link local ip

logical port binding bridge member port, vlan

unicast remote mac + physical locator bridge fdb (mac, vlan, dst <remote ip>)

mcast remote mac “unknown” + physical vxlan link default dest


locator list

unicast local mac + physical locator bridge fdb (mac, vlan, local dev)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


l3 offloads
iproute rtnetlink API path
ip route add 1.1.1.1/32
nexthop via Quagga/Bird
192.168.200.3 nexthop offload API path
via 192.168.200.4

Network
manager

arping for
unresolved
nexthop
user swp1 swp2 swpN

kernel

switch driver
FIB neigh table

kernel

HW
Routing Tables Neigh tables CPU ASIC MEM

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada


l3 hardware offload
● Routes via routing daemons go to the kernel
● Unresolved next hops, point to CPU in HW
● switch driver tries to resolve them by probes
(arping)
● Refresh neigh entries for pkts routed through
hardware (hit bit provided by hardware)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

You might also like