2022 Nicmem Slides
2022 Nicmem Slides
The Benefits of ‡
General-Purpose
On-NIC Memory ^
§
Boris Pismenny † Liran Liss §
Adam Morrison ‡ Dan Tsafrir †^
1
Data movers – definition
message
Apps that are
metadata
1. Network intensive (processed)
2. Process message metadata
3. Do not process message data
data
(unprocessed)
2
Data movers – types
1. Apps that process headers but not payload
− Examples: SW routers, NAT, load balancers, multicast, …
3
Data movers – types
1. Apps that process headers but not payload
− Examples: SW routers, NAT, load balancers, multicast, …
4
Data movers – cost
Example: software router Routing table
CPU Dst IP Output port
NIC
PCIe 172.16.1.0/24 0
172.16.2.0/24 1
Unnecessary wasteful data movement!
172.16.3.0/24 0
srcMAC newSrcMAC … …
dstMAC newDstMAC
64B
srcIP=172.16.1.100
dstIP= 172.16.2.1 dstIP= 172.16.2.1
…
data 1400B
…
5
Data movers – cost
Waste
• PCIe bandwidth
• Memory bandwidth
• CPU cycles (if mover isn’t zero-copy)
• LLC space & bandwidth
− DDIO allows NIC to directly accesses LLC
6
What we do in a nutshell
• Leave data on nicmem
• Copy only metadata
nicmem
7
NIC memory (nicmem) today
• Most NICs have internal SRAM
memory
− For stateful offloading
▪ RDMA, steering, SRIOV, …
− Size: few MBs
• Nicmem is underutilized
− Only 15% used by default in recent
NVIDIA (Mellanox) NICs
nicmem
• Nicmem is cheap & can easily be
enlarged
− About 0.2$ per MB at 7nm
− 3D stacking further reduces area + cost
8
Nicmem is like regular memory
• Expose nicmem as regular memory
− MMIO (like GPU frame buffers)
− Map into process virtual address space
− Dereference via regular pointers
− NIC queues can point to nicmem
hostmem
struct packet {
char *header;
char *data;
} nicmem
9
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem
1. NIC DMA writes packet
10
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem (b) nicmem
1. NIC DMA writes packet
11
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem (b) nicmem (c) nicmem + inline
1. NIC DMA writes packet
1 3 1 3 1 3
• Nicmem
− Splits header and payload nicmem nicmem
− Stores payload on NIC memory
Rx ring Tx ring header payload
• Header inlining
− Write header inside descriptor
− Back to one descriptor per packet
12
Bottlenecks
• NIC
• PCIe
• Memory bandwidth
13
Bottleneck: inside the NIC
• NIC Tx queue overflows
14
Bottleneck: PCIe
• PCIe links towards the host are full
− Increasing latency by 3x
15
Bottleneck: memory bandwidth
• Memory bandwidth is 2.5x
− 15% lower throughput
− 10x higher latency
16
Bottleneck: memory bandwidth
17
Additional experimental results
• Nicmem improves scalability
• Nicmem is better than DDIO
• Nicmem outperforms NFV hardware acceleration
18
Nicmem improves scalability
19
Nicmem reduces DDIO use
20
Nicmem is preferrable to NIC acceleration
• NIC memory can be used by
− Software as nicmem; or
− Hardware for per-flow acceleration state
• NIC acceleration eliminates CPU overhead
− But it doesn’t scale
22
Conclusion
• Nicmem benefits data-mover applications
23
Non-data mover applications (1)
25
Non-data mover applications (2)
26
Practical considerations
• Today’s nicmem is small
− Each core’s queue is 1.5MB
27