0% found this document useful (0 votes)
44 views26 pages

2022 Nicmem Slides

Uploaded by

Chris John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views26 pages

2022 Nicmem Slides

Uploaded by

Chris John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

The Benefits of ‡

General-Purpose
On-NIC Memory ^

§
Boris Pismenny † Liran Liss §
Adam Morrison ‡ Dan Tsafrir †^

1
Data movers – definition
message
Apps that are
metadata
1. Network intensive (processed)
2. Process message metadata
3. Do not process message data

data
(unprocessed)

2
Data movers – types
1. Apps that process headers but not payload
− Examples: SW routers, NAT, load balancers, multicast, …

2. Apps that get item key and return item data


− Examples: key-value stores (Memcached, …), static webservers (Apache, …)

3
Data movers – types
1. Apps that process headers but not payload
− Examples: SW routers, NAT, load balancers, multicast, …

2. Apps that associate item key with item data


− Examples: key-value stores (Memcached, …), static webservers (Apache, …)

This talk is about the first, the second is in the paper

4
Data movers – cost
Example: software router Routing table
CPU Dst IP Output port
NIC
PCIe 172.16.1.0/24 0
172.16.2.0/24 1
Unnecessary wasteful data movement!
172.16.3.0/24 0
srcMAC newSrcMAC … …
dstMAC newDstMAC
64B
srcIP=172.16.1.100
dstIP= 172.16.2.1 dstIP= 172.16.2.1


data 1400B

5
Data movers – cost
Waste
• PCIe bandwidth
• Memory bandwidth
• CPU cycles (if mover isn’t zero-copy)
• LLC space & bandwidth
− DDIO allows NIC to directly accesses LLC

6
What we do in a nutshell
• Leave data on nicmem
• Copy only metadata

nicmem

7
NIC memory (nicmem) today
• Most NICs have internal SRAM
memory
− For stateful offloading
▪ RDMA, steering, SRIOV, …
− Size: few MBs

• Nicmem is underutilized
− Only 15% used by default in recent
NVIDIA (Mellanox) NICs
nicmem
• Nicmem is cheap & can easily be
enlarged
− About 0.2$ per MB at 7nm
− 3D stacking further reduces area + cost
8
Nicmem is like regular memory
• Expose nicmem as regular memory
− MMIO (like GPU frame buffers)
− Map into process virtual address space
− Dereference via regular pointers
− NIC queues can point to nicmem

hostmem
struct packet {
char *header;
char *data;
} nicmem
9
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem
1. NIC DMA writes packet

NIC PCIe mem


host
2
2. NF processes packet header
3. NIC DMA reads packet
1 3

Rx ring Tx ring header payload

10
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem (b) nicmem
1. NIC DMA writes packet

NIC PCIe mem


host
2 2
2. NF processes packet header
3. NIC DMA reads packet
1 3 1 3
• Nicmem nicmem
− Splits header and payload
− Stores payload on NIC memory Rx ring Tx ring header payload

11
Leveraging Nicmem for NFV
• Baseline: host memory stores
header and payload (a) host mem (b) nicmem (c) nicmem + inline
1. NIC DMA writes packet

NIC PCIe mem


host
2. NF processes packet header 2 2 2
3. NIC DMA reads packet

1 3 1 3 1 3
• Nicmem
− Splits header and payload nicmem nicmem
− Stores payload on NIC memory
Rx ring Tx ring header payload
• Header inlining
− Write header inside descriptor
− Back to one descriptor per packet
12
Bottlenecks
• NIC
• PCIe
• Memory bandwidth

13
Bottleneck: inside the NIC
• NIC Tx queue overflows

• Nicmem avoids the issue

(DPDK l3fwd running on a single core)

14
Bottleneck: PCIe
• PCIe links towards the host are full
− Increasing latency by 3x

• Nicmem avoids the issue

(DPDK l3fwd running on a two cores)

15
Bottleneck: memory bandwidth
• Memory bandwidth is 2.5x
− 15% lower throughput
− 10x higher latency

• Nicmem avoids the issue

(DPDK l3fwd running on eight cores)

16
Bottleneck: memory bandwidth

(DPDK l3fwd running on eight cores)

17
Additional experimental results
• Nicmem improves scalability
• Nicmem is better than DDIO
• Nicmem outperforms NFV hardware acceleration

18
Nicmem improves scalability

(FastClick NAT loaded with 200Gbps)

19
Nicmem reduces DDIO use

(FastClick NAT running on 14 cores and loaded with 200Gbps)

20
Nicmem is preferrable to NIC acceleration
• NIC memory can be used by
− Software as nicmem; or
− Hardware for per-flow acceleration state
• NIC acceleration eliminates CPU overhead
− But it doesn’t scale

(DPDK per-flow packet and byte counters running on 2 queues)


21
Conclusion
• Nicmem benefits data-mover applications

• Nicmem eliminates NIC, PCIe, and memory bandwidth bottlenecks

• Nicmem complements DDIO and outperform NFV acceleration in hardware

22
Conclusion
• Nicmem benefits data-mover applications

• Nicmem eliminates NIC, PCIe, and memory bandwidth bottlenecks

• Nicmem complements DDIO and outperform NFV acceleration in hardware

Have any question? Send me an email

Boris Pismenny: [email protected]

23
Non-data mover applications (1)

25
Non-data mover applications (2)

We find that header-data split is not free because it requires


both CPU and NIC to process two buffers per-packet.

26
Practical considerations
• Today’s nicmem is small
− Each core’s queue is 1.5MB

• Single nicmem queue eliminates the


PCIe bottleneck

• All nicmem queues reduces memory


bandwidth
(FastClick NAT running on 14 cores with 200Gbps)

27

You might also like