0% found this document useful (0 votes)
107 views27 pages

Light: A Scalable, High-Performance and Fully-Compatible User-Level TCP Stack

Light is a user-level TCP stack that aims to provide high performance and full compatibility without requiring application code modifications. It separates the Light stack and applications onto different CPU cores. Light takes over network-related APIs through dynamic linking and distinguishes file descriptors spaces maintained by the kernel and Light. It also implements user-level blocking APIs like epoll to monitor both network and non-network file descriptors.

Uploaded by

homepara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views27 pages

Light: A Scalable, High-Performance and Fully-Compatible User-Level TCP Stack

Light is a user-level TCP stack that aims to provide high performance and full compatibility without requiring application code modifications. It separates the Light stack and applications onto different CPU cores. Light takes over network-related APIs through dynamic linking and distinguishes file descriptors spaces maintained by the kernel and Light. It also implements user-level blocking APIs like epoll to monitor both network and non-network file descriptors.

Uploaded by

homepara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Light: A Scalable, High-performance and

Fully-compatible User-level TCP Stack

Dan Li (李丹)
Tsinghua University
Data Center Network Performance
Hardware Capability of Modern Servers

Multi-core CPU

Kernel stack becomes the Linux


performance bottleneck!

PCIe 3.0, 4.0, 5.0

100G~400Gbps NIC
Limitation of Linux Kernel

Interruption based I/O in high-speed traffic

Coupling sockets with VFS

Lack of connection locality

Shared accept core


Applicati
on… Kernel
CPU Usage Breakdown of Web Server (without
TCP/IP)…
(Web server (Lighttpd) Serving a 64 byte file) TCP/IP
34%
Packet
83% of CPU usage spent I/O
4%
inside kernel!
Prior Works

• Improvement to Linux kernel


– Latest Linux 4.14, Fastsocket, Mega-pipe, Affinity-
accept, IsoStack, StackMap
– Problems of the kernel stack remain except the per-
core accept queue
• User-level I/O
– DPDK, PFRing, Netmap, PSIO
• User-level TCP stack
– mTCP, IX, mOS, SeaStar, F-Stack
– Problem: need to modify the app. source code
Light Design Goal

• User-level TCP stack


• High performance
– High throughput
– Low (tail) latency
• Full compatibility
– Do not need to touch the application code at all
Challenge Caused by Full Compatibility

• Performance interference between application


and stack
– Polling-mode I/O
• Taking over network-related API
• Distinguishing FD spaces
– Read(), write()
• User-level blocking API
– send(), recv(), epoll()
• Fault detection and resource recycle
Architecture Overview (1)

App process 0 core 2 core 3 App process 1

Three Components of Light: Program Logic Program Logic

POSIX API POSIX API

• FM (Fronted Module) Frontend Module Frontend Module

Provides POSIX API for apps.

Command

Command
Shared Hugepage Memory

Queue

Queue
• BM (Backend Module) Light Epoll Light Socket

Polls the Command Queue and Backend Module Backend Module

processes the commands


Accept Ready Queue

Accept Ready Queue


Close Ready Queue

Close Ready Queue


RX Ready Queue

RX Ready Queue
TX Ready Queue

TX Ready Queue
sequentially.

• PPM (Protocol Process


Module) Protocol Process Module Protocol Process Module

Undertakes the major process logic of Light Process 0 core 0 core 1 Light Process 1
User Space
the TCP/IP/Ethernet protocols DPDK
Kernel
RSS Space
NIC
Architecture Overview (2)

Light-App Separation:
APP APP APP
Core 2 Applications
• Run the Light stack and Core 0 Core 1

apps on separate cores;


Stack Stack
• One-to-many and many-to- Core 0 Core 1 Light Stack
one match between the
stack and apps. RSS
NIC

Eliminate the performance interference between application and stack.


Design for Full Compatibility (1)

Taking over Network-related APIs:

• LD_PRELOAD Application
• dlsym

Dynamic Linker
Network-related APIs Other APIs

Hijacked by LD_PRELOAD

dlsym
Light FM Lib GNC C Lib
Design for Full Compatibility (2)

Distinguishing FD Spaces:
ssize_t read(int fd, void *buf, size_t
count)

Bottom-up Top-down
0

Other FDs Network-related FDs

Maintained by Kernel Maintained by Light

Light
glibc
Implementation
Design for Full Compatibility (3)

User-Level Blocking APIs: Application


epoll_create() 1.1 epoll_ctl() 2.1 epoll_wait() 3.1

• Epoll_wait(): Light epoll 6

Can monitor both network-related Listened FDs Event collection


FDs and non-network FDs with Socket
Non-
network
Non-
network
Network-
related
FD
blocking semantics. FD event event

kernel kernel kernel


1.2 2.2 5 3.2
epoll_create() epoll_ctl() epoll_wait()
• Other Blocking APIs: Kernel epoll
Leverage epoll_wait() to realize the Listened FDs Kernel Event collection
Non- Non- FIFO 5.2
blocking semantics. FIFO
FD
network network readable
FD event event
1. 4.
4.1
3 2
5.1

FIFO
Kernel Light FD
Design for Full Compatibility (4)

Fault Detection and Resource Recycle:


Fault Detection

Resource Recycle
App 1
3
Epoll Monitor
IPC socket 1
App 2
IPC socket 2
1

2 IPC Socket 2 Event

Kernel
Design for High Performance (1)

(1) Benefits from DPDK:


• General Techniques
PMD, Zero-copy, Hugepage, etc.

• Lockless Shared-Queue Based IPC

(2) TCB Management


• Local Listen Table and Established Table

• Dedicated Accept Queues


Design for High Performance (2)

(3) Full Connection Locality

• Core Locality for Passive Connections

• Core Locality for Active Connections:


Use soft-RSS to compute and record the stack core index in the socket
object. In this way, the reply packets can be steered to the same core as
the original packets.
Implementation

 System Configuration
 Ubuntu 18.04 (kernel version 4.15.0-13-generic)
 DPDK 17.02

 Code
 18263 lines of C code (excluding DPDK Library and the
protocol stack ported from the kernel)

 APIs
 Most TCP related APIs have been realized.
Evaluation (1)

• Network Throughput and Multi-core Scalability


We use two powerful machines:
1) One runs wrk to generate a high workload of http requests;
2) Another runs Nginx on kernel stack or Light stack.

Request

Response

wrk Nginx Server


Evaluation (2)

• Network Throughput and Multi-core Scalability

• Nginx on Light gets 56% higher


throughput on 8 CPU cores
and achieves a linear speedup
ratio of 0.89 in terms of
network throughput.

The RPS of Nginx running on Light and Linux


kernel stack against the number of CPU cores
used. The message size is set as 64 Bytes.
Evaluation (3)

• Network Throughput and Multi-core Scalability

• Nginx on Light can consistently


achieve more than 50% RPS
compared with kernel stack.

The RPS of Nginx running on Light and


Linux kernel stack against the message
size. The number of CPU cores used is 8.
Evaluation (4)

• Network Latency (1)


Two machines:
1) One runs wrk to generate a high workload of http requests;
2) Another runs Nginx on kernel stack or Light stack.

Request

Response

wrk Nginx Server


Evaluation (5)

• Network Latency (1)

• Light can reduce the tail


latency by two orders of
magnitude compared to kernel
stack.

CDF of round-trip latency for Nginx on


Light and kernel stack.
Evaluation (6)

• Network Latency (2)


We use two machines to run as NetPIPE server and NetPIPE client
respectively both on Light stack or kernel stack.

Request

Response

NetPIPE Server NetPIPE Server


Evaluation (7)

• Network Latency

• Compared with Linux kernel


stack, Light can reduce the
average latency by above 40%,
with a maximum of 52%.

One-way latency for NetPIPE on Light and


kernel stack.
Light in DMM (1)
• Light should develop adapter-library
(Light-adapter) for DMM to integrate DMM
for communication with Light. Light- nSocket API

adapter must implement the interfaces


defined by DMM, including the socket Kernel Adapt Light-adapter
APIs, epoll APIs, fork APIs and the nRD
nstack
resource recycle APIs. adapter

• Light should integrate the DMM Light stack

adapter-library(nstack adapter),
developed by DMM. The library utilizes HAL

the plug-in interface to provide rich


NIC
features, such as resource (shared
memory) and event management.
Light in DMM (2)

• Key Techniques Web APP


Video Online
streaming gaming

 Distributed and Centralized nRD deployment


POSIX Socket-compatible API (LD_PRELOAD)
(LRD & CRD) provide end-to-end protocol Socket
Layer
Socket Bridge(SBR)
orchestration Socket
MUX

Protocol Orchestrator

nRD
 Stack-transparent “Protocol Routing” (Stack VPP
Host TLDK Light …
REST
Stack

orchestrator) L2~L4
Data-plane EAL REST

IPv4 IPv6
 POSIX compatible socket APIs input/output input/output

Honeycomb
User

 Flexible socket API redirection and mapping DPDK input Space

(SBR) Kernel
stack
Kernel
Space
DMM
VPP
 Flexible APIs for integration of third party stacks NIC 3rd Party stack

(EAL)
 Multiple stack instances support
 Multiple I/O engines support
Future Work

• Network operating system out of kernel


• Redesign PPM module
– New transport protocol
– New congestion control mechanism
• Virtualization / container environment
• Integrating Light into DMM framework
Thanks!

You might also like