0% found this document useful (0 votes)

23 views

Demystify Ebpf Jit Compiler

The document discusses just-in-time (JIT) compilers and how they work. A JIT compiler generates machine instructions during program execution based on runtime information, unlike a traditional static compiler which compiles all instructions before execution. The document provides a simple example of how a JIT compiler could generate and run machine code by allocating memory, writing instruction bytes to it, and executing the generated function. It also outlines the key stages of an eBPF JIT compiler - verification and code generation - and how it can target different CPU architectures.

Uploaded by

Wei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Demystify Ebpf Jit Compiler

Uploaded by

Wei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Demystify eBPF JIT Compiler

Jiong Wang, Netronome

September 11, 2018

© 2018 NETRONOME
Agenda
Ø What is JIT Compiler?
Ø What is eBPF JIT Compiler?
Ø eBPF JIT Compiler – Verification Stage
Ø eBPF JIT Compiler – Code Gen Stage
Ø Netronome Flow Processor (NFP) Code Gen Back-end
Ø eBPF JIT Compiler Emerging Features

© 2018 NETRONOME 2
Types of Compilers
A Traditional Static Compiler
Ø Translates source language to machine instructions before execution
Ø Good fit for statically typed language
Ø When people mention compiler they might actually mean toolchain
Ø Compilation → Assembling → Linking → Loading → Execution
Before Execution

Static Data .section data

.4byte 0x1
int a = 0x1;
Code .section text ELF Runtime
int foo() sub sp, sp, 4 Sections Relocations Jump to Entry!
{ mov r0, 0x20 and and
int b = 0x20; Stack sw sp, r0 ELF Allocate
return a + b; ld_pc r0, a_addr Segments Segments
} ld r1, sp
add r0, r1, r0
ret

gcc test.c -v will show you more details!

© 2018 NETRONOME 3
Types of Compilers (Continued)
JIT Compiler
Ø Just In Time During Execution Runtime
Ø Machine instruction generation depends on runtime information
• Dynamically typed language
• Portable without re-compilation
• Low level IR to expose low level information for unified check
x86

Source IR IR ARM
Language (Intermediate Optimization
Representation) Verification MIPS

PPC
4

© 2018 NETRONOME
Types of Compilers (Continued)
JIT Compiler
Ø Just In Time During Execution Runtime
Ø Machine instruction generation depends on runtime information
• Dynamically typed language
• Portable without re-compilation
• Low level IR to expose low level information for unified check
Before Execution x86

Source IR IR Arm
Language (Intermediate Optimization
Representation) Verification MIPS

PPC
5

Source IR IR Arm
Language (Intermediate Optimization
Representation) Verification MIPS

PPC
6

© 2018 NETRONOME
What is a JIT Compiler?
int main(int argc, char *argv[]) {
Ø How does a JIT // mov eax, 0
compiler really look unsigned char mov[] = {0xb8, 0x00, 0x00, 0x00, 0x00};
// ret
like? Here is a very unsigned char ret[] = {0xc3};
simple example. int num = atoi(argv[1]);
memcpy(&mov[1], &num, 4);

void *mem = mmap(NULL, sizeof(mov) + sizeof(ret),

PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);

memcpy(mem, mov, sizeof(mov));

memcpy(mem + sizeof(mov), ret, sizeof(ret));

int (*func)() = mem;

return func();
}

Modified from http://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html

© 2018 NETRONOME 7
What is a JIT Compiler?
Ø How does a JIT int main(int argc, char *argv[]) {
// mov eax, 0
compiler really look unsigned char mov[] = {0xb8, 0x00, 0x00, 0x00, 0x00};
// ret
like? Here is a very unsigned char ret[] = {0xc3};
Any Program
simple example.
int num = atoi(argv[1]);
memcpy(&mov[1], &num, 4);
Allocate
void *mem = mmap(NULL, sizeof(mov) +sizeof(ret),
PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);

memcpy(mem, mov, sizeof(mov));

memcpy(mem + sizeof(mov), ret, sizeof(ret)); Block of
Memory
int (*func)() = mem;

return func();
}

© 2018 NETRONOME 8
What is a JIT Compiler?
Ø How does a JIT int main(int argc, char *argv[]) {
// mov eax, 0
compiler really look unsigned char mov[] = {0xb8, 0x00, 0x00, 0x00, 0x00};
// ret
like? Here is a very unsigned char ret[] = {0xc3};
Any Program
simple example.
int num = atoi(argv[1]);
memcpy(&mov[1], &num, 4); Write
void *mem = mmap(NULL, sizeof(mov) +sizeof(ret), Instruction
PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);

memcpy(mem, mov, sizeof(mov));

memcpy(mem + sizeof(mov), ret, sizeof(ret)); Block of
Memory
int (*func)() = mem;

return func();
}

© 2018 NETRONOME 9
What is a JIT Compiler?
Ø How does a JIT int main(int argc, char *argv[]) {
// mov eax, 0
compiler really look unsigned char mov[] = {0xb8, 0x00, 0x00, 0x00, 0x00};
// ret
like? Here is a very unsigned char ret[] = {0xc3};
Any Program
simple example.
int num = atoi(argv[1]);
memcpy(&mov[1], &num, 4); Write
void *mem = mmap(NULL, sizeof(mov) +sizeof(ret), Instruction
PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);

memcpy(mem, mov, sizeof(mov));

memcpy(mem + sizeof(mov), ret, sizeof(ret)); Block of
Memory
int (*func)() = mem;

return func();
}

© 2018 NETRONOME 10
What is a JIT Compiler?
Ø How does a JIT int main(int argc, char *argv[]) {
// mov eax, 0
compiler really look unsigned char mov[] = {0xb8, 0x00, 0x00, 0x00, 0x00};
// ret
like? Here is a very unsigned char ret[] = {0xc3};
Any Program
simple example.
int num = atoi(argv[1]);
memcpy(&mov[1], &num, 4); Jump &
void *mem = mmap(NULL, sizeof(mov) +sizeof(ret), Execute
PROT_WRITE | PROT_EXEC,
MAP_ANON | MAP_PRIVATE, -1, 0);

memcpy(mem, mov, sizeof(mov));

memcpy(mem + sizeof(mov), ret, sizeof(ret)); Block of
Memory
int (*func)() = mem;

return func();
}

© 2018 NETRONOME 11
What is a JIT Compiler - Summary
Ø Generate machine instructions at runtime, write them to executable memory and
execute them there directly
• No assembling stage, direct encoding generation
• No linking stage, direct absolute address generation
Ø Memory region allocated should be protected
• Very risky to leave it as both executable and writable
• Classic buffer overflow writing to stack could be seen as JIT compilation
Ø Runtime overhead
• Only JIT compiles hot code, trace compilation etc.

© 2018 NETRONOME 12
What is an eBPF JIT Compiler?
Ø The whole idea is to run user-supplied
programs inside kernel
Ø Why not kernel module .ko? Kernel Space
• C is permissive, pointer is exploitable

Ø What’s wrong with checking .ko?

.ko Supposed
Resource

• It is compiled already, different architectures are

different in ISA (instruction set architecture), we
will end up with a bunch of .ko verifiers doing Shouldn’t access other resource

similar things and they can't be merged due to

ISA differences
Ø We want to check on unified representation. It was
BPF (Berkeley Packet Filter), now enhanced as
eBPF (Extended BPF)

© 2018 NETRONOME 13
What is an eBPF JIT Compiler?
eBPF representation
Ø Designed to be set of instructions and are close to x86-64, AArch64 etc.
Ø 64-bit instruction encoding and 64-bit register with 32-bit sub-register
Ø Usual data manipulation instructions and control transfer instructions
Ø Support both interpretation execution and JIT execution
Ø KERNEL/Documentation/bpf/bpf_design_QA.rst

35: 57 02 00 00 3f ff 00 00 r2 &= 65343 (AND)

36: 55 02 c9 02 00 00 00 00 if r2 != 0 goto +713 (JMP)
37: 71 82 17 00 00 00 00 00 r2 = *(u8 *)(r8 + 23) (LOAD)
38: 7b 3a a8 ff 00 00 00 00 *(u64 *)(r10 - 88) = r3 (STORE)

© 2018 NETRONOME 14
What is an eBPF JIT Compiler?
eBPF software stack for JIT execution
Ø Compile C into eBPF sequence
Ø Check eBPF sequence
Ø JIT compile and execute the sequence

© 2018 NETRONOME 15
What is an eBPF JIT Compiler?
Ø Components in yellow are sitting inside kernel space
Ø Verifier needs to walk instructions, quite a few information are collected and
shared with architecture code generation back-ends
Ø They work closely, and form an eBPF JIT compiler as a whole

eBPF JIT Compiler

© 2018 NETRONOME 16
eBPF JIT Compiler - Verification Stage
Ø Control flow check
• No function call to an unknown function
• No fall through from one function to the next one
• No jump destination is out of range
• No unreachable instruction
• No loop

© 2018 NETRONOME 17
eBPF JIT Compiler - Verification Stage
Ø Individual instruction check based on static information
• Divide by zero
• Shifts with invalid shifting amount
• Invalid stack access (unaligned, out of range etc)

if ((opcode == BPF_LSH || opcode == BPF_RSH || /* BPF program can access up to 512 bytes of stack space. */
opcode == BPF_ARSH) && BPF_SRC(insn->code) == BPF_K) { #define MAX_BPF_STACK 512
int size = BPF_CLASS(insn->code) == BPF_ALU64 ? 64 : 32;
if (insn->imm < 0 || insn->imm >= size) { if (off >= 0 || off < -MAX_BPF_STACK) {
verbose(env, "invalid shift %d\n", insn->imm); verbose(env, "invalid stack off=%d size=%d\n", off, size);
return -EINVAL; return -EACCES;
} }
}

© 2018 NETRONOME 18
eBPF JIT Compiler - Verification Stage
Ø Individual instruction check based on dynamic information
• Value range based checks. For example, out of range access to packet data
• Register status based checks. For example, read from uninitialized register
mov r0, r2
exit

• Stack status based checks. For example, a corruption spilled pointer on stack
*(u64 *)(r10, -8) = r1
/* mess up with R1 pointer on stack */
*(u8 *)(r10, -7) = 0x23
/* fill back into R0 should fail */
r0 = *(u64 *)(r10, -8)
exit

Ø These checks requires tracking data flow dynamically at an instruction level

© 2018 NETRONOME 19
eBPF JIT Compiler - Verification Stage
Ø Kernel space pointer leak checks under unprivileged mode
• Such information is highly useful to an attacker once leaked to userspace
• No pointer arithmetic and comparison
• No store back to user space accessible storage, like map, packet etc.
• https://lwn.net/Articles/660331/ https://lkml.org/lkml/2015/10/5/687
"unpriv: return pointer", "unpriv: spill/fill of ctx 2",
"unpriv: add const to pointer", "unpriv: spill/fill of ctx 3",
"unpriv: add pointer to pointer", "unpriv: spill/fill of ctx 4",
"unpriv: neg pointer", "unpriv: spill/fill of different pointers stx",
"unpriv: cmp pointer with const", "unpriv: spill/fill of different pointers ldx",
"unpriv: cmp pointer with pointer", "unpriv: write pointer into map elem value",
"unpriv: check that printk is disallowed", "unpriv: partial copy of pointer",
"unpriv: pass pointer to helper function", "unpriv: pass pointer to tail_call",
"unpriv: indirectly pass pointer on stack to helper function", "unpriv: cmp map pointer with zero",
"unpriv: mangle pointer on stack 1", "unpriv: write into frame pointer",
"unpriv: mangle pointer on stack 2", "unpriv: spill/fill frame pointer",
"unpriv: read pointer from stack in small chunks", "unpriv: cmp of frame pointer",
"unpriv: write pointer into ctx", "unpriv: adding of fp",
"unpriv: spill/fill of ctx", "unpriv: cmp of stack pointer",
© 2018 NETRONOME 20
eBPF JIT Compiler - Verification Stage
Ø Any list for all supported verifications?
• KERNEL/toos/testing/selftests/bpf/test_verifier.c is a good reference
• ~900 tests
• Tests are categorized to some extent using prefix, for example verifications related
with another major feature bpf-to-bpf function could be listed using
cat tools/testing//selftestsbpf/test_verifier.c | grep "\"calls:"
"calls: basic sanity",
"calls: not on unpriviledged",
"calls: div by 0 in subprog",
"calls: multiple ret types in subprog 1",
"calls: multiple ret types in subprog 2",
"calls: overlapping caller/callee",
…

• “unpriv:”, “call:”, “XDP,” are interesting categories worth having a look

© 2018 NETRONOME 21
Verification Stage - Algorithms
Ø Control flow checks
• Function partition
• Depth first walk to detect loop and unreachable instructions

Ø Data flow tracking

• All-code-paths walker to collect information instruction by instruction and path by path

© 2018 NETRONOME 22
Verification Stage - Algorithms
Ø Control flow checks
• Function partition
• Depth first walk to detect loop and unreachable instructions

Ø Data flow tracking

• All-code-paths walker to collect information instruction by instruction and path by path
• Code path prune for avoiding walking paths proven to be safe

© 2018 NETRONOME 23
Verification Stage - Algorithms
Ø Control flow checks
• Function partition
• Depth first walk to detect loop and unreachable instructions

Ø Data flow tracking

• All-code-paths walker to collect information instruction by instruction and path by path
• Code path prune for avoiding walking paths proven to be safe

• Path sensitive register liveness tracking to release more prune opportunities

© 2018 NETRONOME 24
Algorithm - Function Partition
Ø Input eBPF sequence is a sequence of functions
Ø Partition the sequence to make them explicit
0: r3 += 1 0: r3 += 1

1: if r0 < r1 goto +1 1: if r0 < r1 goto +1

eBPF CALL instruction encoding 2: call +5 2: call +5

3: goto L2 3: goto L2 func 0

4: r4 += 2 4: r4 += 2
op dst_reg src_reg off imm
5: r4 += r3 5: r4 += r3

6: r0 = r4 6: r0 = r4

BPF_CALL 7: exit 7: exit

8: r2 += r3 8: r2 += r3
call destination = insn_index + imm + 1 9: call +3 9: call +3

10: r4 += 2 10: r4 += 2 func 1

func 1 start = 2 + 5 + 1 = 8
11: r4 += r3 11: r4 += r3
func 2 start = 9 + 3 + 1 = 13
12: exit 12: exit

13: r0 += 1 13: r0 += 1
func 2
... ...
© 2018 NETRONOME 25
Algorithm - Depth First Walk
Ø Depth first walk is used to detect loop and unreachable instruction
Branch not taken Branch taken
0: r3 += 1
0: r3 += 1 0: r3 += 1
1: if r0 < r1 goto L1 1: if r0 < r1 goto L1 1: if r0 < r1 goto L1
2: r1 += r2
3: goto L2
L1: 2: r1 += r2 L1:
3: goto L2 4: r4 += r3
4: r4 += r3
L2:
5: r0 = r4 L2: L2:
6: exit 5: r0 = r4 5: r0 = r4
6: exit 6: exit
Two auxiliary array to help the walk:
insn_stack [MAX_INSN_NUM]
insn_status[MAX_INSN_NUM]

© 2018 NETRONOME 26
Algorithm - Depth First Walk
Ø Depth first walk is used to detect loop and unreachable instruction
Branch not taken Branch taken
0: r3 += 1
0: r3 += 1 0: r3 += 1
1: if r0 < r1 goto L1 1: if r0 < r1 goto L1 1: if r0 < r1 goto L1
2: r1 += r2
3: goto L2
L1: 2: r1 += r2 L1:
3: goto L2 4: r4 += r3
4: r4 += r3
L2:
L2: L2:
5: if r4 < 10 goto L1 5: if r4 < 10 goto L1 5: if r4 < 10 goto L1
6: exit
Back Edge
6: exit L1:
Two auxiliary array to help the walk:
insn_stack [MAX_INSN_NUM] 4: r4 += r3
insn_status[MAX_INSN_NUM]

© 2018 NETRONOME 27
Algorithm - Depth First Walk
Ø Depth first walk is used to detect loop and unreachable instruction
Branch not taken Branch taken
0: r3 += 1
0: r3 += 1 0: r3 += 1
1: if r0 < r1 goto L1 1: if r0 < r1 goto L1 1: if r0 < r1 goto L1
2: r1 += r2
3: goto L2
L1: 2: r1 += r2 L1:
3: goto L2 4: r4 += r3
4: r4 += r3
L2:
L2: L2:
5: if r4 < 10 goto L1 5: if r4 < 10 goto L1 5: if r4 < 10 goto L1
6: exit
Back Edge
6: exit L1:
Two auxiliary array to help the walk:
insn_stack [MAX_INSN_NUM] 4: r4 += r3
insn_status[MAX_INSN_NUM]

© 2018 NETRONOME 28
Algorithm - Code Path Walker
Ø Execute all code path
• Accurately knows program status (registers, memory etc.) at any point
• Track and propagate scalar value range, pointer types etc.
• Code path walker is time intensive
• Program will be rejected if it is too complex (BPF_COMPLEXITY_LIMIT_INSNS)

r3 += 1 r3 += 1 r3 += 1
if r0 < r1 goto L1 if r0 < r1 goto L1 if r0 < r1 goto L1
r4 = *(u8 *)(r2 - 8) r4 = *(u8 *)(r2 - 8) r4 = *(u8 *)(r2 - 8)
goto L2 goto L2 goto L2
L1: L1: L1:
r4 += 2 r4 += 2 r4 += 2
L2: L2: L2:
r4 += r3 r4 += r3 r4 += r3
r0 = r4 r0 = r4 r0 = r4
exit exit exit

collect ALU info

collect JMP info
collect MEM info

30
© 2018 NETRONOME
Algorithm - Code Path Walker
Input BPF sequence Branch taken Branch not taken

This sub-sequence starting from L2 towards the exit is executed for both.
Can we avoid walking through it again?

© 2018 NETRONOME
31
Algorithm - Code Path Prune
Ø No need to walk the sequence again given we have SAFER status when reaching
the starting point of the sequence. Those points are where paths merged.
Branch taken Branch not taken

r3 += 1 r3 += 1
if r0 < r1 goto L1 if r0 < r1 goto L1
r4 = *(u8 *)(r2 - 8) r4 = *(u8 *)(r2 - 8)
goto L2 Path State B goto L2 ß B
L1: (reg, mem)
L1:
Path State A
r4 += 2 ß A (reg, mem) r4 += 2
L2: L2:
r4 += r3 N, walk again r4 += r3
eq?
r0 = r4 r0 = r4
Y, skip the walk
exit exit

© 2018 NETRONOME 32
Algorithm - Register Liveness Tracking
Ø The purpose of tracking register liveness is to make more path prune happen
Branch taken Branch not taken
r3 += 1 r3 += 1
if r0 < r1 goto L1 if r0 < r1 goto L1
r4 = *(u8 *)(r2 - 8) r4 = *(u8 *)(r2 - 8)
goto L2 goto L2 ß B
0 < r0 < 20
L1: L1:
r4 += 2 0 < r0 < 10 r4 += 2
ßA
L2: L2:
r4 += r3 r4 += r3
eq?
r0 = r4 r0 = r4
r0 += 1 It doesn’t matter when we have a r0 += 1
exit unsafe value in the code path, given exit
the value is NOT used in the to-be-
pruned path
33
© 2018 NETRONOME
Verification Stage - Challenges
Ø Complex program will require too much verification resources
Ø Use classic static flow analysis techniques might lower resource consuming and
making verifier more scalable, however would be challenging to guarantee the
same level of securities.

© 2018 NETRONOME 34
Code Gen Stage - Overview
Ø After eBPF sequence passed verification, Code Gen stage start which translates
eBPF into native machine instruction
Ø This is fairly straightforward as eBPF instructions could be one-to-one mapped to
native machine instruction on most modern architectures
Ø Architecture Code Gen back-ends, for example x86-64/AArch64 etc., typically
share the same Code Gen flow.

estimate JIT image size

(dry run)
allocate memory

prepare stack pointer, do callee-saved etc.

build_prologue

one to one insn map, aarch64 example:

case BPF_ALU | BPF_SUB | BPF_K:
JIT image contains
case BPF_ALU64 | BPF_SUB | BPF_K: native machine
build_body emit_a64_mov_i(is64, tmp, imm, ctx); instructions
emit(A64_SUB(is64, dst, dst, tmp), ctx);
break;

restore stack pointer/callee-saved etc.

build_epilogue

arch/x86,arm64/net/bpf_jit_comp.c are quite self-explained

© 2018 NETRONOME 36
Code Gen Stage - Linking
Ø eBPF does not allow global variables and random function calls, no external symbol
access in general which save linking jobs
Ø eBPF does allow call to special runtime helper functions (map), helper address
needs fixup
• Function ID is encoded in “imm” field of eBPF call instruction
• ID needs to be mapped to address and call instruction should be relocated
verifier perpare absoluate address back-end relocate call instruction
case BPF_JMP | BPF_CALL: case BPF_JMP | BPF_CALL:
{ {
switch (insn->imm) { const u8 r0 = bpf2a64[BPF_REG_0];
case BPF_FUNC_map_lookup_elem: const u64 func = (u64)__bpf_call_base + imm;
insn->imm =
BPF_CAST_CALL(ops->map_lookup_elem) - __bpf_call_base; emit_a64_mov_i64(tmp, func, ctx);
continue; emit(A64_BLR(tmp), ctx);
case BPF_FUNC_map_update_elem:
...
64-bit address is too long to encode, keep offset instead

© 2018 NETRONOME 37
Code Gen Stage - Linking for BPF-to-BPF Call
Ø bpf-to-bpf function call requires relocating function call address as well
Ø bpf-to-bpf function call requires JIT compile all eBPF functions first, then we know start
address for each function, and need a second round compilation to relocate all call
instructions.
Ø Architecture back-end doesn’t know function partition, Verifier drives the whole Code Gen

for (i = 0; i < func_cnt; i++)

arch_jit_hook_for_each_func

scan all insn, rewrite imm field of call insn to

callee_start_address - __bpf_call_base

// redo the JIT

for (i = 0; i < func_cnt; i++)
arch_jit_hook_for_each_func

© 2018 NETRONOME 38
Code Gen Stage - Summary
Ø Architecture Code Gen hooks do instruction mapping
Ø Verifier do linking and all other jobs which are generic for all architectures
Ø Verifier and architecture back-ends work closely to form the whole eBPF JIT
compiler

© 2018 NETRONOME 39
NFP Code Gen Back-end - NFP Introduction
Ø Netronome Agilio® SmartNIC contains a powerful Network Flow Processor (NFP)

Ø eBPF aims to offer programmability to kernel network stack

Ø NFP and eBPF matches each other, perfect to offload eBPF to NFP, yes we can!

Ø NFP Code Gen back-end translates eBPF instructions into NFP instructions, write
them to Agilio SmartNIC, and run them there on the fly

Ø In terms of the JIT compiler Code Gen, instruction set architecture is the most
relevant part, so we will focus on this in the following slides

Ø NFP itself is a very powerful chip which offers many other capabilities, please see
our website (Document Library) for details

© 2018 NETRONOME 40
NFP Instruction Set Architecture (ISA)
Ø 32 x 32-bit general purpose registers for each eBPF context
Ø 8K instructions for each eBPF context
Ø Rich arithmetic and logic instructions
Ø Powerful memory unit allowing unaligned access and bulk access
Ø Global absolute jump within the whole instruction buffer
Ø Clustered register sets offers generous registers other than general purpose
registers

© 2018 NETRONOME 41
NFP Code Gen Back-end - Overview
Ø Generally works the same way as the other eBPF arch back-ends.
• Have verified eBPF sequence as input
• Allocate JIT image on Agilio SmartNIC
• One to one map eBPF instruction, and writes to JIT image

Ø However, there are some differences:

• NFP has a unique and powerful memory unit which needs special support

• NFP has other registers besides general purpose registers, need to utilize them
• NFP has 32-bit general purpose register instead of 64-bit
• NFP wants to do all linking jobs by itself

© 2018 NETRONOME 42
NFP Code Gen Back-end - Overview
Ø Differences (continued):
• More strict verification
• Maps are now located on SmartNIC, need to redirect all map access. Call firmware
functions and get results. Extra Code Gen about function call and a couple of
instruction relocation.

eBPF NFP

r3 += 1 alu[*l$index0[2], --, B, gprB_5]

if r0 < r1 goto L1 immed[gprA_4, 0x99], gpr_wrboth
r4 = *(u8 *)(r2 - 8) ld_field[*l$index0, 1000, gprB_4, <<24]
goto L2 alu[gprA_6, gprA_6, +, 0xc], gpr_wrboth

© 2018 NETRONOME 43
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization
• Classic RISC arch use load/store model, can only access register width at maximum
per instruction
• NFP has special internal bus allows bulk memory access and could access 128 bytes
at maximum per instruction
• eBPF ISA follows RISC load/store model, memcpy function are expanded into
load/store pair by LLVM compiler due to inline memcpy for small copy size is common
techniques and eBPF does not allow call external function

• This is not good for NFP, or actually might be not good for other architectures. It is
better to let the architecture back-ends decide how to expand memcpy

© 2018 NETRONOME 44
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization
LLVM NFP JIT Compiler
C source eBPF sequence reconstructed semantics

void cal_align4: void

cal_align4(int *a, int *b) cal_align4(int *a, int *b)
r3 = *(u32 *)(r2 + 28)
{ *(u32 *)(r1 + 28) = r3 {
memcpy(a, b, 32); r3 = *(u32 *)(r2 + 24) memcpy(a, b, 32);
*(u32 *)(r1 + 24) = r3
} }
r3 = *(u32 *)(r2 + 20)
*(u32 *)(r1 + 20) = r3
r3 = *(u32 *)(r2 + 16)
NFP JIT Compiler
*(u32 *)(r1 + 16) = r3
r3 = *(u32 *)(r2 + 12)
*(u32 *)(r1 + 12) = r3 NFP memory bulk access
r3 = *(u32 *)(r2 + 8)
*(u32 *)(r1 + 8) = r3 mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]
r3 = *(u32 *)(r2 + 4) alu[$xfer_0, --, B, $xfer_0]
*(u32 *)(r1 + 4) = r3
...
r3 = *(u32 *)(r2 + 0)
*(u32 *)(r1 + 0) = r3 mem[write32_swap, $xfer_0, gprA_5, 0xc, 7]
45
© 2018 NETRONOME
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization
Instruciton Scheduling

eBPF sequence another sequence

Instruction schduling done by compiler will make it
cal_align4: cal_align4: hard for eBPF JIT compilers to reconstruct memcpy
r3 = *(u32 *)(r2 + 28)
semantics, Netronome contributed new LLVM option
r3 = *(u32 *)(r2 + 28)
*(u32 *)(r1 + 28) = r3 *(u32 *)(r1 + 28) = r3
r3 = *(u32 *)(r2 + 24) r3 = *(u32 *)(r2 + 24) llc -bpf-expand-memcpy-in-order
*(u32 *)(r1 + 24) = r3 *(u32 *)(r1 + 24) = r3
r3 = *(u32 *)(r2 + 20) r3 = *(u32 *)(r2 + 16) to force LLVM generating unscheduled memcpy
*(u32 *)(r1 + 20) = r3 *(u32 *)(r1 + 16) = r3 sequence.
r3 = *(u32 *)(r2 + 16) r3 = *(u32 *)(r2 + 12)
*(u32 *)(r1 + 16) = r3 *(u32 *)(r1 + 12) = r3
r3 = *(u32 *)(r2 + 12) r3 = *(u32 *)(r2 + 20)
*(u32 *)(r1 + 12) = r3 *(u32 *)(r1 + 20) = r3
r3 = *(u32 *)(r2 + 8) r3 = *(u32 *)(r2 + 4)
*(u32 *)(r1 + 8) = r3 *(u32 *)(r1 + 4) = r3
r3 = *(u32 *)(r2 + 4) r3 = *(u32 *)(r2 + 0)
*(u32 *)(r1 + 4) = r3 *(u32 *)(r1 + 0) = r3
r3 = *(u32 *)(r2 + 0) r3 = *(u32 *)(r2 + 8)
*(u32 *)(r1 + 0) = r3 *(u32 *)(r1 + 8) = r3
© 2018 NETRONOME 46
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization
Instruciton Scheduling

eBPF sequence another sequence

Instruction schduling done by compiler will make it
cal_align4: cal_align4: hard for eBPF JIT compilers to reconstruct memcpy
r3 = *(u32 *)(r2 + 28)
semantics, Netronome contributed new LLVM option
r3 = *(u32 *)(r2 + 28)
*(u32 *)(r1 + 28) = r3 *(u32 *)(r1 + 28) = r3
r3 = *(u32 *)(r2 + 24) r3 = *(u32 *)(r2 + 24) llc -bpf-expand-memcpy-in-order
*(u32 *)(r1 + 24) = r3 *(u32 *)(r1 + 24) = r3
r3 = *(u32 *)(r2 + 20) r3 = *(u32 *)(r2 + 16) to force LLVM generating unscheduled memcpy
*(u32 *)(r1 + 20) = r3 *(u32 *)(r1 + 16) = r3 sequence
r3 = *(u32 *)(r2 + 16) r3 = *(u32 *)(r2 + 12)
*(u32 *)(r1 + 16) = r3 *(u32 *)(r1 + 12) = r3
r3 = *(u32 *)(r2 + 12) r3 = *(u32 *)(r2 + 20)
*(u32 *)(r1 + 12) = r3 *(u32 *)(r1 + 20) = r3
r3 = *(u32 *)(r2 + 8) r3 = *(u32 *)(r2 + 4)
*(u32 *)(r1 + 8) = r3 *(u32 *)(r1 + 4) = r3
r3 = *(u32 *)(r2 + 0)
No guarantee r3 is dead after this instruction
r3 = *(u32 *)(r2 + 4)
*(u32 *)(r1 + 4) = r3 *(u32 *)(r1 + 0) = r3 Make sure r3 has the same value as old sequence
r3 = *(u32 *)(r2 + 0) r3 = *(u32 *)(r2 + 8)
*(u32 *)(r1 + 0) = r3 *(u32 *)(r1 + 8) = r3
© 2018 NETRONOME 47
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization
NFP memory bulk access

mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]

alu[$xfer_0, --, B, $xfer_0]
...
mem[write32_swap, $xfer_0, gprA_5, 0xc, 7]

What's this?

© 2018 NETRONOME 48
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization gpr0 tr_in0
NFP memory bulk access gpr1 tr_in1
gpr2 tr_in2
mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]
alu[$xfer_0, --, B, $xfer_0] gpr3 tr_in0

... ... ...

mem[write32_swap, $xfer_0, gprA_5, 0xc, 7]

NFP has clustered register sets and have other registers other than
general purpose registers tr_out0

Memory read are into transfer in registers first, then moved to tr_out1
general purpose registers. The content is not clobbered if there is no tr_out2
other memory read memory
tr_out0
NFP eBPF JIT Compiler can use 32 transfer in registers, meaning ...
could cache 128 bytes memory contents there

Packet data is frequently visited, so we use transfer in registers

to cache packet data.
© 2018 NETRONOME 49
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization gpr0 tr_in0
NFP memory bulk access gpr1 tr_in1
gpr2 tr_in2
mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]
alu[$xfer_0, --, B, $xfer_0] gpr3 tr_in0

... ... ...

mem[write32_swap, $xfer_0, gprA_5, 0xc, 7]

Read reasonable size from memory into transfer in register

during the first memory read tr_out0
tr_out1
tr_out2
memory
tr_out0
...

© 2018 NETRONOME 50
NFP Code Gen Back-end - Memory Support
Ø Memcpy Optimization gpr0 tr_in0
NFP memory bulk access gpr1 tr_in1
gpr2 tr_in2
mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]
alu[$xfer_0, --, B, $xfer_0] gpr3 tr_in0

... ... ...

mem[write32_swap, $xfer_0, gprA_5, 0xc, 7]

The follow up memory read use cache in transfer in register

directly, no need of memory read tr_out0
tr_out1
tr_out2
memory
tr_out0
...

© 2018 NETRONOME 51
NFP Code Gen Back-end - 32-bit Optimization
Ø eBPF ISA defined 32-bit sub-register and associated instructions
• eBPF register is 64-bit, the lower half could be used as 32-bit sub-register
• Any write to the lower half must zero the higher half. This is to match x86-64 and
AArch64 ISA feature
• A set of ALU32 instructions will operate on 32-bit sub-register

program operating on 32-bit type default eBPF code gen from LLVM JITed AArch64 sequence
void cal(unsigned int *a,
unsigned int *b, cal: cal:
unsigned int *c) r1 = *(u32 *)(r1 + 0) ld4u r1, [r1, 0]
{ r2 = *(u32 *)(r2 + 0) ld4u r2, [r2, 0]
unsigned int sum = *a + *b; r2 += r1 addu r2, r2, r1
*(u32 *)(r3 + 0) = r2 st4 [r3, 0], r2
*c = sum; exit ret
}

© 2018 NETRONOME 52
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization
• NFP (or ARM etc.) has 32-bit register, must use register pair to model eBPF register
• NFP data manipulation instructions operates on 32-bit data. Operating on register pair
needs two instructions
program operating on 32-bit type default eBPF code gen from LLVM JITed NFP sequence (pseudo code)
void cal(unsigned int *a, cal:
unsigned int *b, ld4 r2, [r2, 0]
cal:
unsigned int *c) mov r3, 0
r1 = *(u32 *)(r1 + 0)
{ ld4 r4, [r4, 0]
r2 = *(u32 *)(r2 + 0)
unsigned int sum = *a + *b; mov r5, 0
r2 += r1
add r4, r4, r2
*(u32 *)(r3 + 0) = r2
*c = sum; addc r5, r5, r3
exit
} st4 [r6, 0], r4
ret

extra instructions to zero high half

extra instruction to operate on high half

Are they really necessary?

© 2018 NETRONOME 53
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization - Solution 1
• 32-bit sub-register and ALU32 instructions carries semantics, safe to generate native
instructions operating on sub-register only

• Netronome has contributed 32-bit sub-register and ALU32 enablement to LLVM

program operating on 32-bit type -mattr=+alu32 code gen from LLVM JITed NFP sequence (pseudo code)
void cal(unsigned int *a, cal:
unsigned int *b, w1 = *(u32 *)(r1 + 0) cal:
unsigned int *c) w2 = *(u32 *)(r2 + 0) ld4 r2, [r2, 0]
{ w2 += w1 ld4 r4, [r4, 0]
unsigned int sum = *a + *b; *(u32 *)(r3 + 0) = w2 add r4, r4, r2
exit st4 [r6, 0], r4
*c = sum; ret
} (previous 64-bit code gen)
cal:
r1 = *(u32 *)(r1 + 0)
r2 = *(u32 *)(r2 + 0)
r2 += r1
*(u32 *)(r3 + 0) = r2
exit
© 2018 NETRONOME 54
NFP Code Gen Back-end - 32-bit Optimization
Ø Can we rely on semantics from registers and instructions? NO
• eBPF sequence can come from manual written assembly which doesn't conform to such
semantics
• LLVM could set ELF flags to tell the consumer the sequence is from LLVM therefore must
conform to the semantics. But ELF flag could be faked even though we don't know
whether a faked 32-bit flag is making the program easier or harder to exploit
• Also for some instructions, operations on high half can’t be safely omitted without
accurate usage information

© 2018 NETRONOME 55
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization - Solution 2
• JIT compiler figures out data flow from scratch, no rely on external information.
• Netronome has done initial support on this. Define-Use chain will be built for input
eBPF sequence, a series of analysis could be done along the chain.

default eBPF code gen from LLVM define use0 use1

r1 = *(u32 *)(r1 + 0) r1 r1

r2 = *(u32 *)(r2 + 0)
r2 r2
r2 += r1
r2 r2 r1
*(u32 *)(r3 + 0) = r2
r2 r3
exit
r0

© 2018 NETRONOME 56
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization - Solution 2
• JIT compiler figures out data flow from scratch, no rely on external information.
• Netronome has done initial support on this. Define-Use chain will be built for input
eBPF sequence, a series of analysis could be done along the chain.

default eBPF code gen from LLVM define use0 use1

r1 = *(u32 *)(r1 + 0) r1 r1

r2 = *(u32 *)(r2 + 0)
r2 r2
r2 += r1
r2 r2 r1
*(u32 *)(r3 + 0) = r2
r2 r3
exit
r0 seed information
then backward propagate
© 2018 NETRONOME 57
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization - Solution 2
• JIT compiler figures out data flow from scratch, no rely on external information.
• Netronome has done initial support on this. Define-Use chain will be built for input
eBPF sequence, a series of analysis could be done along the chain.

default eBPF code gen from LLVM define use0 use1

r1 = *(u32 *)(r1 + 0) r1 r1

r2 = *(u32 *)(r2 + 0)
r2 r2
r2 += r1
r2 r2 r1
*(u32 *)(r3 + 0) = r2
r2 r3
exit
r0

© 2018 NETRONOME 58
NFP Code Gen Back-end - 32-bit Optimization
Ø 32-bit optimization - Solution 2
• JIT compiler figures out data flow from scratch, no rely on external information.
• Netronome has done initial support on this. Define-Use chain will be built for input
eBPF sequence, a series of analysis could be done along the chain.

default eBPF code gen from LLVM define use0 use1

r1 = *(u32 *)(r1 + 0) r1 r1

r2 = *(u32 *)(r2 + 0)
r2 r2
r2 += r1
r2 r2 r1
*(u32 *)(r3 + 0) = r2
r2 r3
exit
r0

© 2018 NETRONOME 59
NFP Code Gen Back-end - Summary
Ø Share the same flow with other host back-ends
Ø Has special optimization for NFP architecture features
Ø Targets offload execution environment

© 2018 NETRONOME 60
eBPF JIT Compiler - Emerging Features
Ø Verification Stage
• Strong requirement of a more scalable verification analysis infrastructure
• Support bounded loops. Netronome had some proof of concept work with community
to bring modern Control Flow Graph (CFG) infrastructure to eBPF verifier, then we
could build a bounded loop detector on top of it

Ø Programming Model
• Shared library support (dynamic/runtime linking)

Ø Debuggability
• Debug JITed image through DWARF annotations. Debug information could be saved
separately from executable image

[email protected]

Ling 111.02 Assignment Spring 2012 Morphology and Syntax SAMPLE
100% (7)
Ling 111.02 Assignment Spring 2012 Morphology and Syntax SAMPLE
4 pages
Runtime Environments
No ratings yet
Runtime Environments
127 pages
BPF Internals
No ratings yet
BPF Internals
122 pages
Tracing Linux Ezannoni Linuxcon Ja 2015 - 0
No ratings yet
Tracing Linux Ezannoni Linuxcon Ja 2015 - 0
42 pages
Resume-Master - Yuvraj Sakshith
No ratings yet
Resume-Master - Yuvraj Sakshith
3 pages
Kernel Tracing Using eBPF
No ratings yet
Kernel Tracing Using eBPF
31 pages
102 User Experiences With The Portable Stimulus Standard
No ratings yet
102 User Experiences With The Portable Stimulus Standard
61 pages
IntroductionToIntelx86 Part1 PDF
No ratings yet
IntroductionToIntelx86 Part1 PDF
113 pages
armv8-a
No ratings yet
armv8-a
233 pages
L5 Kernel Extensions
No ratings yet
L5 Kernel Extensions
23 pages
Intel386 psABI 1.1
No ratings yet
Intel386 psABI 1.1
64 pages
aaelf64
No ratings yet
aaelf64
42 pages
Lec2 MachineLevelProgramming Basics
No ratings yet
Lec2 MachineLevelProgramming Basics
47 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
em_rep_final
No ratings yet
em_rep_final
11 pages
ECE 367 FreeRTOS Lab 3
No ratings yet
ECE 367 FreeRTOS Lab 3
13 pages
x86 - 64 Abi 0.21
No ratings yet
x86 - 64 Abi 0.21
68 pages
Introduction To Intel x86 Assembly, Architecture, Applications, & Alliteration
No ratings yet
Introduction To Intel x86 Assembly, Architecture, Applications, & Alliteration
113 pages
Ebpf
100% (1)
Ebpf
16 pages
Lecture 9 Using C
No ratings yet
Lecture 9 Using C
28 pages
02 Isa
No ratings yet
02 Isa
140 pages
Ecoder Ug
No ratings yet
Ecoder Ug
155 pages
SLIDES_DevOps-Red-Team-Initial-Access-MAAS
No ratings yet
SLIDES_DevOps-Red-Team-Initial-Access-MAAS
39 pages
Application_binary_interface
No ratings yet
Application_binary_interface
4 pages
X86 Disassembly
No ratings yet
X86 Disassembly
197 pages
Linux On Power/Cell BE Architecture Buffer Overflow Vulnerabilities
No ratings yet
Linux On Power/Cell BE Architecture Buffer Overflow Vulnerabilities
57 pages
4-Tutorial Part 2 - Creating A Trivial Machine Code Function - Libgccjit 13.2.0 Documentation
No ratings yet
4-Tutorial Part 2 - Creating A Trivial Machine Code Function - Libgccjit 13.2.0 Documentation
1 page
Base Platform ABI For The ARM Architecture: 1.4, Your Licence To Use This Specification (ARM Contract Reference
No ratings yet
Base Platform ABI For The ARM Architecture: 1.4, Your Licence To Use This Specification (ARM Contract Reference
44 pages
ERTS Unit 3
No ratings yet
ERTS Unit 3
38 pages
Ebpf Implementation For Freebsd: Yutaro Hayakawa
No ratings yet
Ebpf Implementation For Freebsd: Yutaro Hayakawa
33 pages
Reverse Engineering
88% (8)
Reverse Engineering
85 pages
GNU Toolchain For ARC
100% (1)
GNU Toolchain For ARC
154 pages
CSCI 232: Introduction To Assembly
No ratings yet
CSCI 232: Introduction To Assembly
59 pages
x86 64 psABI 1.0 PDF
No ratings yet
x86 64 psABI 1.0 PDF
157 pages
Knuth Moore 75
No ratings yet
Knuth Moore 75
151 pages
Compiler Construction
No ratings yet
Compiler Construction
39 pages
ARM AAPCS EABI v2.08
No ratings yet
ARM AAPCS EABI v2.08
34 pages
High Level Synthesis - 02 - Basic Concepts
No ratings yet
High Level Synthesis - 02 - Basic Concepts
27 pages
Xen-CET-SS
No ratings yet
Xen-CET-SS
9 pages
Lecture06 PDF
No ratings yet
Lecture06 PDF
36 pages
Overview of Xtensa ISA: Espressif Systems
No ratings yet
Overview of Xtensa ISA: Espressif Systems
27 pages
Core Security Introduction To Software Vulnerability Exploitation
No ratings yet
Core Security Introduction To Software Vulnerability Exploitation
74 pages
IA32
No ratings yet
IA32
45 pages
IA32
No ratings yet
IA32
45 pages
Vieira 2020
No ratings yet
Vieira 2020
36 pages
JVM Deep Dive
No ratings yet
JVM Deep Dive
53 pages
CH-05 Semantic Analysis
No ratings yet
CH-05 Semantic Analysis
115 pages
2023-02-14 - Adopting Position Independent Shellcodes From Object Files in Memory For Threadless Injection
No ratings yet
2023-02-14 - Adopting Position Independent Shellcodes From Object Files in Memory For Threadless Injection
8 pages
1stMarch2021eBPF FromaProgrammersPerspective
No ratings yet
1stMarch2021eBPF FromaProgrammersPerspective
13 pages
Microblaze Interrupt
No ratings yet
Microblaze Interrupt
37 pages
Using and Creating Interrupt-Based Systems: Definitions
No ratings yet
Using and Creating Interrupt-Based Systems: Definitions
37 pages
22.optimization III
No ratings yet
22.optimization III
82 pages
Hi-Tech Picc-18 STD
No ratings yet
Hi-Tech Picc-18 STD
490 pages
Linux Internals & Networking: System Programming Using Kernel Interfaces
No ratings yet
Linux Internals & Networking: System Programming Using Kernel Interfaces
305 pages
System V Application Binary Interface: AMD64 Architecture Processor Supplement Draft Version 0.99
No ratings yet
System V Application Binary Interface: AMD64 Architecture Processor Supplement Draft Version 0.99
128 pages
psABI-x86 64
No ratings yet
psABI-x86 64
128 pages
System V Application Binary Interface: AMD64 Architecture Processor Supplement Draft Version 0.99.7
No ratings yet
System V Application Binary Interface: AMD64 Architecture Processor Supplement Draft Version 0.99.7
128 pages
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
From Everand
Practical Rust 1.x Cookbook: 100+ Solutions across Command Line, CI/CD, Kubernetes, Networking, Code Performance and Microservices
Rustacean Team
No ratings yet
Practical Rust 1.x Cookbook
From Everand
Practical Rust 1.x Cookbook
Rustacean Team
No ratings yet
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
Zeta TCP
0% (1)
Zeta TCP
8 pages
03 DPDK Slab June 2019 Final
No ratings yet
03 DPDK Slab June 2019 Final
13 pages
ISA A64 XML V88a-2021-12 OPT
No ratings yet
ISA A64 XML V88a-2021-12 OPT
3,289 pages
Instruction Set Assembly Guide For Armv7 and Earlier Arm Architectures 100076 0200 00 en
No ratings yet
Instruction Set Assembly Guide For Armv7 and Earlier Arm Architectures 100076 0200 00 en
590 pages
Module 4 C++ Notes
No ratings yet
Module 4 C++ Notes
10 pages
Codrii 333
No ratings yet
Codrii 333
15 pages
Edexcel_iGCSE_Maths Textbook (Collins)-Ch.4
No ratings yet
Edexcel_iGCSE_Maths Textbook (Collins)-Ch.4
12 pages
đề 11-20 THI VÀO 10 TIẾNG ANH 9
No ratings yet
đề 11-20 THI VÀO 10 TIẾNG ANH 9
31 pages
ISEP MEEC MSC Thesis Diogo Carvalho PDF
No ratings yet
ISEP MEEC MSC Thesis Diogo Carvalho PDF
68 pages
Discussion Properties of Congruence
No ratings yet
Discussion Properties of Congruence
107 pages
imo_sample_paper_class-11_2025-26
No ratings yet
imo_sample_paper_class-11_2025-26
4 pages
Passivitation and Pronomalization
No ratings yet
Passivitation and Pronomalization
2 pages
Digital Libraries
100% (1)
Digital Libraries
291 pages
EF EL 9A V Food Intro
No ratings yet
EF EL 9A V Food Intro
1 page
Logic and Philosophy of Themes From Prior ONLINE 2udg
No ratings yet
Logic and Philosophy of Themes From Prior ONLINE 2udg
255 pages
Mobile Based Network Monitoring System
No ratings yet
Mobile Based Network Monitoring System
13 pages
Go Beyond Level 5 Students Book
No ratings yet
Go Beyond Level 5 Students Book
12 pages
SEPM Unit5
No ratings yet
SEPM Unit5
16 pages
Pages From Alt Codes List of Alt Key Codes Symbols
No ratings yet
Pages From Alt Codes List of Alt Key Codes Symbols
6 pages
Rizal's Exile in Dapitan: Significant Events and Learnings
No ratings yet
Rizal's Exile in Dapitan: Significant Events and Learnings
20 pages
BIG-IP CGNAT Implementations
No ratings yet
BIG-IP CGNAT Implementations
208 pages
Paper 2018-English
No ratings yet
Paper 2018-English
7 pages
1 Short Story Elements QUIZ
No ratings yet
1 Short Story Elements QUIZ
1 page
Teaching English Online
No ratings yet
Teaching English Online
12 pages
Apache ActiveMQ Artemis DOcumentation
No ratings yet
Apache ActiveMQ Artemis DOcumentation
458 pages
Backup Veeam
No ratings yet
Backup Veeam
8 pages
02.10 Module Project Template: Question 1: How Did Trade Affect The Spread of Religion During The Middle Ages?
No ratings yet
02.10 Module Project Template: Question 1: How Did Trade Affect The Spread of Religion During The Middle Ages?
2 pages
DVC - All Questions and Answers - CT 1, CT 2 and Model - Final
No ratings yet
DVC - All Questions and Answers - CT 1, CT 2 and Model - Final
114 pages
Tools Pdshell
No ratings yet
Tools Pdshell
2 pages
Template - Video REVIEW
No ratings yet
Template - Video REVIEW
1 page
Fuzzy Tops Is
No ratings yet
Fuzzy Tops Is
4 pages
Abracadabra
No ratings yet
Abracadabra
10 pages
Lecture 4 Intro To Basic Teachings of Quran
No ratings yet
Lecture 4 Intro To Basic Teachings of Quran
16 pages

Demystify Ebpf Jit Compiler

Uploaded by

Demystify Ebpf Jit Compiler

Uploaded by

Demystify eBPF JIT Compiler

Jiong Wang, Netronome

Static Data .section data

gcc test.c -v will show you more details!

void *mem = mmap(NULL, sizeof(mov) + sizeof(ret),

memcpy(mem, mov, sizeof(mov));

int (*func)() = mem;

Modified from http://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html

memcpy(mem, mov, sizeof(mov));

memcpy(mem, mov, sizeof(mov));

memcpy(mem, mov, sizeof(mov));

memcpy(mem, mov, sizeof(mov));

Ø What’s wrong with checking .ko?

• It is compiled already, different architectures are

similar things and they can't be merged due to

35: 57 02 00 00 3f ff 00 00 r2 &= 65343 (AND)

eBPF JIT Compiler

Ø These checks requires tracking data flow dynamically at an instruction level

• “unpriv:”, “call:”, “XDP,” are interesting categories worth having a look

Ø Data flow tracking

Ø Data flow tracking

Ø Data flow tracking

• Path sensitive register liveness tracking to release more prune opportunities

1: if r0 < r1 goto +1 1: if r0 < r1 goto +1

eBPF CALL instruction encoding 2: call +5 2: call +5

3: goto L2 3: goto L2 func 0

BPF_CALL 7: exit 7: exit

10: r4 += 2 10: r4 += 2 func 1

collect ALU info

estimate JIT image size

prepare stack pointer, do callee-saved etc.

one to one insn map, aarch64 example:

restore stack pointer/callee-saved etc.

arch/x86,arm64/net/bpf_jit_comp.c are quite self-explained

for (i = 0; i < func_cnt; i++)

scan all insn, rewrite imm field of call insn to

// redo the JIT

Ø eBPF aims to offer programmability to kernel network stack

Ø However, there are some differences:

r3 += 1 alu[*l$index0[2], --, B, gprB_5]

void cal_align4: void

eBPF sequence another sequence

eBPF sequence another sequence

mem[read32_swap, $xfer_0, gprA_2, 0xc, 7]

... ... ...

Packet data is frequently visited, so we use transfer in registers

... ... ...

Read reasonable size from memory into transfer in register

... ... ...

The follow up memory read use cache in transfer in register

extra instructions to zero high half

Are they really necessary?

• Netronome has contributed 32-bit sub-register and ALU32 enablement to LLVM

default eBPF code gen from LLVM define use0 use1

default eBPF code gen from LLVM define use0 use1

default eBPF code gen from LLVM define use0 use1

default eBPF code gen from LLVM define use0 use1

You might also like