-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RFC][BPF] Support Jump Table #133856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC][BPF] Support Jump Table #133856
Conversation
@aspsk As we discussed in LSFMMBPF, here is the implementation for llvm jump table support. Please take a look and try libbpf/kernel implementations. Let me know if you hit any issues. |
Don't bother. x86 is doing it to save a byte in encoding. This technique doesn't apply to bpf isa. |
|
||
let isIndirectBranch = 1 in { | ||
def JX : JMP_IND<BPF_JA, "gotox", [(brind i64:$dst)]>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see how it should be done, I just had hardcoded it in my test branch: aspsk@98773c6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yonghong-song! I will test this, match with the verification part, and post my results in this PR
@@ -65,10 +65,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM, | |||
|
|||
setOperationAction(ISD::BR_CC, MVT::i64, Custom); | |||
setOperationAction(ISD::BR_JT, MVT::Other, Expand); | |||
setOperationAction(ISD::BRIND, MVT::Other, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Yes, we do not want to expand 'brind', rather we will do pattern matching with 'brind'.
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
Good point. Let me do some experiments with a flag for this. I am not sure whether I could do 'only for a particular switch', but I will do some investigation. Hopefully can find a s solution for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an option to control how many cases in a switch statement to use jump table. The default is 4 cases. But you can change it with additional clang option, e.g., the minimum number of cases must be 6, then
clang ... -mllvm -bpf-min-jump-table-entries=6
I checked other targets, there are no control for a specific switch. So I think we do not need them for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
@yonghong-song could you please elaborate on this? How exactly is to classify those into per-table? |
The below is an example for test_tc_tunnel.bpf.o with
The above .rodata is what you really care. You can also find all .rodata relocations happen in decap and .text sections.
You then need to go through sections 'decap' and '.text' for their .rodata relocations.
It corresponds to insn 7 (0x38/8 = 7).
In the above 'r3 = 0x80' means the relocation starts 0x80 at .rodata section. You need to scan ALL such relocations in .text and decap sections and with that you can sort based on start of each relocation. After that, you will be able to calculate each relocation size. After you calculated each relocation size (for .rodata section), you need to check whether a particular relocation is for gotox or something else. So you need to go backwords to scan. For example,
You find a gotox insn with target r2, then you need to go back and find 'r2 = *(u64 *)(r2 + 0x0)' and then 'r2 += r3' and then 'r2 = 0x140 ll'. The above code pattern is gernated by llvm and should be generally true for jump table implementation. And you will be certain that the table for this particular gotox will be in offset 0x140 of .rodata section. The size of the table is already calculated based on the previous mechanism by scanning all .rodata relocations in .text and decap sections. |
I am looking into how to automate this properly (I have a really hacky PoC test working with this version of llvm and my custom test). It looks simpler with explicit jump tables (when I take an address of a label and store in an array), because then I can just push values to a custom section. Will post updates here. |
I find a llvm option
This way, you just need to scan related code section. As long as it |
This is one test failure like below:
The reason should be due to my unconditional enabling |
Thanks @yonghong-song, that size/offset section is really useful! This looks sufficient for me to continue with a PoC.
Unfortunately, I do, this is required for verification. For indirect jumps to work, two things should be verified:
The So, in order to construct a verifiable program, libbpf should:
(Haven't checked yet for real, but this looks to be enough for "custom", e.g., user-defined, jump tables to work. Just declare it as |
You are right. Verification does need to connect jump table map and gotox insn.
Backtrack certainly work. But maybe there is an alternative not to do backtrack.
Your user-defined jump table may work. But it would be great if we can just allow the current common switch statements from code cleanness and developer productivity. |
Right, this is exactly what I've meant by "backtrack". Looks like for |
Yes, libbpf does not need to do verifier work. The range analysis should be done in verifier. |
Hi @yonghong-song! I was trying different switch variants, simple ones work like magic, so we're definitely going the right direction. One simple case fails for me though. Namely, in the example below LLVM generates an unreachable instruction. Could you take a look please? An example source program is
Then the object file looks like
Now, the jump table is
And the check
makes sure that And this makes the instruction
unreachable. |
I suspect it won't be easy to avoid this on llvm side. Probably better to teach verifier to ignore those. |
Ok, thanks, will do this for now |
Update. I have a patch for kernel + libbpf which uses this LLVM and which passes all my new selftests + all (but one) standard bpf selftests which are compiled to use So far only one selftest fails ( |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks for the update. When trying your above example
I found a problem and just added another commit to fix the problem. The issue is due to llvm machine-sink pass. The implementation is similar to X86 (X86InstrInfo::getJumpTableIndex()). See the top commit (commit 4) for more details. |
Thanks @yonghong-song! I will test your latest changes over this weekend. (The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to modify the ASMParser also?
llvm-project/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
Lines 228 to 233 in f2e62cf
static bool isValidIdAtStart(StringRef Name) { | |
return StringSwitch<bool>(Name.lower()) | |
.Case("if", true) | |
.Case("call", true) | |
.Case("callx", true) | |
.Case("goto", true) |
Right, need to add gotox as well. Will fix. Thanks! |
Here's the kernel side which works with this LLVM: https://lore.kernel.org/bpf/[email protected]/ The following selftests contain indirect jumps (and pass):
A new selftest |
Thanks @aspsk I will also take a look at the kernel patch. Also, the current patch has some conflicts with latest 'main' branch. I will rebase and repost the new llvm patch after doing some testing. |
Rebased on top of current main branch. No functionality change compared to previous version (in more than a month ago). |
Currently llvm has an option EmitJumpTableSizesSection which enables unique jmptable size sections. This patch added an option EmitUniqueJumpTableSection which enables unique jmptable sections. This patch will have EmitUniqueJumpTableSection on by default for BPF programs. Without this, the jmptable will be in '.rodata' sections which may include a lot of other stuffs e.g. const strings. With EmitUniqueJumpTableSection, the llvm will generate unique jump table section per function based on llvm internal conventions and it will support ELF, XCOFF and COFF. The following is an example with bpf selftest user_ringbuf_success.bpf.c (also in description in llvm#133856): $ llvm-readelf -S user_ringbuf_success.bpf.o ... [ 6] .rodata.read_protocol_msg PROGBITS 0000000000000000 000740 000020 00 A 0 0 8 [ 7] .rel.rodata.read_protocol_msg REL 0000000000000000 0038e8 000040 10 I 42 6 8 [ 8] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 000760 000010 00 0 0 1 [ 9] .rel.llvm_jump_table_sizes REL 0000000000000000 003928 000010 10 I 42 8 8 ... [14] .rodata.publish_next_kern_msg PROGBITS 0000000000000000 0008a0 000020 00 A 0 0 8 [15] .rel.rodata.publish_next_kern_msg REL 0000000000000000 0039b8 000040 10 I 42 14 8 [16] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 0008c0 000010 00 0 0 1 [17] .rel.llvm_jump_table_sizes REL 0000000000000000 0039f8 000010 10 I 42 16 8 ... $ llvm-readelf -r user_ringbuf_success.bpf.o ... Relocation section '.rel.rodata.read_protocol_msg' at offset 0x38e8 contains 4 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000008 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000010 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000018 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text Relocation section '.rel.llvm_jump_table_sizes' at offset 0x3928 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000a00000002 R_BPF_64_ABS64 0000000000000000 .rodata.read_protocol_msg ... Relocation section '.rel.rodata.publish_next_kern_msg' at offset 0x39b8 contains 4 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000008 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000010 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000018 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text Relocation section '.rel.llvm_jump_table_sizes' at offset 0x39f8 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000001200000002 R_BPF_64_ABS64 0000000000000000 .rodata.publish_next_kern_msg ... $ llvm-readelf -x '.rodata.read_protocol_msg' user_ringbuf_success.bpf.o Hex dump of section '.rodata.read_protocol_msg': 0x00000000 a8000000 00000000 10010000 00000000 ................ 0x00000010 b8000000 00000000 c8000000 00000000 ................ $ llvm-readelf -x '.rodata.publish_next_kern_msg' user_ringbuf_success.bpf.o Hex dump of section '.rodata.publish_next_kern_msg': 0x00000000 28040000 00000000 00050000 00000000 (............... 0x00000010 70040000 00000000 b8040000 00000000 p............... $ llvm-objdump -Sr user_ringbuf_success.bpf.o ... 0000000000000000 <read_protocol_msg>: ... ; switch (msg->msg_op) { 13: 61 03 00 00 00 00 00 00 w3 = *(u32 *)(r0 + 0x0) 14: 26 03 1c 00 03 00 00 00 if w3 > 0x3 goto +0x1c <read_protocol_msg+0x158> 15: 67 03 00 00 03 00 00 00 r3 <<= 0x3 16: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000080: R_BPF_64_64 .rodata.read_protocol_msg 18: 0f 31 00 00 00 00 00 00 r1 += r3 19: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0) 20: 0d 01 00 00 00 00 00 00 gotox r1 ... What if a single function has two switch statements? The following is an example: $ cat test.c struct simple_ctx { int x; int y; int z; }; int ret_user, ret_user2; void bar(void); int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2) { switch (ctx->x) { case 1: ret_user = 8; break; case 6: ret_user = 3; break; case 2: ret_user = 4; break; case 31: ret_user = 5; break; default: ret_user = 19; break; } bar(); switch (ctx2->x) { case 0: ret_user2 = 8; break; case 7: ret_user2 = 3; break; case 9: ret_user2 = 4; break; case 31: ret_user2 = 5; break; default: ret_user2 = 29; break; } return 0; } $ clang --target=bpf -O2 -c test.c $ llvm-readelf -S test.o ... [ 4] .rodata.foo PROGBITS 0000000000000000 0001b8 0001f8 00 A 0 0 8 [ 5] .rel.rodata.foo REL 0000000000000000 0004e0 0003f0 10 I 10 4 8 [ 6] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 0003b0 000020 00 0 0 1 [ 7] .rel.llvm_jump_table_sizes REL 0000000000000000 0008d0 000020 10 I 10 6 8 ... Note that the same '.llvm_jump-table_sizes' has information for two switch tables since they are in the same function. $ llvm-readelf -x '.llvm_jump_table_sizes' test.o Hex dump of section '.llvm_jump_table_sizes': 0x00000000 00000000 00000000 1f000000 00000000 ................ 0x00000010 f8000000 00000000 20000000 00000000 ........ ....... From the above, the total entries for two switch tables has 0x3f entries: $ llvm-readelf -x '.rodata.foo' test.o Hex dump of section '.rodata.foo': 0x00000000 58000000 00000000 78000000 00000000 X.......x....... 0x00000010 98000000 00000000 98000000 00000000 ................ 0x00000020 98000000 00000000 68000000 00000000 ........h....... 0x00000030 98000000 00000000 98000000 00000000 ................ 0x00000040 98000000 00000000 98000000 00000000 ................ 0x00000050 98000000 00000000 98000000 00000000 ................ 0x00000060 98000000 00000000 98000000 00000000 ................ 0x00000070 98000000 00000000 98000000 00000000 ................ 0x00000080 98000000 00000000 98000000 00000000 ................ 0x00000090 98000000 00000000 98000000 00000000 ................ 0x000000a0 98000000 00000000 98000000 00000000 ................ 0x000000b0 98000000 00000000 98000000 00000000 ................ 0x000000c0 98000000 00000000 98000000 00000000 ................ 0x000000d0 98000000 00000000 98000000 00000000 ................ 0x000000e0 98000000 00000000 98000000 00000000 ................ 0x000000f0 88000000 00000000 08010000 00000000 ................ 0x00000100 48010000 00000000 48010000 00000000 H.......H....... 0x00000110 48010000 00000000 48010000 00000000 H.......H....... 0x00000120 48010000 00000000 48010000 00000000 H.......H....... 0x00000130 28010000 00000000 48010000 00000000 (.......H....... 0x00000140 18010000 00000000 48010000 00000000 ........H....... 0x00000150 48010000 00000000 48010000 00000000 H.......H....... 0x00000160 48010000 00000000 48010000 00000000 H.......H....... 0x00000170 48010000 00000000 48010000 00000000 H.......H....... 0x00000180 48010000 00000000 48010000 00000000 H.......H....... 0x00000190 48010000 00000000 48010000 00000000 H.......H....... 0x000001a0 48010000 00000000 48010000 00000000 H.......H....... 0x000001b0 48010000 00000000 48010000 00000000 H.......H....... 0x000001c0 48010000 00000000 48010000 00000000 H.......H....... 0x000001d0 48010000 00000000 48010000 00000000 H.......H....... 0x000001e0 48010000 00000000 48010000 00000000 H.......H....... 0x000001f0 38010000 00000000 Related relocations: $ llvm-readelf -r test.o ... Relocation section '.rel.rodata.foo' at offset 0x4e0 contains 63 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text 0000000000000008 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text ... Relocation section '.rel.llvm_jump_table_sizes' at offset 0x8d0 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000300000002 R_BPF_64_ABS64 0000000000000000 .rodata.foo 0000000000000010 0000000300000002 R_BPF_64_ABS64 0000000000000000 .rodata.foo
be893e9
to
c5b53c2
Compare
Just uploaded a version on top of latest llvm-project and merged with additional changes from @eddyz87. Note that there is a test failure
which also failed with latest llvm-project. I will update later once this test failure is gone from upstream. |
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary. - Generate all jump tables in a single section named .jumptables. - Represent each jump table as a symbol: - value points to an offset within .jumptables; - size encodes jump table size in bytes. - Indirect jump is a gotox instruction: - dst register is an index within the table; - accompanied by a R_BPF_64_64 relocation pointing to a jump table symbol. clang -S: .LJTI0_0: .reloc 0, FK_SecRel_8, .BPF.JT.0.0 gotox r1 goto LBB0_2 LBB0_4: ... .section .jumptables,"",@progbits .L0_0_set_4 = ((LBB0_4-.LBPF.JX.0.0)>>3)-1 .L0_0_set_2 = ((LBB0_2-.LBPF.JX.0.0)>>3)-1 ... .BPF.JT.0.0: .long .L0_0_set_4 .long .L0_0_set_2 ... llvm-readelf -r --sections --symbols: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al ... [ 4] .jumptables PROGBITS 0000000000000000 000118 000100 00 0 0 1 ... Relocation section '.rel.text' at offset 0x2a8 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000010 0000000300000001 R_BPF_64_64 0000000000000000 .BPF.JT.0.0 ... Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name ... 2: 0000000000000000 112 FUNC GLOBAL DEFAULT 2 foo 3: 0000000000000000 128 NOTYPE GLOBAL DEFAULT 4 .BPF.JT.0.0 ... llvm-objdump -Sdr: 0000000000000000 <foo>: ... 2: gotox r1 0000000000000010: R_BPF_64_64 .BPF.JT.0.0 An option -bpf-min-jump-table-entries is implemented to control the minimum number of entries to use a jump table on BPF. The default value 4, but it can be changed with the following clang option clang ... -mllvm -bpf-min-jump-table-entries=6 where the number of jump table cases needs to be >= 6 in order to use jump table.
Update BPFInstrInfo::analyzeBranch() to comply with TargetInstrInfo::analyzeBranch() requirements for JX instruction: if branch instruction can't be categorized as a conditional with true/false branches -- return true. Because of this bug MachineBlockPlacement transformation inserted an additional unreachabe jump after JX, e.g.: bb.1.entry: ... JX killed $r1, %jump-table.0 JMP %bb.2 Additionally, isNotDuplicable annotation is necessary to avoid machine level transformations creating several JX instruction copies. Such copies would refer to the same jump table and would make it not possible to calculate jump offsets inside the table. Files triggering such duplication are present in kernel selftests.
- one testing a general structure of the generated code; - another testing that several jump tables within the same functions are generated independently.
Coincidentally this fixes two test failures: - LLVM :: CodeGen/BPF/CORE/offset-reloc-fieldinfo-2-bpfeb.ll - LLVM :: CodeGen/BPF/CORE/offset-reloc-fieldinfo-2.ll These tests invoke llc with -mcpuv1 and have a switch statement in the IR. Both tests failed with assertion in SelectionDAGLegalize::LegalizeOp(): for (const SDValue &Op : Node->op_values()) assert((TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"); At the moment of the failure: Op.getOpcode() == BPFISD::BPF_BR_JT The error happened because one of the BPFBrJt parameters has i32 type: def SDT_BPFBrJt : SDTypeProfile<0, 2, [SDTCisVT<0, i32>, // jump table SDTCisVT<1, i64>]>; // index def BPFBrJt : SDNode<"BPFISD::BPF_BR_JT", SDT_BPFBrJt, [SDNPHasChain]>;
The requirement to emit jump table entries as offsets measured in instructions, e.g. as follows: .L0_0_set_7 = ((LBB0_7-.LBPF.JX.0.0)>>3)-1 Makes it impossible to use generic AsmPrinter::emitJumpTableInfo() function. Merge request used this generic function before (and incorrect offsets were generated). This generic function required two overloads: - AsmPrinter::GetJTISymbol() - TargetLowering::getPICJumpTableRelocBaseExpr() Now all jump table emission logic is located in the BPFAsmPrinter::emitJumpTableInfo(), which does not require above overloads. Hence, remove the overloads and move corresponding code to BPFAsmPrinter to keep it in one place.
Added additional changes from @eddyz87 which includes some BPF backend changes and tests. |
FYI, all my tests from |
Make the libbpf parse and pass proper offsets for "new" gotox instructions (generated by llvm/llvm-project#133856). Hack fast, so there are leftovers from the old patch. (And the blindness which was presumably fixed, breaks again in bpf_goto_x tests.) Signed-off-by: Anton Protopopov <[email protected]>
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary.
An option
-bpf-min-jump-table-entries
is implemented to control the minimumnumber of entries to use a jump table on BPF. The default value 4, but it
can be changed with the following clang option
where the number of jump table cases needs to be >= 6 in order to
use jump table.