[RFC] Implementing asm-goto support in Clang/LLVM

There have been quite a few discussions around asm-goto support in Clang and LLVM.

After working with several members of our community, this is a proposal that, in our opinion, strikes a reasonable balance and finally addresses the lack of implementation.

Justification

I’m not sure I follow what the issue is here; why can’t a block appear multiple times in the successor list? It happens routinely for other kinds of branches. It’s fine if you don’t want to actually implement this now, but we probably need a rough idea for how it will be implemented, so we’re reasonably confident we won’t have to change the semantics of callbr. Specifically, do we need to impose any restrictions on the successors of a callbr to allow reasonable code generation? Without any rules, PHI nodes involving the outputs of callbr instructions could be very awkward to lower. -Eli

Awesome! Big +1 for this.

Our proposed approach is to introduce a new IR instruction named callbr with the following syntax:

callbr <return_type> ( , …) to label %normal or jump [label %transfer1, label %transfer2…]

The invoke is currently defined as:

= invoke [cconv] [ret attrs] [addrspace()] [| () [fn attrs]
[operand bundles] to label unwind label

Will callbr support the same attributes (cconv, ret attrs, addrspace, fn attrs, operand bundles)? Were they just excluded from the above syntax for brevity, or are they actually unsupported?

The labels from the label list of an asm-goto statement are used by the inline asm as data arguments. To avoid errors in asm parsing and CFG recognition, the labels are passed as arguments to the inline asm using additional “X” input constraints and blockaddress statements while also being used directly as elements of the jump list.

While it may seem weird to specify the label twice, once as an argument to the call, and once as a successor, I’d also note that doing so increases the generality of the instruction, since the call may have access to the blockaddress via some indirect means and not require it to be passed to the call in order to jump to it.

Implementing the callbr instruction and asm-goto requires some adaptation of the existing passes:

  • All passes that deal with the CFG must consider all potential successors of the callbr instruction to be possible. This means that no passes that simplify the CFG based on any assumptions can work with callbr

  • Due to the way successor and predecessor detection works, some CFG simplifications such as trivial block elimination may be blocked if they would result in duplicate successors for the callbr instruction, as such duplicate successors are incorrectly processed in the IR and cannot be removed due to being used by the callee.

I don’t understand what prevents them from being removed? The callee can’t see what’s in the successor list (only its blockaddress arguments), so if there’s any duplicate successors after coalescing blocks, what would be broken by removing them?

  • The indirectbr expansion pass may destroy blockaddress expressions if the basic blocks they reference are possible successors of an indirectbr. It may have to be reworked to support this new usage of the blockaddress expression

Yes, the indirectbr pass replaces all blockaddresses with a small-integer instead of the actual address, on the assumption that the only use of a blockaddress is in a indirectbr instruction (which itself gets transformed into a ‘switch’ to a branch). Note that this pass is only used by retpoline, but is required for retpoline to work at the moment. I started making a proof-of-concept for fixing that – allowing indirectbr to translate into a jmp to/through the retpoline stub instead; it seemed easy enough, although I’m not certain if it’s right or not. :slight_smile:

Some other notes on the instruction and asm-goto implementation:

  • The special status of the “normal” destination label allows to specifically adjust its transition probability to make it likely to become a fallthrough successor

  • While the initial implementation of asm-goto won’t allow outputs, the instruction’s syntax supports them in principle, so the support for this can be added at a later date

And, besides just asm-goto, I think (separate, future) work based on this could also be useful to support intrinsics with multiple successors.

For example, a (strong) atomic cmpxchg primitive for an ll/sc platform can be most efficiently modeled as an instruction with both an output value and two successor blocks (for success and failure). If we can extend callbr to support an output value, it can represent that. (The alternatives at the moment are to return only oldval (and recompare to expected to get the success flag), or to return both oldval and a flag (and do an extraneous conditional branch.)

  • The general syntax of the callbr instruction allows to use it to implement the control flow tracking for setjmp/longjmp, but that is beyond the scope of this RFC

Yes, with this functionality (…and some other more work), it seems like it may be possible to correctly represent the CFG for setjmp/longjmp. I had started a discussion on fixing that with a couple folks a while ago, I’ll try to resurrect that as a separate RFC.

Eli, James, thanks for your comments! Let me address them in a separate email.

Now I submitted the prototype patch implementing the RFC here: https://reviews.llvm.org/D53765

Alexander

FWIW, I’m generally supporting of this direction, and would love to see asm goto support.

Could you compare and contrast asmbr to a couple other options?

- There is an effort to eliminate "terminators as a thing”, that would allow merging call and invoke. How does asmbr intersect with that work, in the short and long term? If asmbr is a short term thing, is there a clean migration path?

- Have you thought about or scoped the idea of having an asm instruction that is not a call? In the shortest term, it could always be a terminator (similar to invoke) but as the call/invoke work comes together it could be relaxed to being a non-terminator when not an “asm goto”. The rationale for this approach (which I’m not particularly attached to, just want to see what other’s think about it) is that asms are pretty different in some ways from other instructions, and if you’re introducing a new instruction (asmbr) anyway, we might as well consider moving them away from call completely.

-Chris

I’ve been out of the loop for awhile. Is there an email thread about the “removing terminators as a thing” concept?

http://lists.llvm.org/pipermail/llvm-dev/2018-May/123407.html

TLDR; CallInst & InvokeInst should share lots of code and would be useful to share a base type, compared to terminators that don’t really share much code.

(and FWIW, I’m currently trying to finish the patch that makes this a reality… mostly hard because it has to unwind a loooot of complexity we’ve built up due to not having this)

I wanted to check and see what the status of this is.

Here’s a quick update for those not watching the patch on llvm-commits. I’ve picked up the patch from Alexander Ivchenko.

LLVM patch here(same link as previously):https://reviews.llvm.org/D53765
Clang side patch has been posted to phab: https://reviews.llvm.org/D56571

LLVM patch has been rebased after the TerminatorInst removal and the removal of CRTP from CallBase. This reduced some code in the CallBrInst as expected and simplified a few other places.

A few bugs found during testing have been fixed.

Nick Desaulniers from google has reported some success compiling, linking, and booting linux using clang with these patches and a special kernel patch.

I welcome any feedback on the patches. We’re hoping to get this landed in trunk soon to make testing easier even if its not completely bug free.

~Craig

Hi,
Really cool! I’m happy to help with the testing whenever useful.

Is the kernel patch available? I’d like to try to compile the kernel as well.

Thanks