[LLVM] Introduce an ABI lowering library

Description: Currently, every LLVM-based frontend that wants to support calling into C code (FFI) needs to re-implement a substantial amount of complex call ABI handling. The goal of this project is to introduce an LLVM ABI library, which can be reused across different frontends, including Clang. More details on the motivation and a broad outline of the design are available in this RFC:

The initial phase of the project will be to implement a prototype that can handle at least the x86_64 System V ABI. This will involve implementing the ABI type system, mapping of Clang types to ABI types and moving at least part of the X86 ABIInfo implementation from Clang to the new ABI library. This is to demonstrate general feasibility, figure out design questions and analyze compilation-time impact.

Assuming the results from the prototype are positive, the next step would be to upstream the implementation by splitting it into smaller PRs. Finally, the implementation can be expanded to cover additional targets, ultimately removing Clang’s ABI handling code entirely.

Expected results: The minimum result is a prototype for the x86_64 ABI. The maximum result is fully upstreamed support for all targets. The expected result is somewhere in the middle between those two.

Desirable skills: Intermediate C++. Some familiarity with LLVM is a plus, but not required.

Project size: Large

Difficulty: Hard

Confirmed Mentors: @nikic @makslevental

8 Likes

I am really interested in this, especially that my work has involved a lot of delving with assembly and ABI, mostly ARM.
Are there things that you recommend me to read up or experiment with to prepare for this?

Hi @nikic,
I am interested in this project, I have nice experience in LLVM as you know. I added nsw and nuw flags to the trunc instruction and I also added the samesign flag to the icmp instruction, and some accepted PR in the instcombine pass. I started researching this. I hope I spend this summer working on this.

Hello @nikic! I’m interested in this project and have a few questions to better prepare my design proposal:

  • The existing (but incomplete and appears to be abandoned) llvm-abi project has an implementation for x86_64, but it maps to LLVM IR types directly. Would this be a good starting point to get a feel of what the expected prototype should look like?
  • I’ve done a bit of skimming over Clang, and can see that a lot of work such as ABI implementation details has been done already, and that a lot of this can be simply lifted and put in the new library. Is this assumption true or at least sensible? I would also like to ask for a direction on where exactly all of this stuff gets used when Clang makes decisions during emission of e.g a function call so I can study that further.

I’m looking forward to working with you to iron out more details during the application period, and hope to be selected for the summer!

Woah! I am really interested in this project. I was recently reading up about Swift’s C interop and came across this very problem. The success of this project would indeed offload much of the FFI duties to LLVM. About me: I’m a computer science major who majorly writes low-level/hardware level projects. I’m an occasional contributor to LLVM and have read most of the Transforms codebase. I’ve also contributed to InstCombine and ValueTracking in the past. Looking forward to interacting with you all!

If I’m reading things right, the llvm-abi library mostly operates on its own type system (llvm-abi/include/llvm-abi/Type.hpp at master · scrossuk/llvm-abi · GitHub) and only does lowering to LLVM IR types in llvm-abi/lib/FunctionIRMapping.cpp at master · scrossuk/llvm-abi · GitHub.

The FunctionIRMapping part lowers from the ABI information to LLVM IR. Having something like this is not strictly required, though it’s probably a nice utility for simpler users. (I expect that the major consumers will want to do the LLVM IR lowering themselves, they just need the ABI information.)

I haven’t looked particularly deeply into this library, but yes, I expect it’s close to the goal here, at least in terms of high level concepts, if not implementation details.

Yes, Clang already implements everything that is necessary. This projects is basically “just” extracting the current Clang code into a separate library. Of course, this is a lot harder than it sounds, because the implementation in Clang is tightly coupled to other parts of Clang, most importantly the Clang AST type system.

To provide some code references for how Clang currently handles this:

  • Clang AST Types: We’ll need our own type system representation, which will hopefully be much simpler than than this.
  • ABIInfo is the interface implemented by individual targets and ABIArgInfo + CGFunctionInfo is the result returned by it. The ABI library will want to return something similar.
  • CodeGen/Targets includes the implementations for the different targets. Key methods to look at are computeInfo, classifyArgumentType and classifyReturnType. For example, X86_64ABIInfo::computeInfo is the main entry point for the x86_64 SysV ABI.
  • CGCall implements the actual lowering to LLVM IR based on the ABI information. See for example EmitCall.

It may also be interesting to look at some ABI documents to get a picture of the kind of rules involved here. For example:

1 Like

Hi, I’m also looking forward to participating this project.

So generally speaking and from the codebase i’ve readed about, this really feels like decoupling and refactoring a lot of stuff. I have two questions

  1. Should this library also provide C-ffi compatible functions and types so that we could make this library really frontend and language agnostic and limit their workload to binding writings?

  2. This is rather naive since this is my first time ever touching the codebase of LLVM. Will a universal library be possible to hinder some codegen optimization that is based on the language standard. If so, is it tolerable or not or should we provide some optimization hints options or plugins so that the frontend consumer can guide this universal library to codegen better llvm code?

It is!

At least in the long term, definitely. We generally like to have C bindings for everything that frontends need to generate LLVM IR, and this would be part of it.

However, as C bindings are semi-stable (in the sense that we can only ever remove functions, not modify them), we’ll want to make sure the design of the C++ library is mostly stable before we start thinking about exposing C bindings.

I don’t think it should cause any optimization regressions. If necessary, languages that want to use the C ABI but add additional requirements on top, should be able to do so by emitting additional argument attributes.

Hi @nikic

I’m interested in this project. I previously worked on implementing an toy backend. I have a few questions about the ABI lowering library:

From what I understand, the same C function prologue can be lowered into LLVM IR in multiple ways depending on the ABI, platform, and architecture. For example, given:

typedef struct {
    uint32_t a;
    char b;
    uint32_t d;
    char c[3];
} sometype_t;  

void some_func(sometype_t type);

The function can be lowered into different LLVM IR representations, such as:

define dso_local void @some_func([2 x i64] %0)      ...
define dso_local void @some_func(i64 %0, i64 %1)    ...
  • If I understand correctly, the purpose of this library is to allow the frontend to provide basic type information (e.g., that a struct is being passed as a function argument), and in turn, the library determines the correct ABI representation for the frontend to use in LLVM IR. This would simplify the frontend developer’s work. Is that the intended goal?

  • Additionally, in a post, you mentioned that you want this library to be used in Clang. There are several ways to approach this—are you considering modifying the X86_64ABIInfo class to integrate it? More specifically somewhere around here. You linked EmitCall in this post, but it seems like an ABI lowering library operating at that level might not work seamlessly with it. Especially it we were to have two different type system.

  • Finally, how is this library going to handle the difference in hardware (i.e. having or not having FPU)? Part of the calling conventions depends on that. Are we just going to duplicate code?

Thanks,

Vincent

Yes, that’s right.

Not quite: The end goal would be to eliminate X86_64ABIInfo entirely, and have Clang consume the result. So the end result would go something like this:

clang::QualType → abi::Type → abi::X86_64ABIInfo → abi::FunctionInfo → clang::CodeGen

Of course, it is best to do this kind of transition incrementally, so that some targets can use the old implementation and some the new implementation while migrating. This could be done in a few ways, e.g. we could have some temporary mapping code from abi::FunctionInfo to clang::CGFunctionInfo.

In short: The same way that Clang currently does. We’d want to pass down any relevant options there currently are, such as target-abi, FloatABI::Hard/Soft/SoftFP, CallingConv, etc and adjust the ABI based on that.

Generally, the handling of soft-float/hard-float ABIs (and more generally, feature-dependent ABIs) in LLVM is currently quite lacking for most older targets and could use a lot of improvements, but that’s beyond the scope of this project. We just want to replicate existing behavior.

So the end result would go something like this:

clang::QualType → abi::Type → abi::X86_64ABIInfo → abi::FunctionInfo → clang::CodeGen

If I’m not mistaken, currently, in arrangeLLVMFunctionInfo() the computeInfo() method maps from clang::QualTypes → abi::ABIArgInfo. This is then lowered by emitCall().
So is the library supposed to expose some API like computeInfo() to provide mapping from AST Types to ABI Types, or should it expose the entire Call Lowering logic(the logic implemented in CGCall.cpp, specifically emitCall())?

clang::QualType → abi::Type → abi::X86_64ABIInfo → abi::FunctionInfo → clang::CodeGen

Also, currently the ABI Info is injected in the AST as such, right?
clang::QualType → abi::ABIArgInfo → LLVM-IR (RValue).
So could you please shed some light on the additional type conversions you mentioned and why they are needed, specifically abi::Type and abi::FunctionInfo

Hi @nikic, I would like some basic feedback before I submit my final proposal. Based on my understanding, here is my attempt at creating the prototype you described.

Full code is listed here

Example Usage

Consider the following C code for which the frontend needs ABI lowering information:

typedef struct {
    uint64_t a, b, c, d, e; 
} some_type_t;

uint64_t some_func(some_type_t type);

Using the current prototype, the frontend would generate:

  std::shared_ptr<ABI::Type> arg = std::make_shared<ABI::Integer>(/*size*/ 8);
  std::vector<std::shared_ptr<ABI::Type>> args{arg, arg, arg, arg, arg};
  std::shared_ptr<ABI::StructType> arg_one =
      std::make_shared<ABI::StructType>(args);

  std::shared_ptr<ABI::Type> returnType =
      std::make_shared<ABI::Integer>(/*size*/ 8);
  ABI::FunctionInfo FI({arg_one}, returnType, ABI::CallingConvention::C);

To obtain ABI lowering information, we do:

ABI::X86_64ABIInfo abiLowering;
abiLowering.ComputeInfo(FI);

assert(FI.getReturnInfo().Info.GetKind() == ABI::Direct);

// struct larger than 4, 8 bytes should be in memory
auto ArgIterator = FI.GetArgBegin();
ABI::ABIArgInfo abiInfo = ArgIterator->Info;
assert(abiInfo.GetKind() == ABI::Indirect);

If we can successfully convert types from Clang’s type system to the types used in this ABI lowering library, we can correctly determine ABI lowering. In theory, this could eliminate the need for X86.cpp entirely and move the abi logic into this library. Is this what you are looking for?

Basically doing this:

clang::QualType → abi::Type → abi::X86_64ABIInfo → abi::FunctionInfo → clang::CodeGen

Full code is listed here

Some Questions:

  1. Should this library be independent of Clang’s type system, meaning it can be built without linking against libclang?
  2. Where should the type conversion happen? Will Clang itself provide bindings to convert its types to this library’s type system?

Any feedback would be greatly appreciated before I finalize my proposal for Google Summer of Code.

Vincent

Note:

  • The prototype is incomplete in many ways
  • Bitfields are not yet represented in the ABI lowering.
  • Vector types are currently unsupported
  • StructType Lowering Limitation – Currently, StructType can only be lowered when passed on the stack, meaning register-passing cases are not handled.

I think it shouldn’t use Clang libraries, since other language frontends will want to use it, IIRC Rust is one of those.
I think it should be a library that Clang uses, essentially the library kinda sits between Clang and LLVM.

As part of making it easier to use from other language frontends, I think it should have a C API in addition to the C++ API, that said just C++ is fine for prototyping with Clang.

3 Likes

Why all the shared_ptr everywhere? Is it necessary or can we just use value types and unique_ptr as much as possible.

2 Likes

That makes a lot of sense. At some point, someone will have to take on the responsibility of creating bindings between the two type systems. Keep in mind that for this library to determine the correct ABI lowering, it essentially needs to replicate part of the c type system.

The use of std::shared_ptr is completely unnecessary—I only used it for convenience to avoid dealing with raw pointers for dynamic casting. In fact, in the example above, it wouldn’t take much effort to allocate everything on the stack instead.

The interface for this type system is not really thought through and needs a lot of polishing and by no means final.

1 Like

(I’m out of office until April 2, so it may take me a while to respond to questions/drafts for this project.)

From a cursory look, this looks like the right direction at a very high level. The details are not really idiomatic for how LLVM does things. Mehdi already pointed out the use of shared_ptr (which we just categorically don’t use). The other thing is that LLVM typically does not use virtual methods when modelling these kinds of class hierarchies and instead has its own dynamic cast mechanism, see How to set up LLVM-style RTTI for your class hierarchy — LLVM 21.0.0git documentation.

Yes. I expect that the only libraries we’ll depend on are Support, TargetParser and possibly IR from LLVM. No dependencies on Clang at all.

Yes, it should happen in Clang.

2 Likes

I don’t think you’re going to be able to come up with a portable concept of “ABI type” that is both (1) significantly less complicated than the Clang AST and (2) captures everything from C that any possible platform might be interested in. Platform ABIs can and often do vary according to all sorts of minor details of what’s been written in the source. Please look at the actual record layout code in Clang for an idea of what has to be supported here. I am very concerned about this approach, because I don’t think there’s a viable path for Clang to adopt it, and I’m afraid that that means it is doomed to never be more than a buggy re-implementation.

I think this is a correctable problem. You’re imagining a single, portable library that takes in high-level information and a target specification and spits out ABI details. I would suggest instead breaking it down more like this:

First, the low-level lowering decisions are made by target-specific libraries. These libraries consume low-level information — basically, this is implementing the algorithms described in the target psABI, strictly in terms of the cases that the psABI distinguishes. For example:

  • If the psABI says that e.g. _Complex T is treated exactly like struct { T x,y; }, then the API shouldn’t have a case for _Complex T. But if the psABI says that _Complex T is treated specially, it needs to be a case you can represent.
  • The argument layout code shouldn’t get passed a high-level struct type, it should get whatever details of the aggregate layout that argument lowering cares about. If aggregates are always passed on the stack, this is probably just the size and alignment. If they can be broken up into registers, you might also need to take the result of the aggregate classification algorithm (which would be available as a separate function in the library).

Don’t be afraid of writing these libraries in creative ways that only work because of the details of the target. Like, in the abstract, argument layout needs to get passed the complete argument list ahead of time because it might pass the float in void (float, double) differently from the float in void (float, int). In practice, it’s an online algorithm on every single target I know of: you consider the return type, then each argument in order. And that means you can just have the argument layout algorithm be a class type that you call methods on to add specific kinds of argument. And that might make a lot of things easier and more performant around things like aggregate layout.

Once those target-specific libraries are written, you can build portable libraries on top of them. Each library would consume a specific kind of high-level input; for example:

  • You could have a portable library that expects Clang ASTs.
  • You could also have a portable library that expects some intentionally-simplified type system. Since you’re not trying to handle all of C in this, you can just leave out difficult cases, like bit-fields.

The latter would be enough to get simple cases working, which is probably enough for Rust and other frontends. Someone trying to matching the C ABI for a really complex C use case should probably just be using Clang as a library, though.

Who says we’re shooting for “less complicated than the Clang AST” here?

Is there an implicit assumption/suggestion here that the only use-case for targeting a really complex C ABI is the C language itself? Otherwise I’m not understanding. But if that is the implication, then I beg to differ; I would certainly like to be able to emit C and C++ ABI compatible IR from my wholly different/distinct language without first building the corresponding C++ AST. Hell, actually I wouldn’t even be mad having to link libclang if I could actually do that using libclang (emit compatible ABI using public APIs). But even that isn’t possible because all the APIs in CodeGen/* are private (not in headers). Indeed, maybe a first, very dirty, experiment here could be to see how many APIs in CodeGen/* need to be exposed in headers in order to enable emitting ABI compatible IR using just libclang via direct AST node construction and lowering.

I took a quick look and I don’t see much that’s very frightening; skimming the “queries” on ASTRecordLayout objects I see primarily getSize, getNonVirtualSize, getFieldOffset (but yes I did see bit fields are handled carefully).

So, with respect to the first snippet I quoted, I don’t think the goal of this library/project/work-stream is to necessarily simplify the ABI as represented by Clang AST types or the new type system. There’s a “conservation of information” law at play here and we a-priori shouldn’t expect there to be any bloat given that the CodeGen subtree is well-worn by now. In my mind, the goal here is more mundane: just a better factorization of the current emission code so that another frontend language can emit ABI compatible IR without first translating to textual C/C++.