[RFC][tblgen] Generate linker code with TableGen

Problem statement

The main job of the linker is to process relocations, that is, to calculate addresses of global variables and functions and insert these addresses into instructions like CALL or MOVI32.
Both gnu-ld and llvm were initially developed for x86 architecture, where instruction encoding is rather simple, so address-inserting code was written by hand.

However, in modern targets instruction encoding is much more complex, which leads to code like this:

// From llvm-project/lld/ELF/Arch/RISCV.cpp
static uint32_t itype(uint32_t op, uint32_t rd, uint32_t rs1, uint32_t imm) {
  return op | (rd << 7) | (rs1 << 15) | (imm << 20);
}

// From llvm-project/lld/ELF/Arch/LoongArch.cpp
static uint32_t setD5k16(uint32_t insn, uint32_t imm) {
  uint32_t immLo = extractBits(imm, 15, 0);
  uint32_t immHi = extractBits(imm, 20, 16);
  return (insn & 0xfc0003e0) | (immLo << 10) | immHi;
}
// From llvm-project/lld/ELF/Arch/LoongArch.cpp
static uint32_t findMaskR16(Ctx &ctx, uint32_t insn) {
  if (isDuplex(insn))
    return 0x03f00000;

  // Clear the end-packet-parse bits:
  insn = insn & ~instParsePacketEnd;

  if ((0xff000000 & insn) == 0x48000000)
    return 0x061f20ff;
  if ((0xff000000 & insn) == 0x49000000)
    return 0x061f3fe0;
  if ((0xff000000 & insn) == 0x78000000)
    return 0x00df3fe0;
  if ((0xff000000 & insn) == 0xb0000000)
    return 0x0fe03fe0;

  if ((0xff802000 & insn) == 0x74000000)
    return 0x00001fe0;
  if ((0xff802000 & insn) == 0x74002000)
    return 0x00001fe0;
  if ((0xff802000 & insn) == 0x74800000)
    return 0x00001fe0;
  if ((0xff802000 & insn) == 0x74802000)
    return 0x00001fe0;

  for (InstructionMask i : r6)
    if ((0xff000000 & insn) == i.cmpMask)
      return i.relocMask;

  Err(ctx) << "unrecognized instruction for 16_X type: 0x" << utohexstr(insn);
  return 0;
}

Such code contains lots of magic numbers which are very error-prone and hard to maintain.

Proposed solution

Actually, we do not need to write value-inserting code by ourself, because MC-Layer already has this information.

Instruction definitions has the following structure:

// From llvm/lib/Target/RISCV/RISCVInstrFormats.td
class RVInstRFrm<bits<7> funct7, RISCVOpcode opcode, dag outs, dag ins,
                 string opcodestr, string argstr>
    : RVInst<outs, ins, opcodestr, argstr, [], InstFormatR> {
  bits<5> rs2;
  bits<5> rs1;
  bits<3> frm;
  bits<5> rd;

  // The following lines describe instruction encoding:
  let Inst{31-25} = funct7;
  let Inst{24-20} = rs2;
  let Inst{19-15} = rs1;
  let Inst{14-12} = frm;
  let Inst{11-7} = rd;
  let Inst{6-0} = opcode.Value;
}

I propose to add a module to TableGen, which will generate code for inserting relocated values inside the linker.

Proposed API

A new tblgen class will be added to llvm/include/llvm/Target/Target.td:

class FixupKind {

  list<Instruction> Instructions = [];
  string OpName = "";
  int ValueShift;

  bit isPCRel;
  bit isGPRel;
  bit SignedOperand;
}

which will define each TargetFixupKind.

  • Instructions will contain a list of all instructions, which need to be relocated with this fixup.
  • OpName - name of the operand, which needs to be rellocated (we assume, that for a given fixup all instructions have the same naming convention, this could be easily changed)
  • ValueShift - number of bits for arithmetic shift right before relocation (useful, if target has special alignment requirements on jump targets)
  • isPCRel, isGPRel should value be PC or GP relative
  • SignedOperand - when decoding a value, should it be zero-extended or sign-extended.

In addition to autogenerating value-inserting code, this API will guarantee, that all instructions for this FixupKind have the same operand encoding.

All target-specific classes, derived from FixupKind will be used in generation of the following functions and extending Fixups enum:

// Will return true for fixups with isPCRel = 1;
static bool isPCRel(<TARGET>::Fixups Kind);

// Will return true for fixups with isGPRel = 1;
static bool isGPRel(<TARGET>::Fixups Kind);

// Will generate bitstring, where all Value bits are placed in the right positions.
// If CheckImmediate is set to true, immediate will be checked for fitness
template <bool CheckImmediate>
static uint64_t encodeFixup(unsigned Kind, uint32_t Value);

// extract immediate value from instruction bitstring
static uint64_t decodeFixup(unsigned Kind, uint64_t Instr);

The proposed usage of this API is the following:

// Let's assume that:
// - `Instr` contains instruction bitstring before relocation
// - `Kind` is a proper fixup kind for this instruction
// - `Value` immediate value, which we want to insert into this instruction

// First, generate bitmask, which sets 1 to all bits of immediate value to be relocated
uint64_t ImmediateMask =  encodeFixup<false>(Kind, (int64_t)-1);
uint64_t EncodedValue =  encodeFixup<true>(Kind, Value);

uint64_t RelocatedInstr = (Inst & !ImmediateMask) | EncodedValue;

// Decoding (can be used in objdump and other tools):
// - `Instr` is a bitstring of the instruction
uint64_t Value = decodeFixup(Kind, Instr);

Implementation Plan

This tblgen extension is fully implemented by Arseny Tikhonov and passed internal review.
We are preparing patches, which could be open-sourced.

Request for comments

We are looking forward to your feedback. After discussion, if you like the general approach, we plan to submit patches to llvm-project

Evgeny Lomov
Compiler Lab - LRC, Huawei

2 Likes

cc @MaskRay

You are not the first to suggest we could be better at sharing linker code with the several implementations we have, thank you for working on doing so :smiley:.

IIRC, there are three places we end up applying fixups/relocations:

  • The assembler (LLVM’s MC layer), when we can resolve the fixup there directly.
  • The linker (LLD), when a relocation needs to be applied
  • LLVM’s JITLink infrastructure (which I am not very familiar with)

I hope this can work with all three.

I also have some more specific feedback for making this target-independent:

Most targets don’t have something like GP-relative. You might want to have a TSFlags field which targets can use in their own way.

Another note is that you may need a way to associate relocation codes with fixups - given that linkers usually only see relocation codes, but the assembler (and I think the JIT) both use fixup kinds. This may be difficult, as e.g. on RISC-V the mapping isn’t 1:1 due to vendor relocations. This could also help to generate some of MCELFObjectTargetWriter::getRelocType - though I note this now becomes object-file specific which is another issue.

It would be good if this had a clearer interface that left raising errors to the callers, maybe something like the following:

// For checking if a value fixes in this fixup kind - leaving errors to the caller
static bool checkFixup(unsigned Kind, uint64_t Value);

// Write Immediate `Value` into `Instruction` according to `Kind` (combines your encodeFixup<true> and encodeFixup<false> calls, as well as the masking etc)
static uint64_t encodeFixup(unsigned Kind, uint64_t Instruction, uint64_t Value);

// Read Immediate `Value` from `Instruction` (just like your `decodeFixup`)
static uint64_t readWithFixup(unsigned Kind, uint64_t Instruction);

I think this would also make it easier to integrate with e.g. MCAsmBackend::fixupNeedsRelaxationAdvanced, which is another nasty place that fixups need to be checked (without being applied).

2 Likes

IIRC, there are three places we end up applying fixups/relocations

Yes, you are right. We are already using table-generated relocations in both RuntimeDyLd (the JIT-linker used in BOLT) and integrated assembler.

Right now we are not using lld, but as far as I can tell, it will also work without problems.

That’s a good point. When we’ll add TSFlags, I’ll update this RFC.

We’ve already considered this, but turns out, that Fixup <-> Relocation correspondence can sometimes be too complex to implement it in target-independent way. That’s why I explicitly work with fixups only.

Thanks! I agree, that such interface is more clear and easy to integrate to other parts of MC-Layer.

Again, I’ll update this RFC, when we’ll have a new implementation.

Thanks for you feedback!

Some thoughts from the Arm/AArch64 linker perspective. I can’t comment too much about the other use cases, although assembler fixups are similar. Apologies if this is a bit rambling, let me know if I’ve not been clear enough and I’ll try and expand.

I’ve seen attempts like this (internally to the company) fail before, I think the main reasons behind the failure were:

  • There’s a lot of relocations to add when bringing up a new architecture, but once the core instruction set is done the pace of new relocations drops significantly.
  • Relocations and Instruction Sets are fixed so what has been implemented is stable (modulo tool refactoring).
  • The tools often have different data structures for relocations that suite their own situation. The linker in particular has to be careful to keep the size down as there can be millions of relocations. There is sometimes an impedance mismatch between the tools data-structures and the shared API. As an example would a linker have to translate the ELF/COFF/MachO fixup into a MC layer fixup to use the API?
  • Linker’s often abstract the relocation calculation into several standard forms, but these are not just used for resolving the relocation, they often carry additional information that is only used by the linker, such as does a GOT entry need creating? Can this relocation go via the PLT? This may not correspond to MC fixups.
  • A lot of the implementation effort is spent on special cases [*] that are difficult to handle generically as they often need additional context to resolve.
  • Unless the shared API is there at the start, it is way more effort to change the existing tool than it is worth.

What I think would be most helpful would be a low-level API that didn’t try and do too much in one go. For example just extracting the addend (for SHT_REL) or writing the addend would be very helpful for many cases. If the API can do resolution of the relocation given certain inputs make sure that’s an optional step so that the tool can do its own thing if necessary.

Looking at the proposed API itself it looks like instead of encoding/decoding based on instruction, it would be based on a MC fixup type. The advantage of that is the encoding process might be able to handle some special cases per fixup. However it does mean that a linker would need to translate a raw ELF/COFF/MachO relocation type to the MC equivalent which is not always obvious. There’s also a bit of waste as many different fixups will share the same immediate fields.

Overall it is difficult to tell from the API alone whether it will be net win, it probably needs an attempt to implement a subset of the relocations for a target in all the tools to compare it to the existing code.

Good luck, it will be great if you can make this work successfully.

[*] Special cases, particularly common on REL platforms.

  • Magic addend values to encode constants that can’t be represented in the instruction encoding. For example abi-aa/aaelf32/aaelf32.rst at main · ARM-software/abi-aa · GitHub
  • Complicated calculations such as the Arm Group Relocations.
  • Handling Arm/Thumb interworking that needs additional information about the state of the target to resolve the relocation.
  • Thumb Branch relocations in Arm that have extra range using additional bits, but only permitted on some architectures.
  • AArch64 where instruction and data endianness is different.
  • Rewriting instructions by changing the bit-pattern outside of the addend field. For example changing an add into a sub when the result of the calculation is negative, or changing to a different branch to change state from Arm to Thumb.
2 Likes

That’s right! The reason, why I’ve started writing this RFC is that our team is working on downstream VLIW targets, where instruction encodings and/or relocations could change every generation, so manual relocation support becomes a burden.
Probably, this RFC would not be very useful for mainstream targets ans x86 or ARM, as they already have a fixed set of relocations implemented and tested. However, I hope, that such tblgen module will help in development of new targets.

Short answer: yes, it would have to.
Longer answer: in this API only MCFixupKind enum is used (and autogenerated), so that in the worst case scenario enum-to-enum transformation will need to be implemented.

In our API encoder and decoder functions are declared as static, so they could (and likely will be) inlined. After inlining, enum-to-enum transformation could be eliminated.

That’s another reason, why we are explicitly not attempting to deal with relocations instead of fixups. This API focus only on value-insertion code, so all the address-calculating stuff and handling for PLC/GOT still have to be implemented manually.
Indeed, a set of target-specific features seems too big to handle it in a generic way.

Such cases should removed by SimplifyCFGPass or other optimizations.

This case is already handled by this API, as the whole instruction bit-pattern is passed to the encoder. As for other mentioned special cases, they will be handled before calls to this API.

Thanks for your feedback! I try check the performance impact of this patch.

Thanks for the reply. I thought a bit more over lunch to see about how the API might be used within LLD.

I’m happy to hear that the scope of the API is limited to encoding and decoding.

A small disadvantage that I hadn’t thought of before is that if a user limited the MC backends that they were building then the relocation code in lld would have to be conditionally compiled in to match. That likely applies to the other binary utilities too. I expect that’s largely plumbing though.

I think it would be useful for a lot of common relocations. We’d end up with something like

int64_t TARGET::getImplictAddend(const uint8_t *buf, RelType type) {
  fixup = relTypeToFixupAddend(type);
  if (fixup == NotHandled)
    return getImplicitAddendCustom(buf, type);
  return decodeFixup(kind, read32(buf));
}

void TARGET::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) {
  fixup = relTypeToFixupRelocate(rel.type);
  if (fixup == NotHandled)
    relocateCustom(loc, const rel, val);
  uint64_t ImmediateMask =  encodeFixup<false>(Kind, (int64_t)-1);
  uint64_t EncodedValue =  encodeFixup<true>(Kind, Value);

  uint64_t RelocatedInstr = (Inst & !ImmediateMask) | EncodedValue;
  // Thumb would be a pain and need to be split into two separate 16-bit
  // words so that endianness would get handled.
  write32(RelocatedInstr);
}

I guess the usefulness will be proportional to how many relocations can be handled by the simple path, and how much need custom handling. The other part of the trade-off that I can’t speak to is how difficult it is to describe the fixups in tablegen.

The other potential drawback is that it does require more familiarity with MC than is currently necessary, although I expect the majority of LLVM developers contributing to lld, or the binutils equivalents to be in that camp.

As an aside in our proprietary linker in Arm was an instruction encoding/decoding library. As Arm was SHT_REL we needed to read the instruction anyway so relocation was often in pseudo code.

generic_instr = lib.decode(&place);
addend = generic_instr.getImmediate();
// Do relocation calculation, let's assume absolute
value = S + A;
generic_instr.setImmediate(value);
err = generic_instr.encode(&place);

This had the nice property that the library deduced what the instruction was and could extract its immediate without special processing. The downside is that the library didn’t know about any relocation specific processing so the linker would have to handle any magic addend values.

Would be a bit wasteful for RELA as in that case the instruction only needs to be written and not read. It was possible to construct a generic instruction from scratch, but this was way more complex a task than I’d have liked, requiring a lot of parameters. The alternative would be to use the instruction generated from Tablegen but that is not easy to reverse engineer from the relocation.

1 Like

Yes, but we can (relatively) easy achieve this via LLVM_TARGETS_TO_BUILD CMake variable and related machinery.

Right now description of a single fixup will look like this:

// Some made-up target
def Abs32: FixupKind {
    let ValueShift = 0;
    let Instructions = [CALL, MOVI32];
    let OpName = "imm32";
    let SignedOperand = 0;
}

So I guess, it won’t require familiarity with MC-Layer.)

I’ve also considered “full decode, change immediate, full encode” approach, but (besides making full MCAsmBackend required for relocation processing) it is very computationally suboptimal:
LLVM integrated disassembler is a huge FSA, which analyze instructions almost bit-by-bit.

For full disassembly it is OK, because it allows to deal with complex instruction encodings, but for relocations it is an overkill.

It will be nice to see if we can also have an API to replace the current instruction with a new instruction, to support usecases around linker relaxation.

An API to disassemble would also be useful for the linker to emit the disassembled instruction after relocation in some form of trace output.

That’s true, but it is out of scope of this RFC.

I apologize for the brevity of this comment. I may provide more details later.

I believe the drawbacks of this approach outweigh its benefits. @smithp35’s initial comment effectively summarizes the key issues.

Drawing from my recent experience refactoring the LLVM integrated assembler, I’ve outlined some thoughts in my blog posts:

In summary, while deduplicating code for instruction formats and related relocation types is appealing, the interface may introduce inefficiencies due to differing data structures and performance considerations.
The complexity arising from the linker reusing TableGen files and the LLVM_TARGETS_TO_BUILD issue should not be underestimated.

The current relocation generation scheme in the assembler involves multiple representation changes—from relocation specifier to fixup kind, and then to relocation type. I believe we can eliminate most fixups, as demonstrated in my changes:

  • 609586f7f61abf170425883fd8ae390b4a69cc0c (“LoongArch: Remove fixup kinds that map to relocation types”)
  • 89687e6f383b742a3c6542dc673a84d9f82d02de (“LoongArch: Remove TLS fixup kinds that map to relocation types”)

We could potentially eliminate additional fixup kinds for RISC-V, but I haven’t had the time to tackle this yet. In my view, RISC-V’s relocation generation scheme needs significant refactoring.

As I’ve already stated in the reply to @smithp35, we hope, that inlining this autogenerated functions will eliminate performance inefficiencies. However, we will make performance measurements to test this hypothesis.

This is true for lld, but two other parts: integrated assembler and RuntimeDyLd are already build with LLVM_TARGETS_TO_BUILD and tblgen outputs available.

That’s true for targets that have one-to-one correspondence between relocations and fixups, but it is not clear for me how to do it effectively in the opposite case.
Usually, it is done via “decode → change immediate → encode” workflow, but it is very ineffective. I’ve also seen partial hand-written decoding approach, but it seem much more fragile.

It would be a fair point to say, that relocation, which requre multiple fixups is a bad ABI design, however it is not always possible to change a published ABI specification…

After @lenary suggestion, we have chosen to redesign our API.
The new API looks like this:

// build-dir/lib/Target/TARGET/TARGETGenFixups.inc

static bool isPCRel(<TARGET>::Fixups Kind);
static bool hasSignedOperand(unsigned Kind);
static uint64_t getTSFlags(<TARGET>::Fixups Kind);

// Check, if value will fit into fixup
static bool checkFixup(unsigned Kind, uint64_t Value);

// Encode value into Instruction with this fixup Kind
static uint64_t encodeFixup(unsigned Kind, uint64_t Instruction, uint64_t Value);

// Extract fixup value from instruction bits, assuming, that it is encoded as a partiucular fixup Kind
static uint64_t readWithFixup(unsigned Kind, uint64_t Instruction);

Again, thanks to @lenary for this suggestion.

@MaskRay @smithp35 what do you think?
Please, pay attention, that this RFC focuses only on encoding and decoding values.
My goal is only to decouple actual instruction encodings with relocation-related code.

1 Like

Without seeing how the proposed API is used, it’s very difficult to draw a conclusion. While I still believe the TableGen linker code is likely a dead end, I would certainly welcome exploration on the assembler side. Are you trying to improve the backend MCAsmBackend::applyFixup functions?

You might want to take a look at llvm/lib/Target/Sparc/MCTargetDesc/SparcAsmBackend.cpp . It defines very few fixup kinds. Its adjustFixupValue function primarily dispatches on ELF relocation types.