Problem statement
The main job of the linker is to process relocations, that is, to calculate addresses of global variables and functions and insert these addresses into instructions like CALL
or MOVI32
.
Both gnu-ld
and llvm
were initially developed for x86 architecture, where instruction encoding is rather simple, so address-inserting code was written by hand.
However, in modern targets instruction encoding is much more complex, which leads to code like this:
// From llvm-project/lld/ELF/Arch/RISCV.cpp
static uint32_t itype(uint32_t op, uint32_t rd, uint32_t rs1, uint32_t imm) {
return op | (rd << 7) | (rs1 << 15) | (imm << 20);
}
// From llvm-project/lld/ELF/Arch/LoongArch.cpp
static uint32_t setD5k16(uint32_t insn, uint32_t imm) {
uint32_t immLo = extractBits(imm, 15, 0);
uint32_t immHi = extractBits(imm, 20, 16);
return (insn & 0xfc0003e0) | (immLo << 10) | immHi;
}
// From llvm-project/lld/ELF/Arch/LoongArch.cpp
static uint32_t findMaskR16(Ctx &ctx, uint32_t insn) {
if (isDuplex(insn))
return 0x03f00000;
// Clear the end-packet-parse bits:
insn = insn & ~instParsePacketEnd;
if ((0xff000000 & insn) == 0x48000000)
return 0x061f20ff;
if ((0xff000000 & insn) == 0x49000000)
return 0x061f3fe0;
if ((0xff000000 & insn) == 0x78000000)
return 0x00df3fe0;
if ((0xff000000 & insn) == 0xb0000000)
return 0x0fe03fe0;
if ((0xff802000 & insn) == 0x74000000)
return 0x00001fe0;
if ((0xff802000 & insn) == 0x74002000)
return 0x00001fe0;
if ((0xff802000 & insn) == 0x74800000)
return 0x00001fe0;
if ((0xff802000 & insn) == 0x74802000)
return 0x00001fe0;
for (InstructionMask i : r6)
if ((0xff000000 & insn) == i.cmpMask)
return i.relocMask;
Err(ctx) << "unrecognized instruction for 16_X type: 0x" << utohexstr(insn);
return 0;
}
Such code contains lots of magic numbers which are very error-prone and hard to maintain.
Proposed solution
Actually, we do not need to write value-inserting code by ourself, because MC-Layer already has this information.
Instruction definitions has the following structure:
// From llvm/lib/Target/RISCV/RISCVInstrFormats.td
class RVInstRFrm<bits<7> funct7, RISCVOpcode opcode, dag outs, dag ins,
string opcodestr, string argstr>
: RVInst<outs, ins, opcodestr, argstr, [], InstFormatR> {
bits<5> rs2;
bits<5> rs1;
bits<3> frm;
bits<5> rd;
// The following lines describe instruction encoding:
let Inst{31-25} = funct7;
let Inst{24-20} = rs2;
let Inst{19-15} = rs1;
let Inst{14-12} = frm;
let Inst{11-7} = rd;
let Inst{6-0} = opcode.Value;
}
I propose to add a module to TableGen, which will generate code for inserting relocated values inside the linker.
Proposed API
A new tblgen
class will be added to llvm/include/llvm/Target/Target.td
:
class FixupKind {
list<Instruction> Instructions = [];
string OpName = "";
int ValueShift;
bit isPCRel;
bit isGPRel;
bit SignedOperand;
}
which will define each TargetFixupKind
.
Instructions
will contain a list of all instructions, which need to be relocated with this fixup.OpName
- name of the operand, which needs to be rellocated (we assume, that for a given fixup all instructions have the same naming convention, this could be easily changed)ValueShift
- number of bits for arithmetic shift right before relocation (useful, if target has special alignment requirements on jump targets)isPCRel
,isGPRel
should value bePC
orGP
relativeSignedOperand
- when decoding a value, should it be zero-extended or sign-extended.
In addition to autogenerating value-inserting code, this API will guarantee, that all instructions for this FixupKind
have the same operand encoding.
All target-specific classes, derived from FixupKind
will be used in generation of the following functions and extending Fixups
enum:
// Will return true for fixups with isPCRel = 1;
static bool isPCRel(<TARGET>::Fixups Kind);
// Will return true for fixups with isGPRel = 1;
static bool isGPRel(<TARGET>::Fixups Kind);
// Will generate bitstring, where all Value bits are placed in the right positions.
// If CheckImmediate is set to true, immediate will be checked for fitness
template <bool CheckImmediate>
static uint64_t encodeFixup(unsigned Kind, uint32_t Value);
// extract immediate value from instruction bitstring
static uint64_t decodeFixup(unsigned Kind, uint64_t Instr);
The proposed usage of this API is the following:
// Let's assume that:
// - `Instr` contains instruction bitstring before relocation
// - `Kind` is a proper fixup kind for this instruction
// - `Value` immediate value, which we want to insert into this instruction
// First, generate bitmask, which sets 1 to all bits of immediate value to be relocated
uint64_t ImmediateMask = encodeFixup<false>(Kind, (int64_t)-1);
uint64_t EncodedValue = encodeFixup<true>(Kind, Value);
uint64_t RelocatedInstr = (Inst & !ImmediateMask) | EncodedValue;
// Decoding (can be used in objdump and other tools):
// - `Instr` is a bitstring of the instruction
uint64_t Value = decodeFixup(Kind, Instr);
Implementation Plan
This tblgen
extension is fully implemented by Arseny Tikhonov and passed internal review.
We are preparing patches, which could be open-sourced.
Request for comments
We are looking forward to your feedback. After discussion, if you like the general approach, we plan to submit patches to llvm-project
Evgeny Lomov
Compiler Lab - LRC, Huawei