A Complete Guide to LLVM for Programming Language Creators
A Complete Guide to LLVM for Programming Language Creators
TABLE OF
CONTENTS
A Complete
CREATING THE BOLT COMPILER: PART 8 Guide to LLVM
for
Programming
A Complete Guide to Language
Creators
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 1/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Update: this post has now taken off on Hacker News and Reddit. Thank you all!
If you’ve just joined the series at this stage, here’s a quick recap. We’re designing
a Java-esque concurrent object-oriented programming language Bolt. We’ve
gone through the compiler frontend, where we’ve done the parsing, type-
checking and dataflow analysis. We’ve desugared our language to get it ready for
LLVM - the main takeaway is that objects have been desugared to structs, and
their methods desugared to functions.
Learn about LLVM and you’ll be the envy of your friends. Rust uses LLVM for its
backend, so it must be cool. You’ll beat them on all those performance
benchmarks, without having to hand-optimise your code or write machine
assembly code. Shhhh, I won’t tell them.
The C++ class definitions for our desugared representation (we call this Bolt IR)
can be found in deserialise_ir folder. The code for this post (the LLVM IR
generation) can be found in the llvm_ir_codegen folder. The repo uses the Visitor
design pattern and ample use of std::unique_ptr to make memory
management easier.
To cut through the boilerplate, to find out how to generate LLVM IR for a
particular language expression, search for the
IRCodegenVisitor::codegen method that takes in the corresponding
ExprIR object. e.g. for if-else statements:
Copy
Understanding LLVM IR
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 2/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
LLVM sits in the middle-end of our compiler, after we’ve desugared our language
features, but before the backends that target specific machine architectures
(x86, ARM etc.)
The upshot of it is that LLVM IR looks like a more readable form of assembly. As
LLVM IR is machine independent, we don’t need to worry about the number of
registers, size of datatypes, calling conventions or other machine-specific details.
And rather than allocating specific sizes of datatypes, we retain types in LLVM IR.
Again, the backend will take this type information and map it to the size of the
datatype. LLVM has types for different sizes of int s and floats, e.g. int32 ,
int8 , int1 etc. It also has derived types: like pointer types, array types,
struct types, function types. To find out more, check out the Type
documentation.
Now, built into LLVM are a set of optimisations we can run over the LLVM IR e.g.
dead-code elimination, function inlining, common subexpression elimination
etc. The details of these algorithms are irrelevant: LLVM implements them for
us.
Our side of the bargain is that we write LLVM IR in Static Single Assignment (SSA)
form, as SSA form makes life easier for optimisation writers. SSA form sounds
fancy, but it just means we define variables before use and assign to variables
only once. In SSA form, we cannot reassign to a variable, e.g. x = x+1 ; instead
we assign to a fresh variable each time ( x2 = x1 + 1 ).
So in short: LLVM IR looks like assembly with types, minus the messy machine-
specific details. LLVM IR must be in SSA form, which makes it easier to optimise.
Let’s look at an example!
An example: Factorial
Let’s look at a simple factorial function in our language Bolt:
factorial.bolt
Copy
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 3/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
n * factorial(n - 1)
}
}
factorial.ll
Copy
then: ; preds
br label %ifcont
else: ; preds
%sub = sub i32 %0, 1 // n - 1
%2 = call i32 @factorial(i32 %sub) // factorial(n-1)
%mult = mul i32 %0, %2 // n * factorial(n-1)
br label %ifcont
ifcont: ; preds
%iftmp = phi i32 [ 1, %then ], [ %mult, %else ]
ret i32 %iftmp
}
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 4/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
At this point, I’m going to assume you have come across Control Flow
Graphs and basic blocks. We introduced Control Flow Graphs in a
previous post in the series, where we used them to perform different
dataflow analyses on the program. I’d recommend you go and check the
CFG section of that dataflow analysis post now. I’ll wait here :)
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 5/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 6/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
example_program.c
Copy
// a C main function
int main(){
factorial(10);
return 0;
}
We specify that we want to compile for an Intel Macbook Pro in the module
target info:
example_module.ll
Copy
source_filename = "Module"
target triple = "x86_64-apple-darwin18.7.0"
...
define i32 @factorial(i32) {
...
}
define i32 @main() {
entry:
%0 = call i32 @factorial(i32 10)
ret i32 0
}
LLVM defines a whole host of classes that map to the concepts we’ve talked
about.
Value
Module
Type
Function
BasicBlock
BranchInst …
These are all in the namespace llvm . In the Bolt repo, I chose to make
this namespacing explicit by referring to them as llvm::Value ,
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 7/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
llvm::Module etc.)
Most of the LLVM API is quite mechanical. Now you’ve seen the diagrams that
define modules, functions and basic blocks, the relationship between their
corresponding classes in the API falls out nicely. You can query a Module
object to get a list of its Function objects, and query a Function to get the
list of its BasicBlock s, and the other way round: you can query a
BasicBlock to get its parent Function object.
Value is the base class for any value computed by the program. This could be
a function ( Function subclasses Value ), a basic block ( BasicBlock also
subclasses Value ), an instruction, or the result of an intermediate
computation.
ir_codegen_visitor.h
Copy
We’ll use the context to create just one module, which we’ll imaginatively name
"Module" .
ir_codegen_visitor.cc
Copy
context = make_unique<LLVMContext>();
builder = std::unique_ptr<IRBuilder<>>(new IRBuilder<>(*co
module = make_unique<Module>("Module", *context);
IRBuilder
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 8/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
The builder object has Create___() methods for each of the IR instructions.
e.g. CreateLoad for a load instruction , CreateSub , CreateFSub for
integer and floating point sub instructions respectively etc. Some
Create__() instructions take an optional Twine argument: this is used to
give the result’s register a custom name. e.g. iftmp is the twine for the
following instruction:
Use the IRBuilder docs to find the method corresponding to your instruction.
expr_codegen.cc
Copy
function_codegen.cc
Copy
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 9/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Type declarations
We can declare our own custom struct types.
e.g. a Tree with a int value, and pointers to left and right subtrees:
Copy
First we create the type with that name. This adds it to the module’s symbol
table. This type is opaque: we can now reference in other type declarations e.g.
function types, or other struct types, but we can’t create structs of that type (as
we don’t know what’s in it).
Copy
The second step is to specify an array of types that go in the struct body. Note
since we’ve defined the opaque Tree type, we can get a Tree * type using
the Tree type’s getPointerTo() method.
Copy
treeType->setBody(ArrayRef<Type *>({Type::getInt32Ty(*cont
So if you have custom struct types referring to other custom struct types in their
bodies, the best approach is to declare all of the opaque custom struct types,
then fill in each of the structs’ bodies.
class_codegen.cc
Copy
void IRCodegenVisitor::codegenClasses(
const std::vector<std::unique_ptr<ClassIR>> &classes)
// create (opaque) struct types for each of the classes
for (auto &currClass : classes) {
StructType::create(*context, StringRef(currClass->clas
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 10/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
}
// fill in struct bodies
for (auto &currClass : classes) {
std::vector<Type *> bodyTypes;
for (auto &field : currClass->fields) {
// add field type
bodyTypes.push_back(field->codegen(*this));
}
// get opaque class struct type from module symbol tab
StructType *classType =
module->getTypeByName(StringRef(currClass->classNa
classType->setBody(ArrayRef<Type *>(bodyTypes));
}
### Functions
The function prototype consists of the function name, the function type, the
“linkage” information and the module whose symbol table we want to add the
function to. We choose External linkage - this means the function prototype is
viewable externally. This means that we can link in an external function
definition (e.g. if using a library function), or expose our function definition in
another module. You can see the full enum of linkage options here.
function_codegen.cc
Copy
Function::Create(functionType, Function::ExternalLinkage,
function->functionName, module
To generate the function definition we just need to use the API to construct the
control flow graph we discussed in our factorial example:
function_codegen.cc
Copy
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 11/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
builder->SetInsertPoint(entryBasicBlock);
...
Stack allocation
There are two ways we can store values in local variables in LLVM IR. We’ve seen
the first: assignment to virtual registers. The second is dynamic memory
allocation to the stack using the alloca instruction. Whilst we can store ints,
floats and pointers to either the stack or virtual registers, aggregate datatypes,
like structs and arrays, don’t fit in registers so have to be stored on the stack.
Yes, you read that right. Unlike most programming language memory models,
where we use the heap for dynamic memory allocation, in LLVM we just have a
stack.
Heaps are not provided by LLVM - they are a library feature. For single-
threaded applications, stack allocation is sufficient. We’ll talk about the
need for a global heap in multi-threaded programs in the next post
(where we extend Bolt to support concurrency).
We’ve seen struct types e.g. {i32, i1, i32} . Array types are of the form
[num_elems x elem_type] . Note num_elems is a constant - you need to
provide this when generating the IR, not at runtime. So [3 x int32] is valid
but [n x int32] is not.
We give alloca a type and it allocates a block of memory on the stack and
returns a pointer to it, which we can store in a register. We can use this pointer
to load and store values from/onto the stack.
Copy
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 12/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Copy
Global variables
Just as we alloca local variables on a stack, we can create global variables
and load from them and store to them.
Global variables are declared at the start of a module, and are part of the
module symbol table.
We can use the module object to create named global variables, and to query
them.
Copy
module->getOrInsertGlobal(globalVarName, globalVarType);
...
GlobalVariable *globalVar = module->getNamedGlobal(globalV
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 13/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Copy
globalVar->setInitializer(initValue);
Copy
Copy
builder->CreateLoad(globalVar);
builder->CreateStore(someVal, globalVar); // not for const
GEPs
We get a base pointer to the aggregate type (array / struct) on the stack or in
global memory, but what if we want a pointer to a specific element? We’d need
to find the pointer offset of that element within the aggregate, and then add this
to the base pointer to get the address of that element. Calculating the pointer
offset is machine-specific e.g. depends on the size of the datatypes, the struct
padding etc.
The Get Element Pointer (GEP) instruction is an instruction to apply the pointer
offset to the base pointer and return the resultant pointer.
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 14/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Below we show the GEP instruction to calculate the pointer p+1 in each of the
arrays:
Copy
This i64 1 index adds multiples of the base type to the base pointer. p+1
for i8 would add 1 byte, whereas as p+1 for i32 would add 4 bytes to p .
If the index was i64 0 we’d return p itself.
Copy
Wait? Array of indices? Yes, the GEP instruction can have multiple indices passed
to it. We’ve looked at a simple example where we only needed one index.
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 15/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Before we look at the case where we pass multiple indices, I want to reiterate
the purpose of this first index:
A pointer of type Foo * can represent in C the base pointer of an array of type
Foo . The first index adds multiples of this base type Foo to traverse this array.
Copy
We want to index specific fields in the struct. The natural way would be to label
them field 0 , 1 and 2 . We can access field 2 by passing this into the GEP
instruction as another index.
Copy
For structs, you’ll likely always pass the first index as 0 . The biggest confusion
with GEPs is that this 0 can seem redundant, as we want the field 2 , so why
are we passing a 0 index first? Hopefully you can see from the first example
why we need that 0 . Think of it as passing to GEP the base pointer of an
implicit Foo array of size 1.
Copy
The more nested our aggregate structure, the more indices we can provide. E.g.
for element index 2 of Foo’s second field (the 4 element int array):
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 16/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Copy
(In terms of the corresponding API, we’d use CreateGEP and pass the array
{0,1,2} .)
mem2reg
If you remember, LLVM IR must be written in SSA form. But what happens if the
Bolt source program we’re trying to map to LLVM IR is not in SSA form?
reassign_var.bolt
Copy
let x = 1
x = x + 1
One option would be for us to rewrite the program in SSA form in an earlier
compiler stage. Every time we reassign a variable, we’d have to create a fresh
variable. We’d also have to introduce phi nodes for conditional statements.
For our example, this is straightforward, but in general this extra rewrite is a
pain we would rather not deal with.
assign_fresh_vars.bolt
Copy
We can use pointers to avoid assigning fresh variables. Note here we aren’t
reassigning the pointer x , just updating the value it pointed to. So this is valid
SSA.
Copy
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 17/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
let x = &1;
*x = *x + 1
reassign_var.ll
Copy
%x = alloca i32
store i32 1, i32* %x
Let’s revisit the LLVM IR if we were to rewrite the Bolt program to use fresh
variables. It’s only 2 instructions, compared to the 5 instructions needed if using
the stack. Moreover, we avoid the expensive load and store memory
access instructions.
assign_fresh_vars.ll
Copy
%x1 = 1
%x2 = add i32 %x1, 1 // let x2 = x1 + 1
So while we’ve made our lives easier as compiler writers by avoiding a rewrite-
to-SSA pass, this has come at the expense of performance.
How do we do this if the local variable declaration occurs midway through the
function, in another block? Let’s look at an example:
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 18/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
Copy
// translated to LLVM IR
%x = alloca i32
store i32 someVal, i32* %x
We can actually move the alloca . It doesn’t matter where we allocate the
stack space so long as it is allocated before use. So let’s write the alloca at
the very start of the parent function this local variable declaration occurs.
How do we do this in the API? Well, remember the analogy of the builder being
like a file pointer? We can have multiple file pointers pointing to different places
in the file. Likewise, we instantiate a new IRBuilder to point to the start of
the entry basic block of the parent function, and insert the alloca
instructions using that builder.
expr_codegen.cc
Copy
LLVM Optimisations
The API makes it really easy to add passes. We create a
functionPassManager , add the optimisation passes we’d like, and then
initialise the manager.
ir_codegen_visitor.cc
Copy
std::unique_ptr<legacy::FunctionPassManager> functionPassM
make_unique<legacy::FunctionPassManager>(module.get
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 19/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
functionPassManager->add(createPromoteMemoryToRegisterPa
// Do simple "peephole" optimizations
functionPassManager->add(createInstructionCombiningPass
// Reassociate expressions.
functionPassManager->add(createReassociatePass());
// Eliminate Common SubExpressions.
functionPassManager->add(createGVNPass());
// Simplify the control flow graph (deleting unreachable
functionPassManager->add(createCFGSimplificationPass());
functionPassManager->doInitialization();
ir_codegen_visitor.cc
Copy
In particular, let’s look at the the factorial LLVM IR output by our Bolt
compiler before and after. You can find them in the repo:
factorial-unoptimised.ll
Copy
then: ; preds
br label %ifcont
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 20/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
factorial-optimised.ll
Copy
else: ; preds
%sub = add i32 %0, -1
%1 = call i32 @factorial(i32 %sub)
%mult = mul i32 %1, %0
br label %ifcont
ifcont: ; preds
%iftmp = phi i32 [ %mult, %else ], [ 1, %entry ]
ret i32 %iftmp
}
Notice how we’ve actually got rid of the alloca and the associated load
and store instructions, and also removed the then basic block!
Wrap up
This last example shows you the power of LLVM and its optimisations. You can
find the top-level code that runs the LLVM code generation and optimisation in
the main.cc file in the Bolt repository.
In the next few posts we’ll be looked at some more advanced language features:
generics, inheritance and method overriding and concurrency! Stay tuned for
when they come out!
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 21/22
15/01/2024, 00:14 A Complete Guide to LLVM for Programming Language Creators
PS: I also share helpful tips and links as I'm learning - so you get them
well before they make their way into a post!
https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/ 22/22