0% found this document useful (0 votes)
4 views71 pages

ASPLOS19-LLVM-Tutorial

The document is a tutorial on LLVM, detailing its history, structure, and components, including its role as a compiler infrastructure. It covers various tools built using LLVM, programming background required, and how to get involved with LLVM development. Additionally, it explains LLVM's Intermediate Representation (IR), instruction set, and the process of writing LLVM passes for code analysis and transformation.

Uploaded by

pre4508326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views71 pages

ASPLOS19-LLVM-Tutorial

The document is a tutorial on LLVM, detailing its history, structure, and components, including its role as a compiler infrastructure. It covers various tools built using LLVM, programming background required, and how to get involved with LLVM development. Additionally, it explains LLVM's Intermediate Representation (IR), instruction set, and the process of writing LLVM passes for code analysis and transformation.

Uploaded by

pre4508326
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Meliora!

LLVM Tutorial John Criswell


University of Rochester

!1
Introduction

!2
History of LLVM
❖ Developed by Chris Lattner and Vikram Adve at the
University of Illinois at Urbana-Champaign (UIUC)
❖ Released open-source in October 2003
❖ Default compiler for Mac OS X, iOS, and FreeBSD
❖ Used by many companies and research groups
❖ Contributions by many people!

!3
LLVM is a compiler infrastructure!

!4
Tools Built Using LLVM

!5
Tools Built Using LLVM
Co m p i l e r s !

!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!

!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!

i fic a t i o n !
m a l Ve r
For

!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!

Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For

!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!

Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For

B u g Fin d in g To o ls!

!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!

Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For

Profiling Tools!
B u g Fin d in g To o ls!

!5
Things to Do in the Compiler Zoo
❖ Add a security check to every load and store
❖ Create a memory access trace
❖ Check pointer arithmetic on certain types of variables
❖ Trace atomic modifications to a memory location
❖ Change order of local variables in stack frame

!6
What do you want to do with
LLVM?

!7
LLVM Source Code Structure
❖ LLVM is primarily a set of libraries
❖ We use the libraries to create LLVM-based tools

!8
Programming Background
❖ C++
❖ Other language bindings exist, but C++ is “native”
❖ Know how to use classes, pointers, and references
❖ Know how to use C++ iterators
❖ Know how to use Standard Template Library (STL)

!9
Helpful Documents
❖ LLVM Language Reference
Manual
❖ LLVM Programmer’s Manual
❖ How to Write an LLVM Pass
❖ Online LLVM Doxygen
documents

!10
Getting Involved with LLVM
❖ Research on program analysis (NSF REUs)
❖ Google Summer of Code projects
❖ Apple, Samsung, Google, Facebook build LLVM tools
❖ LLVM Developer’s Meeting
❖ One in California; one in Europe
❖ Can present talks, posters, BoFs, etc.

!11
Please fill out feedback form:

https://forms.gle/ib3Ng6osSFqNoQGD7

!12
LLVM Compiler Structure

!13
Ahead of Time (AOT) Compiler

Front End

Optimizer

Code Generator

!14
Front End Structure
Source Clang Clang
Code Parser AST

Clang Clang Clang Clang Clang LLVM


Optimizer AST Optimizer AST CodeGen IR

!15
Optimizer Structure
LLVM LLVM LLVM
Opt 1 Opt 2
IR IR IR

Code Machine
Gen IR

!16
Code Generator Structure
Machine Register Machine Instruction Machine
IR Allocation IR Scheduling IR

Code Native
Emitter Code

!17
LLVM Toolchain Overview
Intermediate Representation Description

Describes structure of source code


Clang AST (if-statements, while-loops)

LLVM IR Architecture independent code in SSA form

Machine IR Native code


(machine registers; native code instructions)

!18
LLVM Toolchain Overview
Intermediate Representation Description

Describes structure of source code


Clang AST (if-statements, while-loops)

LLVM IR Architecture independent code in SSA form

Machine IR Native code


(machine registers; native code instructions)

!18
LLVM
Intermediate Representation

!19
LLVM IR is a language into which programs
are translated for analysis and transformation
(optimization)

!20
LLVM IR Forms
❖ LLVM Assembly Language
❖ Text form saved on disk for humans to read
❖ LLVM Bitcode
❖ Binary form saved on disk for programs to read
❖ LLVM In-Memory IR
❖ Data structures used for analysis and optimization

!21
LLVM Assembly Language
define i32 @foo(i32, i32) local_unnamed_addr #0 {
%3 = tail call i32 @bar(i32 %0) #2
%4 = add nsw i32 %1, %0
%5 = sub i32 %4, %3
ret i32 %5
}

declare i32 @bar(i32) local_unnamed_addr #1

!22
Overview of LLVM IR
❖ Each assembly/bitcode file is a Module
❖ Each Module is comprised of
❖ Global variables
❖ A set of Functions which are comprised of
❖ A set of basic blocks which are comprised of
❖ A set of instructions

!23
LLVM Bitcode File
Module
Global int[20];

Function: foo() Function: bar()


add add add add
sub mult div sub
br br br br

add add
ret ret

!24
LLVM Module with One Function
define i32 @foo(i32, i32) local_unnamed_addr #0 {

%3 = icmp ult i32 %0, %1

br i1 %3, label %4, label %6

%5 = tail call i32 @bar(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0)) #2

br label %8

%7 = tail call i32 @bar(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str.1, i64 0, i64 0)) #2

br label %8

%9 = add i32 %1, %0


ret i32 %9

!25
LLVM Instruction Set
❖ RISC-like architecture
❖ Virtual registers in SSA form
❖ Load/store instructions to read/write memory
objects
❖ All other instructions read or write virtual registers

!26
LLVM Memory Objects
❖ Global Variables
❖ Memory allocated on the stack
❖ Memory allocated on the heap

!27
Instructions for Computation
❖ Arithmetic and binary operators
❖ Two’s complement arithmetic (add, sub, multiply, etc)
❖ Bit-shifting and bit-masking
❖ Pointer arithmetic (getelementptr or “GEP”)
❖ Comparison instructions (icmp, fcmp)
❖ Generates a boolean result

!28
Memory Access Instructions
❖ Load instruction reads memory
❖ Store instruction writes to memory
❖ Atomic compare and exchange
❖ Atomic read/modify/write

!29
Control Flow Instructions
❖ Terminator instructions
❖ Indicate which basic block to jump to next
❖ conditional branch, unconditional branch, switch
❖ Return instruction to return to caller
❖ Unwind instruction for exception handling
❖ Call instruction calls a function
❖ It can occur in the middle of a basic block

!30
Memory Allocation Instructions
❖ Stack allocation (alloca)
❖ Allocates memory on the stack
❖ Calls to heap-allocation functions (e.g., malloc())
❖ Not a special instruction; just uses a call instruction
❖ Global variable declarations
❖ Not really instructions, but allocate memory
❖ All globals are pointers to memory objects

!31
Single Static Assignment (SSA)
• Each function has infinite set of virtual registers
• Only one instruction assigns a value to a virtual register
(called the definition of the register)
• An instruction and the register it defines are
synonymous

%z = add %x, %y

32
The Almighty Phi Node!
y=5;

x=y+1; x=y+2;

z z==x+3;
x;

!33
The Almighty Phi Node!
y=5; y=5;

x=y+1; x=y+2; x1=y+1; x2=y+2;

x3=phi(x1,x2);
z z==x+3;
x;
z=x3+3;

!33
Domination
❖ The definition of a virtual A
register must dominate all of
its uses
❖ Except uses by phi-nodes
B C
❖ A dominates B, C, and D

!34
Writing an LLVM Pass

!35
LLVM Passes: Separation of Concerns
❖ Break optimizer into passes
❖ Each pass performs one analysis or one transformation

!36
LLVM Passes
Optimizer

LLVM Pass LLVM Pass LLVM


IR 1 IR 2 IR

!37
Two Types of Passes
❖ Passes that analyze code
❖ Does not modify the program
❖ Provides information “out of band” to other passes
❖ Passes that transform code
❖ Make modifications to the code

!38
LLVM Passes
Dominator Tree Data

LLVM Dom LLVM LLVM Mem2 LLVM


DGE
IR Tree IR IR Reg IR

!39
LLVM IR Pass Types
❖ ModulePass
❖ FunctionPass
❖ BasicBlockPass
❖ I recommend ignoring “funny” passes
❖ LoopPass
❖ RegionPass

!40
Rules for LLVM Passes
❖ Only modify values and instructions at scope of pass
❖ ModulePass can modify anything
❖ FunctionPass should not modify anything outside of
the function
❖ BasicBlockPass should not modify anything outside
of the basic block

!41
Important Pass Methods: getAnalysisUsage()

❖ Tells PassManager which analysis passes you need


❖ PassManager will schedule analysis passes for you
❖ Cannot schedule transform passes this way
❖ Tells PassManager which analysis results are valid after
a transformation
❖ Avoids re-running expensive analysis passes

!42
runOnModule()
❖ Entry point for ModulePass
❖ Passes a reference to the Module
❖ Can locate functions, basic blocks, globals from Module
❖ Return true if the pass modifies the program
❖ An analysis pass always returns false.
❖ A transform pass can return either true or false.

!43
runOnFunction()
❖ Called for each function in the Module
❖ Passed reference to Function
❖ Return false for no modifications; true for modifications

!44
runOnBasicBlock()
❖ You get the idea…

!45
MyPass.h Example
class MyPass : public ModulePass {
private:
unsigned int analyzeThis (Instruction *I);

public:
static char ID;
MyPass() : ModulePass(ID) {}
const char *getPassName() const { return “My LLVM Pass"; }
virtual bool runOnModule (Module & M);
virtual void getAnalysisUsage(AnalysisUsage &AU) const {
// We require Dominator information
AU.addRequired<DominatorTree>();
}
unsigned int getAnalysisResultFor (Instruction *I);
};

!46
MyPass.cpp Example
static RegisterPass<MyPass> P (“mypass”, “My First LLVM Analysis”);

bool
MyPass::runOnModule (Module & M) {
//
// Iterate over all instructions within a Module
//
for (Module::iterator fi = M.begin(); fi != M.end(); ++fi) {
for (Function::iterator bi = fi->begin(); bi != fi->end(); ++bi) {
for (BasicBlock::iterator it = bi->begin(); it != bi->end; ++it) {
Instruction * I = *it;
}
}
}
}

!47
In-Memory LLVM IR

!48
LLVM Classes
❖ There is a class for each type of IR object
❖ Module class
❖ Function class
❖ BasicBlock class
❖ Instruction class
❖ Classes provide iterators for objects within them

!49
LLVM In-Memory IR
Module
Global int[20];
Global char[16];

Function
Function BasicBlock add sub br
Function BasicBlock add mult br
BasicBlock
add ret

!50
Class Iterators
❖ Each class provides iterators for items it contains
❖ Module::iterator iterates over functions
❖ Function::iterator iterates over basic blocks
❖ BasicBlock::iterator iterates over instructions

!51
Iterator Example
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock * BB = …;
BasicBlock::iterator it;
BasicBlock::iterator ie;

for (it = BB->begin(), end = BB->end(); it != end; ++it) {


Instruction * I = *it;
};

!52
MyPass.cpp Example (Reprise)
static RegisterPass<MyPass> P (“mypass”, “My First LLVM Analysis”);

bool
MyPass::runOnModule (Module & M) {
//
// Iterate over all instructions within a Module
//
for (Module::iterator fi = M.begin(); fi != M.end(); ++fi) {
for (Function::iterator bi = fi->begin(); bi != fi->end(); ++bi) {
for (BasicBlock::iterator it = bi->begin(); it != bi->end; ++it) {
Instruction * I = *it;
}
}
}
}

!53
LLVM Class Hierarchy
❖ Anything that is an SSA value is a subclass of Value
❖ All Instruction classes are a subclass of Instruction
❖ Similar instructions share a common superclass

!54
Simplified LLVM Class Hierarchy
Value

Instruction

TerminatorInst

BranchInst SwitchInst RetInst

!55
Locating Branch Instructions
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock::iterator it;
BasicBlock::iterator ie;

for (it = BB->begin(), end = BB->end(); it != end; ++it) {


Instruction * I = *it;
if (BranchInst * BI = dyn_cast<BranchInst>(I)) {
// Do something with branch instruction BI
}
}

!56
Casting to Subclass in LLVM

Casting Function Description Example


Return true or false if value is of
isa<Class>() isa<BranchInst>(V)
that class.
Returns pointer to object of type
dyn_cast<Class>() dyn_cast<BranchInst>(V)
Class or NULL

!57
Locating Branch Instructions
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock::iterator it;
BasicBlock::iterator ie;

for (it = BB->begin(), end = BB->end(); it != end; ++it) {


Instruction * I = *it;

if (BranchInst * BI = dyn_cast<BranchInst>(I)) {

// Do something with branch instruction BI


}
}

!58
LLVM Instruction Classes
❖ BinaryOperator - add, sub, mult, shift, and, or, etc.
❖ GetElementPointerInst
❖ LoadInst, Storeinst
❖ BranchInst, SwitchInst, RetInst
❖ CallInst
❖ CastInst

!59
LLVM Class Methods
❖ Each class has methods to get information on value
❖ BranchInst - Iterator over successor basic blocks
❖ StoreInst - Get pointer operands of store instruction
❖ GetElementPtrInst - Get indices used as operands
❖ Instruction - Get containing basic block
❖ Method might belong to a superclass

!60
Beyond the Tutorial

!61
Data Flow Analysis
❖ The Dragon Book, Fourth Edition
❖ Papers on SSA-based algorithms
❖ Kam-Ullman paper on iterative data-flow analysis

!62
Getting Involved with LLVM
❖ Research on program analysis (NSF REUs)
❖ Google Summer of Code projects
❖ Apple, Samsung, Google, Facebook build LLVM tools
❖ LLVM Developer’s Meeting
❖ One in California; one in Europe
❖ Can present talks, posters, BoFs, etc.

!63

You might also like