ASPLOS19-LLVM-Tutorial
ASPLOS19-LLVM-Tutorial
!1
Introduction
!2
History of LLVM
❖ Developed by Chris Lattner and Vikram Adve at the
University of Illinois at Urbana-Champaign (UIUC)
❖ Released open-source in October 2003
❖ Default compiler for Mac OS X, iOS, and FreeBSD
❖ Used by many companies and research groups
❖ Contributions by many people!
!3
LLVM is a compiler infrastructure!
!4
Tools Built Using LLVM
!5
Tools Built Using LLVM
Co m p i l e r s !
!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!
!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!
i fic a t i o n !
m a l Ve r
For
!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!
Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For
!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!
Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For
B u g Fin d in g To o ls!
!5
Tools Built Using LLVM
Co m p i l e r s !
JITs!
Secur
ity Ha
rdenin
g Too c a t i o n !
ls! m a l Ve r i fi
For
Profiling Tools!
B u g Fin d in g To o ls!
!5
Things to Do in the Compiler Zoo
❖ Add a security check to every load and store
❖ Create a memory access trace
❖ Check pointer arithmetic on certain types of variables
❖ Trace atomic modifications to a memory location
❖ Change order of local variables in stack frame
!6
What do you want to do with
LLVM?
!7
LLVM Source Code Structure
❖ LLVM is primarily a set of libraries
❖ We use the libraries to create LLVM-based tools
!8
Programming Background
❖ C++
❖ Other language bindings exist, but C++ is “native”
❖ Know how to use classes, pointers, and references
❖ Know how to use C++ iterators
❖ Know how to use Standard Template Library (STL)
!9
Helpful Documents
❖ LLVM Language Reference
Manual
❖ LLVM Programmer’s Manual
❖ How to Write an LLVM Pass
❖ Online LLVM Doxygen
documents
!10
Getting Involved with LLVM
❖ Research on program analysis (NSF REUs)
❖ Google Summer of Code projects
❖ Apple, Samsung, Google, Facebook build LLVM tools
❖ LLVM Developer’s Meeting
❖ One in California; one in Europe
❖ Can present talks, posters, BoFs, etc.
!11
Please fill out feedback form:
https://forms.gle/ib3Ng6osSFqNoQGD7
!12
LLVM Compiler Structure
!13
Ahead of Time (AOT) Compiler
Front End
Optimizer
Code Generator
!14
Front End Structure
Source Clang Clang
Code Parser AST
!15
Optimizer Structure
LLVM LLVM LLVM
Opt 1 Opt 2
IR IR IR
Code Machine
Gen IR
!16
Code Generator Structure
Machine Register Machine Instruction Machine
IR Allocation IR Scheduling IR
Code Native
Emitter Code
!17
LLVM Toolchain Overview
Intermediate Representation Description
!18
LLVM Toolchain Overview
Intermediate Representation Description
!18
LLVM
Intermediate Representation
!19
LLVM IR is a language into which programs
are translated for analysis and transformation
(optimization)
!20
LLVM IR Forms
❖ LLVM Assembly Language
❖ Text form saved on disk for humans to read
❖ LLVM Bitcode
❖ Binary form saved on disk for programs to read
❖ LLVM In-Memory IR
❖ Data structures used for analysis and optimization
!21
LLVM Assembly Language
define i32 @foo(i32, i32) local_unnamed_addr #0 {
%3 = tail call i32 @bar(i32 %0) #2
%4 = add nsw i32 %1, %0
%5 = sub i32 %4, %3
ret i32 %5
}
!22
Overview of LLVM IR
❖ Each assembly/bitcode file is a Module
❖ Each Module is comprised of
❖ Global variables
❖ A set of Functions which are comprised of
❖ A set of basic blocks which are comprised of
❖ A set of instructions
!23
LLVM Bitcode File
Module
Global int[20];
add add
ret ret
!24
LLVM Module with One Function
define i32 @foo(i32, i32) local_unnamed_addr #0 {
%5 = tail call i32 @bar(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0)) #2
br label %8
%7 = tail call i32 @bar(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str.1, i64 0, i64 0)) #2
br label %8
!25
LLVM Instruction Set
❖ RISC-like architecture
❖ Virtual registers in SSA form
❖ Load/store instructions to read/write memory
objects
❖ All other instructions read or write virtual registers
!26
LLVM Memory Objects
❖ Global Variables
❖ Memory allocated on the stack
❖ Memory allocated on the heap
!27
Instructions for Computation
❖ Arithmetic and binary operators
❖ Two’s complement arithmetic (add, sub, multiply, etc)
❖ Bit-shifting and bit-masking
❖ Pointer arithmetic (getelementptr or “GEP”)
❖ Comparison instructions (icmp, fcmp)
❖ Generates a boolean result
!28
Memory Access Instructions
❖ Load instruction reads memory
❖ Store instruction writes to memory
❖ Atomic compare and exchange
❖ Atomic read/modify/write
!29
Control Flow Instructions
❖ Terminator instructions
❖ Indicate which basic block to jump to next
❖ conditional branch, unconditional branch, switch
❖ Return instruction to return to caller
❖ Unwind instruction for exception handling
❖ Call instruction calls a function
❖ It can occur in the middle of a basic block
!30
Memory Allocation Instructions
❖ Stack allocation (alloca)
❖ Allocates memory on the stack
❖ Calls to heap-allocation functions (e.g., malloc())
❖ Not a special instruction; just uses a call instruction
❖ Global variable declarations
❖ Not really instructions, but allocate memory
❖ All globals are pointers to memory objects
!31
Single Static Assignment (SSA)
• Each function has infinite set of virtual registers
• Only one instruction assigns a value to a virtual register
(called the definition of the register)
• An instruction and the register it defines are
synonymous
%z = add %x, %y
32
The Almighty Phi Node!
y=5;
x=y+1; x=y+2;
z z==x+3;
x;
!33
The Almighty Phi Node!
y=5; y=5;
x3=phi(x1,x2);
z z==x+3;
x;
z=x3+3;
!33
Domination
❖ The definition of a virtual A
register must dominate all of
its uses
❖ Except uses by phi-nodes
B C
❖ A dominates B, C, and D
!34
Writing an LLVM Pass
!35
LLVM Passes: Separation of Concerns
❖ Break optimizer into passes
❖ Each pass performs one analysis or one transformation
!36
LLVM Passes
Optimizer
!37
Two Types of Passes
❖ Passes that analyze code
❖ Does not modify the program
❖ Provides information “out of band” to other passes
❖ Passes that transform code
❖ Make modifications to the code
!38
LLVM Passes
Dominator Tree Data
!39
LLVM IR Pass Types
❖ ModulePass
❖ FunctionPass
❖ BasicBlockPass
❖ I recommend ignoring “funny” passes
❖ LoopPass
❖ RegionPass
!40
Rules for LLVM Passes
❖ Only modify values and instructions at scope of pass
❖ ModulePass can modify anything
❖ FunctionPass should not modify anything outside of
the function
❖ BasicBlockPass should not modify anything outside
of the basic block
!41
Important Pass Methods: getAnalysisUsage()
!42
runOnModule()
❖ Entry point for ModulePass
❖ Passes a reference to the Module
❖ Can locate functions, basic blocks, globals from Module
❖ Return true if the pass modifies the program
❖ An analysis pass always returns false.
❖ A transform pass can return either true or false.
!43
runOnFunction()
❖ Called for each function in the Module
❖ Passed reference to Function
❖ Return false for no modifications; true for modifications
!44
runOnBasicBlock()
❖ You get the idea…
!45
MyPass.h Example
class MyPass : public ModulePass {
private:
unsigned int analyzeThis (Instruction *I);
public:
static char ID;
MyPass() : ModulePass(ID) {}
const char *getPassName() const { return “My LLVM Pass"; }
virtual bool runOnModule (Module & M);
virtual void getAnalysisUsage(AnalysisUsage &AU) const {
// We require Dominator information
AU.addRequired<DominatorTree>();
}
unsigned int getAnalysisResultFor (Instruction *I);
};
!46
MyPass.cpp Example
static RegisterPass<MyPass> P (“mypass”, “My First LLVM Analysis”);
bool
MyPass::runOnModule (Module & M) {
//
// Iterate over all instructions within a Module
//
for (Module::iterator fi = M.begin(); fi != M.end(); ++fi) {
for (Function::iterator bi = fi->begin(); bi != fi->end(); ++bi) {
for (BasicBlock::iterator it = bi->begin(); it != bi->end; ++it) {
Instruction * I = *it;
}
}
}
}
!47
In-Memory LLVM IR
!48
LLVM Classes
❖ There is a class for each type of IR object
❖ Module class
❖ Function class
❖ BasicBlock class
❖ Instruction class
❖ Classes provide iterators for objects within them
!49
LLVM In-Memory IR
Module
Global int[20];
Global char[16];
Function
Function BasicBlock add sub br
Function BasicBlock add mult br
BasicBlock
add ret
!50
Class Iterators
❖ Each class provides iterators for items it contains
❖ Module::iterator iterates over functions
❖ Function::iterator iterates over basic blocks
❖ BasicBlock::iterator iterates over instructions
!51
Iterator Example
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock * BB = …;
BasicBlock::iterator it;
BasicBlock::iterator ie;
!52
MyPass.cpp Example (Reprise)
static RegisterPass<MyPass> P (“mypass”, “My First LLVM Analysis”);
bool
MyPass::runOnModule (Module & M) {
//
// Iterate over all instructions within a Module
//
for (Module::iterator fi = M.begin(); fi != M.end(); ++fi) {
for (Function::iterator bi = fi->begin(); bi != fi->end(); ++bi) {
for (BasicBlock::iterator it = bi->begin(); it != bi->end; ++it) {
Instruction * I = *it;
}
}
}
}
!53
LLVM Class Hierarchy
❖ Anything that is an SSA value is a subclass of Value
❖ All Instruction classes are a subclass of Instruction
❖ Similar instructions share a common superclass
!54
Simplified LLVM Class Hierarchy
Value
Instruction
TerminatorInst
!55
Locating Branch Instructions
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock::iterator it;
BasicBlock::iterator ie;
!56
Casting to Subclass in LLVM
!57
Locating Branch Instructions
//
// Iterate over all instructions within a BasicBlock
//
BasicBlock::iterator it;
BasicBlock::iterator ie;
if (BranchInst * BI = dyn_cast<BranchInst>(I)) {
!58
LLVM Instruction Classes
❖ BinaryOperator - add, sub, mult, shift, and, or, etc.
❖ GetElementPointerInst
❖ LoadInst, Storeinst
❖ BranchInst, SwitchInst, RetInst
❖ CallInst
❖ CastInst
!59
LLVM Class Methods
❖ Each class has methods to get information on value
❖ BranchInst - Iterator over successor basic blocks
❖ StoreInst - Get pointer operands of store instruction
❖ GetElementPtrInst - Get indices used as operands
❖ Instruction - Get containing basic block
❖ Method might belong to a superclass
!60
Beyond the Tutorial
!61
Data Flow Analysis
❖ The Dragon Book, Fourth Edition
❖ Papers on SSA-based algorithms
❖ Kam-Ullman paper on iterative data-flow analysis
!62
Getting Involved with LLVM
❖ Research on program analysis (NSF REUs)
❖ Google Summer of Code projects
❖ Apple, Samsung, Google, Facebook build LLVM tools
❖ LLVM Developer’s Meeting
❖ One in California; one in Europe
❖ Can present talks, posters, BoFs, etc.
!63