llvm-demo
llvm-demo
to LLVM
Nick Sumner
[email protected]
What is LLVM?
● A compiler? (clang)
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
– A simple, typed IR (bitcode)
– Program analysis / optimization libraries
– Machine code generation libraries
– Tools that compose the libraries to perform tasks
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
– A simple, typed IR (bitcode)
– Program analysis / optimization libraries
– Machine code generation libraries
– Tools that compose the libraries to perform tasks
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
– A simple, typed IR (bitcode)
– Program analysis / optimization libraries
– Machine code generation libraries
– Tools that compose the libraries to perform tasks
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
– A simple, typed IR (bitcode)
– Program analysis / optimization libraries
– Machine code generation libraries
– Tools that compose the libraries to perform tasks
What is LLVM?
● A compiler? (clang)
● A set of formats, libraries, and tools.
– A simple, typed IR (bitcode)
– Program analysis / optimization libraries
– Machine code generation libraries
– Tools that compose the libraries to perform tasks
● Easy to add / remove / change functionality
How will you be using it?
● Compiling programs to bitcode:
clang -g -c -emit-llvm <sourcefile> -o <bitcode>.bc
How will you be using it?
● Compiling programs to bitcode:
clang -g -c -emit-llvm <sourcefile> -o <bitcode>.bc
● Analyzing the bitcode:
opt -load <plugin>.so --<plugin> -analyze <bitcode>.bc
How will you be using it?
● Compiling programs to bitcode:
clang -g -c -emit-llvm <sourcefile> -o <bitcode>.bc
● Analyzing the bitcode:
opt -load <plugin>.so --<plugin> -analyze <bitcode>.bc
● Writing your own tools:
./callcounter -static test.bc
How will you be using it?
● Compiling programs to bitcode:
clang -g -c -emit-llvm <sourcefile> -o <bitcode>.bc
● Analyzing the bitcode:
opt -load <plugin>.so --<plugin> -analyze <bitcode>.bc
● Writing your own tools:
./callcounter -static test.bc
● Reporting properties of the program:
Function Counts
===============
b : 2
a : 1
printf : 3
What is LLVM Bitcode?
● A (relatively) simple
intermediate representation (IR)
– It captures the
program dependence graph
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
printf("Hello\n"); IR
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
%6 = tail call i32 @puts(i8* getelementptr
}
}
Code ([6 x i8], [6 x i8]* @str, i64 0, i64 0))
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
%6 = tail call i32 @puts(i8* getelementptr
}
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
%6 = tail call i32 @puts(i8* getelementptr
}
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
Functions
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
%6 = tail call i32 @puts(i8* getelementptr
}
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
Basic Blocks
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
} labels & predecessors
%6 = tail call i32 @puts(i8* getelementptr
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
Basic Blocks
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
%6 = tail call i32 @puts(i8* getelementptr
}
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
return 0;
Basic Blocks
foo(argc);
branches & successors
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
What is LLVM Bitcode?
● A (relatively) simple @str = private constant [6 x i8] c"Hello\00"
intermediate representation (IR)
define void @foo(i32) {
– It captures the %2 = icmp eq i32 %0, 0
program dependence graph br i1 %2, label %3, label %4
#include<stdio.h>
; <label>:3: ; preds = %4, %1
ret void
void
foo(unsigned e) {
; <label>:4: ; preds = %1, %4
for (unsigned i = 0; i < e; ++i) {
%5 = phi i32 [ %7, %4 ], [ 0, %1 ]
printf("Hello\n");
%6 = tail call i32 @puts(i8* getelementptr
}
([6 x i8], [6 x i8]* @str, i64 0, i64 0))
}
%7 = add nuw i32 %5, 1
%8 = icmp eq i32 %7, %0
int
br i1 %8, label %3, label %4
main(int argc, char **argv) {
}
Instructions
foo(argc);
return 0;
define i32 @main(i32, i8** nocapture readnone) {
}
tail call void @foo(i32 %0)
clang -c -S -emit-llvm -O1 -g0 ret i32 0
}
Inspecting Bitcode
● LLVM libraries help examine the bitcode
– Easy to examine and/or manipulate
– Many helpers (e.g. CallBase, outs(), dyn_cast)
Inspecting Bitcode
● LLVM libraries help examine the bitcode
– Easy to examine and/or manipulate
– Many helpers (e.g. CallBase, outs(), dyn_cast)
Module& module = ...;
for (Function& fun : module) {
for (BasicBlock& bb : fun) {
for (Instruction& i : bb) {
● BasicBlocks in a Function
● Instructions in a BasicBlock
...
Inspecting Bitcode
● LLVM libraries help examine the bitcode
– Easy to examine and/or manipulate
– Many helpers (e.g. CallBase, outs(), dyn_cast)
Module& module = ...;
for (Function& fun : module) {
for (BasicBlock& bb : fun) {
for (Instruction& i : bb) {
CallBase* cb = dyn_cast<CallBase>(&i);
if (!cb) {
continue; dyn_cast() efficiently checks
}
the runtime types of LLVM IR components.
...
Inspecting Bitcode
● LLVM libraries help examine the bitcode
– Easy to examine and/or manipulate
– Many helpers (e.g. CallBase, outs(), dyn_cast)
Module& module = ...;
for (Function& fun : module) {
for (BasicBlock& bb : fun) {
for (Instruction& i : bb) {
CallBase* cb = dyn_cast<CallBase>(&i);
if (!cb) {
continue; dyn_cast() efficiently checks
}
the runtime types of LLVM IR components.
void foo()
unsigned i = 0;
while (i < 10) {
i = i + 1;
}
}
Static Single Assignment (SSA)
● Program dependence graphs help answer questions like:
– Where was a variable defined?
– Where is a particular value used?
● Compilers today help provide this using SSA form
– Each variable has a single definition,
so resolving dependencies is easier
void foo()
unsigned i = 0;
while (i < 10) {
i = i + 1; What is the single definition
}
} of i at this point?
Static Single Assignment (SSA)
● Program dependence graphs help answer questions like:
– Where was a variable defined?
– Where is a particular value used?
● Compilers today help provide this using SSA form
– Each variable has a single definition,
so resolving dependencies is easier
● Phi instructions select which incoming value to use among options
– Phi nodes must occur at the beginning of a basic block
Static Single Assignment (SSA)
define void @foo() {
br label %1
● Program dependence graphs help answer questions like:
void foo() {
– unsigned
Where was aivariable
= 0; defined? ; <label>:1 ; preds = %1, %0
%i.phi = phi i32 [ 0, %0 ], [ %2, %1 ]
while (i < 10) {
– Whereiis= aiparticular
+ 1; value used? %2 = add i32 %i.phi, 1
} %exitcond = icmp eq i32 %2, 10
} br i1 %exitcond, label %3, label %1
● Compilers today help provide this using SSA form
– ; <label>:3
Each variable has a single definition, ; preds = %1
ret void
so resolving dependencies is easier}
● You can loop over the instructions that use a particular value
Instruction* inst = ...;
for (User* user : inst->users())
if (auto* i = dyn_cast<Instruction>(user)) {
// inst is used by Instruction i
}
Dealing with Types
● LLVM IR is strongly typed
– Every value has a type → getType()
● A value must be explicitly cast to a new type
define i64 @trunc(i16 zeroext %a) {
%1 = zext i16 %a to i64
ret i64 %1
}
Dealing with Types
● LLVM IR is strongly typed
– Every value has a type → getType()
● A value must be explicitly cast to a new type
define i64 @trunc(i16 zeroext %a) {
%1 = zext i16 %a to i64
ret i64 %1
}
Dealing with Types
● LLVM IR is strongly typed
– Every value has a type → getType()
● A value must be explicitly cast to a new type
define i64 @trunc(i16 zeroext %a) {
%1 = zext i16 %a to i64
ret i64 %1
}
● Also types for pointers, arrays, structs, etc.
– Strong typing means they take a bit more work
Dealing with Types: GEP
● We sometimes need to extract elements/fields from arrays/structs
– Pointer arithmetic struct rec {
– Done using GetElementPointer (GEP) int x;
int y;
};
BasicBlock.h InstrTypes.h
DerivedTypes.h IRBuilder.h
Function.h Support/InstVisitor.h
Instructions.h Type.h
Creating a
Static Analysis
Making a new analysis
● Analyses are organized into individual passes
– ModulePass
– FunctionPass Derive from the appropriate
– LoopPass base class to make a Pass
– …
Making a new analysis
● Analyses are organized into individual passes
– ModulePass
– FunctionPass Derive from the appropriate
– LoopPass base class to make a Pass
– …
3 Steps
1) Declare your pass
2) Register your pass
3) Define your pass
Making a new analysis
● Analyses are organized into individual passes
– ModulePass
– FunctionPass Derive from the appropriate
– LoopPass base class to make a Pass
– …
3 Steps
1) Declare your pass
2) Register your pass
Let's count the number of
3) Define your pass
static direct calls to each
function.
Making a ModulePass (1)
● Declare your ModulePass
struct StaticCallCounter : public llvm::ModulePass {
StaticCallCounter()
: ModulePass(ID)
{ }
StaticCallCounter()
: ModulePass(ID)
{ }
StaticCallCounter()
: ModulePass(ID)
{ }
RegisterPass<StaticCallCounter> SCCReg("callcounter",
"Print the static count of direct calls");
Making a ModulePass (3)
● Define your ModulePass
– Need to override runOnModule() and print()
bool
StaticCallCounter::runOnModule(Module& m) {
for (auto& f : m)
for (auto& bb : f)
for (auto& i : bb)
if (CallBase *cb = dyn_cast<CallBase>(&i)) {
handleInstruction(CallSite{&i});
}
return false; // False because we didn't change the Module
}
Making a ModulePass (3)
● Define your ModulePass
– Need to override runOnModule() and print()
bool
StaticCallCounter::runOnModule(Module& m) {
for (auto& f : m)
for (auto& bb : f)
for (auto& i : bb)
if (CallBase *cb = dyn_cast<CallBase>(&i)) {
handleInstruction(CallSite{&i});
}
return false; // False because we didn't change the Module
}
Making a ModulePass (3)
● Analysis continued...
void
StaticCallCounter::handleInstruction(CallBase* cb) {
// Check whether the called function is directly invoked
auto called = cb.getCalledOperand()->stripPointerCasts();
auto fun = dyn_cast<Function>(called);
if (!fun) { return; }
void
CallCounterPass::print(raw_ostream& out, const Module* m) const {
out << "Function Counts\n"
<< "===============\n";
for (auto& kvPair : counts) {
auto* function = kvPair.first;
uint64_t count = kvPair.second;
out << function->getName() << " : " << count << "\n";
}
}
Creating a
Dynamic Analysis
Making a Dynamic Analysis
● We have counted the static direct calls to each function.
● How might we count all dynamic calls to each function?
Making a Dynamic Analysis
● We have counted the static direct calls to each function.
● How might we count all dynamic calls to each function?
● Need to modify the original program!
Making a Dynamic Analysis
● We have counted the static direct calls to each function.
● How might we count all dynamic calls to each function?
● Need to modify the original program!
● Steps:
1) Modify the program using passes
2) Compile the modified version
3) Run the new program
Modifying the Original Program
● Goal: Count the dynamic calls to each function in an execution.
– So how do we want to modify the program?
?
void foo()
bar();
}
void
DynamicCallCounter::handleInstruction(CallBase& cb, Value* counter) {
// Check whether the called function is directly invoked
auto calledValue = cb.getCalledOperation()->stripPointerCasts();
auto calledFunction = dyn_cast<Function>(calledValue);
if (!calledFunction) {
return;
}
void
DynamicCallCounter::handleInstruction(CallBase& cb, Value* counter) {
// Check whether the called function is directly invoked
auto calledValue = cb.getCalledOperation()->stripPointerCasts();
auto calledFunction = dyn_cast<Function>(calledValue);
if (!calledFunction) {
return;
}
void
DynamicCallCounter::handleInstruction(CallBase& cb, Value* counter) {
// Check whether the called function is directly invoked
auto calledValue = cb.getCalledOperation()->stripPointerCasts();
auto calledFunction = dyn_cast<Function>(calledValue);
if (!calledFunction) {
return;
}
Program/Module
Analysis Tool
Instrumentation Pass
Runtime
Library Compilation
Runtime
Library Compilation
Runtime
Library Compilation
Runtime
Library Compilation