SVFIR and Graph Representation of Code
Yulei Sui
University of Technology Sydney, Australia
1
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.
2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.
• Value-Flow Analysis: resolves both control and data dependence.
• Does the information generated at program point A flow to another program point B along some
execution paths?
• Can function F be called either directly or indirectly from some other function F ′ ?
• Is there an unsafe memory access that may trigger a bug or security risk?
2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.
• Value-Flow Analysis: resolves both control and data dependence.
• Does the information generated at program point A flow to another program point B along some
execution paths?
• Can function F be called either directly or indirectly from some other function F ′ ?
• Is there an unsafe memory access that may trigger a bug or security risk?
• Key features of SVF
• Sparse: compute and maintain the data-flow facts where necessary
• Selective : support mixed analyses for precision and efficiency trade-offs.
• On-demand : reason about program parts based on user queries.
2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF: Design Principle
Programs (e.g., C/C++)
Analysis developers
SVF-Frontend (Memory Model)
Generating Graphs (PAG, ICFG, Call Graph, VFG, Constraint Graph)
write
Select graph(s)
Select solver(s)
Analysis Instances Instantiating Solvers (DFS, IncDFS, LCD, HCD)
Graph Solver Templates
• Serving as an open-source foundation for building practical static source code analysis
• Bridge the gap between research and engineering
• Minimize the efforts of implementing sophisticated analysis (extendable, reusable, and robust via layers of abstractions)
• Support developing different analysis variants (flow-, context-, heap-, field-sensitive analysis) in a sparse and on-demand manner.
• Client applications:
• Static bug detection (e.g., memory leaks, null dereferences, use-after-frees and data-races)
• Accelerate dynamic analysis (e.g., Google’s Sanitizers and AFL fuzzing)
3
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.
4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.
• SVFVar: program variables
SVF!"SVFVar
SVF!"ValVar SVF!"ObjVar
4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.
• SVFVar: program variables
SVF!"SVFVar
SVF!"ValVar SVF!"ObjVar
• SVFStmt: program statements
SVF!"SVFStmt
SVF!"AssignStmt SVF!"MultiOpndStmt
SVF!"UnaryOPStmt SVF!"BranchStmt
SVF!"AddrStmt SVF!"RetPE SVF!"PhiStmt SVF!"BinaryOpStmt
SVF!"GepStmt SVF!"CallPE
SVF!"SelectStmt SVF!"CmpStmt
SVF!"CopyStmt SVF!"StoreStmt SVF!"LoadStmt
4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Variables (SVFVar)
• An SVFVar represent either a top-level variable (P) or a memory object
variable (O)
• Each SVFVar has a unique identifier (ID)
• SVFVar ID 0-4 are reserved
Program Variables Domain Meanings
SVFVar V=P∪O Program Variables
ValVar P Top-level variables (scalars and pointers)
ObjVar O=S∪G∪H∪C Memory Objects (stack, global1 , heap and constant data)
FIObjVar o ∈ (S ∪ G ∪ H) A single (base) memory object
GepObjVar oi ∈ (S ∪ G ∪ H) × P i-th subfield/element of an (aggregate) object
ConstantData C Constant data (e.g., numbers and strings)
Program Statement l∈L Statements labels
1
Function objects are considered as global objects
5
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Statements (SVFStmt)
An SVFStmt is one of the following program statements representing the relations
between SVFVars.
SVFStmt LLVM-Like form C-Like form Operand types
AddrStmt %ptr = alloca or constantData p = alloc or p = c P × (O ∪ C)
CopyStmt %p = bitcast %q p=q P×P
LoadStmt %p = load %q p = ∗q P×P
StoreStmt store %p, %q ∗p = q P×P
GepStmt %p = getelementptr %q, %i p = &(q → i) or p = &q[i] P×P×P
PhiStmt %p = phi [ l1 , %q1 ], [ l2 , %q2 ] p = phi(l1 : q1 , l2 : q2 ) P × (L → P2 )
BranchStmt br i1 %p, label %l1 , label %l2 if (p) l1 else l2 P × L2
UnaryOPStmt p = ¬q p = ¬q P×P
BinaryOPStmt/CmpStmt r = ⊗ p, q r=p⊗q P×P×P
%r = call f(. . . %qi . . . ) r = f(. . . , qi , . . . )
f(. . . %pi . . . ){ . . . ret %z} f(. . . , pi , . . . ){. . . return z}
CallPE %pi = %qi (1 < i < n) pi = qi (1 < i < n) (P × P)n
RetPE %r = %z r=z P×P
⊗ ∈ {+, -, ∗, /, %, <<, >>, <, >, &, &&, <=,>=, ≡, ∼, |, ∧ }
6
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Statements (SVFStmt)
• SVFStmt follows the LLVM’s SSA form for top-level variables
• Top-level variables (P) can only be defined once
• Memory objects (i.e., S ∪ G ∪ H excluding constant data) can only be
modified/read through top-level pointers at StoreStmt and LoadStmt.
• For example, p = &a; *p = r; The value of a can only be modified/read via
dereferencing p.
• A ConstantData (C) object needs first to be assigned to a temp top-level
variable and can only be read through that top-level variable in any SVFStmt.
• For example, *p = 3; ⇒ t = 3; *p = t;
• CallPE represents the parameter passing from an actual parameter at a
callsite to a formal parameter of a callee function.
• RetPE represents the parameter passing from a function return to a callsite
return variable.
7
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Graph Representation of Code
• What is a graph representation of code (code graph)?
• Put the LLVM IR or SVF IR on a graph representation.
• Represent a program’s control-flow (i.e., execution order) and/or data-flow
(variable definition and use relations) using nodes and edges of a graph.
8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Graph Representation of Code
• What is a graph representation of code (code graph)?
• Put the LLVM IR or SVF IR on a graph representation.
• Represent a program’s control-flow (i.e., execution order) and/or data-flow
(variable definition and use relations) using nodes and edges of a graph.
• Why a graph representation?
• Abstracting code from low-level complicated instructions
• Applying general graph algorithms
• Easy to maintain and extend
8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Call Graph
• Program calling relations between methods
• Whether a method A can call method B directly or transitively.
define i32 @main() #0 {
1 entry:
2 %a1 = alloca i8, align 1 Program calling relations between methods
3 %b1 = alloca i8, align 1
4 %a = alloca i8*, align 8
5 %b = alloca i8*, align 8
6 store i8* %a1, i8** %a, align 8
7 store i8* %b1, i8** %b, align 8
8 call void @swap(i8** %a, i8** %b)
9 ret i32 0
}
define void @swap(i8** %p, i8** %q) #0
{
10 entry:
main swap
11 %0 = load i8** %p, align 8
12 %1 = load i8** %q, align 8
13 store i8* %1, i8** %p, align 8 Call Graph
14 store i8* %0, i8** %q, align 8
15 ret void
}
https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#3-call-graph
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Call Graph
• Program calling relations between methods
• Whether a method A can call method B directly or transitively.
define i32 @main() #0 {
1 entry:
2 %a1 = alloca i8, align 1
3 %b1 = alloca i8, align 1
4 %a = alloca i8*, align 8 - each node represents a program method
5 %b = alloca i8*, align 8 - each edge represents a calling relation
6 store i8* %a1, i8** %a, align 8
between two program methods
7 store i8* %b1, i8** %b, align 8
8 call void @swap(i8** %a, i8** %b)
9 ret i32 0
} caller callee
define void @swap(i8** %p, i8** %q) #0
{
10 entry:
main swap
11 %0 = load i8** %p, align 8
12 %1 = load i8** %q, align 8
13 store i8* %1, i8** %p, align 8 Call Graph
14 store i8* %0, i8** %q, align 8
15 ret void
}
https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#3-call-graph
10
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Control Flow Graph
Program execution order between two LLVM instructions (SVFStmts).
• Intra-procedural control-flow graph: control-flow within a program method.
• Inter-procedural control-flow graph: control-flow across program methods.
11
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Intra-procedural Control Flow Graph
Program execution order between instructions
@main entry()
- Each node represents an instruction or a statement
1 %a1 = alloca i8
- Each edge represents a control-flow dependence
2 %b1 = alloca i8 between two nodes
3 %a = alloca i8 @swap entry()
IntraICFGNode
4 %b = alloca i8 9 %0 = load i8!" %p
FunEntryICFGNode
5 store i8* %a1, i8!" %a 10 %1 = load i8!" %q FunExitICFGNode
RetICFGNode
6 store i8* %b1, i8!" %b 11 store i8* %1, i8!" %p
CallICFGNode
7 call void @swap(i8!" %a, i8!" %b) 12 store i8* %0, i8!" %q
13 @swap: ret void
return void @swap(i8!" %a, i8!" %b)
@swap Exit
8 ret i32 0
@main Exit
https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#4-interprocedural-control-flow-graph
12
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Inter-procedural Control Flow Graph (ICFG)
Program execution order between instructions
@main entry()
- Each node represents an instruction or a statement
1 %a1 = alloca i8
- Each edge represents a control-flow dependence
2 %b1 = alloca i8 between two nodes
3 %a = alloca i8 @swap entry()
IntraICFGNode
4 %b = alloca i8 9 %0 = load i8!" %p
FunEntryICFGNode
5 store i8* %a1, i8!" %a 10 %1 = load i8!" %q FunExitICFGNode
RetICFGNode
6 store i8* %b1, i8!" %b 11 store i8* %1, i8!" %p
CallICFGNode
7 call void @swap(i8!" %a, i8!" %b) 12 store i8* %0, i8!" %q
13 @swap: ret void
return void @swap(i8!" %a, i8!" %b)
@swap Exit
8 ret i32 0
@main Exit
https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#4-interprocedural-control-flow-graph
13
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0);
6 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0);
6 }
1 define i32 @foo(i32 %b) {
2 %b.addr = alloca i32
3 store i32 %b, i32* %b.addr
4 %0 = load i32, i32* %b.addr
5 ret i32 %0
6 }
7
8 define i32 @main() {
9 %a = alloca i32
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0); Variables introduced by SVF
6 } (created internally)
SVFVar Meaning
1 define i32 @foo(i32 %b) { DummyValVar ID: 0 reserved
2 %b.addr = alloca i32 DummyValVar ID: 1 reserved
3 store i32 %b, i32* %b.addr DummyObjVar ID: 2 reserved
4 %0 = load i32, i32* %b.addr DummyObjVar ID: 3 reserved
ValVar ID: 4 foo
5 ret i32 %0
FIObjVar ID: 5 foo
6 } RetPN ID: 6 ret of foo
7 ValVar ID: 13 main
8 define i32 @main() { FIObjVar ID: 14 main
9 %a = alloca i32 RetPN ID: 15 ret of main
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){ Variables introduced by LLVM
5 int a = foo(0); Variables introduced by SVF
(created by LLVM Values)
6 } (created internally)
SVFVar LLVM Value
SVFVar Meaning
1 define i32 @foo(i32 %b) { ValVar ID: 7 i32 %b { 0th arg foo }
DummyValVar ID: 0 reserved
ValVar ID: 8 %b.addr = alloca i32
2 %b.addr = alloca i32 DummyValVar ID: 1 reserved
FIObjVar ID: 9 %b.addr = alloca i32
3 store i32 %b, i32* %b.addr DummyObjVar ID: 2 reserved
ValVar ID: 11 %0 = load i32, i32* %b.addr
4 %0 = load i32, i32* %b.addr DummyObjVar ID: 3 reserved
ValVar ID: 16 %a = alloca i32
ValVar ID: 4 foo
5 ret i32 %0 FIObjVar ID: 17 %a = alloca i32
FIObjVar ID: 5 foo
6 } ValVar ID: 18 %call = call i32 @foo(i32 0)
RetPN ID: 6 ret of foo
7 ValVar ID: 19 i32 0 { constant data }
ValVar ID: 13 main
8 define i32 @main() { FIObjVar ID: 20 i32 0 { constant data }
FIObjVar ID: 14 main
ValVar ID: 21 store i32 %call, i32* %a
9 %a = alloca i32 RetPN ID: 15 ret of main
ValVar ID: 22 ret i32 0
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
ICFG and SVFStmt Example3
ICFGNode SVFStmt LLVM Value
1 define i32 @foo(i32 %b) { CopyStmt: Var1 ← Var0 i8* null (constant data)
2 %b.addr = alloca i32 AddrStmt: Var19 ← Var20 i32 0 (constant data)
GlobalICFGNode0 AddrStmt: Var4 ← Var5 foo
3 store i32 %b, i32* %b.addr AddrStmt: Var13 ← Var14 main
4 %0 = load i32, i32* %b.addr FunEntryICFGNode1 fun: foo
5 ret i32 %0 IntraICFGNode2 AddrStmt: Var8 ← Var9 %b.addr = alloca i32
6 } IntraICFGNode3 StoreStmt Var8 ← Var7 store i32 %b, i32* %b.addr
IntraICFGNode4 LoadStmt: Var11 ← Var8 %0 = load i32, i32* %b.addr
7
IntraICFGNode5 fun:foo ret i32 %0
8 define i32 @main() { FunExitICFGNode6 PhiStmt: [Var6 ← ([Var11, ICFGNode5],)] ret i32 %0
9 %a = alloca i32 FunEntryICFGNode7 fun: main
10 %call = call i32 @foo(i32 0) IntraICFGNode8 AddrStmt: [Var16 ← Var17] %a = alloca i32
CallICFGNode9 CallPE: [Var7 ← Var19] %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a RetICFGNode10 RetPE: [Var18 ← Var6] %call = call i32 @foo(i32 0)
12 ret i32 0 IntraICFGNode11 StoreStmt: [Var16 ← Var18] store i32 %call, i32* %a
13 } IntraICFGNode12 fun: main ret i32 0
FunExitICFGNode13 PhiStmt: [Var15 ← ([Var19, ICFGNode12],)] ret i32 0
3
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
15
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
What’s next?
• (1) Compile two C programs (example.c and swap.c) into their LLVM IR.
• A guide can be found at https://github.com/SVF-tools/
Teaching-Software-Verification/wiki/SVFIR#2-llvm-ir-generation
• Understand the mapping from a C program to its corresponding LLVM IR.
• (2) Generate and visualize the graph representation of LLVM IR (example.ll
swap.ll).
• https://github.com/SVF-tools/Teaching-Software-Verification/wiki/
SVFIR#3-run-and-debug-your-svfir
• (3) Write code to iterate SVFVars and also the nodes and edges of ICFG and
print their contents.
• https://github.com/SVF-tools/Teaching-Software-Verification/blob/
main/SVFIR/SVFIR.cpp#L67-L93
• (4) More about LLVM IR and SVF’s graph representation
• LLVM language manual https://llvm.org/docs/LangRef.html
• SVF website https://github.com/SVF-tools/SVF
16
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification