0% found this document useful (0 votes)
38 views24 pages

5 Svfir

The document discusses the Static Value-Flow Analysis Framework (SVF), a tool for interprocedural program dependence analysis for both sequential and multithreaded programs, implemented on LLVM. It highlights key features of SVF, such as its scalability, precision, and on-demand analysis capabilities, along with the simplified SVF Intermediate Representation (SVFIR) for static analysis. Additionally, it covers the graph representation of code, including control flow and call graphs, to facilitate understanding and analysis of program execution and dependencies.

Uploaded by

shrydhpd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views24 pages

5 Svfir

The document discusses the Static Value-Flow Analysis Framework (SVF), a tool for interprocedural program dependence analysis for both sequential and multithreaded programs, implemented on LLVM. It highlights key features of SVF, such as its scalability, precision, and on-demand analysis capabilities, along with the simplified SVF Intermediate Representation (SVFIR) for static analysis. Additionally, it covers the graph representation of code, including control flow and call graphs, to facilitate understanding and analysis of program execution and dependencies.

Uploaded by

shrydhpd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

SVFIR and Graph Representation of Code

Yulei Sui
University of Technology Sydney, Australia

1
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.

2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.
• Value-Flow Analysis: resolves both control and data dependence.
• Does the information generated at program point A flow to another program point B along some
execution paths?
• Can function F be called either directly or indirectly from some other function F ′ ?
• Is there an unsafe memory access that may trigger a bug or security risk?

2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF : Static Value-Flow Analysis Framework for Source Code
A scalable, precise and on-demand interprocedural program dependence analysis framework for
both sequential and multithreaded programs.
• The SVF project
• Publicly available since early 2015 and actively maintained: http://svf-tools.github.io/SVF.
• Implemented on top of LLVM compiler (the latest version 12.0.0) with over 100 KLOC C/C++ code
and 700+ stars with 40+ contributors and over 1K commits on Github.
• Invited for a plenary talk in EuroLLVM 2016, and awarded an ICSE 2018 Distinguished Paper, an
SAS Best Paper 2019 and an OOPSLA 2020 Distinguished Paper.
• Value-Flow Analysis: resolves both control and data dependence.
• Does the information generated at program point A flow to another program point B along some
execution paths?
• Can function F be called either directly or indirectly from some other function F ′ ?
• Is there an unsafe memory access that may trigger a bug or security risk?
• Key features of SVF
• Sparse: compute and maintain the data-flow facts where necessary
• Selective : support mixed analyses for precision and efficiency trade-offs.
• On-demand : reason about program parts based on user queries.
2
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF: Design Principle
Programs (e.g., C/C++)
Analysis developers

SVF-Frontend (Memory Model)

Generating Graphs (PAG, ICFG, Call Graph, VFG, Constraint Graph)

write

Select graph(s)
Select solver(s)

Analysis Instances Instantiating Solvers (DFS, IncDFS, LCD, HCD)


Graph Solver Templates

• Serving as an open-source foundation for building practical static source code analysis
• Bridge the gap between research and engineering
• Minimize the efforts of implementing sophisticated analysis (extendable, reusable, and robust via layers of abstractions)
• Support developing different analysis variants (flow-, context-, heap-, field-sensitive analysis) in a sparse and on-demand manner.

• Client applications:
• Static bug detection (e.g., memory leaks, null dereferences, use-after-frees and data-races)
• Accelerate dynamic analysis (e.g., Google’s Sanitizers and AFL fuzzing)

3
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.

4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.
• SVFVar: program variables
SVF!"SVFVar

SVF!"ValVar SVF!"ObjVar

4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR and Why?
• SVFIR is a much simplified representation of LLVM IR (or SSA-based
programming languages) for static analysis purposes.
• Lightweight in terms of fewer types of program variables and statements.
• SVFVar: program variables
SVF!"SVFVar

SVF!"ValVar SVF!"ObjVar

• SVFStmt: program statements

SVF!"SVFStmt

SVF!"AssignStmt SVF!"MultiOpndStmt
SVF!"UnaryOPStmt SVF!"BranchStmt

SVF!"AddrStmt SVF!"RetPE SVF!"PhiStmt SVF!"BinaryOpStmt

SVF!"GepStmt SVF!"CallPE
SVF!"SelectStmt SVF!"CmpStmt
SVF!"CopyStmt SVF!"StoreStmt SVF!"LoadStmt

4
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Variables (SVFVar)
• An SVFVar represent either a top-level variable (P) or a memory object
variable (O)
• Each SVFVar has a unique identifier (ID)
• SVFVar ID 0-4 are reserved

Program Variables Domain Meanings


SVFVar V=P∪O Program Variables
ValVar P Top-level variables (scalars and pointers)
ObjVar O=S∪G∪H∪C Memory Objects (stack, global1 , heap and constant data)
FIObjVar o ∈ (S ∪ G ∪ H) A single (base) memory object
GepObjVar oi ∈ (S ∪ G ∪ H) × P i-th subfield/element of an (aggregate) object
ConstantData C Constant data (e.g., numbers and strings)
Program Statement l∈L Statements labels

1
Function objects are considered as global objects
5
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Statements (SVFStmt)
An SVFStmt is one of the following program statements representing the relations
between SVFVars.
SVFStmt LLVM-Like form C-Like form Operand types
AddrStmt %ptr = alloca or constantData p = alloc or p = c P × (O ∪ C)
CopyStmt %p = bitcast %q p=q P×P
LoadStmt %p = load %q p = ∗q P×P
StoreStmt store %p, %q ∗p = q P×P
GepStmt %p = getelementptr %q, %i p = &(q → i) or p = &q[i] P×P×P
PhiStmt %p = phi [ l1 , %q1 ], [ l2 , %q2 ] p = phi(l1 : q1 , l2 : q2 ) P × (L → P2 )
BranchStmt br i1 %p, label %l1 , label %l2 if (p) l1 else l2 P × L2
UnaryOPStmt p = ¬q p = ¬q P×P
BinaryOPStmt/CmpStmt r = ⊗ p, q r=p⊗q P×P×P
%r = call f(. . . %qi . . . ) r = f(. . . , qi , . . . )
f(. . . %pi . . . ){ . . . ret %z} f(. . . , pi , . . . ){. . . return z}
CallPE %pi = %qi (1 < i < n) pi = qi (1 < i < n) (P × P)n
RetPE %r = %z r=z P×P
⊗ ∈ {+, -, ∗, /, %, <<, >>, <, >, &, &&, <=,>=, ≡, ∼, |, ∧ }
6
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF Program Statements (SVFStmt)
• SVFStmt follows the LLVM’s SSA form for top-level variables
• Top-level variables (P) can only be defined once
• Memory objects (i.e., S ∪ G ∪ H excluding constant data) can only be
modified/read through top-level pointers at StoreStmt and LoadStmt.
• For example, p = &a; *p = r; The value of a can only be modified/read via
dereferencing p.
• A ConstantData (C) object needs first to be assigned to a temp top-level
variable and can only be read through that top-level variable in any SVFStmt.
• For example, *p = 3; ⇒ t = 3; *p = t;
• CallPE represents the parameter passing from an actual parameter at a
callsite to a formal parameter of a callee function.
• RetPE represents the parameter passing from a function return to a callsite
return variable.
7
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Graph Representation of Code

• What is a graph representation of code (code graph)?


• Put the LLVM IR or SVF IR on a graph representation.
• Represent a program’s control-flow (i.e., execution order) and/or data-flow
(variable definition and use relations) using nodes and edges of a graph.

8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Graph Representation of Code

• What is a graph representation of code (code graph)?


• Put the LLVM IR or SVF IR on a graph representation.
• Represent a program’s control-flow (i.e., execution order) and/or data-flow
(variable definition and use relations) using nodes and edges of a graph.
• Why a graph representation?
• Abstracting code from low-level complicated instructions
• Applying general graph algorithms
• Easy to maintain and extend

8
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Call Graph
• Program calling relations between methods
• Whether a method A can call method B directly or transitively.
define i32 @main() #0 {
1 entry:
2 %a1 = alloca i8, align 1 Program calling relations between methods
3 %b1 = alloca i8, align 1
4 %a = alloca i8*, align 8
5 %b = alloca i8*, align 8
6 store i8* %a1, i8** %a, align 8
7 store i8* %b1, i8** %b, align 8
8 call void @swap(i8** %a, i8** %b)
9 ret i32 0
}
define void @swap(i8** %p, i8** %q) #0
{
10 entry:
main swap
11 %0 = load i8** %p, align 8
12 %1 = load i8** %q, align 8
13 store i8* %1, i8** %p, align 8 Call Graph
14 store i8* %0, i8** %q, align 8
15 ret void
}

https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#3-call-graph
9
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Call Graph
• Program calling relations between methods
• Whether a method A can call method B directly or transitively.
define i32 @main() #0 {
1 entry:
2 %a1 = alloca i8, align 1
3 %b1 = alloca i8, align 1
4 %a = alloca i8*, align 8 - each node represents a program method
5 %b = alloca i8*, align 8 - each edge represents a calling relation
6 store i8* %a1, i8** %a, align 8
between two program methods
7 store i8* %b1, i8** %b, align 8
8 call void @swap(i8** %a, i8** %b)
9 ret i32 0
} caller callee
define void @swap(i8** %p, i8** %q) #0
{
10 entry:
main swap
11 %0 = load i8** %p, align 8
12 %1 = load i8** %q, align 8
13 store i8* %1, i8** %p, align 8 Call Graph
14 store i8* %0, i8** %q, align 8
15 ret void
}

https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#3-call-graph
10
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Control Flow Graph

Program execution order between two LLVM instructions (SVFStmts).


• Intra-procedural control-flow graph: control-flow within a program method.
• Inter-procedural control-flow graph: control-flow across program methods.

11
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Intra-procedural Control Flow Graph
Program execution order between instructions
@main entry()
- Each node represents an instruction or a statement
1 %a1 = alloca i8
- Each edge represents a control-flow dependence
2 %b1 = alloca i8 between two nodes
3 %a = alloca i8 @swap entry()

IntraICFGNode
4 %b = alloca i8 9 %0 = load i8!" %p
FunEntryICFGNode
5 store i8* %a1, i8!" %a 10 %1 = load i8!" %q FunExitICFGNode
RetICFGNode
6 store i8* %b1, i8!" %b 11 store i8* %1, i8!" %p
CallICFGNode

7 call void @swap(i8!" %a, i8!" %b) 12 store i8* %0, i8!" %q

13 @swap: ret void


return void @swap(i8!" %a, i8!" %b)
@swap Exit
8 ret i32 0

@main Exit

https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#4-interprocedural-control-flow-graph

12
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
Inter-procedural Control Flow Graph (ICFG)
Program execution order between instructions
@main entry()
- Each node represents an instruction or a statement
1 %a1 = alloca i8
- Each edge represents a control-flow dependence
2 %b1 = alloca i8 between two nodes
3 %a = alloca i8 @swap entry()

IntraICFGNode
4 %b = alloca i8 9 %0 = load i8!" %p
FunEntryICFGNode
5 store i8* %a1, i8!" %a 10 %1 = load i8!" %q FunExitICFGNode
RetICFGNode
6 store i8* %b1, i8!" %b 11 store i8* %1, i8!" %p
CallICFGNode

7 call void @swap(i8!" %a, i8!" %b) 12 store i8* %0, i8!" %q

13 @swap: ret void


return void @swap(i8!" %a, i8!" %b)
@swap Exit
8 ret i32 0

@main Exit

https://github.com/svf-tools/SVF/wiki/Analyze-a-Simple-C-Program#4-interprocedural-control-flow-graph

13
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0);
6 }

2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0);
6 }
1 define i32 @foo(i32 %b) {
2 %b.addr = alloca i32
3 store i32 %b, i32* %b.addr
4 %0 = load i32, i32* %b.addr
5 ret i32 %0
6 }
7
8 define i32 @main() {
9 %a = alloca i32
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){
5 int a = foo(0); Variables introduced by SVF
6 } (created internally)
SVFVar Meaning
1 define i32 @foo(i32 %b) { DummyValVar ID: 0 reserved
2 %b.addr = alloca i32 DummyValVar ID: 1 reserved
3 store i32 %b, i32* %b.addr DummyObjVar ID: 2 reserved
4 %0 = load i32, i32* %b.addr DummyObjVar ID: 3 reserved
ValVar ID: 4 foo
5 ret i32 %0
FIObjVar ID: 5 foo
6 } RetPN ID: 6 ret of foo
7 ValVar ID: 13 main
8 define i32 @main() { FIObjVar ID: 14 main
9 %a = alloca i32 RetPN ID: 15 ret of main
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
SVF IR Example2
1 int foo(b){
2 return b;
3 }
4 int main(){ Variables introduced by LLVM
5 int a = foo(0); Variables introduced by SVF
(created by LLVM Values)
6 } (created internally)
SVFVar LLVM Value
SVFVar Meaning
1 define i32 @foo(i32 %b) { ValVar ID: 7 i32 %b { 0th arg foo }
DummyValVar ID: 0 reserved
ValVar ID: 8 %b.addr = alloca i32
2 %b.addr = alloca i32 DummyValVar ID: 1 reserved
FIObjVar ID: 9 %b.addr = alloca i32
3 store i32 %b, i32* %b.addr DummyObjVar ID: 2 reserved
ValVar ID: 11 %0 = load i32, i32* %b.addr
4 %0 = load i32, i32* %b.addr DummyObjVar ID: 3 reserved
ValVar ID: 16 %a = alloca i32
ValVar ID: 4 foo
5 ret i32 %0 FIObjVar ID: 17 %a = alloca i32
FIObjVar ID: 5 foo
6 } ValVar ID: 18 %call = call i32 @foo(i32 0)
RetPN ID: 6 ret of foo
7 ValVar ID: 19 i32 0 { constant data }
ValVar ID: 13 main
8 define i32 @main() { FIObjVar ID: 20 i32 0 { constant data }
FIObjVar ID: 14 main
ValVar ID: 21 store i32 %call, i32* %a
9 %a = alloca i32 RetPN ID: 15 ret of main
ValVar ID: 22 ret i32 0
10 %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a
12 ret i32 0
13 }
2
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
14
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
ICFG and SVFStmt Example3
ICFGNode SVFStmt LLVM Value
1 define i32 @foo(i32 %b) { CopyStmt: Var1 ← Var0 i8* null (constant data)

2 %b.addr = alloca i32 AddrStmt: Var19 ← Var20 i32 0 (constant data)


GlobalICFGNode0 AddrStmt: Var4 ← Var5 foo
3 store i32 %b, i32* %b.addr AddrStmt: Var13 ← Var14 main
4 %0 = load i32, i32* %b.addr FunEntryICFGNode1 fun: foo
5 ret i32 %0 IntraICFGNode2 AddrStmt: Var8 ← Var9 %b.addr = alloca i32

6 } IntraICFGNode3 StoreStmt Var8 ← Var7 store i32 %b, i32* %b.addr


IntraICFGNode4 LoadStmt: Var11 ← Var8 %0 = load i32, i32* %b.addr
7
IntraICFGNode5 fun:foo ret i32 %0
8 define i32 @main() { FunExitICFGNode6 PhiStmt: [Var6 ← ([Var11, ICFGNode5],)] ret i32 %0
9 %a = alloca i32 FunEntryICFGNode7 fun: main
10 %call = call i32 @foo(i32 0) IntraICFGNode8 AddrStmt: [Var16 ← Var17] %a = alloca i32
CallICFGNode9 CallPE: [Var7 ← Var19] %call = call i32 @foo(i32 0)
11 store i32 %call, i32* %a RetICFGNode10 RetPE: [Var18 ← Var6] %call = call i32 @foo(i32 0)
12 ret i32 0 IntraICFGNode11 StoreStmt: [Var16 ← Var18] store i32 %call, i32* %a
13 } IntraICFGNode12 fun: main ret i32 0
FunExitICFGNode13 PhiStmt: [Var15 ← ([Var19, ICFGNode12],)] ret i32 0

3
https://github.com/SVF-tools/Teaching-Software-Verification/wiki/svfir
15
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification
What’s next?
• (1) Compile two C programs (example.c and swap.c) into their LLVM IR.
• A guide can be found at https://github.com/SVF-tools/
Teaching-Software-Verification/wiki/SVFIR#2-llvm-ir-generation
• Understand the mapping from a C program to its corresponding LLVM IR.
• (2) Generate and visualize the graph representation of LLVM IR (example.ll
swap.ll).
• https://github.com/SVF-tools/Teaching-Software-Verification/wiki/
SVFIR#3-run-and-debug-your-svfir
• (3) Write code to iterate SVFVars and also the nodes and edges of ICFG and
print their contents.
• https://github.com/SVF-tools/Teaching-Software-Verification/blob/
main/SVFIR/SVFIR.cpp#L67-L93
• (4) More about LLVM IR and SVF’s graph representation
• LLVM language manual https://llvm.org/docs/LangRef.html
• SVF website https://github.com/SVF-tools/SVF
16
Software Verification https://github.com/SVF-tools/Teaching-Software-Verification

You might also like