Compiler2018 Big Picture
Compiler2018 Big Picture
Big Picture
Lequn Chen
March 16, 2017
Why Compilers Course
Compiler
Source Language Target Language
gcc
C Linux x86-64 ELF
javac
Java JVM Bytecode
your compiler
M* Linux x86-64 Assembly
What Are Virtual Machines
• Interpreter
• JIT Optimization focused on Code Generation
JVM Bytecode
Scala JVM on x86-64 macOS
• Language Features
• Optimization based on High Level Semantics
not Low-Level Virtual Machine anymore
LLVM
• Code Generation
• Transforms and Optimizations
C/C++ x86-64
Rust RISC
LLVM IR
Swift ARM
Haskell MIPS
• Language Features
• Optimization based on High Level Semantics
not Low-Level Virtual Machine anymore
LLVM
• Code Generation
Front-end • Transforms and Optimizations
C/C++ x86-64
Back-end
Rust RISC
LLVM IR
Swift ARM
Haskell MIPS
• Language Features
• Optimization based on High Level Semantics
Compilers
• Overview
• Front-end → Intermediate Representation → Back-end
• More detail?
• Lexing
• Parsing
• Semantic Analysis
• IR Generation
• IR Optimization
• Code Generation
• Target-dependent Optimizations
About the Course
• Language: whatever you want
KEYWORD while
IDENTIFIER f3
SYMBOL <
LITERAL 100
SYMBOL {
IDENTIFIER f3
SYMBOL =
IDENTIFIER f1
SYMBOL +
IDENTIFIER f2
SYMBOL ;
IDENTIFIER f1
SYMBOL ,
IDENTIFIER f2
SYMBOL =
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Parsing
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}
IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f2
SYMBOL ,
IDENTIFIER f2
SYMBOL =
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Syntax Error
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}
IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f2
SYMBOL ,
IDENTIFIER f2
SYMBOL = Syntax Error: Missing }
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Parsing: Grammars
stmt: expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;
Factor → ( Expr )
| Integer
def Expr():
Expr()
match('+') Infinite Recursion!
Term()
Lexer & Parser?
• However,
• vector<pair<int, int>>
Pragmatic Solution
• What to do
• Build AST
• Check syntax errors
• Use parser generators, especially, ANTLR 4
• Check https://github.com/antlr/grammars-v4
• Read if you want
• https://abcdabcd987.com/using-antlr4/
• https://abcdabcd987.com/notes-on-antlr4/
Challenge Yourself
IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f4
SYMBOL ,
IDENTIFIER f2
SYMBOL = Semantic Error: f4 used before declaration
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Language Features
• x, y = y, x
• a.sort(key=lambda x: x[0])
Pragmatic Solution
• What to do
• Walk the AST tree
• Build symbol table
• Check all kinds of semantic errors
Challenge Yourself
• Intermediate Representation
struct RT {
char A;
%struct.RT = type { i8, [10 x [20 x i32]], i8 }
int B[10][20];
%struct.ST = type { i32, double, %struct.RT }
char C;
};
define i32* @foo(%struct.ST* %s) {
struct ST {
entry:
int X;
%arrayidx = getelementptr inbounds %struct.ST,
double Y;
%struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
struct RT Z;
ret i32* %arrayidx
};
}
int *foo(struct ST *s) {
return &s[1].Z.B[5][13];
}
IR Design: Structure
• Tree (the Tiger Book)
• ✘ I cannot understand it
• ✘ Hard to analyze and transform
if x > y:
z = x
foo()
z = x z = y
else: foo() bar()
z = y
bar()
print(z)
print(z)
• Register-to-Register:
• Unlimited virtual register
• Reg Alloc: What should be spilled to memory?
• ✔ Easy to understand
• ✔ Similar to the target platform
Design: Function?
• Should the “function” and “function call” concept
present in IR?
• What to do
• Analyze and transform IR
• Graph coloring register allocation
• Inlining
Challenge Yourself
• What to do
• Transform IR to target machine assembly
• Do it in a naïve way
Challenge Yourself
• Use libc
Challenge Yourself