Compiler Construction: M Ikram Ul Haq 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Compiler Construction

Compiler Construction

M Ikram Ul Haq 1
Compiler Construction

Roadmap

> Overview
> Front end
> Back end
> Multi-pass compilers

See Modern compiler implementation


in Java (Second edition), chapter 1.

2
Compiler Construction

Roadmap

> Overview
> Front end
> Back end
> Multi-pass compilers

© Oscar Nierstrasz 3
Compiler Construction

Textbook

> Andrew W. Appel, Modern compiler implementation in


Java (Second edition), Cambridge University Press,
New York, NY, USA, 2002, with Jens Palsberg.

© Oscar Nierstrasz 4
Compiler Construction

Other recommended sources

> Compilers: Principles, Techniques, and


Tools, Aho, Sethi and Ullman
— http://dragonbook.stanford.edu/

> Parsing Techniques, Grune and Jacobs


— http://www.cs.vu.nl/~dick/PT2Ed.html

> Advanced Compiler Design and


Implementation, Muchnik

© Oscar Nierstrasz 5
PS — Introduction

Compilers, Interpreters …

© O. Nierstrasz 1.6
Compiler Construction

What is a compiler?

a program that translates an executable


program in one language into an
executable program in another language

© Oscar Nierstrasz 7
Compiler Construction

What is an interpreter?

a program that reads an


executable program and produces
the results of running that program

© Oscar Nierstrasz 8
Compiler Construction

Why do we care?

artificial greedy algorithms


intelligence learning algorithms
Compiler construction graph algorithms
is a microcosm of algorithms union-find
dynamic programming
computer science
DFAs for scanning
theory parser generators
lattice theory for analysis
allocation and naming
systems locality
synchronization
pipeline management
architecture hierarchy management
instruction set use

Inside a compiler, all these things come together


© Oscar Nierstrasz 9
Compiler Construction

Isn’t it a solved problem?

> Machines are constantly changing


— Changes in architecture  changes in compilers
— new features pose new problems
— changing costs lead to different concerns
— old solutions need re-engineering

> Innovations in compilers should prompt changes in


architecture
— New languages and features

© Oscar Nierstrasz 10
Compiler Construction

What qualities are important in a compiler?

1. Correct code
2. Output runs fast
3. Compiler runs fast
4. Compile time proportional to program size
5. Support for separate compilation
6. Good diagnostics for syntax errors
7. Works well with the debugger
8. Good diagnostics for flow anomalies
9. Cross language calls
10. Consistent, predictable optimization

© Oscar Nierstrasz 11
Compiler Construction

A bit of history

> 1952: First compiler (linker/loader) written by Grace


Hopper for A-0 programming language

> 1957: First complete compiler for FORTRAN by John


Backus and team

> 1960: COBOL compilers for multiple architectures

> 1962: First self-hosting compiler for LISP

© Oscar Nierstrasz 12
Compiler Construction

A compiler was originally a program that


“compiled” subroutines [a link-loader].
When in 1954 the combination “algebraic
compiler” came into use, or rather into
misuse, the meaning of the term had already
shifted into the present one.
— Bauer and Eickel [1975]

© Oscar Nierstrasz 13
Compiler Construction

Abstract view

• recognize legal (and illegal) programs


• generate correct code
• manage storage of all variables and code
• agree on format for object (or assembly) code

Big step up from assembler — higher level notations


© Oscar Nierstrasz 14
Compiler Construction

Traditional two pass compiler

• intermediate representation (IR)


• front end maps legal code into IR
• back end maps IR onto target machine
• simplify retargeting
• allows multiple front ends
• multiple passes  better code

© Oscar Nierstrasz 15
Compiler Construction

A fallacy!

Front-end, IR and back-end must encode


knowledge needed for all nm combinations!

© Oscar Nierstrasz 16
Compiler Construction

Roadmap

> Overview
> Front end
> Back end
> Multi-pass compilers

© Oscar Nierstrasz 17
Compiler Construction

Front end

• recognize legal code


• report errors
• produce IR
• preliminary storage map
• shape code for the back end

Much of front end construction can be automated


© Oscar Nierstrasz 18
Compiler Construction

Scanner

• map characters to tokens


• character string value for a token is a lexeme
• eliminate white space

x=x+y <id,x> = <id,x> + <id,y>

© Oscar Nierstrasz 19
Compiler Construction

Parser

• recognize context-free syntax


• guide context-sensitive analysis
• construct IR(s)
• produce meaningful error messages
• attempt error correction

Parser generators mechanize much of the work


© Oscar Nierstrasz 20
Compiler Construction

Context-free grammars

Context-free syntax 1. <goal> := <expr>


is specified with a 2. <expr> := <expr> <op>
grammar, usually in <term>
Backus-Naur form 3. | <term>
(BNF) 4. <term> := number
5. | id
6. <op> := +
7. | -
A grammar G = (S,N,T,P)
• S is the start-symbol
• N is a set of non-terminal symbols
• T is a set of terminal symbols
• P is a set of productions — P: N  (N T)*
© Oscar Nierstrasz 21
Compiler Construction

Deriving valid sentences

Productio Result
n Given a grammar, valid
<goal> sentences can be
1 <expr> derived by repeated
2 <expr> <op> <term> substitution.
5 <expr> <op> y
7 <expr> - y To recognize a valid
2 <expr> <op> <term> - y sentence in some
4 <expr> <op> 2 - y
CFG, we reverse this
6
process and build up a
<expr> + 2 - y
parse.
3 <term> + 2 - y
5 x+2-y

© Oscar Nierstrasz 22
Compiler Construction

Parse trees

A parse can be represented by a


tree called a parse or syntax tree.

Obviously, this contains a lot


of unnecessary information

© Oscar Nierstrasz 23
Compiler Construction

Abstract syntax trees

So, compilers often use an abstract syntax tree (AST).

ASTs are often


used as an IR.

© Oscar Nierstrasz 24
Compiler Construction

Roadmap

> Overview
> Front end
> Back end
> Multi-pass compilers

© Oscar Nierstrasz 25
Compiler Construction

Back end

• translate IR into target machine code


• choose instructions for each IR operation
• decide what to keep in registers at each point
• ensure conformance with system interfaces

Automation has been less successful here


© Oscar Nierstrasz 26
Compiler Construction

Instruction selection

• produce compact, fast code


• use available addressing modes
• pattern matching problem
— ad hoc techniques
— tree pattern matching
— string pattern matching
— dynamic programming

© Oscar Nierstrasz 27
Compiler Construction

Register allocation

• have value in a register when used


• limited resources
• changes instruction choices
• can move loads and stores
• optimal allocation is difficult

Modern allocators often use an analogy to graph coloring


© Oscar Nierstrasz 28
Compiler Construction

Roadmap

> Overview
> Front end
> Back end
> Multi-pass compilers

© Oscar Nierstrasz 29
Compiler Construction

Traditional three-pass compiler

• analyzes and changes IR


• goal is to reduce runtime
• must preserve values

© Oscar Nierstrasz 30
Compiler Construction

Optimizer (middle end)

Modern optimizers are usually built as a set of passes

• constant propagation and folding


• code motion
• reduction of operator strength
• common sub-expression elimination
• redundant store elimination
• dead code elimination
© Oscar Nierstrasz 31
Compiler Construction

The MiniJava compiler

© Oscar Nierstrasz 32
Compiler Construction

Compiler phases
Lex Break source file into individual words, or tokens
Parse Analyse the phrase structure of program
Parsing Actions Build a piece of abstract syntax tree for each phrase
Determine what each phrase means, relate uses of variables to their
Semantic Analysis
definitions, check types of expressions, request translation of each phrase
Place variables, function parameters, etc., into activation records (stack
Frame Layout
frames) in a machine-dependent way
Produce intermediate representation trees (IR trees), a notation that is not tied
Translate
to any particular source language or target machine
Hoist side effects out of expressions, and clean up conditional branches, for
Canonicalize
convenience of later phases
Group IR-tree nodes into clumps that correspond to actions of target-machine
Instruction Selection
instructions
Analyse sequence of instructions into control flow graph showing all possible
Control Flow Analysis
flows of control program might follow when it runs
Gather information about flow of data through variables of program; e.g.,
Data Flow Analysis liveness analysis calculates places where each variable holds a still-needed
(live) value
Choose registers for variables and temporary values; variables not
Register Allocation
simultaneously live can share same register
Code Emission Replace temporary names in each machine instruction with registers

© Oscar Nierstrasz 33
Compiler Construction

A straight-line programming language


(no loops or conditionals):

Stm  Stm ; Stm CompoundStm


Stm  id := Exp AssignStm
Stm  print ( ExpList ) PrintStm
Exp  id IdExp
Exp  num NumExp
Exp  Exp Binop Exp OpExp
Exp  ( Stm , Exp ) EseqExp
ExpList  Exp , ExpList PairExpList
ExpList  Exp LastExpList
Binop  + Plus
Binop   Minus
Binop   Times
Binop  / Div

a := 5 + 3; b := (print(a,a—1),10a); print(b)

87
prints 80
© Oscar Nierstrasz 34
Compiler Construction

Tree representation
a := 5 + 3; b := (print(a,a—1),10a); print(b)

© Oscar Nierstrasz 35
Compiler Construction

Java classes for trees

class NumExp extends Exp {


abstract class Stm {} int num;
class CompoundStm extends Stm { NumExp(int n) {num=n;}
Stm stm1, stm2; }
CompoundStm(Stm s1, Stm s2) class OpExp extends Exp {
{stm1=s1; stm2=s2;} Exp left, right; int oper;
} final static int Plus=1,Minus=2,Times=3,Div=4;
class AssignStm extends Stm { OpExp(Exp l, int o, Exp r)
String id; Exp exp; {left=l; oper=o; right=r;}
}
AssignStm(String i, Exp e)
class EseqExp extends Exp {
{id=i; exp=e;}
Stm stm; Exp exp;
} EseqExp(Stm s, Exp e) {stm=s; exp=e;}
class PrintStm extends Stm { }
ExpList exps; abstract class ExpList {}
PrintStm(ExpList e) {exps=e;} class PairExpList extends ExpList {
} Exp head; ExpList tail;
abstract class Exp {} public PairExpList(Exp h, ExpList t)
class IdExp extends Exp { {head=h; tail=t;}
String id; }
IdExp(String i) {id=i;} class LastExpList extends ExpList {
Exp head;
}
public LastExpList(Exp h) {head=h;}
}

© Oscar Nierstrasz 36
Compiler Construction

What you should know!

 What is the difference between a compiler and an


interpreter?
 What are important qualities of compilers?
 Why are compilers commonly split into multiple passes?
 What are the typical responsibilities of the different parts
of a modern compiler?
 How are context-free grammars specified?
 What is “abstract” about an abstract syntax tree?
 What is intermediate representation and what is it for?
 Why is optimization a separate activity?

© Oscar Nierstrasz 37
Compiler Construction

Can you answer these questions?

 Is Java compiled or interpreted? What about Smalltalk?


Ruby? PHP? Are you sure?
 What are the key differences between modern compilers
and compilers written in the 1970s?
 Why is it hard for compilers to generate good error
messages?
 What is “context-free” about a context-free grammar?

© Oscar Nierstrasz 38

You might also like