Debre Markos University Burie Campus Departement of Computer Science
Debre Markos University Burie Campus Departement of Computer Science
CAMPUS
DEPARTEMENT OF COMPUTER
SCIENCE
Compiler Design
Banchalem A(MSC)
1
Outline
Introduction
Programs related to compiler
The translation process
Analysis
• Lexical analysis
• Syntax analysis
• Semantic analysis
Synthesis
• IC generator
• IC optimizer
• Code generator
• Code optimizer
• Phases of compiler
• Major data and structures in a compiler
• Compiler construction tools
2
Introduction
What is a compiler?
a program that reads a program written in one language (the
source language) and translates it into an equivalent
program in another language (the target language).
Why we design compiler?
Why we study compiler construction techniques?
• Compilers provide an essential interface between
applications and architectures
• Compilers embody a wide range of theoretical techniques
Source
program
High level
Compiler
Target program
language Error messages Assembly or machine
language
Target program
Input
exe Output
3
Introduction…
Using a high-level language for programming has a large
impact on how fast programs can be developed.
compiler compiler
compiler
Unix
Win
Mac
5
Programs related to compilers
Interpreter
Is a program that reads a source program and executes it
Works by analyzing and executing the source program
commands one at a time
Does not translate the whole source program into object
code
Interpretation is important when:
Programmer is working in interactive mode and needs to view
and update variables
Running speed is not important
Commands have simple formats, and thus can be quickly
analyzed and executed
Modification or addition to user programs is required as
execution proceeds
6
Programs related to compilers
Interpreter and compiler
Compilation Processing
Interpreter
Source code Intermediate
code
Compilation Interpretation
7
Programs related to compilers
Interpreter and compiler differences
8
Programs related to compilers…
Interpreter…
9
E.g., Compiling Java Programs
The Java compiler produces bytecode not machine code
Bytecode is converted into machine code using a Java
Interpreter
You can run bytecode on any computer that has a Java
Interpreter installed
Win
ter
e
r pr
te
Java Program Java bytecode In
Mac
compiler Interpreter
Inte
rpre
ter
Unix
10
Android and Java
11
Programs related to compiler…
Assemblers
Translator for the assembly language.
Assembly code is translated into machine code
Output is relocatable machine code.
Linker
Links object files separately compiled or
assembled
Links object files to standard library functions
Generates a file that can be loaded and executed
12
Programs related to compiler…
Loader
Loading of the executable codes, which are the outputs
of linker, into main memory.
Pre-processors
A pre-processor is a separate program that is called by
the compiler before actual translation begins.
Such a pre-processor:
• Produce input to a compiler
• can delete comments,
• Macro processing (substitutions)
• include other files...
13
Programs related to compiler
C or C++ program
Preprocessor
Assembly code
Assembler
Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
14
The translation process
A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
The phases of a compiler are shown in the next slide,
together with three auxiliary components that interact
with some or all of the phases:
The symbol table,
the literal table,
and error handler.
15
The translation process…
Source code
Intermediate code
Literal Scanner generator
table
Intermediate
Tokens code
Symbol Intermediate code
table Parser optimizer
Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
16
Analysis and Synthesis
Analysis (front end)
Breaks up the source program into constituent pieces and
Creates an intermediate representation of the source
program.
During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
Synthesis (back end)
The synthesis part constructs the desired program from the
intermediate representation.
17
Analysis of the source program
18
1. Lexical analysis or Scanning
The stream of characters making up the source program is
read from left to right and is grouped into tokens.
A token is a sequence of characters having a collective
meaning.
A lexical analyzer, also called a lexer or a scanner,
receives a stream of characters from the source program and
groups them into tokens.
Examples: Source Lexical Streams of
program analyzer tokens
• Identifiers
• Keywords
• Symbols (+, -, …)
• Numbers …
Blanks, new lines, tabulation marks will be removed during
lexical analysis.
19
Lexical analysis or Scanning…
Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator
4 number
+ plus operator
2 number
; semicolon
A scanner may perform other operations along with the
recognition of tokens.
• It may inter identifiers into the symbol table, and
• It may inter literals into literal table.
20
Lexical Analysis Tools
21
2. Syntax analysis or Parsing
22
Syntax analysis or Parsing…
Ex. Consider again the line of C code: a[index] = 4 + 2
23
Syntax analysis or Parsing…
Sometimes syntax trees are called abstract syntax trees, since
they represent a further abstraction from parse trees. Example is
shown in the following figure.
24
Syntax Analysis Tools
25
3. Semantic analysis
The semantics of a program are its meaning as opposed
to syntax or structure
The semantics consist of:
Runtime semantics – behavior of program at runtime
Static semantics – checked by the compiler
Static semantics include:
Declarations of variables and constants before use
Calling functions that exist (predefined in a library or defined by
the user)
Passing parameters properly
Type checking.
Annotated (integer)
syntax tree
27
Synthesis of the target program
28
Code Improvement
Code improvement techniques can be applied to:
Intermediate code – independent of the target machine
Target code – dependent on the target machine
Intermediate code
Abstract syntax Intermediate code
generator
31
IC optimizer…
There are several techniques of optimizing code
and they will be discussed in the forthcoming
chapters.
x = y+1;
for(i=1; i<10, i++)
z = x+i;
32
IC optimizer…
In our previous example, we have included an opportunity
for source level optimization; namely, the expression 4 + 2
can be recomputed by the compiler to the result 6(This
particular optimization is called constant folding).
This optimization can be performed directly on the syntax
tree as shown below.
33
IC optimizer…
Many optimizations can be performed directly on the tree.
However, in a number of cases, it is easier to optimize a linearized
form of the tree that is closer to assembly code.
A standard choice is Three-address code, so called because it
contains the addresses of up to three locations in memory.
In our example, three address code for the original C expression
might look like this:
t1=2
t2 = 4 + t1
a[index] = t2
Now the optimizer would improve this code in two steps, first
computing the result of the addition
t = 4+2
a[index] = t
And then replacing t by its value to get the three-address statement
a[index] = 6
34
Code generator
The machine code generator receives the (optimized) intermediate
code, and then it produces either:
Machine code for a specific machine, or
Assembly code for a specific machine and assembler.
Code generator
Selects appropriate machine instructions
Allocates memory locations for variables
Allocates registers for intermediate computations
35
Code generator…
The code generator takes the IR code and generates code for the
target machine.
Here we will write target code in assembly language: a[index]=6
36
The target code optimizer
In this phase, the compiler attempts to improve the
target code generated by the code generator.
Such improvement includes:
• Choosing addressing modes to improve performance
• Replacing slow instruction by faster ones
• Eliminating redundant or unnecessary operations
In the sample target code given, use a shift instruction to
replace the multiplication in the second instruction.
Another is to use a more powerful addressing mode, such as
indexed addressing to perform the array store.
With these two optimizations, our target code becomes:
Compiler passes:
A pass consists of reading an input file and writing an output file.
Several phases may be grouped in one pass.
For example, the front-end phases of lexical analysis, syntax
analysis, semantic analysis, and intermediate code generation
might be grouped together into one pass.
38
Grouping of phases…
Single pass
is a compiler that passes through the source code of
each compilation unit only once.
a one-pass compiler does not "look back" at code it
previously processed.
A one-pass compilers is faster than multi-pass
compilers
they are unable to generate an efficient programs,
due to the limited scope available.
Multi pass
is a type of compiler that processes the source code or
abstract syntax tree of a program several times.
A collection of phases is done multiple times
39
Major Data and Structures in a Compiler
Token
Represented by an integer value or an enumeration
literal
Sometimes, it is necessary to preserve the string of
characters that was scanned
For example, name of an identifiers or value of a
literal
Syntax Tree
Constructed as a pointer-based structure
Symbol Table
Keeps information associated with all kinds of tokens:
41
Major Data and Structures in a Compiler…
Literal Table
Stores constant values and string literals in a
program.
One literal table applies globally to the entire
program.
Used by the code generator to:
• Assign addresses for literals.
Avoids the replication of constants and strings.
Quick insertion and lookup are essential.
42
Compiler construction tools
Various tools are used in the construction of the
various parts of a compiler.
Scanner generators
Ex. Lex, flex, JLex
These tools generate a scanner /lexical analyzer/
if given a regular expression.
Parser Generators
Ex. Yacc, Bison, CUP
These tools produce a parser /syntax analyzer/ if
given a Context Free Grammar (CFG) that
describes the syntax of the source language.
43
Compiler construction tools…
Syntax directed translation engines
Ex. Cornell Synthesizer Generator
It produces a collection of routines that walk
the parse tree and execute some tasks.
Automatic code generators
Take a collection of rules that define the
translation of the IC to target code and
produce a code generator.
This completes our brief description of the
phases of compiler.
44