Compilation Techniques
Compilation Techniques
1. Introduction
Compilers and translators
Translate from a source language to
a destination language. The
translation must preserve the
semantic (meaning) of the application.
The source language typically is a high
level language (Java, C#, C/C++, …)
The destination language is a low level
language (machine code for a CPU,
virtual machine). Sometimes the
destination is also a high level
language (C++ -> C)
Other applications
Loading data from custom
configuration files
Data validation
Data mining – extract useful info
from documents, web pages, …
Data indexing for classification and
search engines
Compiler phases
Lexicalanalysis
Syntactic analysis
FRONT END
Semantic analysis (dependent of the
source language)
Intermediate code generation
Intermediate code optimization
Code generation
Code BACK END
optimization (dependent of the
destination
language)
(for operations such as optimization or
analysis on the intermediate code
the termen MIDDLE END is also used)
Lexical analysis
Splits the input text in lexical
tokens
Lexical token – an indivisible
(atomic) unit of information for
the next phases
Information which is not used in
the later phases is discarded
(spaces, comments)
len=2*PI*r; //circumference
ID:len ASSIGN INT:2 MUL ID:PI MUL
Syntactic analysis
Combine tokens in syntactic constructions
(expressions, statements)
Usually the result can be represented as a
tree (Abstract Syntax Tree – AST) or in a
LISP like form (first the parent and after
that its children)
Assign
len Multiply
Multiply r
2 P
I
(Assign len (Multiply (Multiply 2 PI) r))
Semantic analysis
Checks the proper usage of the
identifiers: if they are declared,
multiple declarations, scope, visibility –
domain analysis
Determine the type and the correctness
of the expressions based on their
input/output types – type analysis
All symbols are stored in Symbols Table
PUSH.CT.D 2 PUSH.CT.D
PUSH.CT.D 6.28318
3.14159 PUSH.VAR.D r
MUL.D MUL.D
PUSH.VAR.D r POP.D len
MUL.D
Error handling
At each phase errors can appear, due
to invalid source code or from internal
conditions such as insufficient memory
A compiler must give descriptive
messages and for these additional info
must be preserved: file name, line, …
error: syntax error
vs
error in “1.c”, line 59: missing ( after for
In case of errors the compiler can stop
or it can try to recover from that error
in order to also the detect other
possible errors in the same compilation
Warnings
A compiler can analyse the
semantic of the program and it can
give warning messages which are
not errors but usually denote flaws:
unused variables, dead code, loss
of precision from implicit
conversions, use of
deprecated/unsafe functions, …
Static code analysers – a special
class of software which tries to
evaluate the code in all possible
Tools
There are many tools available to aid in
compiler construction:
Lexical generators – generates the lexical
analyzer from a given lexical grammar:
(f)lex, antlr, JavaCC
Parser generators – generates the
syntactic analyzer from a given syntactic
grammar: yacc, bison, antlr.
Some tools can also generate an AST
generator from the syntactic grammar: antlr
Compiler backends – receive from the
compiler front end an intermediate code,
optimize it, generate the output code from it
and possibly execute the generated code:
Bibliography
Alfred V. Aho, Monica S. Lam, Ravi
Sethi, Jeffrey D. Ullman: Compilers.
Principles, Techniques and Tools,
2nd edition, Addison-Wesley
Publishing Company, 2007
Dick Grune, Henri E. Bal, Ceriel J.H.
Jacobs, Koen Langendoen: Modern
Compiler Design, John Wiley, 2003
Horia Ciocârlie: Limbaje de
programare. Concepte
fundamentale, Editura de Vest,
2007