Compiler Construction Notes
Compiler Construction Notes
Tsan-sheng Hsu
[email protected]
http://www.iis.sinica.edu.tw/~tshsu
What is a compiler?
a
recognizer ; translator .
Denitions:
a
programming languages; machine architecture; language theory; algorithms and data structures; software engineering.
History:
1950: the rst FORTRAN compiler took 18 man-years; now: using software tools, can be done in a few months as a students
project.
Compiler notes #1, 20060224, Tsan-sheng Hsu 2
Applications
Computer language compilers. Translator: from one format to another.
query interpreter text formatter silicon compiler inx notation postx notation: 3+566 3 5 + 6 6
pretty printers
Computational theory:
a set of grammar rules the denition of a particular machine. also equivalent to a set of languages recognized by this machine. a type of machines: a family of machines with a given set of operations,
or capabilities; power of a type of machines the set of languages that can be recognized by this type of machines.
Scanner
Actions:
Reads characters from the source program; Groups characters into
token .
the scanner returns the next token, plus maybe some additional information, to the parser;
The scanner may also discover lexical errors, i.e., erroneous characters.
The denitions of what a lexeme , token or bad character is depend on the denition of the source language.
Arbitrary number of blanks between lexemes. Erroneous sequence of characters, that are not parts of comments, for the C language:
control characters @ 2abc
Parser
Actions:
Group tokens into
grammatical phrases , to discover the underlying structure of the source Find syntax errors , e.g., the following C source line: (Lexeme) (Token) index ID = ASSIGN 12 INT * TIMES ; SEMI-COL
May nd some static semantic errors , e.g., use of undeclared variables or multiple declared variables. May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree.
interior nodes of the tree are OPERATORS; a nodes children are its OPERANDS; each subtree forms a
logical unit . the subtree with at its root shows that has higher precedence than +, the operation rate 60 must be performed as a unit, not initial + rate.
Semantic analyzer
Actions:
Check for more static semantic errors, e.g.,
type errors .
assembly languages.
= position (float) initial
temp1 := int-to-oat(60)
+
Example:
10
Optimizer
Improve the eciency of intermediate code. Goal may be to make code run faster , and/or to use least number of registers
temp1 := int-to-oat(60) temp2 := rate * temp1 temp3 := initial + temp2 position := temp3 temp2 := rate * 60.0 position := initial + temp2
Example:
Current trends:
to obtain smaller, but maybe slower, equivalent code for embedded
11
Code generation
A compiler may generate
pure machine codes (machine dependent assembly language) directly,
Example:
PASCAL
codes.
Advantages:
simplify the job of a compiler; decrease the size of the generated code:
can be run easily on a variety of platforms P-machine is an ideal general machine whose interpreter can be written easily; divide and conquer; recent example: JAVA and Byte-code.
12
13
14
forward addressed objects, i.e., anything that is used before its declaration. goto error handling; error handling:
Example: C language
15
the trace
16