Lecture1 - Compiler Design
Lecture1 - Compiler Design
Lecture1 - Compiler Design
1
Contents
• Introduction
• A Simple Syntax-Directed Translator
• Lexical Analysis
• Syntax Analysis
• Syntax-Directed Translation
• Intermediate-Code Generation
• Type Checking
• Run-time Environments
• Code Generation
• Code Optimization
2
Introduction
Programming languages are notations for describing
computations to people & to machines
• World depends on Programming Languages
Compilers
• Areas:
Compiler construction
Write up Programming languages
Machine architecture
Language theory
Algorithms
Software engineering
3
Language Processors
• Compiler: is a program that can read a
program in one language (source language)
and translate it into an equivalent program in
another language (target language)
– The essential interface between applications & architectures
Compiler
Source Target
program program
6
Requirements
• Basic Requirements
– Work on your homework individually.
• Discussions are encouraged but don’t copy others’ work.
– Get you hands dirty!
• Experiment with ideas presented in class and gain first-hand
knowledge!
– Come to class and DON’T hesitate to speak if you have any
questions/comments/suggestions!
– Student participation is important!
7
Compiler vs. Interpreter (1/5)
• Compilers: Translate a source (human-
writable) program to an executable (machine-
readable) program
8
Compiler vs. Interpreter (2/5)
Ideal concept:
Source code
Interpreter Output data
Input data
9
Compiler vs. Interpreter (3/5)
• Most languages are usually thought of as
using either one or the other:
– Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1
– Interpreters: Lisp, scheme, BASIC, APL, Perl,
Python, Smalltalk
• BUT: not always implemented this way
– Virtual Machines (e.g., Java)
– Linking of executables at runtime
– JIT (Just-in-time) compiling
10
Compiler vs. Interpreter (4/5)
• Actually, no sharp boundary between them.
General situation is a combo:
• Cons • Cons
– Slow processing – Not for large projects
• Partly Solved • Exceptions: Perl, Python
(Separate compilation) – Requires more space
– Debugging – Slower execution
• Improved thru IDEs • Interpreter in memory all the
time
12
A Language Processing System
•The task of collecting the source
program is sometimes entrusted
to a separate program Macro
(preprocessor)
Intermediate code
Analysis Synthesis +
Symbol table
Breaks source program (constituent parts) =
Impose grammatical structure (Lexical, Target code
syntax, semantic)
Intermediate source code Back End
Error checks
Stored in Symbol table
Front End
14
Phase of
compilations
15
Scanning/Lexical analysis
Break program down into its smallest
meaningful symbols (tokens, atoms, lexemes)
Tools for this include lex, flex
Tokens include e.g.:
“Reserved words”: do if float while
Special characters: ( { , + - = ! /
Names & numbers: myValue 3.07e02
Start symbol table with new symbols found
16
Scanning/Lexical analysis
• For each lexeme, lexical analyzer produces as
output a token: <token-name, attribute-value>
token- name: abstract symbol that is used during
syntax analysis ,
attribute-value: points to an entry in the symbol table
for this token.
17
Scanning/Lexical analysis
• Assignment Statement (source program):
posit ion = initial + rate * 60
• Lexemes:
1. position is a lexeme that would be mapped into a
token <id, 1>, where id is an abstract symbol standing
for identifier and 1 points to the symbol table entry for
position.
The symbol-table entry for an identifier holds
information about the identifier, such as its name and
type. 18
Scanning/Lexical analysis
• 2. The assignment symbol = is a lexeme that is
mapped into the token <=>.
• 3. initial is a lexeme that is mapped into the token
<id, 2> , where 2 points to the symbol-table entry
for initial
• 4. + is a lexeme that is mapped into token <+>
• 5. rate is a lexeme that is mapped into the token
<id, 3> , where 3 points to the symbol-table entry
for rate.
19
Scanning/Lexical analysis
• 6. * is a lexeme that is mapped into token <*>
• 7. 60 is a lexeme that is mapped into the token
<60>
• Blanks (White Space) separating the lexemes
would be discarded by the lexical analyzer.
20
Translation of an
assignment
statement
21
Parsing/Syntax Analysis
• The parser create a tree-like intermediate
representation that depicts the grammatical
structure of the token stream.
• A typical representation is a syntax/parse tree
in which each interior node represents an
operation and the children of the node
represent the arguments of the operation.
22
Parsing/Syntax Analysis
• This tree shows the order in which the operations in
the assignment are to be performed:
position = initial + rate * 60
• The tree has an interior node labeled * with <id, 3>
as its left child and the integer 60 as its right child.
The node <id, 3> represents the identifier rate.
• The node labeled * makes it explicit that we must
first multiply the value of rate by 60.
• The node labeled + indicates that we must add the
result of this multiplication to the value of initial.
23
Parsing/Syntax Analysis
• The root of the tree, labeled =, indicates that
we must store the result of this addition into
the location for the identifier position.
• This ordering of operations is consistent with
the usual conventions of arithmetic which tell
us
multiplication has higher precedence than
addition, and hence that the multiplication is to
be performed before the addition.
24
Semantic Analysis
• The semantic an analyzer uses the syntax tree &
the information in the symbol table to check the
source program for semantic consistency with
the language definition.
• It also gathers type information & saves it in
either the syntax tree or the symbol table, for
subsequent use during intermediate-code
generation.
25
Semantic Analysis
• Important part: type checking
compiler checks that each operator has
matching operands.
Ex: many programming language definitions
require an array index to be an integer;
the compiler must report an error if a floating-
point number is used to index an array.
26
Semantic Analysis
• The language specification may permit some
type conversions called coercions.
• Suppose that position, initial, and rate have
been declared to be floating-point numbers,
and that the lexeme 60 by itself forms an
integer.
• The type checker discovers that the operator *
is applied to a floating-point number rate & an
integer 60.
27
Semantic Analysis
• In this case, the integer may be converted into
a floating-point number.
• The output of the semantic analyzer has an
extra node for the operator inttofloat , which
explicitly converts its integer argument into a
floating-point number.
28
Intermediate Code Generation
• one or more intermediate representations,
which can have a variety of forms.
• Syntax trees are a form of intermediate
representation; they are commonly used
during syntax and semantic analysis.
• 02 important properties:
it should be easy to produce
it should be easy to translate into the target
machine
29
Intermediate Code Generation
• Three-address code: a sequence of assembly-
like instructions with three operands per
instruction.
• Each operand can act like a register.
30
Intermediate Code Generation
• Properties:
Each three-address assignment instruction has
at most one operator on the right side.
Compiler must generate a temporary name to
hold the value computed by a three-address
instruction.
Some "three-address instructions“ have fewer
than three operands.
31
Code Optimization
• The code-optimization phase attempts to improve the
intermediate code so that better target code will result.
• Better means faster,
shorter code, or target code that consumes less power.
• Ex: a straightforward algorithm generates the intermediate
code, using an instruction for each operator in the tree
representation that comes from the semantic analyzer.
32
Code Optimization
• The optimizer can deduce that the conversion of
60 from integer to floating point can be done once
and for all at compile time,
so the inttofloat operation can be eliminated by
replacing the integer 60 by the floating-point
number 60.0.
• Moreover, t3 is used only once to transmit its value
to id1 so the optimizer can transform into the
shorter sequence
33
Code Generation
• The code generator takes as input an
intermediate representation of the source
program and maps it into the target language.
• If the target language is machine code, registers
or memory locations are selected for each of
the variables used by the program.
• Then, the intermediate instructions are
translated into sequences of machine
instructions that perform the same task.
34
Code Generation
• Ex: using registers R1 and R2, the intermediate
code in might get translated into machine code:
35
Code Generation
The code loads the contents of address id3 into register R2,
then multiplies it with floating-point constant 60.0.
The # signifies that 60.0 is to be treated as an immediate
constant
38
Commonly used Compiler Construction Tools
39
Commonly used Compiler Construction Tools
41
The Move to Higher-level Language
• Early 1950: Assembly languages (mnemonic)
• Later, macro instructions were added to
assembly languages so that a programmer
could define parameterized shorthands for
frequently used sequences of machine
instructions.
42
The Move to Higher-level Language
• Latter half of the 1950's: A major step towards higher-level
languages was made
Fortran for scientific computation,
Cobol for business data processing,
Lisp for symbolic computation.
44
Classification
• 2. imperative for languages
a program specifies how a computation is to be done and
declarative for languages in which a program specifies what
computation is to be done.
• Languages such as C, C++, C#, and Java are imperative
languages.
• In imperative languages there is a notion of program state
and statements that change the state.
• Functional languages such as ML and Haskell and constraint
logic languages such as Prolog are often considered to be
declarative languages.
45
Classification
• 3. von Neumann language
• computational model is based on the von
Neumann computer architecture.
• Fortran and C are von Neumann languages.
46
Classification
• 4. An object-oriented language
• supports object-oriented programming, a
programming style in which a program consists
of a collection of objects that interact with one
another.
Simula 67 and Smalltalk are the earliest major
object-oriented languages.
C++, C#, Java, and Ruby are more recent object-
oriented languages.
47
Classification
5. Scripting languages
interpreted languages with high-level operators
designed for "gluing together" computations.
These computations were originally called "scripts."
Awk, JavaS cript, Perl, PHP, Python, Ruby, and Tel
are popular examples of scripting languages.
Programs written in scripting languages are often
much shorter than equivalent programs written in
languages like C.
48
Application of Compiler Technology
Implementation of High-level programming language
Optimizations for Computer Architecture: parallelism,
Memory Hierarchies
Design of New Computer Architecture: RISC, Specialized
architecture
Debugging
Fault location
Model checking in formal analysis
Model-driven development
Optimization techniques in software engineering
Program Translation: Binary translation, Hardware
synthesis, database query interpreters
Software productivity tools: Type checking, bounds
checking, memory-management, software maintenance
Visualizations of analysis results 49
Compiler Scientist
(1906 – 1992)
•The first self-hosting compiler –
capable of compiling its own
source code in a high-level
language – was created in 1962 for
Lisp by Tim Hart and Mike Levin at
MIT.
John McCarthy
51
Dennis Ritchie
1941 – 2011
http://en.wikipedia.org/wiki/List_of_compilers
52