What Are Compilers?
What Are Compilers?
Compiler as a Translator
2
Goals of translation
• Good compile time performance
• Good performance for the
generated code
• Correctness
– A very important issue.
–Can compilers be proven to be
correct?
• Tedious even for toy compilers!
Undecidable in general.
–However, the correctness has an
implication on the development cost
3
How to translate?
• Direct translation is difficult. Why?
7
The first few steps
• How to recognize words in a
programming language?
– a dictionary (of keywords etc.)
– rules for constructing words (identifiers,
numbers etc.)
• This is called lexical analysis
• Recognizing words is not completely
trivial. For example:
w hat ist his se nte nce?
8
Lexical Analysis: Challenges
• We must know what the word
separators are
9
Lexical Analysis: Challenges
• In programming languages a character
from a different class may also be
treated as word separator.
10
The next step
• Once the words are understood, the next
step is to understand the structure of the
sentence
Sentence
11
Parsing
• Parsing a program is exactly the same
process as shown in previous slide.
• Consider an expression
if x == y then z = 1 else z = 2
if stmt
== = =
x y z 1 z 2
12
Understanding the meaning
• Once the sentence structure is
understood we try to understand the
meaning of the sentence (semantic
analysis)
• A challenging task
• Example:
Prateek said Nitin left his assignment at
home
• What does his refer to? Prateek or Nitin?
13
Understanding the meaning
• Worse case
Amit said Amit left his assignment at
home
• Even worse
Amit said Amit left Amit’s assignment
at home
• How many Amits are there? Which
one left the assignment? Whose
assignment got left?
14
Semantic Analysis
• Too hard for compilers. They do not have
capabilities similar to human understanding
• However, compilers do perform analysis to
understand the meaning and catch
inconsistencies
• Programming languages define strict rules to
avoid such ambiguities
{ int Amit = 3;
{ int Amit = 4;
cout << Amit;
}
}
15
More on Semantic Analysis
• Compilers perform many other checks
besides variable bindings
• Type checking
Amit left her work at home
• There is a type mismatch between her
and Amit. Presumably Amit is a male.
And they are not the same person.
16
Example from Mahabharat
Compiler
Lexical Syntax Semantic
Analysis Analysis Analysis
18
Code Optimization
• No strong counter part with
English, but is similar to
editing/précis writing
23
Code Optimization
• Some common optimizations
–Common sub-expression elimination
–Copy propagation
–Dead code elimination
–Code motion
–Strength reduction
–Constant folding
• Example: x = 15 * 3 is transformed
to x = 45
24
Example of Optimizations
A : assignment M : multiplication D : division E : exponent
PI = 3.14159
Area = 4 * PI * R^2
Volume = (4/3) * PI * R^3 3A+4M+1D+2E
--------------------------------
X = 3.14159 * R * R
Area = 4 * X
Volume = 1.33 * X * R 3A+5M
--------------------------------
Area = 4 * 3.14159 * R * R
Volume = ( Area / 3 ) * R 2A+4M+1D
--------------------------------
Area = 12.56636 * R * R
Volume = ( Area /3 ) * R 2A+3M+1D
--------------------------------
X=R*R
Area = 12.56636 * X
Volume = 4.18879 * X * R 3A+4M
25
Code Generation
• Usually a two step process
– Generate intermediate code from the
semantic representation of the program
– Generate machine code from the
intermediate code
27
Code Generation
• Abstractions at the source level
identifiers, operators, expressions, statements,
conditionals, iteration, functions (user defined,
system defined or libraries)
29
Code Generation
30
Post translation Optimizations
Instruction selection
– Addressing mode selection
– Opcode selection
– Peephole optimization
32
if
boolean
== = int ;
int b 0 a b
int int int
CMP Cx, 0
CMOVZ Dx,Cx
33
Compiler structure
Compiler
Lexical Syntax Semantic IL code
Code
Optimizer generator
Analysis Analysis Analysis generator
Optimized
code
Source Abstract Unambiguous Target
Token Syntax IL
Program Program Program
stream tree representation code
34
Something is missing
• Information required about the program variables during
compilation
– Class of variable: keyword, identifier etc.
– Type of variable: integer, float, array, function etc.
– Amount of storage required
– Address in the memory
– Scope information
• Location to store this information
– Attributes with the variable (has obvious problems)
– At a central repository and every phase refers to the repository
whenever information is required
• Normally the second approach is preferred
– Use a data structure called symbol table
35
Final Compiler structure
Symbol Table
Compiler
Lexical Syntax Semantic IL code
Code
Optimizer generator
Analysis Analysis Analysis generator
Optimized
code
Source Abstract Unambiguous Target
Token Syntax IL
Program Program Program
stream tree representation code
37
Advantages of the model …
• Compiler is re-targetable
38
Issues in Compiler Design
• Compilation appears to be very simple, but there are
many pitfalls
39
M*N vs M+N Problem
Intermediate Language
F1 B1 F1 B1
F2 B2 F2 B2
IL
F3 B3 F3 B3
FM BN FM BN
40
Universal Intermediate Language
41
How do we know compilers generate
correct code?
• GENERATE compilers
44
Tool based Compiler Development
Source Target
Lexical Parser Semantic IL code Code
Program Analyzer Optimizer generator generator Program
Analyzer
Generator
Generator
Generator
generator
Analyzer
Parser
Lexical
Other phase
Code
Generators
Lexeme Parser
Phase Machine
specs specs
Specifications specifications
45
Bootstrapping
• Compiler is a complex program and should not be
written in assembly language
• How to write compiler for a language in the same
language (first time!)?
• First time this experiment was done for Lisp
• Initially, Lisp was used as a notation for writing
functions.
• Functions were then hand translated into
assembly language and executed
• McCarthy wrote a function eval[e] in Lisp that
took a Lisp expression e as an argument
• The function was later hand translated and it
became an interpreter for Lisp
46
Bootstrap
48
Bootstrapping: Example
• How to develop cc-x335?
• Write a C compiler in C that
emits x335 code
• Compile using gcc-x86 on x86
machine
• We have a C compiler that
emits x335 code
–But runs on x86, not x355
49
Bootstrapping: Example
• We have cc-x86-x335
• Compiler runs on x86, generated code runs
on x355
• Compile the source code of C compiler
with cc-x86-x335
• There it is
• the output is a binary that runs on x335
• this binary is the desired compiler :
cc-x335
Bootstrapping …
• A compiler can be characterized by three languages: the
source language (S), the target language (T), and the
implementation language (I)
L N L N
S S M M
M EQN TROFF EQN TROFF
C C PDP11 PDP11
PDP11
52
Bootstrapping a Compiler
• Suppose LNN is to be developed on a machine M where
LMM is available
L N L N
L L M M
M
L N L N
L L N N
M
53
Bootstrapping a Compiler:
the Complete picture
L N L N
L N L L N N
L L M M
54
Compilers of the 21st Century
• Overall structure of almost all the compilers is similar to
the structure we have discussed
55