Structure and phases of a compiler
Structure and phases of a compiler
1
Introduction to Compilers
2
Introduction
3
Theory of Computer Languages
4
Natural Language vs. Formal Language
5
Natural Language vs. Formal Language
(Contd.)
For the user, the following two sentences:
THE MOUSE CAUGHT THE CAT and
THE CAT CAUGHT THE MOUSE
are syntactically correct but semantically (meaning-wise) these are
incorrect.
6
Language and Grammar
Grammar means a set of rules governing a language Components
of the language
• Character Set
• Words (strings or tokens)
• Sentence
The sentence has different parts such as noun, verb, etc.
Consider a sentence:
The boy throws a ball
7
Parts of a sentence
8
Grammar Notations and Conventions
9
Grammar Notations and Conventions
(Contd.)
A generative grammar (G) can be formally defined as G = (VN, VT, R, S) such that:
Formal grammar G = (VN, VT, R, S) such that VN and VT are the finite set of symbols
where, VN VT = empty
R is a set of pairs (P, Q) such that
a) P in (VN È VT)+
b) Q in (VN È VT)*
S in VN
R is the rewriting rule represented in the form P :: Q or P Q. We say P produces Q.
10
Hierarchy of Formal Languages
11
Comparison of the grammar and its
generated languages
12
Features of a Good Language
• Easy to understand
• Expressive power
• Interoperability
• Good turnaround time
• Portability
• Automatic error recovery
• Good error reporting
13
Features of a Good Language (Contd.)
14
Representation of Languages
15
Grammar of a Language
Se id = expr
expr expr + term
expr term
term term * fact
term fact
fact (expr)
fact id
fact const
16
History of Compilers
Language Year Introduced by Place
lambda calculus 1930 Alonzo Church Princeton
Stephen Cole University
Kleene
FORTRAN 1954-1957 John Backus IBM
Simula 1960 Ole-Johan Dahl Norwegian
and Kristen Computing Center
Nygaard
Smalltalk 1970 Alan Kay Xerox PARC
LISP 1958 Jhon McCarthy Massa chusetts
Institute of
Technology
Prolog 1970 Alain Colmerauer Marseille, France
Eiffel 1980s Bertrand Meyer
17
History of Compilers (Contd.)
Language Year Introduced by Place
Ada 1977 to 1983 Jean David Ichbiah CII Honeywell Bull
18
History of Compilers (Contd.)
19
Development of Compilers
The typical issues associated from the language developer
perspective are:
• Where the source code is available (from keyboard, file, or
socket etc)
• What are the types of data supported?
• Basic data types (Boolean, Character, Integer, Real, etc)
• Qualified data types (short, long, signed, unsigned etc)
•Derived data types (record, 1-D array, 2-D array, File,
pointers etc)
• Types of constants (Boolean, char, string etc)
• How variables are represented?
• Size of each data supported
• Scope of variables (static, dynamic)
20
Development of Compilers (Contd.)
21
Development of Compilers (Contd.)
The typical issues associated with respect to the
compilers are:
• How to read the source code?
• How to represent the source code?
• How to separate the tokens?
•What are the data structures for storing variable
information?
•How to store them in memory (code, stack, or
heap area)?
•How to manage the storage during the run time?
•How to prepare the errors linked with multiple
lines?
22
Development of Compilers (Contd.)
23
Compiler—At a glance
24
Structure of a Compiler
25
Phases of a Compiler
26
Phases of a Compiler (Contd.)
27
Phases of a Compiler (Contd.)
28
Lexical Analysis
Scans the source code character by character delimited by some white
space characters, operators and punctuators and separate the tokens.
29
Lexical Analysis: Token Representation
30
Syntactic Analysis
Types of Statements
Declarative statement
Assignment statement (Sa)
Lval = Expression
Control statement
Selective
If statement (Sif)
If-then-else statement (Sie)
Switch.. Case (Ssc)
Iterative statement
For statement (Sfor)
While statement (Swhile)
Repeat while or Do while statement
(Sdw)
Goto statement (Sgo)
IO statement (Sio)
31
Syntactic Analysis: Syntax for Arithmetic
Statement
32
Syntactic Analysis: Syntax for Arithmetic
Statement (Contd.)
Sa Id = Expr
Sif if Se
Sie if Se else Se
Ssc switch Expr Scase
Sfor for Exprinit Exprcheck Exprincrdcr Se
Swhile while Expr Se
Sdw do Se while Expr
Sgo goto L
Scase Scase case Expr Se | ε
33
Top-down Parser
Beginning with the start symbol and expanding
(producing) till the given sentence is produced
Example:
Given Sentence: if (a<10) c=a+b else c=a-b
34
Bottom-up Parser
35
Bottom-up Parsing — An Example
Grammar for arithmetic expression
Expr Expr + Term
Expr Term
Term Term * Factor
Term Factor
Factor (Expr)
Factor Id
Factor Const
Given sentence to be parsed: A+B
A+B id + id Id+Id Factor + Id
Term+ Id
Expr + Id
Expr + Factor
Expr + Term
Expr … Which is start symbol for the expression
36
Ambiguous Grammar
37
Semantic Analysis
38
Intermediate Code Generation (IC)
39
IC-Compilation process without IC
40
IC-Compilation process with IC
41
IC-Advantages
42
IC-Types
Syntax Trees
Three address code
Quadruple
Triple
Indirect Triple
Any valid and usable IC
43
IC-Example with Syntax Tree
Example: A + B * C
Id *
A Id Id
B C
44
Code Optimization
45
Code Optimization (Contd.)
46
Code Generation
47
Code Generation: Example
Given sentence: D=A + B * C
Intermediate code:
T1 = B * C
T2 = A + T1
D = T2
T1 = B * C MOV R1, B
MOV R2, C
ADD R1, R2
MOV T1, R1
T2 = A + T1 MOV R1, A
MOV R2, T1
ADD R1, R2
MOV T2, R1
D = T2 MOV R1, T2
MOV D, R1
48
Code Generation: Issues
• Register Allocation
• Register Scheduling
• Code selection
• Addressing modes
• Instruction format
• Power of Instructions
• Optimization at the machine code level
• Back patching
49
Symbol Table: Management
50
Symbol Table: Attributes associated with
symbols
• Name
• Type
• Size
• Location
• Array of records
• Linked list of records
• Tree of records
• Hash data structure etc
51
Error Management
52
Error Management (Contd.)
Lexical errors
• Caused due to misspelling
• Juxtaposing of characters
Syntax errors (Context free grammar error)
• Un-balanced parenthesis
• Undeclared variables
• Missing punctuation operators
Semantic errors (Context sensitive grammar error)
• Assignment of incompatible data types
• Truncation of results
Un-reachable code
53
Key Terms
54