CH 02 - PL
CH 02 - PL
and Semantics
Introduction
> Providing a concise yet understandable description of
a programming language is difficult but essential to
the language’s success
Preprocessor
Assembly code
Assembler
Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
The translation process
❒ A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
❒ The phases of a compiler are shown in the next slide,
together with three auxiliary components that interact
with some or all of the phases:
❍ The symbol table,
❍ the literal table,
❍ and error handler.
Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
Analysis and Synthesis
Analysis (front end)
❒ Breaks up the source program into constituent pieces and
❒ Creates an intermediate representation of the source
program.
❒ During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
Synthesis (back end)
❒ The synthesis part constructs the desired program from the
intermediate representation.
Analysis of the source program
Annotated (integer)
syntax tree
Synthesis of the target program
Intermediate code
Abstract syntax Intermediate code
generator
Source
Program
symbol
table
(Contains a record
for each identifier)
❒ Regular expressions
❒ Regular expressions for tokens
Regular expression: Definitions
❒ Union of L and M
❍L ∪ M = {s |s ∈ L or s ∈ M}
❒ Concatenation of L and M
❍ LM = {xy | x ∈ L and y ∈ M}
❒ Exponentiation of L
❍ L0 = {ε}; Li = Li-1L The following shorthands
are often used:
❒ Kleene closure of L
❍ L* = ∪i=0,…,∞ Li r+ =rr*
r* = r+| ε
❒ Positive closure of L
r? =r|ε
❍ L+ = ∪i=1,…,∞ Li
Examples
L1={a,b,c,d} L2={1,2}
L1 ∪ L2={a,b,c,d,1,2}
L1L2={a1,a2,b1,b2,c1,c2,d1,d2}
L1*=all strings of letter a,b,c,d and empty string.
L1+= the set of all strings of one or more letter a,b,c,d,
empty string not included
Regular expressions…
❒ Examples (more):
1- a | b = {a,b}
2- (a|b)a = {aa,ba}
3- (ab) | ε ={ab, ε}
4- ((a|b)a)* = {ε, aa,ba,aaaa,baba,....}
❒ Reverse
1 – Even binary numbers (0|1)*0
2 – An alphabet consisting of just three alphabetic
characters: Σ = {a, b, c}. Consider the set of all strings
over this alphabet that contains exactly one b.
(a | c)*b(a|c)* {b, abc, abaca, baaaac, ccbaca, cccccb}
Regular expressions for tokens
start a b b
0 1 2 3
b S={0,1,2,3}
Σ={a,b}
S0=0
F={3}
32
Transition Table
❒ The mapping T of an NFA can be represented
in a transition table
State Input Input Input
a b ε
0 {0,1} {0} ø
a a b b
0 0 1 2 3 YES
a a b b
0 0 0 0 0 NO
Another NFA
a
a
ε
start
b
b
ε
aa*|bb*
Deterministic Finite Automata (DFA)
Two algorithms:
1- Translate a regular expression into an NFA
(Thompson’s construction)
Rules:
1- For an ε , a regular expressions, construct:
start a,ε
0 1
From regular expression to an NFA…
2- For a composition of regular expression:
❒ Case 1: Alternation: regular expression(s|r), assume
that NFAs equivalent to r and s have been
constructed.
From regular expression to an NFA…
❒ Case 2: Concatenation: regular expression sr
…r …s
Case 3: Repetition r*
From RE to NFA:Exercises
Rules:
❒ Start state of D is assumed to be unmarked.
❒ Start state of D is = ε-closer (S0),
where S0 -start state of N.
NFA to a DFA…
ε- closure
ε-closure (S’) – is a set of states with the following
characteristics:
1- S’ € ε-closure(S’) itself
2- if t € ε-closure (S’) and if there is an edge labeled
ε from t to v, then v € ε-closure (S’)
3- Repeat step 2 until no more states can be added
to ε-closure (S’).
E.g: for NFA of (a|b)*abb
ε-closure (0)= {0, 1, 2, 4, 7}
ε-closure (1)= {1, 2, 4}
NFA to a DFA…
Algorithm
While there is unmarked state
X = { s0, s1, s2,..., sn} of D do
Begin
Mark X
For each input symbol ‘a’ do
Begin
Let T be the set of states to which there is a transition ‘a’ from state si
in X.
Y= ε-Closer (T)
If Y has not been added to the set of states of D then {
Mark Y an “Unmarked” state of D add a transition from X to Y labeled a
if not already presented
}
End
End
NFA for identifier: letter(letter|digit)*
ε
letter
3 4
ε ε
start
letter ε ε
0 1 2 7 8
digit ε
ε 5 6
ε
NFA to a DFA…
Example: Convert the following NFA into the corresponding
DFA. letter (letter|digit)*
A={0}
B={1,2,3,5,8}
start letter C={4,7,2,3,5,8}
A B
D={6,7,8,2,3,5}
letter digit
letter
digit D digit
C
letter
Exercise: convert NFA of (a|b)*abb in to DFA.
51
Other Algorithms
Analysis/
Transformations/
Symbol and optimizations
literal Tables
IR: Intermediate
Representation
Code
Generator
Assembly code
Generating a Lexical Analyzer using Lex
Lex is a scanner generator ----- it takes lexical specification as
input, and produces a lexical analyzer written in C.
Lex source
program Lex compiler lex.yy.c
lex.l
lex.yy.c
C compiler a.out
Lexical Analyzer
Pattern matching examples
Assignment 2
❒ Specifications:
There are no procedures and declarations.
All variables are integer variables, and variables are
declared simply by assigning values to them.
There are only two control statements:
An if – statement and
A repeat statement
Both the control statements may themselves
contain statement sequences.
The MINI Language Introduction...
An if – statement has an optional else part and must
be terminated by the key word end.
There are also read and write statements that
perform input/output.
Comments are allowed with curly brackets,
comments cannot be nested.
Expression in MINI are also limited to Boolean and
integer arithmetic expressions.
A Boolean expressions consists of a comparison of
two arithmetic expressions using either of the two
comparison operators < and =.
The TINY Language...
An arithmetic expression may involve integer constants,
variables, parenthesis, and any of the four integer
operators +, -, *, and / (integer division).
Boolean expressions may appear only as tests in
control statements – i.e. There are no Boolean
variables, assignment, or I/O.
Here is a sample program in this language for factorial
function.
{ sample program
in MINI language – computes factorials
}
read x; { input an integer }
if x > 0 then { don’t compute if x<= 0}
fact:= 1;
repeat
fact := fact * x ;
X:= x-1
until x = 0;
write fact { output factorial of x}
end
The MINI Language...
❒ In addition, MINI has the following
lexical conventions:
Comments : are enclosed in curly brackets {...} and
cannot be nested.
White space : consists of blanks, tabs, and
newlines.
Design a scanner for MINI language