0% found this document useful (0 votes)

36 views92 pages

CH 02 - PL

Uploaded by

bersufekad yetera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views92 pages

CH 02 - PL

Uploaded by

bersufekad yetera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Chapter 2: Describing Syntax

and Semantics
Introduction
> Providing a concise yet understandable description of
a programming language is difficult but essential to
the language’s success

> One of the problems in describing a language is the

diversity of the people who must understand the description

> The study of programming languages, like the study of

natural languages, can be divided into examinations of
syntax and semantics.
Introduction…
❒ Using a high-level language for programming has a large
impact on how fast programs can be developed.

❒ The main reasons for this are:

❍ Compared to machine language, the notation used by
programming languages is closer to the way humans think
about problems.
❍ The compiler can spot some obvious programming
mistakes.
❍ Programs written in a high-level language tend to be
shorter than equivalent programs written in machine
language.
❍ The same program can be compiled to many different
machine languages and, hence, be brought to run on
many different machines.
Programs related to compilers
Interpreter
❒ Is a program that reads a source program and executes it
❒ Works by analyzing and executing the source program
commands one at a time
❒ Does not translate the whole source program into object
code
❒ Interpretation is important when:
❍ Programmer is working in interactive mode and needs to view
and update variables
❍ Running speed is not important
❍ Commands have simple formats, and thus can be quickly
analyzed and executed
❍ Modification or addition to user programs is required as
execution proceeds
Programs related to compilers…
Interpreter…

❒ Well-known examples of interpreters:

❍ Basic interpreter, Lisp interpreter, UNIX shell command
interpreter, SQL interpreter, java interpreter…
❒ In principle, any programming language can be either
interpreted or compiled:
❍ Some languages are designed to be interpreted, others are
designed to be compiled
❒ Interpreters involve large overheads:
❍ Execution speed degradation can vary from 10:1 to 100:1
❍ Substantial space overhead may be involved
Programs related to compiler…
Assemblers
❒ Translator for the assembly language.
❒ Assembly code is translated into machine code
❒ Output is relocatable machine code.
Linker
❍ Links object files separately compiled or
assembled
❍ Links object files to standard library functions
❍ Generates a file that can be loaded and executed
Programs related to compiler…
Loader
❒ Loading of the executable codes, which are the outputs
of linker, into main memory.
Pre-processors
❒ A pre-processor is a separate program that is called by
the compiler before actual translation begins.
❒ Such a pre-processor:
• Produce input to a compiler
• can delete comments,
• Macro processing (substitutions)
• include other files...
Programs related to compiler
C or C++ program

Preprocessor

C or C++ program with

macro substitutions
and file inclusions
Compiler

Assembly code
Assembler

Relocatable object
module
Other relocatable Linker
object modules or
library modules Executable code
Loader
Absolute machine code
The translation process
❒ A compiler consists of internally of a number of steps,
or phases, that perform distinct logical operations.
❒ The phases of a compiler are shown in the next slide,
together with three auxiliary components that interact
with some or all of the phases:
❍ The symbol table,
❍ the literal table,
❍ and error handler.

❒ There are two important parts in compilation process:

❍ Analysis and
❍ Synthesis.
The translation process…
Source code
Intermediate code
Literal Scanner generator
table
Intermediate
Tokens code
Symbol Intermediate code
table Parser optimizer

Intermediate
Syntax tree code
Error Target code
handler generator
Semantic
analyzer Target
code
Target code
Annotated optimizer
tree
Target
code
Analysis and Synthesis
Analysis (front end)
❒ Breaks up the source program into constituent pieces and
❒ Creates an intermediate representation of the source
program.
❒ During analysis, the operations implied by the source
program are determined and recorded in hierarchical
structure called a tree.
Synthesis (back end)
❒ The synthesis part constructs the desired program from the
intermediate representation.
Analysis of the source program

Analysis consists of three phases:

❒ Linear/Lexical analysis
❒ Hierarchical/Syntax analysis
❒ Semantic analysis
1. Lexical analysis or Scanning
❒ The stream of characters making up the source program is
read from left to right and is grouped into tokens.
❒ A token is a sequence of characters having a collective
meaning.
❒ A lexical analyzer, also called a lexer or a scanner,
receives a stream of characters from the source program and
groups them into tokens.
❒ Examples: Source Lexical Streams of
program analyzer tokens
• Identifiers
• Keywords
• Symbols (+, -, …)
• Numbers …
❒ Blanks, new lines, tabulation marks will be removed during
lexical analysis.
Lexical analysis or Scanning…
❒ Example
a[index] = 4 + 2;
a identifier
[ left bracket
index identifier
] right bracket
= assignment operator
4 number
+ plus operator
2 number
; semicolon
❒ A scanner may perform other operations along with the
recognition of tokens.
• It may inter identifiers into the symbol table, and
• It may inter literals into literal table.
Lexical Analysis Tools

❒ There are tools available to assist in the writing of

lexical analyzers.
❍ lex - produces C source code (UNIX/linux).
❍ flex - produces C source code (gnu).
❍ JLex - produces Java source code.
❒ We will use Lex.
2. Syntax analysis or Parsing

❒ The parser receives the source code in the form of tokens

from the scanner and performs syntax analysis.
❒ The results of syntax analysis are usually represented by a
parse tree or a syntax tree.
❒ Syntax tree each interior node represents an operation
and the children of the node represent the arguments of the
operation.
❒ The syntactic structure of a programming language is
determined by context free grammar (CFG).

Stream of Syntax Abstract

tokens analyzer syntax tree
Syntax analysis or Parsing…
❒ Ex. Consider again the line of C code: a[index] = 4 + 2
Syntax analysis or Parsing…

❒ Sometimes syntax trees are called abstract syntax trees, since

they represent a further abstraction from parse trees. Example is
shown in the following figure.
Syntax Analysis Tools

❒ There are tools available to assist in the writing

of parsers.
❍ yacc - produces C source code (UNIX/Linux).
❍ bison - produces C source code (gnu).
❍ CUP - produces Java source code.

❒ We will use yacc.

3. Semantic analysis
❒ The semantics of a program are its meaning as opposed
to syntax or structure
❒ The semantics consist of:
❍ Runtime semantics – behavior of program at runtime
❍ Static semantics – checked by the compiler
❒ Static semantics include:
❍ Declarations of variables and constants before use
❍ Calling functions that exist (predefined in a library or defined by
the user)
❍ Passing parameters properly
❍ Type checking.

❒ The semantic analyzer does the following:

❍ Checks the static semantics of the language
❍ Annotates the syntax tree with type information
Semantic analysis…
❒ Ex. Consider again the line of C code: a[index] = 4 + 2

Annotated (integer)
syntax tree
Synthesis of the target program

Intermediate code generator

❒ Intermediate code optimizer
❒ The target code generator
❒ The target code optimizer
Code Improvement
❒ Code improvement techniques can be applied to:
❍ Intermediate code – independent of the target machine
❍ Target code – dependent on the target machine

❒ Intermediate code improvement include:

❍ Constant folding
❍ Elimination of common sub-expressions
❍ Improving loops
❍ Improving function calls
❒ Target code improvement include:
❍ Allocation and use of registers
❍ Selection of better (faster) instructions and addressing
modes
Intermediate code generator
❒ Comes after syntax and semantic analysis
❒ Separates the compiler front end from its backend
❒ Intermediate representation should have 2 important
properties:
❍ Should be easy to produce
❍ Should be easy to translate into the target program
❒ Intermediate representation can have a variety of forms:
❍ Three-address code, P-code for an abstract machine, Tree or
DAG representation

Intermediate code
Abstract syntax Intermediate code
generator

Three address code for the original C expression a[index]=4+2 is:

t1=2
t2 = 4 + t1
a[index] = t2
IC optimizer
❒ An IC optimizer reviews the code, looking for ways to reduce:
❍ the number of operations and
❍ the memory requirements.
❒ A program may be optimized for speed or for size.
❒ This phase changes the IC so that the code generator produces
a faster and less memory consuming program.
❒ The optimized code does the same thing as the original (non-
optimized) code but with less cost in terms of CPU time and
memory space.

Intermediate IC Optimizer Intermediate

code code
IC optimizer…
❒ There are several techniques of optimizing code

❒ Ex. Unnecessary lines of code in loops (i.e. code

that could be executed outside of the loop) are
moved out of the loop.
for(i=1; i<10, i++){
x = y+1;
z = x+i; }
x = y+1;
for(i=1; i<10, i++)
z = x+i;
IC optimizer…
❒ In our previous example, we have included an opportunity
for source level optimization; namely, the expression 4 + 2
can be recomputed by the compiler to the result 6(This
particular optimization is called constant folding).
❒ This optimization can be performed directly on the syntax
tree as shown below.
IC optimizer…
❒ Many optimizations can be performed directly on the tree.
❒ However, in a number of cases, it is easier to optimize a linearized
form of the tree that is closer to assembly code.
❒ A standard choice is Three-address code, so called because it
contains the addresses of up to three locations in memory.
❒ In our example, three address code for the original C expression
might look like this:
t1=2
t2 = 4 + t1
a[index] = t2
Now the optimizer would improve this code in two steps, first
computing the result of the addition
t = 4+2
a[index] = t
❒ And then replacing t by its value to get the three-address statement
a[index] = 6
Code generator
❒ The machine code generator receives the (optimized)
intermediate code, and then it produces either:
❍ Machine code for a specific machine, or
❍ Assembly code for a specific machine and assembler.
❒ Code generator
❍ Selects appropriate machine instructions
❍ Allocates memory locations for variables
❍ Allocates registers for intermediate computations
Code generator…
❒ The code generator takes the IR code and generates code for the
target machine.
❒ Here we will write target code in assembly language: a[index]=6

MOV R0, index ;; value of index -> R0

MUL R0, 2 ;; double value in R0
MOV R1, &a ;; address of a ->R1
ADD R1, R0 ;; add R0 to R1
MOV *R1, 6 ;; constant 6 -> address in R1

❒ &a –the address of a (the base address of the array)

❒ *R1-indirect registers addressing (the last instruction stores the
value 6 to the address contained in R1)
The target code optimizer
❒ In this phase, the compiler attempts to improve the
target code generated by the code generator.
❒ Such improvement includes:
• Choosing addressing modes to improve performance
• Replacing slow instruction by faster ones
• Eliminating redundant or unnecessary operations
❒ In the sample target code given, use a shift instruction to
replace the multiplication in the second instruction.
❒ Another is to use a more powerful addressing mode, such as
indexed addressing to perform the array store.
❒ With these two optimizations, our target code becomes:
MOV R0, index ;; value of index -> R0
SHL R0 ;; double value in R0
MOV &a [R0], 6 ;; constant 6 -> address a + R0
Lexical analysis
Introduction
The role of the lexical analyzer is:
• to read a sequence of characters from the source
program
• group them into lexemes and
• produce as output a sequence of tokens for each
lexeme in the source program.
❒ The scanner can also perform the following
secondary tasks:
❍ stripping out blanks, tabs, new lines
❍ stripping out comments
❍ keep track of line numbers (for error reporting)
Interaction of the Lexical Analyzer
with the Parser

next char next token

lexical Syntax
analyzer analyzer
get next
char get next
token

Source
Program
symbol
table

(Contains a record
for each identifier)

token: smallest meaningful sequence of characters

of interest in source program
Token, pattern, lexeme
❒ A token is a sequence of characters from the source
program having a collective meaning.
❒ A token is a classification of lexical units.
- For example: id and num
❒ Lexemes are the specific character strings that make
up a token.
– For example: abc and 123A
❒ Patterns are rules describing the set of lexemes
belonging to a token.
– For example: “letter followed by letters and digits”
❒ Patterns are usually specified using regular expressions.
[a-zA-Z]*
Example: printf("Total = %d\n", score);
Token, pattern, lexeme…
❒ Example: The following table shows some tokens and
their lexemes in Pascal (a high level, case insensitive
programming language)
Token Some lexemes pattern
begin Begin, Begin, BEGIN, Begin in small or capital
beGin… letters
if If, IF, iF, If If in small or capital letters
ident Distance, F1, x, Dist1,… Letters followed by zero or
more letters and/or digits

• In general, in programming languages, the following are

tokens:
keywords, operators, identifiers, constants, literals,
punctuation symbols…
Attributes of tokens
❒ When more than one pattern matches a lexeme, the
scanner must provide additional information about the
particular lexeme to the subsequent phases of the
compiler.
❒ For example, both 0 and 1 match the pattern for the
token num.
❒ But the code generator needs to know which number is
recognized.
❒ The lexical analyzer collects information about tokens
into their associated attributes.
• Tokens influence parsing decisions;
• Attributes influence the translation of tokens after
parse
Attributes of tokens…
❒ Practically, a token has one attribute:
❍ a pointer to the symbol table entry in which the
information about the token is kept.
❒ The symbol table entry contains various
information about the token
❍ such as its lexeme, type, the line number in which
it was first seen …

Ex. y = 31 + 28 * x, The tokens and their

attributes are written as:
Attributes of tokens…
Errors
❒ Very few errors are detected by the lexical
analyzer.
❒ For example, if the programmer mistakes
ebgin for begin, the lexical analyzer cannot
detect the error since it will consider ebgin as
an identifier.
❒ Nonetheless, if a certain sequence of
characters follows none of the specified
patterns, the lexical analyzer can detect the
error.
Errors…
❒ When an error occurs, the lexical analyzer
recovers by:
❍ skipping (deleting) successive characters from the
remaining input until the lexical analyzer can find a
well-formed token (panic mode recover)
❍ deleting one character from the remaining input
❍ inserting missing characters in to the remaining input
❍ replacing an incorrect character by a correct
character
❍ transposing two adjacent characters
Specification of patterns using
regular expressions

❒ Regular expressions
❒ Regular expressions for tokens
Regular expression: Definitions

❒ Represents patterns of strings of characters.

❒ An alphabet Σ is a finite set of symbols
(characters)
❒ A string s is a finite sequence of symbols
from Σ
❍ |s| denotes the length of string s
❍ ε denotes the empty string, thus |ε| = 0
❒ A language L is a specific set of strings over
some fixed alphabet Σ
Regular expressions…
❒ A regular expression is one of the following:
Symbol: a basic regular expression consisting of a single
character a, where a is from:
an alphabet Σ of legal characters;
the metacharacter ε: or
the metacharacter ø.
In the first case, L(a)={a};
in the second case, L(ε)= { ε};
in the third case, L(ø)= { }.
{} – contains no string at all.
{ε} – contains the single string consists of no character
Regular expressions…
❒ Alternation: an expression of the form r|s, where r
and s are regular expressions.
❍ In this case , L(r|s) = L(r) U L(s) ={r,s}

❒ Concatenation: An expression of the form rs, where r

and s are regular expressions.
❍ In this case, L(rs) = L(r)L(s)={rs}
❒ Repetition: An expression of the form r*, where r is a
regular expression.
❍ In this case, L(r*) = L(r)* ={ε, r,…}
Regular expression: Language Operations

❒ Union of L and M
❍L ∪ M = {s |s ∈ L or s ∈ M}
❒ Concatenation of L and M
❍ LM = {xy | x ∈ L and y ∈ M}
❒ Exponentiation of L
❍ L0 = {ε}; Li = Li-1L The following shorthands
are often used:
❒ Kleene closure of L
❍ L* = ∪i=0,…,∞ Li r+ =rr*
r* = r+| ε
❒ Positive closure of L
r? =r|ε
❍ L+ = ∪i=1,…,∞ Li
Examples

L1={a,b,c,d} L2={1,2}
L1 ∪ L2={a,b,c,d,1,2}
L1L2={a1,a2,b1,b2,c1,c2,d1,d2}
L1*=all strings of letter a,b,c,d and empty string.
L1+= the set of all strings of one or more letter a,b,c,d,
empty string not included
Regular expressions…
❒ Examples (more):
1- a | b = {a,b}
2- (a|b)a = {aa,ba}
3- (ab) | ε ={ab, ε}
4- ((a|b)a)* = {ε, aa,ba,aaaa,baba,....}
❒ Reverse
1 – Even binary numbers (0|1)*0
2 – An alphabet consisting of just three alphabetic
characters: Σ = {a, b, c}. Consider the set of all strings
over this alphabet that contains exactly one b.
(a | c)*b(a|c)* {b, abc, abaca, baaaac, ccbaca, cccccb}
Regular expressions for tokens

❒ Regular expressions are used to specify the

patterns of tokens.
❒ Each pattern matches a set of strings. It falls into
different categories:
❒ Reserved (Key) words: They are represented by
their fixed sequence of characters,
❒ Ex. If, while and do....
❒ If we want to collect all the reserved words into
one definition, we could write it as follows:
Reserved = if | while | do |...
Regular expressions for tokens…
❒ Special symbols: including arithmetic operators,
assignment and equality such as =, :=, +, -, *
❒ Identifiers: which are defined to be a sequence of
letters and digits beginning with letter,
❒ we can express this in terms of regular definitions as
follows:
letter = A|B|…|Z|a|b|…|z
digit = 0|1|…|9
or
letter= [a-zA-Z]
digit = [0-9]
identifiers = letter(letter|digit)*
Regular expressions for tokens…
❒ Numbers: Numbers can be:
❍ sequence of digits (natural numbers), or
❍ decimal numbers, or
❍ numbers with exponent (indicated by an e or E).
❒ Example: 2.71E-2 represents the number 0.0271.
❒ We can write regular definitions for these numbers as
follows:
nat = [0-9]+
signedNat = (+|-)? Nat
number = signedNat(“.” nat)?(E signedNat)?
❒ Literals or constants: which can include:
❍ numeric constants such as 42, and
❍ string literals such as “ hello, world”.
Regular expressions for tokens…

❒ relop < | <= | = | <> | > | >=

❒ Comments: Ex. /* this is a C comment*/
❒ Delimiter newline | blank | tab | comment
❒ White space = (delimiter )+
Recognition of tokens
a grammar for branching Patterns for tokens using
statements and conditional regular expressions
expressions
digit [0-9]
nat digit+
stmt if expr then stmt signednat (+|-)?nat
| if expr then stmt else stmt numbersignednat(“.”nat)?(E signednat)?
|ε letter [A-Za-z]
idletter(letter|digit)*
expr term relop term | term ifif
term id | number then then
elseelse
relop <|>|<=|>=|=|<>
For this language, the lexical analyzer will recognize:
the keywords if, then, else
Lexemes that match the patterns for relop, id, number
ws (blank | tab | newline)+
Recognition of tokens…
Tokens, their patterns, and attribute values
Transition diagram that recognizes the lexemes
matching the token relop and id.
Coding…
token nexttoken() case 9: c = nextchar();
{ while (1) { if (isletter(c)) state = 10;
switch (state) { else state = fail();
case 0: c = nextchar(); break;
if (c==blank || c==tab || c==newline) case 10: c = nextchar();
{ if (isletter(c)) state = 10;
state = 0; else if (isdigit(c)) state = 10;
lexeme beginning++; else state = 11;
} break;
else if (c==‘<’) state = 1;
else if (c==‘=’) state = 5;
else if (c==‘>’) state = 6;
else state = fail();
break;
case 1: c = nextchar();
…
Design of a Lexical Analyzer/Scanner
Finite Automata
Lex – turns its input program into lexical analyzer.
At the heart of the transition is the formalism known as
finite automata.
Finite automata is graphs, like transition diagrams, with a
few differences:
1. Finite automata are recognizers; they simply say "yes" or
"no" about each possible input string.
2. Finite automata come in two flavors:
a) Nondeterministic finite automata (NFA) have no restrictions
on the labels of their edges.
ε, the empty string, is a possible label.
b) Deterministic finite automata (DFA) have, for each state,
and for each symbol of its input alphabet exactly one edge
with that symbol leaving that state.
The Whole Scanner Generator Process
Overview
Direct construction of Nondeterministic finite
Automation (NFA) to recognize a given regular
expression.
Easy to build in an algorithmic way
Requires ε-transitions to combine regular sub expressions
Construct a deterministic finite automation
(DFA) to simulate the NFA Optional
Use a set-of-state construction
Minimize the number of states in the DFA
Generate the scanner code.
Design of a Lexical Analyzer …
❒ Token Pattern
❒ Pattern Regular Expression
❒ Regular Expression NFA
❒ NFA DFA
❒ DFA’s or NFA’s for all tokens Lexical Analyzer
Non-Deterministic Finite Automata
(NFA)
Definition
❒ An NFA M consists of five tuples: ( Σ,S, T, s0, F)
❍ A set of input symbols Σ, the input alphabet
❍ a finite set of states S,
❍ a transition function T: S × (Σ U { ε}) -> S (next state),
❍ a start state s0 from S, and
❍ a set of accepting/final states F from S.
❒ The language accepted by M, written L(M), is defined as:
The set of strings of characters c1c2...cn with each ci from
Σ U { ε} such that there exist states s1 in T(s0,c1), s2 in
T(s1,c2), ... , sn in T(sn-1,cn) with sn an element of F.
NFA…
❒ It is a finite automata which has choice of
edges
• The same symbol can label edges from one state to
several different states.
❒ An edge may be labeled by ε, the empty
string
• We can have transitions without any input
character consumption.
Transition Graph
❒ The transition graph for an NFA recognizing the
language of regular expression (a|b)*abb
all strings of a's and b's ending in the
particular string abb
a

start a b b
0 1 2 3

b S={0,1,2,3}
Σ={a,b}
S0=0
F={3}
32
Transition Table
❒ The mapping T of an NFA can be represented
in a transition table
State Input Input Input
a b ε
0 {0,1} {0} ø

T(0,a) = {0,1} 1 ø {2} ø

T(0,b) = {0}
T(1,b) = {2} 2 ø {3} ø
T(2,b) = {3}
3 ø ø ø

The language defined by an NFA is the set of input

strings it accepts, such as (a|b)*abb for the example
NFA
Acceptance of input strings by NFA
❒ An NFA accepts input string x if and only if there is
some path in the transition graph from the start
state to one of the accepting states
❒ The string aabb is accepted by the NFA:

a a b b
0 0 1 2 3 YES

a a b b
0 0 0 0 0 NO
Another NFA
a

a
ε

start
b
b
ε

An ε-transition is taken without consuming any character from

the input.
What does the NFA above accepts?

aa*|bb*
Deterministic Finite Automata (DFA)

❒ A deterministic finite automaton is a special

case of an NFA
❍ No state has an ε-transition
❍ For each state S and input symbol a there is at
most one edge labeled a leaving S
❒ Each entry in the transition table is a single state
❍ At most one path exists to accept a string
❍ Simulation algorithm is simple
DFA example
A DFA that accepts (a|b)*abb
Simulating a DFA: Algorithm
How to apply a DFA to a string.
INPUT:
❒ An input string x terminated by an end-of-file character
eof.
❒ A DFA D with start state So, accepting states F, and
transition function move.
OUTPUT: Answer ''yes" if D accepts x; "no" otherwise
METHOD
❒ Apply the algorithm in (next slide) to the input string x.
❒ The function move(s, c) gives the state to which there is
an edge from state s on input c.
❒ The function nextChar() returns the next character of
the input string x.
Simulating a DFA
s = so;
c = nextchar();
while ( c != eof ) {
s = move(s, c);
c = nextchar();
}
if ( s is in F ) return
"yes";
DFA accepting (a|b)*abb
else return "no";

Given the input string ababb, this DFA enters the

sequence of states 0,1,2,1,2,3 and returns "yes"
DFA: Exercise

❒ Draw DFAs for the string matched by the

following definition:
digit =[0-9]
nat=digit+
signednat=(+|-)?nat
number=signednat(“.”nat)?(E signedNat)?
Design of a Lexical Analyzer Generator

Regular Expression DFA

Two algorithms:
1- Translate a regular expression into an NFA
(Thompson’s construction)

2- Translate NFA into DFA

(Subset construction)
From regular expression to an NFA
❒ It is known as Thompson’s construction.

Rules:
1- For an ε , a regular expressions, construct:

start a,ε
0 1
From regular expression to an NFA…
2- For a composition of regular expression:
❒ Case 1: Alternation: regular expression(s|r), assume
that NFAs equivalent to r and s have been
constructed.
From regular expression to an NFA…
❒ Case 2: Concatenation: regular expression sr

…r …s

Case 3: Repetition r*
From RE to NFA:Exercises

❒ Construct NFA for token identifier.

letter(letter|digit)*
❒ Construct NFA for the following regular
expression:
(a|b)*abb
From an NFA to a DFA
(subset construction algorithm)

❒ Input NFA N Both accept the same

Output DFA D language usage (RE)

Rules:
❒ Start state of D is assumed to be unmarked.
❒ Start state of D is = ε-closer (S0),
where S0 -start state of N.
NFA to a DFA…
ε- closure
ε-closure (S’) – is a set of states with the following
characteristics:
1- S’ € ε-closure(S’) itself
2- if t € ε-closure (S’) and if there is an edge labeled
ε from t to v, then v € ε-closure (S’)
3- Repeat step 2 until no more states can be added
to ε-closure (S’).
E.g: for NFA of (a|b)*abb
ε-closure (0)= {0, 1, 2, 4, 7}
ε-closure (1)= {1, 2, 4}
NFA to a DFA…
Algorithm
While there is unmarked state
X = { s0, s1, s2,..., sn} of D do
Begin
Mark X
For each input symbol ‘a’ do
Begin
Let T be the set of states to which there is a transition ‘a’ from state si
in X.
Y= ε-Closer (T)
If Y has not been added to the set of states of D then {
Mark Y an “Unmarked” state of D add a transition from X to Y labeled a
if not already presented
}
End
End
NFA for identifier: letter(letter|digit)*
ε

letter
3 4
ε ε
start
letter ε ε
0 1 2 7 8
digit ε
ε 5 6

ε
NFA to a DFA…
Example: Convert the following NFA into the corresponding
DFA. letter (letter|digit)*
A={0}
B={1,2,3,5,8}
start letter C={4,7,2,3,5,8}
A B
D={6,7,8,2,3,5}

letter digit
letter
digit D digit
C

letter
Exercise: convert NFA of (a|b)*abb in to DFA.

51
Other Algorithms

❒ How to minimize a DFA ? (see Dragon Book

3.9, pp.173)
❒ How to convert RE to DFA directly ? (see
Dragon Book 3.9.5 pp.179)
General Compiler Infra-structure
Parse tree
Program source Tokens Parser
Scanner Semantic
(tokenizer) Routines
(stream of
characters) Annotated/decorated
tree

Analysis/
Transformations/
Symbol and optimizations
literal Tables
IR: Intermediate
Representation

Code
Generator

Assembly code
Generating a Lexical Analyzer using Lex
Lex is a scanner generator ----- it takes lexical specification as
input, and produces a lexical analyzer written in C.

Lex source
program Lex compiler lex.yy.c
lex.l

lex.yy.c
C compiler a.out

Input stream Sequence of

a.out tokens

Lexical Analyzer
Pattern matching examples
Assignment 2

Assignment on Lexical Analyzer

The MINI Language Introduction
❒ Assumptions:
❍ Source code – MINI language
❍ Target code – Assembly language

❒ Specifications:
There are no procedures and declarations.
All variables are integer variables, and variables are
declared simply by assigning values to them.
There are only two control statements:
An if – statement and
A repeat statement
Both the control statements may themselves
contain statement sequences.
The MINI Language Introduction...
An if – statement has an optional else part and must
be terminated by the key word end.
There are also read and write statements that
perform input/output.
Comments are allowed with curly brackets,
comments cannot be nested.
Expression in MINI are also limited to Boolean and
integer arithmetic expressions.
A Boolean expressions consists of a comparison of
two arithmetic expressions using either of the two
comparison operators < and =.
The TINY Language...
An arithmetic expression may involve integer constants,
variables, parenthesis, and any of the four integer
operators +, -, *, and / (integer division).
Boolean expressions may appear only as tests in
control statements – i.e. There are no Boolean
variables, assignment, or I/O.
Here is a sample program in this language for factorial
function.
{ sample program
in MINI language – computes factorials
}
read x; { input an integer }
if x > 0 then { don’t compute if x<= 0}
fact:= 1;
repeat
fact := fact * x ;
X:= x-1
until x = 0;
write fact { output factorial of x}
end
The MINI Language...
❒ In addition, MINI has the following
lexical conventions:
Comments : are enclosed in curly brackets {...} and
cannot be nested.
White space : consists of blanks, tabs, and
newlines.
Design a scanner for MINI language

❒ In designing a scanner for this language:

1. Start with regular expressions
2. Identify Tokens...
3. Develop and simulate NFA
4. Construct and simulate DFA

1995 FDRE Constitution (English and Amharic Version)
82% (11)
1995 FDRE Constitution (English and Amharic Version)
38 pages
D332 PG30A Parameters List E1 en CH
No ratings yet
D332 PG30A Parameters List E1 en CH
4 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
LE Traverse Calculation Release 5 v5.0
100% (1)
LE Traverse Calculation Release 5 v5.0
280 pages
Swift Algorithms Data Structures
No ratings yet
Swift Algorithms Data Structures
50 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Theory of Computation BSC CSIT Old Questions
No ratings yet
Theory of Computation BSC CSIT Old Questions
7 pages
CD Module 1 Cambridge
No ratings yet
CD Module 1 Cambridge
136 pages
Chapter-1 Machine
No ratings yet
Chapter-1 Machine
61 pages
Jimma Institute of Technology: Chapter 1 Introduction To OOP
100% (1)
Jimma Institute of Technology: Chapter 1 Introduction To OOP
24 pages
Unit 1
No ratings yet
Unit 1
37 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Ass Java
No ratings yet
Ass Java
6 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
m433-نظرية المترجمات د عبدالباقي
No ratings yet
m433-نظرية المترجمات د عبدالباقي
146 pages
Senior iOS Developer Swift Que-Ans (2025 Edition)
No ratings yet
Senior iOS Developer Swift Que-Ans (2025 Edition)
36 pages
Lecture#1 2
No ratings yet
Lecture#1 2
54 pages
Accounting Information Systems An Overview
No ratings yet
Accounting Information Systems An Overview
62 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Cgu Java Syllabus
No ratings yet
Cgu Java Syllabus
6 pages
The Special Theory of Relativity: Author
No ratings yet
The Special Theory of Relativity: Author
27 pages
The Role of Microfinance in Women's Empowerment: A Comparative Study of Rural & Urban Groups in India
No ratings yet
The Role of Microfinance in Women's Empowerment: A Comparative Study of Rural & Urban Groups in India
26 pages
Unit 1 Introduction To Compiler 1. Introduction To Compiler
No ratings yet
Unit 1 Introduction To Compiler 1. Introduction To Compiler
134 pages
7.1 Solving Linear Differential Equations
No ratings yet
7.1 Solving Linear Differential Equations
44 pages
Intro To Compilers
No ratings yet
Intro To Compilers
77 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
68 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
CD Notes
No ratings yet
CD Notes
69 pages
Unit 1 Part 3 - Compiler
No ratings yet
Unit 1 Part 3 - Compiler
45 pages
Unit 1
No ratings yet
Unit 1
29 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Lec - 1. INTRODUCTION
No ratings yet
Lec - 1. INTRODUCTION
39 pages
Compiler Construction: Lecture 1 - An Overview
No ratings yet
Compiler Construction: Lecture 1 - An Overview
30 pages
Page 1 of 44
No ratings yet
Page 1 of 44
44 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
BR Phaseisofcompiler Presention
No ratings yet
BR Phaseisofcompiler Presention
30 pages
Chapter 6 Compiler Phases
No ratings yet
Chapter 6 Compiler Phases
20 pages
Compiler 1
No ratings yet
Compiler 1
28 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
1.lecture Notes 19 Apil
No ratings yet
1.lecture Notes 19 Apil
26 pages
Debre Markos University Burie Campus Departement of Computer Science
No ratings yet
Debre Markos University Burie Campus Departement of Computer Science
44 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
1 Compiler Design Lect1
No ratings yet
1 Compiler Design Lect1
28 pages
Lecture 1 - Ch1. Introduction To Compiler
No ratings yet
Lecture 1 - Ch1. Introduction To Compiler
29 pages
MCQ Unit 1 2 3 4 5 QB Pps Up No Info
No ratings yet
MCQ Unit 1 2 3 4 5 QB Pps Up No Info
60 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
66 pages
Principles of Compiler Design: Million G/her
No ratings yet
Principles of Compiler Design: Million G/her
40 pages
Phases of Compiler
No ratings yet
Phases of Compiler
36 pages
Lecture 08 Language Translation PDF
No ratings yet
Lecture 08 Language Translation PDF
11 pages
Lec#1
No ratings yet
Lec#1
36 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
CPM-Critical Path Method: Compiled by - KRISHNA KUMAR, Mtech (Production Engineering) Sse/C&W/Smvb
No ratings yet
CPM-Critical Path Method: Compiled by - KRISHNA KUMAR, Mtech (Production Engineering) Sse/C&W/Smvb
74 pages
Experiment No 6 - DONE
No ratings yet
Experiment No 6 - DONE
8 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Compiler
No ratings yet
Compiler
15 pages
Lecture 10 - Chomsky Normal Form
No ratings yet
Lecture 10 - Chomsky Normal Form
75 pages
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
No ratings yet
TK3163 Sem2 2020 1MyCh1.1-1.2 Intro-20200211121547
39 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
CH 1
No ratings yet
CH 1
23 pages
CD 1
No ratings yet
CD 1
15 pages
C Language (1) - 11235
No ratings yet
C Language (1) - 11235
79 pages
Compiler CH1
No ratings yet
Compiler CH1
24 pages
Compiler Construction Course
No ratings yet
Compiler Construction Course
12 pages
Unit 1
No ratings yet
Unit 1
29 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
621 C++ Mcqs CA Test 1
No ratings yet
621 C++ Mcqs CA Test 1
23 pages
Introduction To Compiler
No ratings yet
Introduction To Compiler
10 pages
CSE 241 Algorithms Midterm
No ratings yet
CSE 241 Algorithms Midterm
21 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
CD - Unit-1
No ratings yet
CD - Unit-1
10 pages
CS244 Final Exam
No ratings yet
CS244 Final Exam
4 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
13.logo Language Solving Turtle Questions
No ratings yet
13.logo Language Solving Turtle Questions
13 pages
Stratus AXI v2.0 en
No ratings yet
Stratus AXI v2.0 en
38 pages
Image Segmentation Using Deep Learning A Survey
No ratings yet
Image Segmentation Using Deep Learning A Survey
20 pages
Compiler Design and Implementation
No ratings yet
Compiler Design and Implementation
5 pages
WORD EMBEDDING Project
No ratings yet
WORD EMBEDDING Project
15 pages
Paper Questions
No ratings yet
Paper Questions
16 pages
Programming Concepts
No ratings yet
Programming Concepts
5 pages
Compiler CT
No ratings yet
Compiler CT
4 pages
Oops Alv in Abap
No ratings yet
Oops Alv in Abap
7 pages
BB 1
No ratings yet
BB 1
12 pages
University of Karachi: Assignment #
No ratings yet
University of Karachi: Assignment #
1 page
TLM-1 Implementation
No ratings yet
TLM-1 Implementation
6 pages
Education Is The Key To Success - Computer Science For Class IX (Science) - Target Papers 2025 - by Sir Sajjad Akber ChandioSSC Part 1 and 2.merged
No ratings yet
Education Is The Key To Success - Computer Science For Class IX (Science) - Target Papers 2025 - by Sir Sajjad Akber ChandioSSC Part 1 and 2.merged
8 pages
Introduction To Progarmming: Sample Online Bits: Mid-1 1 B.Tech, Sem: 1
No ratings yet
Introduction To Progarmming: Sample Online Bits: Mid-1 1 B.Tech, Sem: 1
8 pages
Prim's Minimum Spanning Tree Algorithm
No ratings yet
Prim's Minimum Spanning Tree Algorithm
7 pages
Exemple SDA Lab Alg Sortare Cautare
No ratings yet
Exemple SDA Lab Alg Sortare Cautare
7 pages