0% found this document useful (0 votes)
20 views447 pages

Ilovepdf Merged

Uploaded by

Devanshu Tanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views447 pages

Ilovepdf Merged

Uploaded by

Devanshu Tanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 447

Compiler Design (CD)

GTU # 3170701

Unit – 1
Overview of the Compiler
&
it’s Structure

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Language Processor
• Translator
• Analysis synthesis model of compilation
• Phases of compiler
• Grouping of the Phases
• Difference between compiler & interpreter
• Context of compiler (Cousins of compiler)
• Pass structure
• Types of compiler
• The Science of building a compiler
What do you see?
10101001 00000000 10101001 00000000
10101001 00000000 10101001 00000000
10000101 0000000 110101001 00000000 Binary
10101001 00000010 10101001 00000010 program
10000101 00000010 10101001 00000010
10100000 00000000 10101001 00000010
10101001 0000000 110101001 00000010 What does it
10010001 00000001 10000101 00000000 mean????

10000101 00000001 10101001 00000010


10101001 00000010 10101001 00000010
10101001 00000010 10101001 00000010

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 3
What do you see?

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 4
Semantic gap

10101001 00000000 10101001 00000000


10101001 00000000 10101001 00000000
10000101 0000000 110101001 00000000
10101001 00000010 10101001 00000010
Semantic
10000101 00000010 10101001 00000010 Low level
Gap High level
10100000 00000000 10101001 00000010
10101001 0000000 110101001 00000010
10010001 00000001 10000101 00000000

Actual Data Human perception

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 5
Language processor
Language processor is a software which bridges semantic gap.
A language processor is a software program designed or used to perform tasks such as
processing program code to machine code.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 6
હે લો
Translator
A translator is a program that takes one form of program as input and converts it into another
form.
Types of translators are:
1. Compiler
2. Interpreter
3. Assembler

Source Translator Target


Program Program

Error
Messages (If any)
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 8
Compiler
A compiler is a program that reads a program written in source language and translates it into
an equivalent program in target language.

void main() 0000 1100 0010


{ 0100
Source
int a=1,b=2,c; 0111 1000 0001
Target
Compiler
c=a+b; Program 1111 0101 1110
Program
printf(“%d”,c); 1100 0000 1000
} 1011

Source Error Target


Program Messages (If any) Program

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 9
Interpreter
Interpreter is also program that reads a program written in source language and translates it
into an equivalent program in target language line by line.

Void main() 0000 1100 0010


{ 0000
int a=1,b=2,c; Interpreter 1111 1100 0010
c=a+b; 1010 1100 0010
printf(“%d”,c);
0011 1100 0010
} 1111
Error
Source Target
Messages (If any)
Program Program

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 10
Assembler
Assembler is a translator which takes the assembly code as an input and generates the
machine code as an output.

MOV id3, R1 0000 1100 0010


MUL #2.0, R1 0100
MOV id2, R2 0111 1000 0001
MUL R2, R1 Assembler 1111 0101 1110
MOV id1, R2 1100 0000 1000
ADD R2, R1 1011
MOV R1, id1 1100 0000 1000
Error
Assembly Code Messages (If any) Machine Code

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 11
Analysis synthesis model of compilation
There are two parts of compilation.

1. Analysis Phase
2. Synthesis Phase

void main() Analysis Synthesis


{ Phase Phase 0000 1100
int a=1,b=2,c; 0111 1000
c=a+b; 0001
printf(“%d”,c); Intermediate 1111 0101
} Representation 1000
1011
Source Code Target Code

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 13
Analysis phase & Synthesis phase
Analysis Phase Synthesis Phase
Analysis part breaks up the source The synthesis part constructs the desired
program into constituent pieces and target program from the intermediate
creates an intermediate representation of representation.
the source program.
Synthesis phase consist of the following sub
Analysis phase consists of three sub phases:
phases:
1. Code optimization
1. Lexical analysis
2. Code generation
2. Syntax analysis
3. Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 14
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 16
Lexical analysis
Lexical Analysis is also called linear analysis or scanning.
Lexical Analyzer divides the given source statement into the Position = initial + rate*60
tokens.
Ex: Position = initial + rate * 60 would be grouped into the
Lexical analysis
following tokens:
Position (identifier) id1 = id2 + id3 * 60
= (Assignment symbol)
initial (identifier)
+ (Plus symbol)
rate (identifier)
* (Multiplication symbol)
60 (Number)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 17
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 18
Syntax analysis
Syntax Analysis is also called Parsing or Hierarchical Position = initial + rate*60
Analysis.
The syntax analyzer checks each line of the code and Lexical analysis
spots every tiny mistake.
id1 = id2 + id3 * 60
If code is error free then syntax analyzer generates the
tree.
Syntax analysis

id1 +

id2 *
id3 60

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 19
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 20
Semantic analysis
Semantic analyzer determines the meaning of a source =
string.
id1 +
It performs following operations:
1. matching of parenthesis in the expression. id2 * int to
real
2. Matching of if..else statement. id3 60
3. Performing arithmetic operation that are type
compatible. Semantic analysis
4. Checking the scope of operation. =
*Note: Consider id1, id2 and id3 are real
id1 +

id2 *
id3 inttoreal

60
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 21
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 22
Intermediate code generator
Two important properties of intermediate code : =
1. It should be easy to produce.
id1 +
2. Easy to translate into target program.
id2 *
Intermediate form can be represented using “three
address code”. t3 id3 inttoreal
t2 t1
Three address code consist of a sequence of instruction, 60
each of which has at most three operands.
Intermediate code

t1= int to real(60)


t2= id3 * t1
t3= t2 + id2
id1= t3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 23
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 24
Code optimization
It improves the intermediate code.
This is necessary to have a faster execution of code Intermediate code
or less consumption of memory.
t1= int to real(60)
t2= id3 * t1
t3= t2 + id2
id1= t3

Code optimization

t1= id3 * 60.0


id1 = id2 + t1

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 25
Phases of compiler

Compiler

Analysis phase Synthesis phase

Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation

Code generation
Semantic analysis

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 26
Code generation
The intermediate code instructions are translated into
sequence of machine instruction. Code optimization

t1= id3 * 60.0


id1 = id2 + t1

Code generation

MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1

Id3R2
Id2R1
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 27
Phases of compiler
Source program

Analysis Phase
Lexical analysis

Syntax analysis

Semantic analysis
Symbol table Error detection
and recovery
Intermediate code

Variable Type Address Code optimization


Name
Position Float 0001 Code generation Synthesis Phase
Initial Float 0005
Rate Float 0009 Target Program

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 28
Exercise
Write output of all the phases of compiler for following statements:
1. x = b-c*2
2. I=p*n*r/100

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 29
Front end & back end (Grouping of phases)
Front end
Depends primarily on source language and largely independent of the target machine.
It includes following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table & Error handling
Back end
Depends on target machine and do not depends on source program.
It includes following phases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 31
Difference between compiler & interpreter

Compiler Interpreter
Scans the entire program and translates it It translates program’s one statement at a
as a whole into machine code. time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire program is An error is displayed for every instruction
checked. interpreted if any.
Memory requirement is more. Memory requirement is less.
Example: C compiler Example: Basic, Python, Ruby

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 32
Context of compiler (Cousins of compiler)
Skeletal Source Program
In addition to compiler, many other system programs
are required to generate absolute machine code. Preprocessor
These system programs are:
Source Program

Preprocessor Compiler

Assembler Target Assembly


Program
Linker
Assembler
Loader
Relocatable Object
Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 34
Context of compiler (Cousins of compiler)
Skeletal Source Program
Preprocessor
Some of the task performed by preprocessor: Preprocessor

1. Macro processing: Allows user to define macros. Ex: #define Source Program
PI 3.14159265358979323846
2. File inclusion: A preprocessor may include the header file Compiler
into the program. Ex: #include<stdio.h> Target Assembly
3. Rational preprocessor: It provides built in macro for Program
construct like while statement or if statement. Assembler
4. Language extensions: Add capabilities to the language by Relocatable Object
using built-in macros. Code
Ex: the language equal is a database query language Libraries & Linker / Loader
embedded in C. Statement beginning with ## are taken by Object Files
preprocessor to be database access statement unrelated
to C and translated into procedure call on routines that Absolute Machine
perform the database access. Code
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 35
Context of compiler (Cousins of compiler)
Skeletal Source Program
Compiler
Preprocessor
A compiler is a program that reads a program
written in source language and translates it into an Source Program
equivalent program in target language.
Compiler

Target Assembly
Program
Assembler

Relocatable Object
Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 36
Context of compiler (Cousins of compiler)
Skeletal Source Program
Assembler
Preprocessor
Assembler is a translator which takes the assembly
program (mnemonic) as an input and generates the Source Program
machine code as an output.
Compiler

Target Assembly
Program
Assembler

Relocatable Object
Code
Libraries & Linker / Loader
Object Files

Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 37
Context of compiler (Cousins of compiler)
Skeletal Source Program
Linker
Preprocessor
Linker makes a single program from a several files
of relocatable machine code. Source Program
These files may have been the result of several Compiler
different compilation, and one or more library files.
Target Assembly
Loader Program
Assembler
The process of loading consists of:
Relocatable Object
Taking relocatable machine code Code
Altering the relocatable address Libraries & Linker / Loader
Placing the altered instructions and data in Object Files
memory at the proper location.
Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 38
Pass structure
One complete scan of a source program is called pass.
Pass includes reading an input file and writing to the output file.
In a single pass compiler analysis of source statement is immediately followed by synthesis of
equivalent target statement.
While in a two pass compiler intermediate code is generated between analysis and synthesis
phase.
It is difficult to compile the source program into single pass due to: forward reference

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 40
Pass structure
Forward reference: A forward reference of a program entity is a reference to the entity which
precedes its definition in the program.
This problem can be solved by postponing the generation of target code until more information
concerning the entity becomes available.
It leads to multi pass model of compilation.

Pass I:

Perform analysis of the source program and note relevant information.

Pass II:

In Pass II: Generate target code using information noted in pass I.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 41
Effect of reducing the number of passes
It is desirable to have a few passes, because it takes time to read and write intermediate file.
If we group several phases into one pass then memory requirement may be large.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 1– Overview of the Compiler & it’s Structure 42
Types of compiler
1. One pass compiler
It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
It is a type of compiler that compiles whole process in two-pass.
It generates intermediate code.
3. Incremental compiler
The compiler which compiles only the changed line from the source code and update the object
code.
4. Native code compiler
The compiler used to compile a source code for a same type of platform only.
5. Cross compiler
The compiler used to compile a source code for a different kinds platform.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 44
Science of building Compilers
The main job of compiler is to accept the source program and convert it into suitable target
program.
A compiler must accept all source programs that conform to the specification of the language;
the set of source programs is infinite and any program can be very large, consisting of possibly
millions of lines of code.
Compiler study mainly focused on study of how to design the correct mathematical model and
choose correct algorithm.
In compiler design, term “Code Optimization” indicates the attempts made by a compiler to
produce a code which is more efficient then a previous code.
The code should be faster than any other code that performs the same task.
The objectives to be fulfilled by the compiler optimization include:
1. The meaning of the compiled program must be preserved.
2. Optimization should improve programs performance.
3. Time required for compilation should be reasonable.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 46
Science of building Compilers
Just theory is not sufficient to build a compiler, People involved in the design of compiler
should be able to formulate the right problem to solve.
In, order to do this the first step in through understanding of the behavior of programs.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 47
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD)  (PS)
Unit 1–
 Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 48
Compiler Design (CD)
GTU # 3170701

Unit – 2
Lexical Analyzer

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Interaction of scanner & parser
• Token, Pattern & Lexemes
• Input buffering
• Specification of tokens
• Regular expression & Regular definition
• Transition diagram
• Hard coding & automatic generation lexical analyzers
• Finite automata
• Regular expression to NFA using Thompson's rule
• Conversion from NFA to DFA using subset construction method
• DFA optimization
• Conversion from regular expression to DFA
• An Elementary Scanner Design & It’s Implementation
Interaction of scanner & parser
Token
Source Lexical
Parser
Program Analyzer
Get next token

Symbol Table

Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 4


Why to separate lexical analysis & parsing?
1. Simplicity in design.
2. Improves compiler efficiency.
3. Enhance compiler portability.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 5


Token, Pattern & Lexemes
Token Pattern
Sequence of character having a collective The set of rules called pattern associated
meaning is known as token. with a token.
Categories of Tokens: Example: “non-empty sequence of digits”,
“letter followed by letters and digits”
1. Identifier
2. Keyword
Lexemes
3. Operator
The sequence of character in a source
4. Special symbol
program matched with a pattern for a token
5. Constant is called lexeme.
Example: Rate, DIET, count, Flag

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 7


Example: Token, Pattern & Lexemes
Example: total = sum + 45
Tokens:
total Identifier1

= Operator1

sum Identifier2 Tokens


+ Operator2

45 Constant1

Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 8
Input buffering
There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels

Buffer Pair

The lexical analysis scans the input string from left to right one character at a time.
Buffer divided into two N-character halves, where N is the number of character on one disk
block.

: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 10


Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward
lexeme_beginnig

Pointer Lexeme Begin, marks the beginning of the current lexeme.


Pointer Forward, scans ahead until a pattern match is found.
Once the next lexeme is determined, forward is set to character at its right end.
Lexeme Begin is set to the character immediately after the lexeme just found.
If forward pointer is at the end of first buffer half then second is filled with N input character.
If forward pointer is at the end of second buffer half then first is filled with N input character.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 11


Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward forward


lexeme_beginnig
Code to advance forward pointer
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half;
end
else forward := forward + 1;
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 12
Sentinels

: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward
lexeme_beginnig

In buffer pairs we must check, each time we move the forward pointer that we have not moved
off one of the buffers.
Thus, for each character read, we make two tests.
We can combine the buffer-end test with the test for the current character.
We can reduce the two tests to one if we extend each buffer to hold a sentinel character at the
end.
The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character EOF.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 13


Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward forward forward


lexeme_beginnig
forward := forward + 1;
if forward = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at the second half then begin
reload first half;
move forward to beginning of first half;
end
else terminate lexical analysis;
end
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 14
Strings and languages
Term Definition
Prefix of s A string obtained by removing zero or more trailing symbol of
string S.
e.g., ban is prefix of banana.
Suffix of S A string obtained by removing zero or more leading symbol of
string S.
e.g., nana is suffix of banana.
Sub string of S A string obtained by removing prefix and suffix from S.
e.g., nan is substring of banana
Proper prefix, suffix Any nonempty string x that is respectively proper prefix, suffix or
and substring of S substring of S, such that s≠x.
Subsequence of S A string obtained by removing zero or more not necessarily
contiguous symbol from S.
e.g., baaa is subsequence of banana.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 16


Exercise
Write prefix, suffix, substring, proper prefix, proper suffix and subsequence of following string:
String: Compiler

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 17


Operations on languages
Operation Definition
Union of L and M |
Written L U M
Concatenation of L
and M |
Written LM
Kleene closure of L ∗
“ ” .
Written L∗
Positive closure of L “ ” .
Written L+

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 18


Regular expression
A regular expression is a sequence of characters that define a pattern.
Notational shorthand's
1. One or more instances: +
2. Zero or more instances: *
3. Zero or one instances: ?
4. Alphabets: Σ

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 20


Rules to define regular expression
1. ∈ is a regular expression that denotes ∈ , the set containing empty string.
2. If is a symbol in Σ then is a regular expression,
3. Suppose and are regular expression denoting the languages and . Then,
a. r | s is a regular expression denoting L r U L#s$
b. r s is a regular expression denoting L r L s

c. r * is a regular expression denoting L r
d. r is a regular expression denoting L# r $

The language denoted by regular expression is said to be a regular set.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 21


Regular expression
L = Zero or More Occurrences of a = a*

(
a
aa
aaa Infinite …..
aaaa
aaaaa…..

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 22


Regular expression
L = One or More Occurrences of a = a+

a
aa
aaa Infinite …..
aaaa
aaaaa…..

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 23


Precedence and associativity of operators

Operator Precedence Associative


Kleene * 1 left
Concatenation 2 left
Union | 3 left

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 24


Regular expression examples
1. 0 or 1
+,-./01: 3, 4 :. ;. 3|4

2. 0 or 11 or 111
+,-./01: 3, 44, 444 :. ;. 3 44 444

3. String having zero or more a.



+,-./01: 5, 6, 66, 666, 6666 … . . :. ;. 6

4. String having one or more a.


+,-./01: 6, 66, 666, 6666 … . . :. ;. 6

5. Regular expression over Σ , *, that represent all string of length 3.


+,-./01: 689, 896, 888, 968, 686 … . :. ;. 689 6 8 9 #6 8 9$
6. All binary string
+,-./01: 3, 44, 434, 43434, 4444 … :. ;. #3 | 4$+

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 25


Regular expression examples
7. 0 or more occurrence of either a or b or both
<=>?@AB: (, C, CC, CDCD, DCD … E. F. #C | D$ ∗

8. 1 or more occurrence of either a or b or both


<=>?@AB: C, CC, CDCD, DCD, DDDCCC … E. F. #C | D$+

9. Binary no. ends with 0


<=>?@AB: 3, 43, 433, 4343, 44443 … E. F. #3 | 4$* 3

10. Binary no. ends with 1


<=>?@AB: 4, 434, 4334, 43434, … E. F. #3 | 4$ ∗ 4

11. Binary no. starts and ends with 1


<=>?@AB: 44, 434, 4334, 43434, … E. F. 4 #3 | 4$ ∗ 4

12. String starts and ends with same character


<=>?@AB: 33, 434, CDC, DCCD … E. F. 4 #3 | 4$ ∗ 4 G- 3 #3 | 4$ ∗ 3
∗ ∗
C #C | D$ C G- D #C | D$ D

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 26


Regular expression examples
13. All string of a and b starting with a
<=>?@AB: C, CD, CCD, CDD… E. F. C#C | D$*

14. String of 0 and 1 ends with 00


<=>?@AB: 33, 433, 333, 4333, 4433… E. F. #3 | 4$ ∗ 33

15. String ends with abb


<=>?@AB: CDD, DCDD, CDCDD… E. F. #C | D$ ∗ CDD

16. String starts with 1 and ends with 0


<=>?@AB: 43, 433, 443, 4333, 4433… E. F. 4#3 | 4$ ∗ 3

17. All binary string with at least 3 characters and 3rd character should be zero
<=>?@AB: 333, 433, 4433, 4334… E. F. 3 4 3 4 3#3 | 4$ ∗

18. Language which consist of exactly two b’s over the set Σ ,*
∗ ∗ ∗
<=>?@AB: DD, DCD, CCDD, CDDC… E. F. C DC DC

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 27


Regular expression examples
19. The language with Σ , * such that 3rd character from right end of the string is always a.
<=>?@AB: CCC, CDC, CCDC, CDD… E. F. #C | D$ ∗ C#C|D$#C|D$

20. Any no. of followed by any no. of * followed by any no. of


∗ ∗ ∗
<=>?@AB: (, CDH, CCDDHH, CCDH, CDD… E. F. C D H

21. String should contain at least three 1


<=>?@AB: 444, 34434, 3434443…. E. F. #3|4$∗ 4 #3|4$∗ 4 #3|4$∗ 4 #3|4$∗

22. String should contain exactly two 1


<=>?@AB: 44, 3434, 4433, 343343, 433433…. E. F. 3∗ 43∗ 43∗

23. Length of string should be at least 1 and at most 3


<=>?@AB: 3, 4, 44, 34, 444, 343, 433…. E. F. 3|4 3|4 3|4 3|4 3|4 3|4
24. No. of zero should be multiple of 3
<=>?@AB: 333, 343434, 443433, 333333, 433343343…. E. F. #4∗ 34∗ 34∗ 34∗ $∗

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 28


Regular expression examples
25. The language with Σ , *, where should be multiple of 3
<=>?@AB: CCC, DCCC, DCHCDC, CCCCCC. . E. F. # D|H ∗ C D|H ∗ C D|H ∗ C D|H ∗ $∗
26. Even no. of 0
<=>?@AB: 33, 3434, 3333, 433433…. E. F. #4∗ 34∗ 34∗ $∗

27. String should have odd length


<=>?@AB: 3, 343, 443, 333, 43343…. E. F. 3|4 # 3 4 #3|4$$∗
28. String should have even length
<=>?@AB: 33, 3434, 3333, 433433…. E. F. # 3 4 #3|4$$∗
29. String start with 0 and has odd length
<=>?@AB: 3, 343, 343, 333, 33343…. E. F. 3 # 3 4 #3|4$$∗
30. String start with 1 and has even length
<=>?@AB: 43, 4433, 4333, 433433…. E. F. 4#3|4$# 3 4 #3|4$$∗

31. All string begins or ends with 00 or 11



<=>?@AB: 33434, 43433, 443, 34344 … E. F. #33|44$#3 | 4$ ∗ | 3 4 #33|44$
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 29
Regular expression examples
32. Language of all string containing both 11 and 00 as substring
<=>?@AB: 3344, 4433, 433443, 343344 … E. F. ##3|4$∗ 33#3|4$∗ 44#3|4$∗ $ | ##3|4$∗ 44#3|4$∗ 33#3|4$∗ $

33. String ending with 1 and not contain 00


<=>?@AB: 344, 4434, 4344 … . E. F. 4 34
34. Language of identifier
<=>?@AB: C>LC, ?, >LM?NOB, A>CML4 … . E. F. #_ J$#_ J K$∗
PQL>L J ?B JL==L> & S .1 T.0.,

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 30


Regular definition
A regular definition gives names to certain regular expressions and uses those names in other
regular expressions.
Regular definition is a sequence of definitions of the form:
1→ 1
2 → 2
……

Where is a distinct name & is a regular expression.
• Example: Regular definition for identifier
letter  A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 31


Regular definition example
Example: Unsigned Pascal numbers
3
5280
39.37
6.336E4
1.894E-4
2.56E+7
Regular Definition
digit  0|1|…..|9
digits  digit digit*
optional_fraction  .digits | (
optional_exponent  (E(+|-|()digits)|(
num  digits optional_fraction optional_exponent

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 32


Transition Diagram
A stylized flowchart is called transition diagram.

is a state

is a transition

is a start state

is a final state

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 34


Transition Diagram : Relational operator

< =
0 1 2 return (relop,LE)

>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
=
6 7 return (relop,GE)

other
8 return (relop,GT)

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 35


Transition diagram : Unsigned number

digit digit digit

start digit . digit E +or - digit other


8
1 2 3 4 5 6 7

E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 36
Hard coding and automatic generation lexical analyzers
Lexical analysis is about identifying the pattern from the input.
To recognize the pattern, transition diagram is constructed.
It is known as hard coding lexical analyzer.
Example: to represent identifier in ‘C’, the first character must be letter and other characters are
either letter or digits.
To recognize this pattern, hard coding lexical analyzer will work with a transition diagram.
The automatic generation lexical analyzer takes special notation as input.
For example, lex compiler tool will take regular expression as input and finds out the pattern
matching to that regular expression.

Letter or digit

Start Letter
1 2 3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 38


Finite Automata
Finite Automata are recognizers.
FA simply say “Yes” or “No” about each possible input string.
Finite Automata is a mathematical model consist of:
1. Set of states <
2. Set of input symbol ]
3. A transition function move
4. Initial state <3
5. Final states or accepting states ^

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 40


Types of finite automata
Types of finite automata are:
DFA
b

Deterministic finite automata (DFA): have for


each state exactly one edge leaving out for a b b
1 2 3 4
each symbol.
a
a
b a
NFA DFA
Nondeterministic finite automata (NFA): a
There are no restrictions on the edges
leaving a state. There can be several with the a b b
1 2 3 4
same symbol as label and some edges can
be labeled with (.
b NFA
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 41
Regular expression to NFA using Thompson's rule
1. For ∈ , construct the NFA 3. For regular expression

start
start ( N(s) N(t)

2. For in Σ, construct the NFA Ex: ab

start a a b
1 2 3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 43


Regular expression to NFA using Thompson's rule
4. For regular expression | 5. For regular expression *
(
N(s) (
(
start ( (
start N(s)

( N(t) ( (

Ex: (a|b) Ex: a*


(
a
2 3
( ( ( (
1 2 3 4
1 6

( ( (
4 5
b
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 44
Regular expression to NFA using Thompson's rule
a*b

( ( *
1 2 3 4 5

b*ab
(

( * ( *
1 2 3 4 5 6

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 45


Exercise
Convert following regular expression to NFA:
1. abba
2. bb(a)*
3. (a|b)*
4. a* | b*
5. a(a)*ab
6. aa*+ bb*
7. (a+b)*abb
8. 10(0+1)*1
9. (a+b)*a(a+b)
10. (0+1)*010(0+1)*
11. (010+00)*(10)*
12. 100(1)*00(0+1)*
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 46
Subset construction algorithm
Input: An NFA _.
Output: A DFA D accepting the same language.
Method: Algorithm construct a transition table ` for D. We use the following operation:

OPERATION DESCRIPTION
a b c # $ Set of NFA states reachable from NFA state on
– transition alone.
a b c #d$ Set of NFA states reachable from some NFA state
in d on – transition alone.
MNfL #d, $ Set of NFA states to which there is a transition on
input symbol from some NFA state in d.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 48


Subset construction algorithm
initially a b c # 0$ be the only state in ` and it is unmarked;
while there is unmarked states T in ` do begin
mark d;
for each input symbol do begin
(a b c g d, ;
if is not in ` then
add as unmarked state to ` ;
` i d, j
end
end

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 49


Conversion from NFA to DFA

(a|b)*abb (

a
2 3
( (
( ( a b b
0 1 6 7 8 9 10

( (
4 5
b

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 50


Conversion from NFA to DFA

a
2 3
( (
( ( a b b
0 1 6 7 8 9 10

( (
4 5
b

(- Closure(0)= {0, 1, 7, 2, 4}

= {0,1,2,4,7} ---- A

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 51


Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}

( (
4 5
b

(
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
(- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 52
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
( (
4 5
b

(
A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
(- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 53
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
( (
4 5
b

(
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
(- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 54
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
( (
4 5 D = {1,2,4,5,6,7,9}
b

B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
(- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 55
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
( (
4 5 D = {1,2,4,5,6,7,9}
b

C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
(- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 56
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9}
b

(
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
(- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 57
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B
b

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
(- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 58
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
(

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
(- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 59
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
(
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
(- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 60
Conversion from NFA to DFA

a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
(
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
(- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 61
Conversion from NFA to DFA

b
States a b B D
a
A = {0,1,2,4,7} B C a
B = {1,2,3,4,6,7,8} B D
A a a b
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E b
C E
E = {1,2,4,5,6,7,10} B C b

Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 62


Exercise
Convert following regular expression to DFA using subset construction method:
1. (a+b)*a(a+b)
2. (a+b)*ab*a

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 63


DFA optimization
1. Construct an initial partition Π of the set of states with two groups: the accepting states r
and the non-accepting states s a r.
2. Apply the repartition procedure to Π to construct a new partition Π t.
3. If Π t Π, let Π b Π and continue with step (4). Otherwise, repeat step (2) with
Π Π t.
for each group u of Π do begin
partition u into subgroups such that two states and
of u are in the same subgroup if and only if for all
input symbols , states and have transitions on
to states in the same group of Π.
replace u in Π t by the set of all subgroups formed.
end

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 65


DFA optimization
4. Choose one state in each group of the partition Π b as the representative for that group.
The representatives will be the states of ′. Let s be a representative state, and suppose on
input a there is a transition of from to . Let be the representative of ′s group. Then ′
has a transition from to on . Let the start state of ′ be the representative of the group
containing start state 0 of , and let the accepting states of ′ be the representatives that
are in r.
5. If ′ has a dead state , then remove from ′. Also remove any state not reachable from
the start state.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 66


DFA optimization

States a b
w, x, y, `, z
A B C
B B D
Nonaccepting States Accepting States
C B C
w, x, y, ` z
D B E
E B C
w, x, y `

States a b
w, y x
A B A
B B D
Now no more splitting is possible. D B E
E B A
If we chose A as the representative for group
Optimized
(AC), then we obtain reduced transition table Transition Table

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 67


Rules to compute nullable, firstpos, lastpos
nullable(n)
The subtree at node generates languages including the empty string.

firstpos(n)
The set of positions that can match the first symbol of a string generated by the subtree at node .

lastpos(n)
The set of positions that can match the last symbol of a string generated be the subtree at node .

followpos(i)
The set of positions that can follow position in the tree.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 69


Rules to compute nullable, firstpos, lastpos

Node n nullable(n) firstpos(n) lastpos(n)


A leaf labeled by true ∅ ∅
A leaf with
false i i
position .

n | nullable(c1) firstpos(c1) lastpos(c1)


or ∪ ∪
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if (nullable(c1)) if (nullable(c2))
n . nullable(c1)
thenfirstpos(c1) ∪ then lastpos(c1) ∪
and
c1 firstpos(c2) lastpos(c2)
c2 nullable(c2)
else firstpos(c1) else lastpos(c2)
n ∗
true firstpos(c1) lastpos(c1)
c1

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 70


Rules to compute followpos
1. If n is concatenation node with left child c1 and right child c2 and i is a position in lastpos(c1),
then all position in firstpos(c2) are in followpos(i)

2. If n is * node and i is position in lastpos(n), then all position in firstpos(n) are in followpos(i)

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 71


Conversion from regular expression to DFA
Step 1: Construct Syntax Tree
(a|b) * abb #
Step 2: Nullable node
False .
False . A leaf labeled by True
#
‚ A leaf with position ? false
False .
* False
n
False . • | nullable(c1)
* False or
• c1 c2 nullable(c2)
True ∗ False
€ n ∗
False true
False | c1

n nullable(c1)
* . and
4 } nullable(c2)
False False Here, * is only nullable node c1 c2
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 72
Conversion from regular expression to DFA

Step 3: Calculate firstpos


Firstpos
1,2,3 .
1,2,3 . A leaf with position ? ?
6 #
1,2,3 . ‚
5 * n
|
1,2,3 . • firstpos(c1) ∪ firstpos(c2)
4 *
• c1 c2
1,2 ∗ 3
n ∗
€ firstpos(c1)
c1
1,2 |
n if (nullable(c1))
.
* thenfirstpos(c1) ∪ firstpos(c2)
1 4 2} else firstpos(c1)
c1 c2

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 73


Conversion from regular expression to DFA

Step 3: Calculate lastpos


Lastpos
1,2,3 . 6

1,2,3 . 5 A leaf with position ? ?


6 # 6
1,2,3 . 4

n
5 * 5 |
1,2,3 . • lastpos(c1) ∪ lastpos(c2)
3 4 * 4
c1 c2

1,2 ∗ 1,2 3 3 n ∗
€ lastpos(c1)
c1
1,2 | 1,2
n if (nullable(c2)) then
.
* lastpos(c1) ∪ lastpos(c2)
1 1 2 2 else lastpos(c2)
4 } c1 c2

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 74


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


5 If n is * node and i is position in
Firstpos 1,2,3 . 6 lastpos(n), then all position in
4 firstpos(n) are in followpos(i)
Lastpos
1,2,3 . 5 3
6 # 6
2 1,2,
1,2,3 . 4

5 * 5 1 1,2,
1,2,3 . 3 4 * 4

• 1,2 * 1,2
1,2 ∗ 1,2 3 3 @

1,2 | 1,2
b ƒ # $ 1,2
ƒ 1,2
* bb tƒ 1 1,2
1 1 2 2
4 } bb tƒ 2 1,2

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 75


Conversion from regular expression to DFA

Step 4: Calculate followpos Rule:


Position followpos
If n is concatenation node
5 with left child c1 and right
Firstpos 1,2,3 . 6 child c2 and i is a position in
4
Lastpos lastpos(c1), then all position
1,2,3 . 5 3 in firstpos(c2) are in
6 # 6
2 1,2, 3 followpos(i)
1,2,3 . 4

5 * 5 1 1,2, 3
1,2,3 . 3 4 * 4
• .

1,2 ∗ 1,2 3 3 1,2 H4 1,2 3 H} 3

1,2 | 1,2
b ƒ # 1$ 1,2
ƒ 2 3
* bb tƒ 1 3
1 1 2 2
4 } bb tƒ 2 3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 76


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


5 If n is concatenation node
with left child c1 and right
Firstpos 1,2,3 . 6 4 child c2 and i is a position in
Lastpos lastpos(c1), then all position
. 3 4
1,2,3 5 in firstpos(c2) are in
6 # 6
2 1,2, 3 followpos(i)
1,2,3 . 4

5 * 5 1 1,2, 3
1,2,3 . 3 • .
4 * 4

1,2 ∗ 1,2 3 3 1,2,3 H 3 4 H} 4
4

1,2 | 1,2
b ƒ # 1$ 3
ƒ 2 4
* bb tƒ 3 4
1 1 2 2
4 }

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 77


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


5 If n is concatenation node
with left child c1 and right
Firstpos 1,2,3 . 6 4 5 child c2 and i is a position in
Lastpos lastpos(c1), then all position
. 3 4
1,2,3 5 in firstpos(c2) are in
6 # 6
2 1,2, 3 followpos(i)
1,2,3 . 4

5 * 5 1 1,2, 3
1,2,3 . 3 4 * 4
• .

1,2 ∗ 1,2 3 3 1,2,3 H 4 5 H} 5
4

1,2 | 1,2
b ƒ # 1$ 4
ƒ 2 5
* bb tƒ 4 5
1 1 2 2
4 }

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 78


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


5 6 If n is concatenation node
with left child c1 and right
Firstpos 1,2,3 . 6 4 5 child c2 and i is a position in
Lastpos lastpos(c1), then all position
. 3 4
1,2,3 5 in firstpos(c2) are in
6 # 6
2 1,2, 3 followpos(i)
1,2,3 . 4

5 * 5 1 1,2, 3
1,2,3 . 3 4 * 4
• .

1,2 ∗ 1,2 3 3 1,2,3 H 5 6 H} 6
4

1,2 | 1,2
b ƒ # 1$ 5
ƒ 2 6
* bb tƒ 5 6
1 1 2 2
4 }

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 79


Conversion from regular expression to DFA
Initial state = ƒ of root = {1,2,3} ----- A
Position followpos
State A
5 6
δ( (1,2,3),a) = followpos(1) U followpos(3) 4 5
=(1,2,3) U (4) = {1,2,3,4} ----- B 3 4
2 1,2,3

δ( (1,2,3),b) = followpos(2) 1 1,2,3

=(1,2,3) ----- A
States a b
A={1,2,3} B A
B={1,2,3,4}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 80


Conversion from regular expression to DFA
State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,4),b) = followpos(2) U followpos(4) 2 1,2,3

=(1,2,3) U (5) = {1,2,3,5} ----- C 1 1,2,3

State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3) States a b
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3,4} B C
C={1,2,3,5} B D
δ( (1,2,3,5),b) = followpos(2) U followpos(5) D={1,2,3,6}

=(1,2,3) U (6) = {1,2,3,6} ----- D


Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 81
Conversion from regular expression to DFA
State D
Position followpos
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,6),b) = followpos(2) 2 1,2,3

=(1,2,3) ----- A 1 1,2,3

b
a States a b
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b

DFA
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 82
An Elementary Scanner Design & It’s Implementation
Tasks of Scanner
1. The main purpose of the scanner is to return the next input token to the parser.
2. The scanner must identify the complete token and sometimes differentiate between keywords
and identifiers.
3. The scanner may perform symbol-table maintenance, inserting identifiers, literals, and constants
into the tables.
4. The scanner also eliminate the white spaces.
Regular Expression: Tokens can be Specified using regular expression.
Example: id letter(letter | digit)*

Transition Diagram: Finite-state diagrams or transition diagrams are often used to recognize a
token
digit digit

1
. 2
digit
3
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit12––Basic
Lexical
Probability
Analysis 84
Implementation of Lexical Analyzer (Lex)
Lex is tool or a language which is useful for generating a lexical Analyzer and it specifies the
regular expression
Regular expression is used to represent the patterns for a token.
Creating Lexical Analyzer with LEX
Source
Lex Compiler lex.yy.c
Program
(lex.l)

lex.yy.c C Compiler a.out Lexical Analyzer

Input Sequence
a.out
Stream of token

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 2––Basic
Lexical
Probability
Analysis 85
Structure of Lex Program
Any lex program contains mainly three sections
1. Declaration
2. Translation rules
3. Auxiliary Procedures

Structure of Program
Declaration It is used to declare variables, constant & regular definition
Syntax: Pattern {Action}
Example:
Example:
%{
%% %%
int x,y;
Translation rule pattern1 {Action1}
float rate;
%% %}
pattern2 {Action2}
pattern3 {Action3}
Digit [0-9]
%%
Auxiliary Procedures All a-Z]
Letter [A-Z the function needed are specified over here.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit12––Basic
Lexical
Probability
Analysis 86
Example: Lex Program
Program: Write Lex program to recognize identifier, keywords, relational operator and numbers
/* Declaration */ /* Translation rule */
%{ %%
/* Lex program for recognizing tokens */ {Id} {printf(“%s is an identifier”,yytext);}
%} If {printf(“%s is a keyword”,yytext);}
Letter [a-z A-z] else {printf(“%s is a keyword”,yytext);}
Digit [0-9] “<” {printf(“%s is a less then operator”,yytext);}
Id {Letter}({Letter}|{Digit})* “>=” {printf(“%s is a greater then equal to operator”,yytext);}
Numbers {Digit}+ (.{Digit}+)? (E[+ -]? Digit+)? {Numbers} {printf(“%s is a number”,yytext);}
%%
/* Auxiliary Procedures */
install_id() Input string: If year < 2021
{
/* procedure to lexeme into the symbol table and return a pointer */
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 2––Basic
Lexical
Probability
Analysis 87
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 2––Basic
Lexical
Probability
Analysis 88
Compiler Design (CD)
GTU # 3170701

Unit – 3
Syntax Analysis (I)

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Role of parser
• Context free grammar
• Derivation & Ambiguity
• Left recursion & Left factoring
• Classification of parsing
• Backtracking
• LL(1) parsing
• Recursive descent paring
• Shift reduce parsing
• Operator precedence parsing
• LR parsing
• Parser generator
Role of parser
Token Parse
Source Lexical tree Rest of front IR
Parser
program analyzer end

Get next token

Symbol table

Parser obtains a string of token from the lexical analyzer and reports syntax error if any
otherwise generates parse tree.
There are two types of parser:
1. Top-down parser
2. Bottom-up parser

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 4
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,

is a finite set formulas of the form → where ∈ and ∈ ∪ Σ

Nonterminal symbol:
The name of syntax category of a language, e.g., noun, verb, etc.
The It is written as a single capital letter, or as a name enclosed between < … >, e.g., A or
<Noun>
<Noun Phrase> → <Article><Noun>
<Article> → a | an | the
<Noun> → boy | apple

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 6
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,

is a finite set formulas of the form → where ∈ and ∈ ∪ Σ

Terminal symbol:
A symbol in the alphabet.
It is denoted by lower case letter and punctuation marks used in language.

<Noun Phrase> → <Article><Noun>


<Article> → a | an | the
<Noun> → boy | apple

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 7
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,

is a finite set formulas of the form → where ∈ and ∈ ∪ Σ

Start symbol:
First nonterminal symbol of the grammar is called start symbol.

<Noun Phrase> → <Article><Noun>


<Article> → a | an | the
<Noun> → boy | apple

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 8
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,

is a finite set formulas of the form → where ∈ and ∈ ∪ Σ

Production:
A production, also called a rewriting rule, is a rule of grammar. It has the form of
A nonterminal symbol → String of terminal and nonterminal symbols

<Noun Phrase> → <Article><Noun>


<Article> → a | an | the
<Noun> → boy | apple

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 9
Example: Context Free Grammar
Write non terminals, terminals, start symbol, and productions for following grammar.
E  E O E | (E) | id
O+|-|*|/ |↑

Non terminals: E, O
Terminals: id + - * / ↑ ( )
Start symbol: E
Productions: E  E O E | (E) | id
O+|-|*|/ |↑

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 10
Derivation
A derivation is basically a sequence of production rules, in order to get the input string.
To decide which non-terminal to be replaced with production rule, we can have two options:
1. Leftmost derivation
2. Rightmost derivation

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 12
Leftmost derivation
A derivation of a string in a grammar is a left most derivation if at every step the left most
non terminal is replaced.
Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a

S S
Parse tree represents the
S-S structure of derivation
S - S
S*S-S
a*S-S S S
* a
a*a-S
a*a-a a a
Leftmost Derivation Parse tree

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 13
Rightmost derivation
A derivation of a string in a grammar is a right most derivation if at every step the right
most non terminal is replaced.
It is all called canonical derivation.
Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a

S
S
S*S
S * S
S*S-S
S*S-a a S S
-
S*a-a
a*a-a a a
Rightmost Derivation Parse Tree
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 14
Ambiguity
Ambiguity, is a word, phrase, or statement which contains more than one meaning.

A long thin piece of potato

Chip

A small piece of silicon

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 16
Ambiguity
In formal language grammar, ambiguity would arise if identical string can occur on the RHS of
two or more productions.
Grammar:
1→ Replaced by
or ?
2→
can be derived from either N1 or N2

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 17
Ambiguous grammar
Ambiguous grammar is one that produces more than one leftmost or more then one rightmost
derivation for the same sentence.
Grammar: SS+S | S*S | (S) | a Output string: a+a*a

S S S S
S*S S+S
S * S S + S
S+S*S a+S
a+S*S S S a+S*S a
+ a S * S
a+a*S a+a*S
a+a*a a a a+a*a a a
Here, Two leftmost derivation for string a+a*a is possible hence, above grammar is ambiguous.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 18
Parsing
Parsing is a technique that takes input string and produces output either a parse tree if string is
valid sentence of grammar, or an error message indicating that string is not a valid.
Types of Parsing

Top down parsing: In top down parsing Bottom up parsing: Bottom up parser starts
parser build parse tree from top to bottom. from leaves and work up to the root.
Grammar: String: abbcde
S
SaABe S
AAbc | b
A
Bd a A B e
A B
A b c d
a b b c d e
b
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 20
Classification of parsing
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 22
Classification of parsing
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 23
Backtracking
In backtracking, expansion of nonterminal symbol we choose one alternative and if any
mismatch occurs then we try another alternative.
Grammar: S cAd Input string: cad
A ab | a

S S S

c A d c A d c A d
Make prediction Make prediction

a b Backtrack a Parsing done

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 25
Problems in Top-down Parsing
Left recursion
A grammar is said to be left recursive if it has a non terminal such that there is a derivation
 for some string .
Grammar: AA |
A A

A
*
A
A
A
A
A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 27
Left recursion elimination

→ | → ’

’ ’|

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 28
Examples: Left recursion elimination
EE+T | T
ETE’
E’+TE’ | ε
TT*F | F
TFT’
T’*FT’ | ε
XX%Y | Z
XZX’
X’%YX’ | ε

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 29
Problems in Top-down Parsing
Left factoring

A 1| 2| 3

Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing.
It is used to remove nondeterminism from the grammar.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 31
Left factoring

!
→ | δ →
!
→ |

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 32
Example: Left factoring
SaAB | aCD
SaS’
S’AB | CD
A xByA | xByAzA | a

A xByAA’ | a
A’ Є | zA

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 33
Rules to compute first of non terminal
1. If → and is terminal, add to "#$ % .
2. If → ∈, add ∈ to "#$ % .
3. If & is nonterminal and &'1 '2 … . ') is a production, then place * in "#$ % & if for some
+, a is in "#$ % '+ , and is in all of "#$ % '1 , … … … , "#$ % ',-. ; that is '1 … ',-. ⇒
. If is in "#$ % '1 for all 1 1,2, … . . , ) then add to "#$ % & .
Everything in "#$ % '1 is surely in "#$ % & If '1 does not derive , then we do nothing
more to "#$ % & , but if '1 ⇒ , then we add "#$ % '2 and so on.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 35
Rules to compute first of non terminal
Simplification of Rule 3
If → '. '2 … … . . '3 ,
• If '. does not derives ∈ 4ℎ67, "#$ % "#$ % '.
• If '. derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 U "#$ % '2
• If '. & Y2 derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 U "#$ % '2 8 : "#$ % ';
• If '. , Y2 & Y3 derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 : "#$ % '2 8 : "#$ % '; 8 : "#$ % '<
• If '. , Y2 , Y3 …..YK all derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 : "#$ % '2 8 : "#$ % '; 8 : "#$ % '< 8
: … … … … "#$ % '= (note: if all non terminals derives ∈ then add ∈ to FIRST(A))

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 36
Rules to compute FOLLOW of non terminal
1. Place $ +7 ?@AA@B . S is start symbol)
2. If A → D , then everything in "#$ % except for is placed in "NOON D
3. If there is a production A → D or a production A → D where "#$ % contains then
everything in FNOON D "NOON

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 37
Example-1: First & Follow
Compute FIRST ETE’
First(E) E’+TE’ | ϵ
E  T E’ Rule 3 TFT’
ETE’ A  Y1 Y2 First(A)=First(Y1) T’*FT’ | ϵ
F(E) | id
FIRST(E)=FIRST(T) = {(, id }

NT First
First(T) E { (,id }
T  F T’ Rule 3 E’
TFT’
A  Y1 Y2 First(A)=First(Y1)
T { (,id }
FIRST(T)=FIRST(F)= {(, id } T’
First(F) F { (,id }
F(E) Fid
F  ( E ) F  id
A  Rule 1 A  Rule 1
add to "#$ % add to "#$ %
FIRST(F)={ ( , id }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 38
Example-1: First & Follow
Compute FIRST ETE’
First(E’) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
E’  + T E’ Rule 1
add to "#$ % NT First
A 
E { (,id }

E’ E’ { +, }
T { (,id }
T’
E’  Rule 2
F { (,id }
A  add to "#$ %

FIRST(E’)={ + , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 39
Example-1: First & Follow
Compute FIRST ETE’
First(T’) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
T’  * F T’ Rule 1
add to "#$ % NT First
A 
E { (,id }

T’ E’ { +, }
T { (,id }
T’ { *, }
T’  Rule 2
F { (,id }
A  add to "#$ %

FIRST(T’)={ * , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 40
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(E) E’+TE’ | ϵ
TFT’
Rule 1: Place $ in FOLLOW(E) T’*FT’ | ϵ
F(E) | id
F(E)
NT First Follow
E { (,id } { $,) }
E’ { +, }
F  ( E ) Rule 2 T { (,id }
A  Q B R
T’ { *, }
F { (,id }

FOLLOW(E)={ $, }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 41
Example-1: First & Follow
ETE’
Compute FOLLOW E’+TE’ | ϵ
FOLLOW(E’) TFT’
T’*FT’ | ϵ
ETE’ F(E) | id
NT First Follow
E  T E’ Rule 3 E { (,id } { $,) }
A  Q B
E’ { +, } { $,) }
T { (,id }
E’+TE’ T’ { *, }
F { (,id }
E’  +T E’ Rule 3
A  Q B

FOLLOW(E’)={ $,) }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 42
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
ETE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E  T E’ Rule 2 E { (,id } { $,) }
A  B R
E’ { +, } { $,) }
T { (,id }
T’ { *, }
F { (,id }
E  T E’ Rule 3
A  B R

FOLLOW(T)={ +, $, )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 43
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E’  + T E’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, }
F { (,id }
E’  + T E’ Rule 3
A  Q B R

FOLLOW(T)={ +, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 44
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T’) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T  F T’ Rule 3 E { (,id } { $,) }
A  Q B
E’ { +, } { $,) }
T’*FT’ T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T’  *F T’ Rule 3
A  Q B

FOLLOW(T’)={+ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 45
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T  F T’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T  F T’ Rule 3
A  Q B R

FOLLOW(F)={ *, + ,$ , )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 46
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T’  * F T’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
T’  * F T’ Rule 3
A  Q B R

FOLLOW(F)={ *,+, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 47
Example-2: First & Follow
SABCDE
A a |
B b |
C c NT First Follow
D d | S {a,b,c} {$}
E e | A {a, } {b, c}
B {b, } {c}
C {c} {d, e, $}
D {d, } {e, $}
E {e, } {$}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 48
Parsing Methods
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 49
LL(1) parser (Predictive parser or Non recursive descent parser)
LL(1) is non recursive top down parser.
1. First L indicates input is scanned from left to right.
2. The second L means it uses leftmost derivation for input string
3. 1 means it uses only input symbol to predict the parsing process.

a + b $ INPUT

X
Predictive
Y
Stack parsing OUTPUT
Z program
$

Parsing table M

Model of LL(1) Parser


Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 50
LL(1) parsing (predictive parsing)
Steps to construct LL(1) parser
1. Remove left recursion / Perform left factoring (if any).
2. Compute FIRST and FOLLOW of non terminals.
3. Construct predictive parsing table.
4. Parse the input string using parsing table.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 51
Rules to construct predictive parsing table
1. For each production → of the grammar, do steps 2 and 3.
2. For each terminal * in ?+ST4 , Add → to UV , *W.
3. If is in ?+ST4 , Add → to UV , XW for each terminal X in "NOON . If is in
?+ST4 , and $ is in "NOON , add → to UV , $W.
4. Make each undefined entry of M be error.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 52
Example-1: LL(1) parsing
SaBa
BbB | ϵ
NT First
Step 1: Not required S {a}
Step 2: Compute FIRST B {b, }

First(S) S  a B a Rule 1
SaBa A  add to "#$ %
FIRST(S)={ a }

First(B)
BbB B

B  b B B 
Rule 1
A  A 
add to "#$ % Rule 2
add to "#$ %
FIRST(B)={ b , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 53
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 2: Compute FOLLOW B {b, } {a}
Follow(S)
Rule 1: Place $ in FOLLOW(S)
Follow(S)={ $ }

Follow(B)
SaBa BbB

S  a B a Rule 2 B  b B Rule 3
A  Q B R First(β 8 A  Q B Follow(A)=follow(B)

Follow(B)={ a }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 54
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}

NT Input Symbol
a b $
S SaBa
B

SaBa
Rule: 2
a=FIRST(aBa)={ a } A
a = first( )
M[S,a]=SaBa M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 55
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}

NT Input Symbol
a b $
S SaBa
B BbB

BbB
Rule: 2
a=FIRST(bB)={ b } A
a = first( )
M[B,b]=BbB M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 56
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}

NT Input Symbol
a b $
S SaBa Error Error
B Bϵ BbB Error

Bϵ
Rule: 3
b=FOLLOW(B)={ a } A
b = follow(A)
M[B,a]=B M[A,b] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 57
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(S) B {b, }
SaB S C {c, }

S  a B S 
Rule 1 Rule 2
A  add to "#$ % A  add to "#$ %

FIRST(S)={ a , }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 58
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(B) B {b, }
BbC B C {c, }

B  b C B 
Rule 1 Rule 2
A  add to "#$ % A  add to "#$ %

FIRST(B)={ b , }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 59
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(C) B {b, }
CcS C C {c, }

C  c S C 
Rule 1 Rule 2
A  add to "#$ % A  add to "#$ %

FIRST(B)={ c , }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 60
Example-2: LL(1) parsing
Step 2: Compute FOLLOW
Follow(S) Rule 1: Place $ in FOLLOW(S)
Follow(S)={ $ }
CcS SaB | ϵ
BbC | ϵ
C  c S Rule 3 CcS | ϵ
A  Q B Follow(A)=follow(B)
Follow(S)=Follow(C) ={$}
NT First Follow
S {a, } {$}
BbC SaB B {b, } {$}
C {c, } {$}
B  b C Rule 3 S  a B Rule 3
A  Q B Follow(A)=follow(B) A  Q B Follow(A)=follow(B)
Follow(C)=Follow(B) ={$} Follow(B)=Follow(S) ={$}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 61
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a, } {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB
B
C

SaB Rule: 2
A
a=FIRST(aB)={ a } a = first( )
M[S,a]=SaB M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 62
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB S
B
C

S Rule: 3
A
b=FOLLOW(S)={ $ } b = follow(A)
M[S,$]=S M[A,b] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 63
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB S
B BbC
C

BbC Rule: 2
A
a=FIRST(bC)={ b } a = first( )
M[B,b]=BbC M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 64
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB S
B BbC B
C

B Rule: 3
A
b=FOLLOW(B)={ $ } b = follow(A)
M[B,$]=B M[A,b] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 65
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB S
B BbC B
C CcS

CcS Rule: 2
A
a=FIRST(cS)={ c } a = first( )
M[C,c]=CcS M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 66
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}

N Input Symbol
T a b c $
S SaB Error Error S
B Error BbB Error B
C Error Error CcS C

C Rule: 3
A
b=FOLLOW(C)={ $ } b = follow(A)
M[C,$]=C M[A,b] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 67
Example-3: LL(1) parsing
EE+T | T
TT*F | F
F(E) | id
Step 1: Remove left recursion
ETE’
E’+TE’ | ϵ
TFT’
T’*FT’ | ϵ
F(E) | id

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 68
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(E) E’+TE’ | ϵ
E  T E’ Rule 3 TFT’
ETE’ A  Y1 Y2 First(A)=First(Y1) T’*FT’ | ϵ
F(E) | id
FIRST(E)=FIRST(T) = {(, id }

NT First
First(T) E { (,id }
T  F T’ Rule 3 E’
TFT’
A  Y1 Y2 First(A)=First(Y1)
T { (,id }
FIRST(T)=FIRST(F)= {(, id } T’
First(F) F { (,id }
F(E) Fid
F  ( E ) F  id
A  Rule 1 A  Rule 1
add to "#$ % add to "#$ %
FIRST(F)={ ( , id }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 69
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(E’) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
E’  + T E’ Rule 1
add to "#$ % NT First
A 
E { (,id }

E’ E’ { +, }
T { (,id }
T’
E’  Rule 2
F { (,id }
A  add to "#$ %

FIRST(E’)={ + , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 70
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(T’) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
T’  * F T’ Rule 1
add to "#$ % NT First
A 
E { (,id }

T’ E’ { +, }
T { (,id }
T’ { *, }
T’  Rule 2
F { (,id }
A  add to "#$ %

FIRST(T’)={ * , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 71
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(E) E’+TE’ | ϵ
TFT’
Rule 1: Place $ in FOLLOW(E) T’*FT’ | ϵ
F(E) | id
F(E)
NT First Follow
E { (,id } { $,) }
E’ { +, }
F  ( E ) Rule 2 T { (,id }
A  Q B R
T’ { *, }
F { (,id }

FOLLOW(E)={ $, }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 72
Example-3: LL(1) parsing
ETE’
Step 2: Compute FOLLOW E’+TE’ | ϵ
FOLLOW(E’) TFT’
T’*FT’ | ϵ
ETE’ F(E) | id
NT First Follow
E  T E’ Rule 3 E { (,id } { $,) }
A  Q B
E’ { +, } { $,) }
T { (,id }
E’+TE’ T’ { *, }
F { (,id }
E’  +T E’ Rule 3
A  Q B

FOLLOW(E’)={ $,) }

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 73
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
ETE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E  T E’ Rule 2 E { (,id } { $,) }
A  B R
E’ { +, } { $,) }
T { (,id }
T’ { *, }
F { (,id }
E  T E’ Rule 3
A  B R

FOLLOW(T)={ +, $, )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 74
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E’  + T E’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, }
F { (,id }
E’  + T E’ Rule 3
A  Q B R

FOLLOW(T)={ +, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 75
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T’) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T  F T’ Rule 3 E { (,id } { $,) }
A  Q B
E’ { +, } { $,) }
T’*FT’ T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T’  *F T’ Rule 3
A  Q B

FOLLOW(T’)={+ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 76
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T  F T’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T  F T’ Rule 3
A  Q B R

FOLLOW(F)={ *, + ,$ , )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 77
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T’  * F T’ Rule 2 E { (,id } { $,) }
A  Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
T’  * F T’ Rule 3
A  Q B R

FOLLOW(F)={ *,+, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 78
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
ETE’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(TE’)={ (,id } A
a = first( )
M[E,(]=ETE’ M[A,a] = A
M[E,id]=ETE’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 79
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
E’+TE’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(+TE’)={ + } A
a = first( )
M[E’,+]=E’+TE’ M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 80
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
E’ F { (,id } {*,+,$,)}
Rule: 3
b=FOLLOW(E’)={ $,) } A
b = follow(A)
M[E’,$]=E’ M[A,b] = A
M[E’,)]=E’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 81
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
TFT’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(FT’)={ (,id } A
a = first( )
M[T,(]=TFT’ M[A,a] = A
M[T,id]=TFT’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 82
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ T’*FT’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
T’*FT’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(*FT’)={ * } A
a = first( )
M[T’,*]=T’*FT’ M[A,a] = A

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 83
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
NT Input Symbol TFT’
id + * ( ) $ T’*FT’ | ϵ
E ETE’ ETE’ F(E) | id

E’ E’+TE’ E’ E’


NT First Follow
T TFT’ TFT’
E { (,id } { $,) }
T’ T’ T’*FT’ T’ T’
E’ { +, } { $,) }
F
T { (,id } { +,$,) }
T’ T’ { *, } { +,$,) }
b=FOLLOW(T’)={ +,$,) } F { (,id } {*,+,$,)}
Rule: 3
M[T’,+]=T’ A
b = follow(A)
M[T’,$]=T’ M[A,b] = A
M[T’,)]=T’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 84
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ T’ T’*FT’ T’ T’ E’ { +, } { $,) }
F F(E) T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
Rule: 2
F(E) A
a = first( )
a=FIRST((E))={ ( } M[A,a] = A
M[F,(]=F(E)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 85
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ T’ T’*FT’ T’ T’ E’ { +, } { $,) }
F Fid F(E) T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
Rule: 2
Fid A
a = first( )
a=FIRST(id)={ id } M[A,a] = A
M[F,id]=Fid
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 86
Example-3: LL(1) parsing
Step 4: Make each undefined entry of table be Error
NT Input Symbol
id + * ( ) $
E ETE’ Error Error ETE’ Error Error
E’ Error E’+TE’ Error Error E’ E’
T TFT’ Error Error TFT’ Error Error
T’ Error T’ T’*FT’ Error T’ T’
F Fid Error Error F(E) Error Error

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 87
Example-3: LL(1) parsing
Step 4: Parse the string : id + id * id $ NT Input Symbol
id + * ( ) $
STACK INPUT OUTPUT
E ETE’ Error Error ETE’ Error Error
E$ id+id*id$
E’ Error E’+TE’ Error Error E’ E’
TE’$ id+id*id$ ETE’
T TFT’ Error Error TFT’ Error Error
FT’E’$ id+id*id$ TFT’
T’ Error T’ T’*FT’ Error T’ T’
idT’E’$ id+id*id$ Fid
F Fid Error Error F(E) Error Error
T’E’$ +id*id$
E’$ +id*id$ T’
+TE’$ +id*id$ E’+TE’
TE’$ id*id$ FT’E’$ id$
FT’E’$ id*id$ TFT’ idT’E’$ id$ Fid
idT’E’$ id*id$ Fid T’E’$ $
T’E’$ *id$ E’$ $ T’
*FT’E’$ *id$ T*FT’ $ $ E’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 88
Parsing methods
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 89
Recursive descent parsing
A top down parsing that executes a set of recursive procedure to process the input without
backtracking is called recursive descent parser.
There is a procedure for each non terminal in the grammar.
Consider RHS of any production rule as definition of the procedure.
As it reads expected input symbol, it advances input pointer to next position.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 90
Example: Recursive descent parsing
Procedure E Proceduce Match(token t)
Procedure T
{ {
{
If lookahead=num If lookahead=t
If lookahead=’*’
{ lookahead=next_token;
{
Match(num); Else
Match(‘*’);
T(); Error();
If lookahead=num
} }
{
Else
Match(num);
Error();
T(); Procedure Error
} {
If lookahead=$
Else Print(“Error”);
{
Error(); }
Declare success;
}
}
Else
Else
Error();
}
NULL E num T
} T * num T |
3 * 4 $ Success

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 91
Example: Recursive descent parsing
Procedure E Procedure T Proceduce Match(token t)
{ { {
If lookahead=num If lookahead=’*’ If lookahead=t
{ { lookahead=next_token;
Match(num); Match(‘*’); Else
T(); If lookahead=num Error();
} { }
Else Match(num);
Error(); T(); Procedure Error
If lookahead=$ } {
{ Else Print(“Error”);
Declare success; Error(); }
}
Else }
Error(); Else
} NULL E num T
} T * num T |
3 * 4 $ Success 3 4 * $ Error
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 92
Handle & Handle pruning
Handle: A “handle” of a string is a substring of the string that matches the right side of a
production, and whose reduction to the non terminal of the production is one step along the
reverse of rightmost derivation.
Handle pruning: The process of discovering a handle and reducing it to appropriate left hand
side non terminal is known as handle pruning.
EE+E
EE*E String: id1+id2*id3
Eid
Rightmost Derivation Right sentential form Handle Production
id1+id2*id3 id1 Eid
E
E+E E+id2*id3 id2 Eid
E+E*E E+E*id3 id3 Eid
E+E*id3 E+E*E E*E EE*E
E+id2*id3 E+E E+E EE+E
id1+id2*id3 E
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 94
Shift reduce parser
The shift reduce parser performs following basic operations:
1. Shift: Moving of the symbols from input buffer onto the stack, this action is called shift.
2. Reduce: If handle appears on the top of the stack then reduction of it by appropriate rule is
done. This action is called reduce action.
3. Accept: If stack contains start symbol only and input buffer is empty at the same time then
that action is called accept.
4. Error: A situation in which parser cannot either shift or reduce the symbols, it cannot even
perform accept action then it is called error action.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 95
Example: Shift reduce parser
Grammar: Stack Input Buffer Action
EE+T | T $ id+id*id$ Shift
TT*F | F $id +id*id$ Reduce Fid
Fid $F +id*id$ Reduce TF
String: id+id*id $T +id*id$ Reduce ET
$E +id*id$ Shift
$E+ id*id$ Shift
$E+id *id$ Reduce Fid
$E+F *id$ Reduce TF
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ Reduce Fid
$E+T*F $ Reduce TT*F
$E+T $ Reduce EE+T
$E $ Accept
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 96
Viable Prefix
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce
parser are called viable prefixes.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 97
Parsing Methods

Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
Parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 98
Operator precedence parsing
Operator Grammar: A Grammar in which there is no Є in RHS of any production or no adjacent
non terminals is called operator grammar.
Example: E EAE | (E) | id
A + | * | -
Above grammar is not operator grammar because right side EAE has consecutive non
terminals.
In operator precedence parsing we define following disjoint relations:

Relation Meaning
a<.b a “yields precedence to” b
a=b a “has the same precedence as” b
a.>b a “takes precedence over” b

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 100
Precedence & associativity of operators

Operator Precedence Associative


↑ 1 right
*, / 2 left
+, - 3 left

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 101
Steps of operator precedence parsing
1. Find Leading and trailing of non terminal
2. Establish relation
3. Creation of table
4. Parse the string

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 102
Leading & Trailing
Leading:- Leading of a non terminal is the first terminal or operator in production of that non
terminal.
Trailing:- Trailing of a non terminal is the last terminal or operator in production of that non
terminal.
Example: EE+T | T
TT*F | F
Fid

Non terminal Leading Trailing


E {+,*,id} {+,*,id}
T {*,id} {*,id}
F {id} {id}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 103
Rules to establish a relation
.
1. For a = b, ⇒ * X, where is or a single non terminal [e.g : (E)]
2. a <.b ⇒ N[ . % 4ℎ67 N[ \. O6*]+7^ % [e.g : +T]
3. a .>b ⇒ % . N[ 4ℎ67 %S*+A+7^ % . _ N[ [e.g : E+]
4. $ <. Leading (start symbol)
5. Trailing (start symbol) .> $

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 104
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E +T| T
Nonterminal Leading Trailing T T *F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}

Step 2: Establish Relation Step3: Creation of Table


+ * id $
1. a <.b + .> <. <. .>

* .> .> <. .>


2. N[ . % N[ \. O6*]+7^ %
id .> .> .>
3. `% ` \. ∗, +]
$ <. <. <.
4. ∗ " ∗ \. b+]}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 105
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E+ T| T
Nonterminal Leading Trailing T T* F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}

Step2: Establish Relation Step3: Creation of Table


1. a .>b + * id $
+ .> <. <. .>
2. % . N[ %S*+A+7^ % ._ N[
* .> .> <. .>
3. c ` `,∗, +] ._ `
.> .> .>
id
4. % ∗ b∗, +]}. _∗ $ <. <. <.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 106
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E+ T| T
Nonterminal Leading Trailing T T* F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}

Step 2: Establish Relation Step 3: Creation of Table


+ * id $
1. $<.Leading (start symbol) + .> <. <. .>
2. $ <. `,∗, +] * .> .> <. .>
3. Trailing (start symbol) .> $ id .> .> .>

4. `,∗, +] .> $ $ <. <. <.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 107
Example: Operator precedence parsing
Step 4: Parse the string using precedence table
Assign precedence operator between terminals
String: id+id*id
+ * id $
$ id+id*id $ + .> <. <. .>

$ <. id+id*id$ * .> .> <. .>

id .> .> .>


$ <. id .> +id*id$
$ <. <. <.
$ <. id .> + <. id*id$
$ <. id .> + <. id .> *id$
$ <. id .> + <. id .> *<. id$
$ <. id .> + <. id .> *<. id .> $

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 108
Example: Operator precedence parsing
Step 4: Parse the string using precedence table EE+T | T
1. Scan the input string until first .> is encountered. TT*F | F
Fid
2. Scan backward until <. is encountered.
3. The handle is string between <. and .>
$ <. Id .> + <. Id .> * <. Id .> $ Handle id is obtained between <. and .> + * id $
Reduce this by Fid + .> <. <. .>
$ F + <. Id .> * <. Id .> $ Handle id is obtained between <. and .> .> .>
Reduce this by Fid
* <. .>

id .> .> .>


$ F + F * <. Id .> $ Handle id is obtained between <. and .>
Reduce this by Fid $ <. <. <.
$F+F*F$ Perform appropriate reductions of all nonterminals.
$E+T*F$ Remove all non terminals.
$ + * $ Place relation between operators
$ <. + <. * >$ The * operator is surrounded by <. and .>. This
indicates * becomes handle so reduce by TT*F.
$ <. + >$ + becomes handle. Hence reduce by EE+T.

$ $ Parsing Done

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 109
Operator precedence function
Algorithm for constructing precedence functions
1. Create functions ?* and ^* for each a that is terminal or $.
2. Partition the symbols in as many as groups possible, in such a way that ?* and ^X are in the same
group if * X.
3. Create a directed graph whose nodes are in the groups, next for each symbols * *7] X do:
a) if * \d X, place an edge from the group of ^X to the group of ?*
b) if * d_ X, place an edge from the group of ?* to the group of ^X
4. If the constructed graph has a cycle then no precedence functions exist. When there are no cycles
collect the length of the longest paths from the groups of ?* and ^X respectively.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 111
Operator precedence function
1. Create functions fa and ga for each a that is terminal or $. E E+T | T
T T*F | F
F id
* b`,∗, +]} @S $

f+ f* fid f$

g+ g* gid g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 112
Operator precedence function
Partition the
.
symbols in as many as groups possible, in such a way that fa and gb are in the same
group if a = b.

+ * id $
+ .> <. <. .>
gid fid .> .>
* <. .>

id .> .> .>

$ <. <. <.


f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 113
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb

g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>

id .> .> .>

f* g* $ <. <. <.

f+ .> g+ f+  g+
g+ f+
f* .> g+ f*  g+
fid .> g+ fid  g+
f$ g$ f$ <. g+ f$  g+

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 114
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb

g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>

id .> .> .>

f* g* $ <. <. <.

f+ <. g* f+  g*
g+ f+
f* .> g* f*  g*
fid .> g* fid  g*
f$ g$ f$ <. g* f$  g*

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 115
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb

g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>

id .> .> .>

f* g* $ <. <. <.

f+ <. gid f+  gid


g+ f+
f* <. gid f*  gid
f$ <. gid f$  gid
f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 116
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb

g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>

id .> .> .>

f* g* $ <. <. <.

f+ <. g$ f+  g$
g+ f+
f* <. g$ f*  g$
fid <. g$ fid  g$
f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 117
Operator precedence function

+ * id $
f 2
gid fid g

f* g*
4. If the constructed graph has
a cycle then no precedence
g+ f+ functions exist. When there
are no cycles collect the
length of the longest paths
f$ g$ from the groups of fa and gb
respectively.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 118
Operator precedence function

+ * id $

gid fid f 2
g 1

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 119
Operator precedence function

+ * id $

gid fid f 2 4
g 1

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 120
Operator precedence function

+ * id $

gid fid f 2 4
g 1 3

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 121
Operator precedence function

+ * id $

gid fid f 2 4 4
g 1 3

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 122
Operator precedence function

+ * id $

gid fid f 2 4 4
g 1 3 5

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 123
Operator precedence function

+ * id $

gid fid f 2 4 4 0
g 1 3 5 0

f* g*

g+ f+

f$ g$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 124
Parsing Methods
Parsing

Top down parsing Bottom up parsing (Shift reduce)

Back tracking Operator precedence

Parsing without
backtracking (predictive LR parsing
Parsing)
SLR
LL(1)
CLR
Recursive
descent LALR

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 125
LR parser
LR parsing is most efficient method of bottom up parsing which can be used to parse large
class of context free grammar.
The technique is called LR(k) parsing:
1. The “L” is for left to right scanning of input symbol,
2. The “R” for constructing right most derivation in reverse,
3. The “k” for the number of input symbols of look ahead that are used in making parsing
decision. a + b $ INPUT

X
LR parsing
Y
program OUTPUT
Z
$
Parsing Table
Action Goto
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 127
Computation of closure & goto function
SAS | b
S’S.
ASA | a
Closure(I): SA.S
S.AS
S.b
A.SA
A.a
S’.S
S.AS Goto(I,b)
S.b Sb.
A.SA AS.A
A.a A.SA
A.a
S.AS
S.b
Aa.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 129
Example: SLR(1)- simple LR
S  AA
S AA . e5
A  aA | b S’ S. e
@ 4@ #2, A a . A
e
ef @ 4@ #0, A. aA e3
S A . A A. b
S’.S
A. aA
S. AA
A. b A b. e4
A. aA @ 4@ #2, X
A. b
e3 A aA . e6 LR(0) item set
Augmented
grammar A a . A
@ 4@ #0, X A. aA @ 4@ #3, * A a . A
A. b e3
A b. e4 A. aA
A. b

A b. e4

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 131
Rules to construct SLR parsing table
1. Construct h b #0, #1, … … . #7}, the collection of sets of LR(0) items for ’.
2. State + is constructed from #+ . The parsing actions for state + are determined as follow :
a) If V → . * W is in #+ and GOTO #+ , * #1 , then set h%#N V+, *W to “shift j”. Here a must be
terminal.
b) If V → . W is in #+ , then set h%#N V+, *W to “reduce A ” for all a in "NOON ; here A may not be
S’.
c) If V → . W is in #+ , then set action V+, $W to “accept”.
3. The goto transitions for state i are constructed for all non terminals A using
the+? N%N #+ , #1 4ℎ67 N%N V+, W 1.
4. All entries not defined by rules 2 and 3 are made error.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 132
Example: SLR(1)- simple LR
S AA . e5 "@AA@B b$}
S’ S. e
@ 4@ #2, A a . A "@AA@B b*, X, $}
e
ef @ 4@ #0, A. aA e3
S A . A A. b Action Go to
S’. S
A. aA Item a b $ S A
S. AA
A. b A b. e4 set
A. aA @ 4@ #2, X
0 S3 S4 1 2
A. b
e3 A aA . e6 1 Accept
2 S3 S4 5
A a . A
3 S3 S4 6
@ 4@ #0, X A. aA @ 4@ #3, * A a . A 4 R3 R3 R3
A. b e3
A b. e4 A. aA
5 R1
A. b
6 R2 R2 R2
S  AA
A  aA | b A b. e4

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 133
How to calculate look ahead?
How to calculate look ahead?
SCC
S’  . S , $
C cC | d
A  . X i , j
Closure(I)
Lookahead First ij
S’.S,$ First $
$
S.CC, $
C.cC, c|d S  . C C , $
C.d, c|d A  . X i , j
Lookahead First ij
First h$
q, r

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 135
Example: CLR(1)- canonical LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,S
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X
e8
A.b, a|b
e3 A aA.,a|b LR(1) item set
Augmented
grammar Aa.A, a|b e3
@ 4@ #0, X A.aA ,a|b @ 4@ #3, * A a.A , a|b
A. b, a|b A.aA , a|b
A b., a|b e4
A.b , a|b
S  AA
A  aA | b e4 A b., a|b

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 136
Example: CLR(1)- canonical LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,$
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X Item Action Go to
e8 set a b $ S A
A.b, a|b
e3 A aA.,a|b 0 S3 S4 1 2
1 Accept
Aa.A, a|b e3 2 S6 S7 5
@ 4@ #0, X A.aA ,a|b @ 4@ #3, * A a.A , a|b 3 S3 S4 8
A. b, a|b 4 R3 R3
A b., a|b e4 A.aA , a|b
5 R1
A.b , a|b 6 S6 S7 9
S  AA 7 R3
A  aA | b 8 R2 R2
e4 A b., a|b
9 R2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 137
Example: LALR(1)- look ahead LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,$
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X e36 CLR
e8
A.b, a|b
e3 A aA.,a|b Aa.A, a|b|$
e3 A.aA , a|b|$
Aa.A, a|b
@ 4@ #0, X
A. b, a|b|$
A.aA ,a|b @ 4@ #3, * A a.A , a|b
A. b, a|b A.aA , a|b e47
A b., a|b e4
A.b , a|b A b., a|b|$
S  AA e89
A  aA | b e4 A b., a|b A aA.,a|b|$

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 139
Example: LALR(1)- look ahead LR

Item Action Go to
set a b $ S A
0 S3 S4 1 2 Item Action Go to
1 Accept set a b $ S A
2 S6 S7 5 0 S36 S47 1 2
3 S3 S4 8 1 Accept
4 R3 R3 2 S36 S47 5
5 R1 36 S36 S47 89
6 S6 S7 9 47 R3 R3 R3
5 R1
7 R3
89 R2 R2 R2
8 R2 R2
9 R2

CLR Parsing Table LALR Parsing Table

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 140
(YACC)
YACC tool or YACC Parser Generator
YACC is a tool which generates the parser.
It takes input from the lexical analyzer (tokens) and produces parse tree as an output.

Yacc
Yacc Compiler y.tab.c
specification
(translate.y)

y.tab.c C Compiler a.out

Input a.out output

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 142
Structure of Yacc Program
Any Yacc program contains mainly three sections
1. Declaration
2. Translation rules
3. Supporting C-routines

Structure of Program
<left side><alt 1>|<alt 2>|……..|<alt n>
Declaration It is used to declare variables, constant & Header files
%%
Example:
<left side> : <alt 1> {semantic action 1}
%{
%% | <alt 2> {semantic action 2}
int x,y;
Translation rule | <alt n> {semantic action n}
const int digit=50;
%% %%
#include <ctype.h>
%}
Supporting C routines All the function needed are specified over here.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 143
Example: Yacc Program
Program: Write Yacc program for simple desk calculator
/* Declaration */ /* Translation rule */ /* Supporting C routines*/
%{ %% yylex()
#include <ctype.h> line : expr ‘\n’ {print(“%d\n”,$1);} {
%} expr : expr ‘+’ term {$$=$1 + $3;} int c;
% token DIGIT | term; c=getchar();
term : term ‘*’ factor {$$=$1 * $3;} if(isdigit(c))
| factor; {
factor : ‘(‘ expr ‘)’ {$$=$2;} yylval= c-’0’
| DIGIT; return DIGIT
%% }
EE+T | T return c;
TT*F | F }
F(E) | id
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 144
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (I) 145
Compiler Design (CD)
GTU # 3170701

Unit – 3
Syntax Analysis (II)

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Syntax directed definitions
• Synthesized attributes
• Inherited attribute
• Dependency graph
• Evaluation order
• Construction of syntax tree
• Bottom up evaluation of S-attributed definitions
• L-Attributed definitions
• Translation scheme
Syntax directed definitions
Syntax directed definition is a generalization of context free grammar in which each grammar
symbol has an associated set of attributes.
The attributes can be a number, type, memory location, return type etc….
Types of attributes are:
1. Synthesized attribute
2. Inherited attribute

E. Value
Memory
Type
Return location
Type

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 4
Synthesized attributes
Value of synthesized attribute at a node can be computed from the value of attributes at the children of that
node in the parse tree.
A syntax directed definition that uses synthesized attribute exclusively is said to be S-attribute definition.
Example: Syntax directed definition of simple desk calculator
Production Semantic rules

L  En Print (E.val)

E  E1+T E.val = E1.val + T.val

ET E.val = T.val

T  T1*F T.val = T1.val * F.val

TF T.val = F.val

F  (E) F.val = E.val

F  digit F.val = digit.lexval

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 5
Example: Synthesized attributes
String: 3*5+4n;
Production Semantic rules
L
L  En Print (E.val)
n
E.val=19
E  E1+T E.Val = E1.val + T.val

ET E.Val = T.val


+ T.val=4
E.val=15
T  T1*F T.Val = T1.val * F.val
The process of computing the
T.val=15 F.val=4 attribute values at the node is TF T.Val = F.val
called annotating or decorating
the parse tree F  (E) F.Val = E.val
* digit.lexval=4
T.val=3 F.val=5 F  digit F.Val = digit . lexval

F.val=3 digit.lexval=5
parse tree showing the value
digit.lexval=3 of the attributes at each node
is called Annotated parse tree
Annotated parse tree for 3*5+4n

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 6
Exercise
Draw Annotated Parse tree for following:
1. 7+3*2n
2. (3+4)*(5+6)n

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 7
Syntax directed definition to translates arithmetic expressions from infix to prefix notation

Production Semantic rules


LE Print(E.val)
EE+T E.val=’+’ E.val T.val
EE-T E.val=’-‘ E.val T.val
ET E.val= T.val
TT*F T.val=’*’ T.val F.val
TT/F T.val=’/’ T.val F.val
TF T.val= F.val
FF^P F.val=’^’ F.val P.val
FP F.val= P.val
P(E) P.val= E.val
Pdigit P.val=digit.lexval

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 8
Inherited attribute
An inherited value at a node in a parse tree is computed from the value of attributes at the
parent and/or siblings of the node.

Production Semantic rules


D→TL L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 , id L1.in = L.in, addtype(id.entry,L.in)
L → id addtype(id.entry,L.in)

Syntax directed definition with inherited attribute L.in

Symbol T is associated with a synthesized attribute type.


Symbol L is associated with an inherited attribute in.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 9
Example: Inherited attribute
Example: Pass data types to all identifier real id1,id2,id3

D
Production Semantic rules
D→TL L.in = T.type LL.in=real
T
T.type=real
T → int T.type = integer
real ,
T → real T.type = real id
id3
L1
L.in=real
L → L1 , id L1.in = L.in,
addtype(id.entry,L.in)
,
L → id addtype(id.entry,L.in) L.in=real
L1 id2
id

id
id1

L → Lid
DTL 1 , id

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 10
Dependency graph
The directed graph that represents the interdependencies between synthesized and inherited
attribute at nodes in the parse tree is called dependency graph.
For the rule XYZ the semantic action is given by X.x=f(Y.y, Z.z) then synthesized attribute X.x
depends on attributes Y.y and Z.z.
The basic idea behind dependency graphs is for a compiler to look for various kinds of
dependency among statements to prevent their execution in wrong order.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 12
Algorithm : Dependency graph
for each node n in the parse tree do
for each attribute a of the grammar symbol at n do
Construct a node in the dependency graph for a;
for each node n in the parse tree do
for each semantic rule b=f(c1,c2,…..,ck)
associated with the production used at n do
for i=1 to k do
construct an edge from the node for Ci to the node for b;

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 13
Example: Dependency graph
Example: Production Semantic rules
EE1+E2 EE1+E2 E.val = E1.val+E2.val

Parse tree Dependency graph


val
E

E1 E2
val + val

. is synthesized from . and .

The edges to E.val from E1.val and E2.val shows that E.val is depends on E1.val and E2.val

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 14
Evaluation order
A topological sort of a directed acyclic graph is any ordering 1, 2, … … … . . , of the nodes
of the graph such that edges go from nodes earlier in the ordering to later nodes.
If  is an edge from to then appears before in the ordering.

1 T.type=real L.in=real 2

real 3 ,
L.in=real id3 4

,
L.in=real id2 6
5

7 id1

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 15
Construction of syntax tree
Following functions are used to create the nodes of the syntax tree.
1. Mknode (op,left,right): creates an operator node with label op and two fields containing
pointers to left and right.
2. Mkleaf (id, entry): creates an identifier node with label id and a field containing entry, a
pointer to the symbol table entry for the identifier.
3. Mkleaf (num, val): creates a number node with label num and a field containing val, the
value of the number.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 17
Construction of syntax tree for expressions
Example: construct syntax tree for a-4+c
P1: mkleaf(id, entry for a); P5 +
P2: mkleaf(num, 4);
P3: mknode(‘-‘,p1,p2);
P4: mkleaf(id, entry for c); P3 id
- P4
P5: mknode(‘+’,p3,p4);
Entry for c

P1 id P2 Num 4

Entry for a

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 18
Bottom up evaluation of S-attributed definitions
S-attributed definition is one such class of syntax directed definition with synthesized
attributes only.
Synthesized attributes can be evaluated using bottom up parser only.
Synthesized attributes on the parser stack
Consider the production AXYZ and associated semantic action is A.a=f(X.x, Y.y, Z.z)

State Value State Value


top-2 . top .
top-1 .
top .
Before reduction After reduction

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 20
Bottom up evaluation of S-attributed definitions
Production Semantic rules Input State Val Production Used
L  En Print (val[top]) 3*5n - -
E  E1+T val[top]=val[top-2] + val[top] *5n 3 3
*5n F 3 Fdigit
ET
*5n T 3 TF
T  T1*F val[top]=val[top-2] * val[top]
5n T* 3
TF n T*5 3,5
F  (E) val[top]=val[top-2] - val[top] n T*F 3,5 Fdigit

F  digit n T 15 TT1*F
n E 15 ET
Implementation of a desk calculator En 15
with bottom up parser L 15 L  En
Move made by translator

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 21
L-Attributed definitions
A syntax directed definition is L-attributed if each inherited attribute of , 1 , on
the right side of  1, 2 … depends only on:
1. The attributes of the symbols 1, 2, … j-1 to the left of in the production and
2. The inherited attribute of A.
Example: Production Semantic Rules
A LM L.i:=l(A.i)
M.i=m(L.s)
AXYZ A.s=f(M.s)
A QR R.i=r(A.i)
Q.i=q(R.s)
NotL-
L-Attributed
Attributed
 A.s=f(Q.s)

Above syntax directed definition is not L-attributed because the inherited attribute Q.i of the
grammar symbol Q depends on the attribute R.s of the grammar symbol to its right.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 23
Translation Scheme
Translation scheme is a context free grammar in which attributes are associated with the
grammar symbols and semantic actions enclosed between braces { } are inserted within the
right sides of productions.
Attributes are used to evaluate the expression along the process of parsing.
During the process of parsing the evaluation of attribute takes place by consulting the semantic
action enclosed in { }.
A translation scheme generates the output by executing the semantic actions in an ordered
manner.
This process uses the depth first traversal.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 25
Example: Translation scheme (Infix to postfix notation)
String: 9-5+2 ETR
R addop !". #$ $ $ R1 | %
E
T num & .' #

T R

- R
9 {Print(9)} T {Print(-)}

5 {Print(5)} + R
T {Print(+)}

2 {Print(2)} %

Now, Perform Depth first traversal Postfix=95-2+

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 26
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS)  Unit
Unit31––Syntax
Basic Probability
Analysis (II) 27
Compiler Design (CD)
GTU # 3170701

Unit – 4
Error Recovery

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Types of errors
• Error recovery strategies
Types of Errors

Errors

Compile time Run time

Lexical Syntactic Semantic


Phase error Phase error Phase error

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 4
Lexical error
Lexical errors can be detected during lexical analysis phase.
Typical lexical phase errors are:
1. Spelling errors
2. Exceeding length of identifier or numeric constants
3. Appearance of illegal characters
Example:
fi ( )
{
}
In above code 'fi' cannot be recognized as a misspelling of keyword if rather lexical analyzer
will understand that it is an identifier and will return it as valid identifier.
Thus misspelling causes errors in token formation.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 5
Syntax error
Syntax error appear during syntax analysis phase of compiler.
Typical syntax phase errors are:
1. Errors in structure
2. Missing operators
3. Unbalanced parenthesis
The parser demands for tokens from lexical analyzer and if the tokens do not satisfies the
grammatical rules of programming language then the syntactical errors get raised.
Example:

printf(“Hello World !!!”) Error: Semicolon missing

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 6
Semantic error
Semantic error detected during semantic analysis phase.
Typical semantic phase errors are:
1. Incompatible types of operands
2. Undeclared variable
3. Not matching of actual argument with formal argument
Example:
id1=id2+id3*60 (Note: id1, id2, id3 are real)
(Directly we can not perform multiplication due to incompatible types of variables)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 7
Error recovery strategies (Ad-Hoc & systematic methods)
There are mainly four error recovery strategies:
1. Panic mode
2. Phrase level recovery
3. Error production
4. Global generation

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 9
Panic mode
In this method on discovering error, the parser discards input symbol one at a time. This
process is continued until one of a designated set of synchronizing tokens is found.
Synchronizing tokens are delimiters such as semicolon or end.
These tokens indicate an end of the statement.
If there is less number of errors in the same statement then this strategy is best choice.

fi ( ) Scan entire line otherwise scanner will return fi as valid identifier


{
}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 10
Phrase level recovery
In this method, on discovering an error parser performs local correction on remaining input.
The local correction can be:
1. Replacing comma by semicolon
2. Deletion of semicolons
3. Inserting missing semicolon
This type of local correction is decided by compiler designer.
This method is used in many error-repairing compilers.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 11
Error production
If we have good knowledge of common errors that might be encountered, then we can augment
the grammar for the corresponding language with error productions that generate the
erroneous constructs.
Then we use the grammar augmented by these error production to construct a parser.
If error production is used then, during parsing we can generate appropriate error message and
parsing can be continued.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 12
Global correction
Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a
related string y, such that number of insertions, deletions and changes of token require to
transform x into y is as small as possible.
Such methods increase time and space requirements at parsing time.
Global correction is thus simply a theoretical concept.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 13
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)  Unit
Unit1 4––Basic
ErrorProbability
Recovery 14
Compiler Design (CD)
GTU # 3170701

Unit – 5
Intermediate Code Generation

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Different intermediate forms
• Different representation of Three Address code
Different intermediate forms
Different forms of intermediate code are:

1. Abstract syntax tree


2. Postfix notation
3. Three address code

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 4
Abstract syntax tree & DAG
A syntax tree depicts the natural hierarchical structure of a source program.
A DAG (Directed Acyclic Graph) gives the same information but in a more compact way
because common sub-expressions are identified.
Ex: a = b * -c+b*-c

= =

+
a +
a

* * *

b uminus b uminus b uminus

c c Syntax Tree c DAG

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 5
Postfix Notation
Postfix notation is a linearization of a syntax tree.
In postfix notation the operands occurs first and then operators are arranged.
Ex: (A + B) * (C + D)

Postfix notation: A B + C D + *

Ex: (A + B) * C

Postfix notation: A B + C *

Ex: (A * B) + (C * D)
Postfix notation: A B * C D * +
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 6
Three address code
Three address code is a sequence of statements of the general form,
a:= b op c
Where a, b or c are the operands that can be names or constants and op stands for any
operator.
Example: a = b + c + d
t1=b+c
t2=t1+d
a= t2
Here t1 and t2 are the temporary names generated by the compiler.
There are at most three addresses allowed (two for operands and one for result). Hence, this
representation is called three-address code.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 7
Different Representation of Three Address Code
There are three types of representation used for three address code:
1. Quadruples
2. Triples
3. Indirect triples
Ex: x= -a*b + -a*b
t 1= - a
t2 = t1 * b
t 3= - a
Three Address Code
t4 = t3 * b
t5 = t2 + t4
x= t5

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 9
Quadruple
The quadruple is a structure with at the most four fields such as op, arg1, arg2 and result.
The op field is used to represent the internal code for operator.
The arg1 and arg2 represent the two operands.
And result field is used to store the result of an expression.

Quadruple
No. Operator Arg1 Arg2 Result
x= -a*b + -a*b
t1= - a (0) uminus a t1
t2 = t1 * b (1) * t1 b t2
t3= - a (2) uminus a t3
t4 = t3 * b (3) * t3 b t4
t5 = t2 + t4
(4) + t2 t4 t5
x= t5
(5) = t5 x

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 10
Triple
To avoid entering temporary names into the symbol table, we might refer a temporary value by
the position of the statement that computes it.
If we do so, three address statements can be represented by records with only three fields: op,
arg1 and arg2.

Quadruple Triple
No. Operator Arg1 Arg2 Result No. Operator Arg1 Arg2
(0) uminus a t1 (0) uminus a
(1) * t1 b t2 (1) * (0) b
(2) uminus a t3 (2) uminus a
(3) * t3 b t4 (3) * (2) b
(4) + t2 t4 t5 (4) + (1) (3)
(5) = t5 x (5) = x (4)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 11
Indirect Triple
In the indirect triple representation the listing of triples has been done. And listing pointers are
used instead of using statement.
This implementation is called indirect triples.

Triple Indirect Triple

No. Operator Arg1 Arg2 Statement No. Operator Arg1 Arg2


(0) uminus a (0) (14) (0) uminus a
(1) * (0) b (1) (15) (1) * (14) b
(2) uminus a (2) (16) (2) uminus a
(3) * (2) b (3) (17) (3) * (16) b
(4) + (1) (3) (4) (18) (4) + (15) (17)
(5) = x (4) (5) (19) (5) = x (18)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 12
Exercise
Write quadruple, triple and indirect triple for following:
1. -(a*b)+(c+d)
2. a*-(b+c)
3. x=(a+b*c)^(d*e)+f*g^h
4. g+a*(b-c)+(x-y)*d

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 13
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) 
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 14
Compiler Design (CD)
GTU # 3170701

Unit – 6
Run-Time Environments

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Source language issues
• Storage organization
• Storage allocation strategies
Run-Time Environments
As execution proceeds, the same name is the source text can be denote different data objects
in the target machine.
Each execution of a procedure is referred to as an activation of the procedure.
If the procedure is recursive, several of its activation may be alive at same time.
Source language issues:
1. Procedures
2. Activation tree
3. Control stack
4. The Scope of a declaration
5. Binding of names

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 4
Procedures
A program is made up of procedures.
Procedure: declaration that associate an identifier with a statement.
Identifier: procedure name
Statement: procedure body
Procedure call: procedure name appears within an executable statement.
Example:
main() void quicksort(int m, int n) void readarray()
{ { {
int n; int i=partition(m, n); ………
readarray(); quicksort(m, i-1); }
quicksort(1, n); quicksort(i+1, n); int partition(int y, int z)
} } {
……
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 5
Activation Tree
main() void quicksort(int m, int n)
If ‘a’ and ‘b’ are two procedures, their activations void
willreadarray()
be:
{ { {
Non-overlapping: when one is called after other
int n; nested procedure int i=partition(m, n);
Nested: ………
readarray();procedure: new quicksort(m,
Recursive activation i-1); }
begins before an earlier
activation ofn);
quicksort(1, the same procedure has ended.
quicksort(i+1, n); int partition(int y, int z) main
} An activation tree shows
} the way control enters
{ and leaves
readarray() quicksort(1,10)
activations. ……
} Partition(1,10) quicksort(6,10)
main() void quicksort(int m, int n) void readarray()
Properties of activation trees are :- quicksort(1,4)
{ { {
1. Each node represents an activation of a procedure.
int n; int i=partition(m, n); ……… Partition(1,4) quicksort(4,4)
2. The root shows the activation of quicksort(m,
readarray();
the main function.
i-1); } quicksort(1,2)
3. The node for procedure
quicksort(1, n); ‘a’ is thequicksort(i+1,
parent of noden); for procedure ‘b’ y, int z)
int partition(int Activation Tree
}
if and only if the control flows
}
from procedure a to procedure
{
b.
4. If node ‘a’ is left of the node ‘b’ if and only if the lifetime of ……
a
occurs before the lifetime of b.
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 6
Control stack
Control stack or runtime stack is used to keep track of the live procedure activations i.e the procedures
whose execution have not been completed.
A procedure name is pushed on to the stack when it is called (activation begins) and it is popped when it
returns (activation ends).
Information needed by a single execution of a procedure is managed using an activation record or
frame. When a procedure is called, an activation record is pushed into the stack and as soon as the
control returns to the caller function the activation record is popped.
main

readarray() quicksort(1,10) qs(4,4)


qs(1,4)
Partition(1,10) qs(1,10)
quicksort(1,4) main

Partition(1,4) quicksort(4,4)
quicksort(1,2)

Activation Tree
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 7
Scope of a declaration
A declaration in a language is a syntactic construct that associate information with a name.
Var i: integer;
There may be declaration of the same name in different parts of a program.
The scope rules of a language determine which declaration of a name applies when the name
appears in the text of a program.
The portion of the program, to which declaration applies is called the scope of the declaration.
An occurrence of a name in a procedure is said to be local to the procedure if it is in the scope
of a declaration within the procedure; otherwise the occurrence is said to be nonlocal.
The distinction between local and non local names carries over to any syntactic construct that
can have declaration within it.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 8
Binding of names
Environment: function that maps a name to a storage location.
State: function that maps a storage location to the value held there.

Environment State

Name Storage Value


(l-value) (r-value)

When an environment associates storage location s with a name xx is bound to s.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 9
Storage organization
The executing target program runs in it’s own logical address space in which each program
value has a location.
The management and organization of this logical address space is shared between the
compiler, operating system and target machine.
The operating system maps the logical address into the physical address, which are usually
spread throughout memory.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 11
Subdivision of Runtime Memory
The compiler demands for a block of memory to operating system.
The compiler utilizes this block of memory executing the compiled program. This block of
memory is called run time storage.
The run time storage is subdivided to hold code and data such as, the generated target code
and data objects.
The size of generated code is fixed. Hence the target code occupies the determined area of the
memory.

Code area
Static data area
Stack
Free
Heap

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 12
Subdivision of Runtime Memory
Code block consisting of a memory location for code.
The amount of memory required by the data objects is known at the compiled time and hence
data objects also can be placed at the statically determined area of the memory.
Stack is used to manage the active procedure.
Managing of active procedures means when a call occurs then execution of activation is
interrupted and information about status of the stack is saved on the stack.
Heap area is the area of run time storage in which the other information is stored.

Code area Memory location for code are determined at compile time
Static data area Location of static data can also be determine at compile time
Stack Data objects allocated at run time (Activation Record)

Heap Other dynamically allocated data objects at run time


(ex: Malloc area in C)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 13
Activation Record
Control stack is a run time stack which is used to keep track of the live procedure activations
i.e. it is used to find out the procedures whose execution have not been completed.
When it is called (activation begins) then the procedure name will push on to the stack and
when it returns (activation ends) then it will popped.
Activation record is used to manage the information needed by a single execution of a
procedure.
An activation record is pushed into the stack when a procedure is called and it is popped when
the control returns to the caller function.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 14
Activation Record
The execution of a procedure is called its activation.
An activation record contains all the necessary information required to call a procedure.
Return value: used by the called procedure to return a value to calling
procedure. Return value
Actual parameters: This field holds the information about the actual Actual parameters
parameters. Control link
Control link (optional): points to activation record of caller. Access link
Access link (optional): refers to non-local data held in other activation Machine status
records.
Local variable
Machine status: holds the information about status of machine just Temporary values
before the function call.
Local variables: hold the data that is local to the execution of the
procedure.
Temporary values: stores the values that arise in the evaluation of an
expression.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 15
Compile-Time Layout of Local Data
The amount of storage needed for a name is determined from its type. (e.g.: int, char, float…)
Storage for an aggregate, such as an array or record, must be large enough to hold all it’s
components.
The field of local data is laid out as the declarations in a procedure are examined at compile
time.
We keep a count of the memory locations that have been allocated for previous declarations.
From the count we determine a relative address of the storage for a local with respect to some
position such as the beginning of the activation record.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 16
Storage allocation strategies
The different storage allocation strategies are:
Code area
Static allocation: lays out storage for all data objects at compile time. Static data area
Stack allocation: manages the run-time storage as a stack. Stack
Heap allocation: allocates and de-allocates storage as needed at run
time from a data area known as heap. Heap

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 18
Static allocation
In static allocation, names are bound Example:
to storage as the program is compiled, int main()
so there is no need for a run-time
support package. {
Code for main
int num1=100, num2=200, res; code
Since the bindings do not change at Code for sum
run-time, every time a procedure is res = sum(num1, num2);
activated, its names are bounded to printf(“\n Addition is %d : “,res);
the same storage location. ina num1
return (0);
Limitation int num 2 Activation
} int res record for
1. Size of data object must be known at Static
int sum(int num1, int num2) data main
compile time.
{ ina num1
2. Recursive procedures are restricted. Activation
int num3; int num 2
3. Data structure can’t be created int num3 record for
dynamically. num3 = num1 + num2; sum
return (num3);
}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 19
Stack allocation
All compilers for languages that use procedures, functions or methods as units of user define
actions manage at least part of their run-time memory as a stack.
Each time a procedure is called, space for its local variables is pushed onto a stack, and when
the procedure terminates, the space is popped off the stack.
Locals are bound to fresh storage in each activations.
Locals are deleted when the activation ends.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 20
Stack allocation
At run time, an activation record can be allocated by incrementing ‘top’ by the size of the record.
Deallocated by decrementing ‘top’ by the size of record.
main
Position in activation tree Activation record on the Remarks
stack ra() qs(1,10)
main main Frame for main
Int n pa(1,10)
qs(1,4)
main main Ra is activated
Int n pa(1,4)
ra qs(1,2)
ra()
Int i
Frame ra has been poped Activation Tree
main main main
and qs(1,10) is pushed.
Int n Int n
ra() qs(1,10) qs(1,10) qs(1,10)
Int i Int i
qs(1,4)
Int i

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)(PS) UnitUnit
  6 –1Run-time
– Basic Probability
Environments 21
Stack allocation: Calling Sequences
Procedures calls are implemented by generating what are known as calling sequences in the
target code.
A call sequence allocates an activation record and enters the information into its fields.
A Return sequence restore the state of machine so the calling procedure can continue its
execution.
The code is calling sequence of often divided between the calling procedure (caller) and
procedure is calls (callee).
Parameter and return value Caller’s
Control link activation
Links and saved status record
Calling procedure
Temporaries and local data
Caller’s
Parameter and returned value responsibility
Control link
Links and saved status Callee’s
top_sp Called procedure activation
Temporaries and local data Callee’s record
responsibility
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 22
Stack allocation: Calling Sequences
The calling sequence and its division between caller and callee are as follows:
1. The caller (Calling procedure) evaluates the actual parameters.
2. The caller stores a return address and the old value of top_sp into the callee’s activation
record. The caller then increments the top_sp to the respective positions.
3. The callee (Called procedure) saves the register values and other status information.
4. The callee initializes its local data and begins execution.

Parameter and return value Caller’s


Control link activation
Links and saved status record
Calling procedure
Temporaries and local data
Caller’s
Parameter and returned value responsibility
Control link
Links and saved status Callee’s
top_sp Called procedure activation
Temporaries and local data Callee’s record
responsibility
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 23
Stack allocation: Calling Sequences
A suitable, corresponding return sequence is:
1. The callee places the return value next to the parameters.
2. Using the information in the machine status field, the callee restores top_sp and other
registers, and then branches to the return address that the caller placed in the status field.
3. Although top_sp has been decremented, the caller knows where the return value is, relative
to the current value of top_sp ; the caller therefore may use that value.

Parameter and return value Caller’s


Control link activation
Links and saved status record
Calling procedure
Temporaries and local data
Caller’s
Parameter and returned value responsibility
Control link
Links and saved status Callee’s
top_sp Called procedure activation
Temporaries and local data Callee’s record
responsibility
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 24
Stack allocation: Variable length data on stack
The run time memory management system must deal frequently with the allocation of objects,
the sizes of which are not known at the compile time, but which are local to a procedure and
thus may be allocated on the stack.
The same scheme works for objects of any type if they are local to the procedure called have a
size that depends on the parameter of the call.
Control link
Pointer to A
Activation
Pointer to B
record for p top: position of the next activation record
Pointer to C
top_sp: find the local data

Array A
Array B Arrays of p
Array C
top_sp Activation
Control link record for q
top Arrays of q
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 25
Stack allocation: Dangling Reference
When there is a reference to storage that has been deallocated.
It is a logical error to use dangling reference, since, the value of de-allocated storage is
undefined according to the semantics of most languages.
main()
{
int *p;
p=dengle();
}
int *dengle()
{
int i=10;
return &i;
}

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 26
Heap Allocation
The stack allocation strategy can not be used for following condition:
1. The values of the local names must be retained when an activation ends.
2. A called activation outlives the called.
Position in activation tree Activation record on the Remarks
heap
Retains activation record
main s for ra
Control link
ra() qs(1,10)
rs
Control link

qs(1,10)
Control link

Records for the live activations need not be adjacent in a heap.


Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 27
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 28
Compiler Design (CD)
GTU # 3170701

Unit – 7
Code Generation & Optimization

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Issues in the design of a code generator
• The Target machine
• Basic block and flow-graph
• Transformation on basic block
• A simple code generator
• Code optimization
Issues in design of Code Generator
Issues in Code Generation are:
1. Input to code generator
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Choice of evaluation
7. Approaches to code generation

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 4
Input to code generator
Input to the code generator consists of the intermediate representation of the source program.
Types of intermediate language are:
1. Postfix notation
2. Three address code
3. Syntax trees or DAGs
The detection of semantic error should be done before submitting the input to the code
generator.
The code generation phase requires complete error free intermediate code as an input.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 5
Target program
The output may be in form of:
1. Absolute machine language: Absolute machine language program can be placed in a
memory location and immediately execute.
2. Relocatable machine language: The subroutine can be compiled separately. A set of
relocatable object modules can be linked together and loaded for execution.
3. Assembly language: Producing an assembly language program as output makes the
process of code generation easier, then assembler is require to convert code in binary
form.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 6
Memory management
Mapping names in the source program to addresses of data objects in run time memory is done
cooperatively by the front end and the code generator.
We assume that a name in a three-address statement refers to a symbol table entry for the
name.
From the symbol table information, a relative address can be determined for the name in a data
area.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 7
Instruction selection
Example: the sequence of statements
a := b + c
d := a + e
would be translated into
MOV b, R0
ADD c, R0 MOV b, R0
MOV R0, a ADD c, R0
MOV a, R0 ADD e, R0
ADD e, R0 MOV R0, d
MOV R0, d
So, we can eliminate redundant statements.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 8
Register allocation
The use of registers is often subdivided into two sub problems:
During register allocation, we select the set of variables that will reside in registers at a point in
the program.
During a subsequent register assignment phase, we pick the specific register that a variable will
reside in.
Finding an optimal assignment of registers to variables is difficult, even with single register
value.
Mathematically the problem is NP-complete.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 9
Choice of evaluation
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
Picking a best order is another difficult, NP-complete problem.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 10
Approaches to code generation
The most important criterion for a code generator is that it produces correct code.
The design of code generator should be in such a way so it can be implemented, tested, and
maintained easily.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 11
Target machine
We will assume our target computer models a three-address machine with:
1. load and store operations
2. computation operations
3. jump operations
4. conditional jumps
The underlying computer is a byte-addressable machine with general-purpose registers,
0, 1, . . . ,

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 13
Addressing Modes

Mode Form Address Extra cost


Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *k(R) contents(k + contents(R)) 1

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 14
Instruction Cost
Mode Form Address Extra cost
Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *k(R) contents(k + contents(R)) 1

Calculate cost for following:

MOV B,R0 MOV B,R0 cost = 1+1+0=2


ADD C,R0 ADD C,R0 cost = 1+1+0=2
MOV R0,A MOV R0,A cost = 1+0+1=2

Total Cost=6

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 15
Instruction Cost
Mode Form Address Extra cost
Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *k(R) contents(k + contents(R)) 1

Calculate cost for following:

MOV *R1 ,*R0 MOV *R1 ,*R0  cost = 1+0+0=1


MOV *R1 ,*R0 MOV *R1 ,*R0  cost = 1+0+0=1

Total Cost=2

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 16
Basic Blocks
A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.
The following sequence of three-address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4+t5

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 18
Algorithm: Partition into basic blocks
Input: A sequence of three-address statements.
Output: A list of basic blocks with each three-address statement in exactly one block.
Method:
1. We first determine the set of leaders, for that we use the following rules:
I. The first statement is a leader.
II. Any statement that is the target of a conditional or unconditional goto is a leader.
III. Any statement that immediately follows a goto or conditional goto statement is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 19
Example: Partition into basic blocks
begin
prod := 0; Block B1 (1) prod := 0 Leader
i := 1; (2) i := 1
(3) t1 := 4*i Leader
do (4) t2 := a [t1]
prod := prod + a[t1] * b[t2]; (5) t3 := 4*i
(6) t4 :=b [t3]
i := i+1; (7) t5 := t2*t4
while i<= 20 (8) t6 := prod +t5
Block B2
(9) prod := t6
end (10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)

Three Address Code

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 20
Optimization of Basic block
Transformation on Basic Blocks
A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block.
Many of these transformations are useful for improving the quality of the code.
Types of transformations are:
1. Structure preserving transformation
2. Algebraic transformation

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 22
Structure Preserving Transformations
Structure-preserving transformations on basic blocks are:
1. Common sub-expression elimination
2. Dead-code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 23
Common sub-expression elimination
Consider the basic block,
a:= b+c
b:= a-d
c:= b+c
d:= a-d
The second and fourth statements compute the same expression, hence this basic block may
be transformed into the equivalent block:
a:= b+c
b:= a-d
c:= b+c
d:= b

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 24
Dead-code elimination
Suppose s dead, that is, never subsequently used, at the point where the statement :
appears in a basic block.
Above statement may be safely removed without changing the value of the basic block.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 25
Renaming of temporary variables
Suppose we have a statement
t:=b+c, where t is a temporary variable.
If we change this statement to
u:= b+c, where u is a new temporary variable,
Change all uses of this instance of t to u, then the value of the basic block is not changed.
In fact, we can always transform a basic block into an equivalent block in which each statement
that defines a temporary defines a new temporary.
We call such a basic block a normal-form block.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 26
Interchange of two independent adjacent statements
Suppose we have a block with the two adjacent statements,
t1:= b+c
t2:= x+y
Then we can interchange the two statements without affecting the value of the block if and only
if neither nor is 1 and neither nor is 2.
A normal-form basic block permits all statement interchanges that are possible.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 27
Algebraic Transformation
Countless algebraic transformation can be used to change the set of expressions computed by
the basic block into an algebraically equivalent set.
The useful ones are those that simplify expressions or replace expensive operations by cheaper
one.
Example: x=x+0 or x=x*1 can be eliminated.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 28
Flow Graph
We can add flow-of-control information to the set of basic blocks making up a program by
constructing a direct graph called a flow graph.
Nodes in the flow graph represent computations, and the edges represent the flow of control.
Example of flow graph for following three address code:

prod=0 Block B1
i=1
t1 := 4*i
t2 := a [t1]
t3 := 4*i
Flow Graph t4 :=b [t3]
t5 := t2*t4 Block B2
t6 := prod +t5
prod := t6
t7 := i+1
i := t7
if i<=20 goto B2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 29
A simple code generator
The code generation strategy generates target code for a sequence of three address statement.
It uses function getReg() to assign register to variable.
The code generator algorithm uses descriptors to keep track of register contents and
addresses for names.
Address descriptor stores the location where the current value of the name can be found at run
time. The information about locations can be stored in the symbol table and is used to access
the variables.
Register descriptor is used to keep track of what is currently in each register. The register
descriptor shows that initially all the registers are empty. As the generation for the block
progresses the registers will hold the values of computation.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 31
A Code Generation Algorithm
The algorithm takes a sequence of three-address statements as input. For each three address
statement of the form x:= y op z perform the various actions. Assume L is the location where
the output of operation y op z is stored.
1. Invoke a function getReg() to find out the location L where the result of computation y op z
should be stored.
2. Determine the present location of ‘y’ by consulting address description for y if y is not present
in location L then generate the instruction MOV y' , L to place a copy of y in L
3. Present location of z is determined using step 2 and the instruction is generated as OP z' , L
4. If L is a register then update it’s descriptor that it contains value of x. update the address
descriptor of x to indicate that it is in L.
5. If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register
will no longer contain y or z.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 32
Generating a code for assignment statement
The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following
sequence of three address code:
t:= a-b
Statement Code Generated Register descriptor Address descriptor u:= a-c
t:= a - b MOV a,R0 R0 contains t t in R0 v:= t +u
SUB b, R0
d:= v+u
u:= a - c MOV a,R1 R0 contains t t in R0
SUB c, R1 R1 contains u u in R1
v:= t + u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R0
d:= v + u ADD R1,R0 R0 contains d d in R0
MOV R0, d d in R0 and memory

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 33
Machine independent optimization
Code Optimization
Code Optimization is a program transformation technique which, tries to improve the code by
eliminating unnecessary code lines and arranging the statements in such a sequence that
speed up the execution without wasting the resources.

Advantages
1. Faster execution
2. Better performance
3. Improves the efficiency

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 35
Code Optimization techniques (Machine independent techniques)
Techniques

1. Compile time evaluation


2. Common sub expressions elimination
3. Code Movement or Code Motion
4. Reduction in Strength
5. Dead code elimination

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 36
Compile time evaluation
Compile time evaluation means shifting of computations from run time to compile time.
There are two methods used to obtain the compile time evaluation.
Folding
In the folding technique the computation of constant is done at compile time instead of run
time.
Example : length = (22/7)*d
Here folding is implied by performing the computation of 22/7 at compile time.
Constant propagation
In this technique the value of variable is replaced and computation of an expression is done at
compilation time.
Example : pi = 3.14; r = 5;
Area = pi * r * r;
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then computation of
3.14 * 5 * 5 is done during compilation.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 37
Common sub expressions elimination
The common sub expression is an expression appearing repeatedly in the program which is
computed previously.
If the operands of this sub expression do not get changed at all then result of such sub
expression is used instead of re-computing it each time.
Example:

t1 := 4 * i
t2 := a + 2
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5
t1

Before
After Optimization
Optimization

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 38
Code Movement or Code Motion
Optimization can be obtained by moving some amount of code outside the loop and placing it
just before entering in the loop.
It won't have any difference if it executes inside or outside the loop.
This method is also called loop invariant computation.
Example:
While(i<=max-1) N=max-1;
{ While(i<=N)
sum=sum+a[i]; {
} sum=sum+a[i];
}
Before Optimization After Optimization

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 39
Reduction in Strength
priority of certain operators is higher than others.
For instance strength of * is higher than +.
In this technique the higher strength operators can be replaced by lower strength operators.
Example:

A=A*2 A=A+A
Before Optimization After Optimization

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 40
Dead code elimination
The variable is said to be dead at a point in a program if the value contained into it is never
been used.
The code containing such a variable supposed to be a dead code.
Example:
i=0;
if(i==1)
{
Dead Code
a=x+5;
}

If statement is a dead code as this condition will never get satisfied hence, statement can be
eliminated and optimization can be done.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 41
Machine dependent optimization
Machine dependent optimization
Machine dependent optimization may vary from machine to machine.
Machine-dependent optimization is done after the target code has been generated and when
the code is transformed according to the target machine architecture.
Machine-dependent optimizers put efforts to take maximum advantage of the memory
hierarchy.
Techniques

Number of register may vary from machine to machine.


1. Register allocation Used register may be of 32-bit register or 64 bit register.
2. Use of addressing modes Addressing mode also vary from machine to machine.

3. Peephole optimization

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 43
Peephole optimization
Peephole optimization is a simple and effective technique for locally improving target code.
This technique is applied to improve the performance of the target program by examining the
short sequence of target instructions (called the peephole) and replacing these instructions by
shorter or faster sequence whenever possible.
Peephole is a small, moving window on the target program.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 44
Redundant Loads & Stores
Especially the redundant loads and stores can be eliminated in following type of
transformations.
Example:
MOV R0,x
MOV x,R0
We can eliminate the second instruction since x is in already R0.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 45
Flow of Control Optimization
The unnecessary jumps can be eliminated in either the intermediate code or the target code by
the following types of peephole optimizations.
We can replace the jump sequence.

Goto L1
…… Goto L2
L1: goto L2

It may be possible to eliminate the statement L1: goto L2 provided it is preceded by an


unconditional jump. Similarly, the sequence can be replaced by:

If a<b goto L1
…… If a<b goto L2
L1: goto L2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 46
Algebraic simplification
Peephole optimization is an effective technique for algebraic simplification.
The statements such as x = x + 0 or x := x* 1 can be eliminated by peephole optimization.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 47
Reduction in strength
Certain machine instructions are cheaper than the other.
In order to improve performance of the intermediate code we can replace these instructions by
equivalent cheaper instruction.
For example, x2 is cheaper than x * x.
Similarly addition and subtraction are cheaper than multiplication and division. So we can add
effectively equivalent addition and subtraction for multiplication and division.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 48
Machine idioms
The target instructions have equivalent machine instructions for performing some operations.
Hence we can replace these target instructions by equivalent machine instructions in order to
improve the efficiency.
Example: Some machines have auto-increment or auto-decrement addressing modes.
(Example : INC i)
These modes can be used in code for statement like i=i+1.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 49
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)  (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 50
Compiler Design (CD)
GTU # 3170701

Unit – 8
Instruction-Level Parallelism

Computer Engineering Department


Darshan Institute of Engineering & Technology, Rajkot
[email protected]
+91 - 97277 47317 (CE Department)
Topics to be covered
Looping
• Processor Architectures
• Code scheduling constrains
• Basic block Scheduling
• Pass structure of Assembler
Processor Architecture
When we think of instruction-level parallelism, we usually imagine a processor issuing several
operations in a single clock cycle. In fact, it is possible for a machine to issue just one
operation per clock and yet achieve instruction-level parallelism using the concept of pipelining.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 4
Instruction Pipelines and Branch Delays
Practically every processor, be it a high-performance supercomputer or a standard machine,
uses an instruction pipeline. With an instruction pipeline, a new instruction can be fetched every
clock while preceding instructions are still going through the pipeline.
Shown in below Fig. is a simple 5-stage instruction pipeline: it first fetches the instruction (IF),
decodes it (ID), executes the operation (EX), accesses the memory (MEM), and writes back the
result (WB).

The figure shows how instructions i, i + 1, i + 2, i + 3, and i + 4 can execute at the same time.
Each row corresponds to a clock tick, and each column in the figure specifies the stage each
instruction occupies at each clock tick.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 5
Instruction Pipelines and Branch Delays
If the result from an instruction is available by the time the succeeding instruction needs the
data, the processor can issue an instruction every clock.
Branch instructions are especially problematic because until they are fetched, decoded and
executed, the processor does not know which instruction will execute next. Many processors
speculatively fetch and decode the immediately succeeding instructions in case a branch is not
taken. When a branch is found to be taken, the instruction pipeline is emptied and the branch
target is fetched.
Thus, taken branches introduce a delay in the fetch of the branch target and introduce
"hiccups" in the instruction pipeline. Advanced processors use hard-ware to predict the
outcomes of branches based on their execution history and to pre-fetch from the predicted
target locations. Branch delays are nonetheless observed if branches are mis-predicted.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 6
Pipelined Execution
Some instructions take several clocks to execute.
One common example is the memory-load operation. Even when a memory access hits in the cache, it
usually takes several clocks for the cache to return the data.
We say that the execution of an instruction is pipelined if succeeding instructions not dependent on the
result are allowed to proceed.
Thus, even if a processor can issue only one operation per clock, several operations might be in their
execution stages at the same time.
If the deepest execution pipeline has n stages, potentially n operations can be "in flight" at the same time.
Note that not all instructions are fully pipelined. While floating-point adds and multiplies often are fully
pipelined, floating-point divides, being more complex and less frequently executed, often are not.
Most general-purpose processors dynamically detect dependences between consecutive instructions and
automatically stall the execution of instructions if their operands are not available. Some processors,
especially those embedded in hand-held devices, leave the dependence checking to the software in order
to keep the hardware simple and power consumption low. In this case, the compiler is responsible for
inserting "noop" instructions in the code if necessary to assure that the results are available when needed.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 7
Multiple Instruction Issue
By issuing several operations per clock, processors can keep even more operations in flight.
The largest number of operations that can be executed simultaneously can be computed by
multiplying the instruction issue width by the average number of stages in the execution
pipeline.
Like pipelining, parallelism on multiple-issue machines can be managed either by software or
hardware.
Machines that rely on software to manage their parallelism are known as VLIW (Very-Long-
Instruction-Word) machines, while those that manage their parallelism with hardware are known
as superscalar machines.
Simple hardware schedulers execute instructions in the order in which they are fetched.
If a scheduler comes across a dependent instruction, it and all instructions that follow must
wait until the dependences are resolved (i.e., the needed results are available).
Such machines obviously can benefit from having a static scheduler that places independent
operations next to each other in the order of execution.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 8
Multiple Instruction Issue
More sophisticated schedulers can execute instructions "out of order."
Operations are independently stalled and not allowed to execute until all the values they depend
on have been produced.
Even these schedulers benefit from static scheduling, because hardware schedulers have only
a limited space in which to buffer operations that must be stalled. Static scheduling can place
independent operations close together to allow better hardware utilization.
More importantly, regardless how sophisticated a dynamic scheduler is, it cannot execute
instructions it has not fetched. When the processor has to take an unexpected branch, it can
only find parallelism among the newly fetched instructions.
The compiler can enhance the performance of the dynamic scheduler by ensuring that these
newly fetched instructions can execute in parallel.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 9
Code-Scheduling Constraints
Code scheduling is a form of program optimization that applies to the machine code that is
produced by code generator.
Code scheduling subject to three kinds of constraints:
1. Control dependence constraints: All the operations executed in the original program must
be executed in the optimized one.
2. Data dependence constraints: The operations in the optimized program must produce the
same result as the corresponding ones in the original program.
3. Resource constraints: The schedule must not oversubscribe the resources one the
machine.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 11
Code-Scheduling Constraints
These scheduling constraints guarantee that the optimized program produces the same result
as the original.
However, because code scheduling changes the order in which the operation execute, the state
of the memory at any one point may not match any of the memory states in a sequential
execution.
This situation is a problem if a program’s execution is interrupted by, for example, a thrown
exception or a user interested breakpoint.
Optimized programs are therefore harder to debug.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 12
Basic-Block Scheduling
1. Data-Dependence Graphs
2. List Scheduling of Basic Blocks
3. Prioritized Topological Orders

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 14
Data-Dependence Graphs
We represent each basic block of machine instructions by a data-dependence graph, G =
(N,E), having a set of nodes N representing the operations in the machine instructions in the
block and a set of directed edges E representing the data-dependence constraints among the
operations. The nodes and edges of G are constructed as follows:
1. Each operation n in N has a resource-reservation table RTn, whose value is simply the
resource-reservation table associated with the operation type of n.
2. Each edge e in E is labeled with delay de indicating that the destination node must be issued
no earlier than de clocks after the source node is issued. Suppose operation n± is followed by
operation n2, and the same location is accessed by both, with latencies l1 and l2 respectively.
That is, the location's value is produced l1 clocks after the first instruction begins, and the
value is needed by the second instruction l2 clocks after that instruction begins Then, there is
an edge n1  n2 in E labeled with delay l1 — l2

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 15
List Scheduling of Basic Blocks
The simplest approach to scheduling basic blocks involves visiting each node of the data-
dependence graph in "prioritized topological order.
Since there can be no cycles in a data-dependence graph, there is always at least one
topological order for the nodes. However, among the possible topological orders, some may be
preferable to others.
There is some algorithm for picking a preferred order.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 16
Prioritized Topological Orders
List scheduling does not backtrack; it schedules each node once and only once. It uses a
heuristic priority function to choose among the nodes that are ready to be scheduled next. Here
are some observations about possible prioritized orderings of the nodes:
Without resource constraints, the shortest schedule is given by the critical path, the longest
path through the data-dependence graph. A metric useful as a priority function is the height of
the node, which is the length of a longest path in the graph originating from the node.
On the other hand, if all operations are independent, then the length of the schedule is
constrained by the resources available. The critical resource is the one with the largest ratio of
uses to the number of units of that resource available. Operations using more critical resources
may be given higher priority.
Finally, we can use the source ordering to break ties between operations; the operation that
shows up earlier in the source program should be scheduled first

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 17
Pass structure of assembler
A complete scan of the program is called pass.
Types of assembler are:
1. Two pass assembler (Two pass translation)
2. Single pass assembler (Single pass translation)

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 19
Two pass assembler (Two pass translation)

l l l l l

Data structures

Source Target
Pass I Pass II
Program Program

Intermediate code

Data access
Control transfer

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 20
Two pass assembler (Two pass translation)

Source Intermediate Target


Pass I Pass II
Program representation Program

Intermediate Data
code structures

The first pass performs analysis of the source program.


The first pass performs Location Counter processing and records the addresses of symbols in
the symbol table.
It constructs intermediate representation of the source program.
Intermediate representation consists of following two components:
1. Intermediate code
2. Data structures
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 21
Two pass assembler (Two pass translation)

Source Intermediate Target


Pass I Pass II
Program representation Program

Intermediate Data
code structures

The second pass synthesizes the target program by using address information recorded in the
symbol table.
Two pass translation handles a forward reference to a symbol naturally because the address of
each symbol would be known before program synthesis begins.

Use of a Symbol that precedes


its definition in a program.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 22
One pass assembler (One pass translation)
A one pass assembler requires one scan of the source program to generate machine code.
Location counter processing, symbol table construction and target code generation proceed in
single pass.
The issue of forward references can be solved using a process called back patching.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 23
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
 Unit
 8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 24

You might also like