Ilovepdf Merged
Ilovepdf Merged
GTU # 3170701
Unit – 1
Overview of the Compiler
&
it’s Structure
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 3
What do you see?
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 4
Semantic gap
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 5
Language processor
Language processor is a software which bridges semantic gap.
A language processor is a software program designed or used to perform tasks such as
processing program code to machine code.
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 6
હે લો
Translator
A translator is a program that takes one form of program as input and converts it into another
form.
Types of translators are:
1. Compiler
2. Interpreter
3. Assembler
Error
Messages (If any)
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 8
Compiler
A compiler is a program that reads a program written in source language and translates it into
an equivalent program in target language.
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 9
Interpreter
Interpreter is also program that reads a program written in source language and translates it
into an equivalent program in target language line by line.
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 10
Assembler
Assembler is a translator which takes the assembly code as an input and generates the
machine code as an output.
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 11
Analysis synthesis model of compilation
There are two parts of compilation.
1. Analysis Phase
2. Synthesis Phase
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 13
Analysis phase & Synthesis phase
Analysis Phase Synthesis Phase
Analysis part breaks up the source The synthesis part constructs the desired
program into constituent pieces and target program from the intermediate
creates an intermediate representation of representation.
the source program.
Synthesis phase consist of the following sub
Analysis phase consists of three sub phases:
phases:
1. Code optimization
1. Lexical analysis
2. Code generation
2. Syntax analysis
3. Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 14
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 16
Lexical analysis
Lexical Analysis is also called linear analysis or scanning.
Lexical Analyzer divides the given source statement into the Position = initial + rate*60
tokens.
Ex: Position = initial + rate * 60 would be grouped into the
Lexical analysis
following tokens:
Position (identifier) id1 = id2 + id3 * 60
= (Assignment symbol)
initial (identifier)
+ (Plus symbol)
rate (identifier)
* (Multiplication symbol)
60 (Number)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 17
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 18
Syntax analysis
Syntax Analysis is also called Parsing or Hierarchical Position = initial + rate*60
Analysis.
The syntax analyzer checks each line of the code and Lexical analysis
spots every tiny mistake.
id1 = id2 + id3 * 60
If code is error free then syntax analyzer generates the
tree.
Syntax analysis
id1 +
id2 *
id3 60
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 19
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 20
Semantic analysis
Semantic analyzer determines the meaning of a source =
string.
id1 +
It performs following operations:
1. matching of parenthesis in the expression. id2 * int to
real
2. Matching of if..else statement. id3 60
3. Performing arithmetic operation that are type
compatible. Semantic analysis
4. Checking the scope of operation. =
*Note: Consider id1, id2 and id3 are real
id1 +
id2 *
id3 inttoreal
60
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 21
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 22
Intermediate code generator
Two important properties of intermediate code : =
1. It should be easy to produce.
id1 +
2. Easy to translate into target program.
id2 *
Intermediate form can be represented using “three
address code”. t3 id3 inttoreal
t2 t1
Three address code consist of a sequence of instruction, 60
each of which has at most three operands.
Intermediate code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 23
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 24
Code optimization
It improves the intermediate code.
This is necessary to have a faster execution of code Intermediate code
or less consumption of memory.
t1= int to real(60)
t2= id3 * t1
t3= t2 + id2
id1= t3
Code optimization
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 25
Phases of compiler
Compiler
Lexical analysis
Intermediate Code
code optimization
Syntax analysis generation
Code generation
Semantic analysis
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 26
Code generation
The intermediate code instructions are translated into
sequence of machine instruction. Code optimization
Code generation
MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1
Id3R2
Id2R1
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 27
Phases of compiler
Source program
Analysis Phase
Lexical analysis
Syntax analysis
Semantic analysis
Symbol table Error detection
and recovery
Intermediate code
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 28
Exercise
Write output of all the phases of compiler for following statements:
1. x = b-c*2
2. I=p*n*r/100
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 29
Front end & back end (Grouping of phases)
Front end
Depends primarily on source language and largely independent of the target machine.
It includes following phases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table & Error handling
Back end
Depends on target machine and do not depends on source program.
It includes following phases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 31
Difference between compiler & interpreter
Compiler Interpreter
Scans the entire program and translates it It translates program’s one statement at a
as a whole into machine code. time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire program is An error is displayed for every instruction
checked. interpreted if any.
Memory requirement is more. Memory requirement is less.
Example: C compiler Example: Basic, Python, Ruby
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 32
Context of compiler (Cousins of compiler)
Skeletal Source Program
In addition to compiler, many other system programs
are required to generate absolute machine code. Preprocessor
These system programs are:
Source Program
Preprocessor Compiler
Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 34
Context of compiler (Cousins of compiler)
Skeletal Source Program
Preprocessor
Some of the task performed by preprocessor: Preprocessor
1. Macro processing: Allows user to define macros. Ex: #define Source Program
PI 3.14159265358979323846
2. File inclusion: A preprocessor may include the header file Compiler
into the program. Ex: #include<stdio.h> Target Assembly
3. Rational preprocessor: It provides built in macro for Program
construct like while statement or if statement. Assembler
4. Language extensions: Add capabilities to the language by Relocatable Object
using built-in macros. Code
Ex: the language equal is a database query language Libraries & Linker / Loader
embedded in C. Statement beginning with ## are taken by Object Files
preprocessor to be database access statement unrelated
to C and translated into procedure call on routines that Absolute Machine
perform the database access. Code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 35
Context of compiler (Cousins of compiler)
Skeletal Source Program
Compiler
Preprocessor
A compiler is a program that reads a program
written in source language and translates it into an Source Program
equivalent program in target language.
Compiler
Target Assembly
Program
Assembler
Relocatable Object
Code
Libraries & Linker / Loader
Object Files
Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 36
Context of compiler (Cousins of compiler)
Skeletal Source Program
Assembler
Preprocessor
Assembler is a translator which takes the assembly
program (mnemonic) as an input and generates the Source Program
machine code as an output.
Compiler
Target Assembly
Program
Assembler
Relocatable Object
Code
Libraries & Linker / Loader
Object Files
Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 37
Context of compiler (Cousins of compiler)
Skeletal Source Program
Linker
Preprocessor
Linker makes a single program from a several files
of relocatable machine code. Source Program
These files may have been the result of several Compiler
different compilation, and one or more library files.
Target Assembly
Loader Program
Assembler
The process of loading consists of:
Relocatable Object
Taking relocatable machine code Code
Altering the relocatable address Libraries & Linker / Loader
Placing the altered instructions and data in Object Files
memory at the proper location.
Absolute Machine
Code
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 38
Pass structure
One complete scan of a source program is called pass.
Pass includes reading an input file and writing to the output file.
In a single pass compiler analysis of source statement is immediately followed by synthesis of
equivalent target statement.
While in a two pass compiler intermediate code is generated between analysis and synthesis
phase.
It is difficult to compile the source program into single pass due to: forward reference
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 40
Pass structure
Forward reference: A forward reference of a program entity is a reference to the entity which
precedes its definition in the program.
This problem can be solved by postponing the generation of target code until more information
concerning the entity becomes available.
It leads to multi pass model of compilation.
Pass I:
Pass II:
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 41
Effect of reducing the number of passes
It is desirable to have a few passes, because it takes time to read and write intermediate file.
If we group several phases into one pass then memory requirement may be large.
Prof. Dixita B Kagathara #3170701 (CD) Unit 1– Overview of the Compiler & it’s Structure 42
Types of compiler
1. One pass compiler
It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
It is a type of compiler that compiles whole process in two-pass.
It generates intermediate code.
3. Incremental compiler
The compiler which compiles only the changed line from the source code and update the object
code.
4. Native code compiler
The compiler used to compile a source code for a same type of platform only.
5. Cross compiler
The compiler used to compile a source code for a different kinds platform.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 44
Science of building Compilers
The main job of compiler is to accept the source program and convert it into suitable target
program.
A compiler must accept all source programs that conform to the specification of the language;
the set of source programs is infinite and any program can be very large, consisting of possibly
millions of lines of code.
Compiler study mainly focused on study of how to design the correct mathematical model and
choose correct algorithm.
In compiler design, term “Code Optimization” indicates the attempts made by a compiler to
produce a code which is more efficient then a previous code.
The code should be faster than any other code that performs the same task.
The objectives to be fulfilled by the compiler optimization include:
1. The meaning of the compiled program must be preserved.
2. Optimization should improve programs performance.
3. Time required for compilation should be reasonable.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 46
Science of building Compilers
Just theory is not sufficient to build a compiler, People involved in the design of compiler
should be able to formulate the right problem to solve.
In, order to do this the first step in through understanding of the behavior of programs.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 47
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701#3130006
(CD) (PS)
Unit 1–
Overview of theProbability
Unit 1 – Basic Compiler & it’s Structure 48
Compiler Design (CD)
GTU # 3170701
Unit – 2
Lexical Analyzer
Symbol Table
Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.
= Operator1
45 Constant1
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 8
Input buffering
There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels
Buffer Pair
The lexical analysis scans the input string from left to right one character at a time.
Buffer divided into two N-character halves, where N is the number of character on one disk
block.
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
forward forward
lexeme_beginnig
forward
lexeme_beginnig
In buffer pairs we must check, each time we move the forward pointer that we have not moved
off one of the buffers.
Thus, for each character read, we make two tests.
We can combine the buffer-end test with the test for the current character.
We can reduce the two tests to one if we extend each buffer to hold a sentinel character at the
end.
The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character EOF.
(
a
aa
aaa Infinite …..
aaaa
aaaaa…..
a
aa
aaa Infinite …..
aaaa
aaaaa…..
2. 0 or 11 or 111
+,-./01: 3, 44, 444 :. ;. 3 44 444
17. All binary string with at least 3 characters and 3rd character should be zero
<=>?@AB: 333, 433, 4433, 4334… E. F. 3 4 3 4 3#3 | 4$ ∗
18. Language which consist of exactly two b’s over the set Σ ,*
∗ ∗ ∗
<=>?@AB: DD, DCD, CCDD, CDDC… E. F. C DC DC
is a state
is a transition
is a start state
is a final state
< =
0 1 2 return (relop,LE)
>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
=
6 7 return (relop,GE)
other
8 return (relop,GT)
E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 36
Hard coding and automatic generation lexical analyzers
Lexical analysis is about identifying the pattern from the input.
To recognize the pattern, transition diagram is constructed.
It is known as hard coding lexical analyzer.
Example: to represent identifier in ‘C’, the first character must be letter and other characters are
either letter or digits.
To recognize this pattern, hard coding lexical analyzer will work with a transition diagram.
The automatic generation lexical analyzer takes special notation as input.
For example, lex compiler tool will take regular expression as input and finds out the pattern
matching to that regular expression.
Letter or digit
Start Letter
1 2 3
start
start ( N(s) N(t)
start a a b
1 2 3
( N(t) ( (
( ( (
4 5
b
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 44
Regular expression to NFA using Thompson's rule
a*b
( ( *
1 2 3 4 5
b*ab
(
( * ( *
1 2 3 4 5 6
OPERATION DESCRIPTION
a b c # $ Set of NFA states reachable from NFA state on
– transition alone.
a b c #d$ Set of NFA states reachable from some NFA state
in d on – transition alone.
MNfL #d, $ Set of NFA states to which there is a transition on
input symbol from some NFA state in d.
(a|b)*abb (
a
2 3
( (
( ( a b b
0 1 6 7 8 9 10
( (
4 5
b
a
2 3
( (
( ( a b b
0 1 6 7 8 9 10
( (
4 5
b
(- Closure(0)= {0, 1, 7, 2, 4}
= {0,1,2,4,7} ---- A
a
2 3 States a b
( (
A = {0,1,2,4,7} B
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
( (
4 5
b
(
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
(- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 52
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
( (
4 5
b
(
A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
(- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 53
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
( (
4 5
b
(
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
(- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 54
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
( (
4 5 D = {1,2,4,5,6,7,9}
b
B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
(- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 55
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
( (
4 5 D = {1,2,4,5,6,7,9}
b
C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
(- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 56
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9}
b
(
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
(- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 57
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B
b
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
(- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 58
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
(
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
(- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 59
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
(
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
(- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 60
Conversion from NFA to DFA
a
2 3 States a b
( (
A = {0,1,2,4,7} B C
( ( a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
( (
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
(
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
(- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 61
Conversion from NFA to DFA
b
States a b B D
a
A = {0,1,2,4,7} B C a
B = {1,2,3,4,6,7,8} B D
A a a b
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9} B E b
C E
E = {1,2,4,5,6,7,10} B C b
Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA
States a b
w, x, y, `, z
A B C
B B D
Nonaccepting States Accepting States
C B C
w, x, y, ` z
D B E
E B C
w, x, y `
States a b
w, y x
A B A
B B D
Now no more splitting is possible. D B E
E B A
If we chose A as the representative for group
Optimized
(AC), then we obtain reduced transition table Transition Table
firstpos(n)
The set of positions that can match the first symbol of a string generated by the subtree at node .
lastpos(n)
The set of positions that can match the last symbol of a string generated be the subtree at node .
followpos(i)
The set of positions that can follow position in the tree.
2. If n is * node and i is position in lastpos(n), then all position in firstpos(n) are in followpos(i)
n nullable(c1)
* . and
4 } nullable(c2)
False False Here, * is only nullable node c1 c2
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 72
Conversion from regular expression to DFA
• 1,2 * 1,2
1,2 ∗ 1,2 3 3 @
€
1,2 | 1,2
b ƒ # $ 1,2
ƒ 1,2
* bb tƒ 1 1,2
1 1 2 2
4 } bb tƒ 2 1,2
1,2 | 1,2
b ƒ # 1$ 1,2
ƒ 2 3
* bb tƒ 1 3
1 1 2 2
4 } bb tƒ 2 3
1,2 | 1,2
b ƒ # 1$ 3
ƒ 2 4
* bb tƒ 3 4
1 1 2 2
4 }
1,2 | 1,2
b ƒ # 1$ 4
ƒ 2 5
* bb tƒ 4 5
1 1 2 2
4 }
1,2 | 1,2
b ƒ # 1$ 5
ƒ 2 6
* bb tƒ 5 6
1 1 2 2
4 }
=(1,2,3) ----- A
States a b
A={1,2,3} B A
B={1,2,3,4}
State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3) States a b
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3,4} B C
C={1,2,3,5} B D
δ( (1,2,3,5),b) = followpos(2) U followpos(5) D={1,2,3,6}
b
a States a b
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b
DFA
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 82
An Elementary Scanner Design & It’s Implementation
Tasks of Scanner
1. The main purpose of the scanner is to return the next input token to the parser.
2. The scanner must identify the complete token and sometimes differentiate between keywords
and identifiers.
3. The scanner may perform symbol-table maintenance, inserting identifiers, literals, and constants
into the tables.
4. The scanner also eliminate the white spaces.
Regular Expression: Tokens can be Specified using regular expression.
Example: id letter(letter | digit)*
Transition Diagram: Finite-state diagrams or transition diagrams are often used to recognize a
token
digit digit
1
. 2
digit
3
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit12––Basic
Lexical
Probability
Analysis 84
Implementation of Lexical Analyzer (Lex)
Lex is tool or a language which is useful for generating a lexical Analyzer and it specifies the
regular expression
Regular expression is used to represent the patterns for a token.
Creating Lexical Analyzer with LEX
Source
Lex Compiler lex.yy.c
Program
(lex.l)
Input Sequence
a.out
Stream of token
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 2––Basic
Lexical
Probability
Analysis 85
Structure of Lex Program
Any lex program contains mainly three sections
1. Declaration
2. Translation rules
3. Auxiliary Procedures
Structure of Program
Declaration It is used to declare variables, constant & regular definition
Syntax: Pattern {Action}
Example:
Example:
%{
%% %%
int x,y;
Translation rule pattern1 {Action1}
float rate;
%% %}
pattern2 {Action2}
pattern3 {Action3}
Digit [0-9]
%%
Auxiliary Procedures All a-Z]
Letter [A-Z the function needed are specified over here.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit12––Basic
Lexical
Probability
Analysis 86
Example: Lex Program
Program: Write Lex program to recognize identifier, keywords, relational operator and numbers
/* Declaration */ /* Translation rule */
%{ %%
/* Lex program for recognizing tokens */ {Id} {printf(“%s is an identifier”,yytext);}
%} If {printf(“%s is a keyword”,yytext);}
Letter [a-z A-z] else {printf(“%s is a keyword”,yytext);}
Digit [0-9] “<” {printf(“%s is a less then operator”,yytext);}
Id {Letter}({Letter}|{Digit})* “>=” {printf(“%s is a greater then equal to operator”,yytext);}
Numbers {Digit}+ (.{Digit}+)? (E[+ -]? Digit+)? {Numbers} {printf(“%s is a number”,yytext);}
%%
/* Auxiliary Procedures */
install_id() Input string: If year < 2021
{
/* procedure to lexeme into the symbol table and return a pointer */
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 2––Basic
Lexical
Probability
Analysis 87
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 2––Basic
Lexical
Probability
Analysis 88
Compiler Design (CD)
GTU # 3170701
Unit – 3
Syntax Analysis (I)
Symbol table
Parser obtains a string of token from the lexical analyzer and reports syntax error if any
otherwise generates parse tree.
There are two types of parser:
1. Top-down parser
2. Bottom-up parser
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 4
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,
∗
is a finite set formulas of the form → where ∈ and ∈ ∪ Σ
Nonterminal symbol:
The name of syntax category of a language, e.g., noun, verb, etc.
The It is written as a single capital letter, or as a name enclosed between < … >, e.g., A or
<Noun>
<Noun Phrase> → <Article><Noun>
<Article> → a | an | the
<Noun> → boy | apple
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 6
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,
∗
is a finite set formulas of the form → where ∈ and ∈ ∪ Σ
Terminal symbol:
A symbol in the alphabet.
It is denoted by lower case letter and punctuation marks used in language.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 7
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,
∗
is a finite set formulas of the form → where ∈ and ∈ ∪ Σ
Start symbol:
First nonterminal symbol of the grammar is called start symbol.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 8
Context free grammar
A context free grammar (CFG) is a 4-tuple , Σ, , where,
is finite set of non terminals,
Σ is disjoint finite set of terminals,
is an element of and it’s a start symbol,
∗
is a finite set formulas of the form → where ∈ and ∈ ∪ Σ
Production:
A production, also called a rewriting rule, is a rule of grammar. It has the form of
A nonterminal symbol → String of terminal and nonterminal symbols
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 9
Example: Context Free Grammar
Write non terminals, terminals, start symbol, and productions for following grammar.
E E O E | (E) | id
O+|-|*|/ |↑
Non terminals: E, O
Terminals: id + - * / ↑ ( )
Start symbol: E
Productions: E E O E | (E) | id
O+|-|*|/ |↑
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 10
Derivation
A derivation is basically a sequence of production rules, in order to get the input string.
To decide which non-terminal to be replaced with production rule, we can have two options:
1. Leftmost derivation
2. Rightmost derivation
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 12
Leftmost derivation
A derivation of a string in a grammar is a left most derivation if at every step the left most
non terminal is replaced.
Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a
S S
Parse tree represents the
S-S structure of derivation
S - S
S*S-S
a*S-S S S
* a
a*a-S
a*a-a a a
Leftmost Derivation Parse tree
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 13
Rightmost derivation
A derivation of a string in a grammar is a right most derivation if at every step the right
most non terminal is replaced.
It is all called canonical derivation.
Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a
S
S
S*S
S * S
S*S-S
S*S-a a S S
-
S*a-a
a*a-a a a
Rightmost Derivation Parse Tree
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 14
Ambiguity
Ambiguity, is a word, phrase, or statement which contains more than one meaning.
Chip
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 16
Ambiguity
In formal language grammar, ambiguity would arise if identical string can occur on the RHS of
two or more productions.
Grammar:
1→ Replaced by
or ?
2→
can be derived from either N1 or N2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 17
Ambiguous grammar
Ambiguous grammar is one that produces more than one leftmost or more then one rightmost
derivation for the same sentence.
Grammar: SS+S | S*S | (S) | a Output string: a+a*a
S S S S
S*S S+S
S * S S + S
S+S*S a+S
a+S*S S S a+S*S a
+ a S * S
a+a*S a+a*S
a+a*a a a a+a*a a a
Here, Two leftmost derivation for string a+a*a is possible hence, above grammar is ambiguous.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 18
Parsing
Parsing is a technique that takes input string and produces output either a parse tree if string is
valid sentence of grammar, or an error message indicating that string is not a valid.
Types of Parsing
Top down parsing: In top down parsing Bottom up parsing: Bottom up parser starts
parser build parse tree from top to bottom. from leaves and work up to the root.
Grammar: String: abbcde
S
SaABe S
AAbc | b
A
Bd a A B e
A B
A b c d
a b b c d e
b
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 20
Classification of parsing
Parsing
Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 22
Classification of parsing
Parsing
Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 23
Backtracking
In backtracking, expansion of nonterminal symbol we choose one alternative and if any
mismatch occurs then we try another alternative.
Grammar: S cAd Input string: cad
A ab | a
S S S
c A d c A d c A d
Make prediction Make prediction
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 25
Problems in Top-down Parsing
Left recursion
A grammar is said to be left recursive if it has a non terminal such that there is a derivation
for some string .
Grammar: AA |
A A
A
*
A
A
A
A
A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 27
Left recursion elimination
→ | → ’
’ ’|
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 28
Examples: Left recursion elimination
EE+T | T
ETE’
E’+TE’ | ε
TT*F | F
TFT’
T’*FT’ | ε
XX%Y | Z
XZX’
X’%YX’ | ε
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 29
Problems in Top-down Parsing
Left factoring
A 1| 2| 3
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing.
It is used to remove nondeterminism from the grammar.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 31
Left factoring
!
→ | δ →
!
→ |
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 32
Example: Left factoring
SaAB | aCD
SaS’
S’AB | CD
A xByA | xByAzA | a
A xByAA’ | a
A’ Є | zA
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 33
Rules to compute first of non terminal
1. If → and is terminal, add to "#$ % .
2. If → ∈, add ∈ to "#$ % .
3. If & is nonterminal and &'1 '2 … . ') is a production, then place * in "#$ % & if for some
+, a is in "#$ % '+ , and is in all of "#$ % '1 , … … … , "#$ % ',-. ; that is '1 … ',-. ⇒
. If is in "#$ % '1 for all 1 1,2, … . . , ) then add to "#$ % & .
Everything in "#$ % '1 is surely in "#$ % & If '1 does not derive , then we do nothing
more to "#$ % & , but if '1 ⇒ , then we add "#$ % '2 and so on.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 35
Rules to compute first of non terminal
Simplification of Rule 3
If → '. '2 … … . . '3 ,
• If '. does not derives ∈ 4ℎ67, "#$ % "#$ % '.
• If '. derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 U "#$ % '2
• If '. & Y2 derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 U "#$ % '2 8 : "#$ % ';
• If '. , Y2 & Y3 derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 : "#$ % '2 8 : "#$ % '; 8 : "#$ % '<
• If '. , Y2 , Y3 …..YK all derives ∈ 4ℎ67,
"#$ % "#$ % '. 8 : "#$ % '2 8 : "#$ % '; 8 : "#$ % '< 8
: … … … … "#$ % '= (note: if all non terminals derives ∈ then add ∈ to FIRST(A))
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 36
Rules to compute FOLLOW of non terminal
1. Place $ +7 ?@AA@B . S is start symbol)
2. If A → D , then everything in "#$ % except for is placed in "NOON D
3. If there is a production A → D or a production A → D where "#$ % contains then
everything in FNOON D "NOON
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 37
Example-1: First & Follow
Compute FIRST ETE’
First(E) E’+TE’ | ϵ
E T E’ Rule 3 TFT’
ETE’ A Y1 Y2 First(A)=First(Y1) T’*FT’ | ϵ
F(E) | id
FIRST(E)=FIRST(T) = {(, id }
NT First
First(T) E { (,id }
T F T’ Rule 3 E’
TFT’
A Y1 Y2 First(A)=First(Y1)
T { (,id }
FIRST(T)=FIRST(F)= {(, id } T’
First(F) F { (,id }
F(E) Fid
F ( E ) F id
A Rule 1 A Rule 1
add to "#$ % add to "#$ %
FIRST(F)={ ( , id }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 38
Example-1: First & Follow
Compute FIRST ETE’
First(E’) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
E’ + T E’ Rule 1
add to "#$ % NT First
A
E { (,id }
E’ E’ { +, }
T { (,id }
T’
E’ Rule 2
F { (,id }
A add to "#$ %
FIRST(E’)={ + , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 39
Example-1: First & Follow
Compute FIRST ETE’
First(T’) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
T’ * F T’ Rule 1
add to "#$ % NT First
A
E { (,id }
T’ E’ { +, }
T { (,id }
T’ { *, }
T’ Rule 2
F { (,id }
A add to "#$ %
FIRST(T’)={ * , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 40
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(E) E’+TE’ | ϵ
TFT’
Rule 1: Place $ in FOLLOW(E) T’*FT’ | ϵ
F(E) | id
F(E)
NT First Follow
E { (,id } { $,) }
E’ { +, }
F ( E ) Rule 2 T { (,id }
A Q B R
T’ { *, }
F { (,id }
FOLLOW(E)={ $, }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 41
Example-1: First & Follow
ETE’
Compute FOLLOW E’+TE’ | ϵ
FOLLOW(E’) TFT’
T’*FT’ | ϵ
ETE’ F(E) | id
NT First Follow
E T E’ Rule 3 E { (,id } { $,) }
A Q B
E’ { +, } { $,) }
T { (,id }
E’+TE’ T’ { *, }
F { (,id }
E’ +T E’ Rule 3
A Q B
FOLLOW(E’)={ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 42
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
ETE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E T E’ Rule 2 E { (,id } { $,) }
A B R
E’ { +, } { $,) }
T { (,id }
T’ { *, }
F { (,id }
E T E’ Rule 3
A B R
FOLLOW(T)={ +, $, )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 43
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E’ + T E’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, }
F { (,id }
E’ + T E’ Rule 3
A Q B R
FOLLOW(T)={ +, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 44
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(T’) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T F T’ Rule 3 E { (,id } { $,) }
A Q B
E’ { +, } { $,) }
T’*FT’ T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T’ *F T’ Rule 3
A Q B
FOLLOW(T’)={+ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 45
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T F T’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T F T’ Rule 3
A Q B R
FOLLOW(F)={ *, + ,$ , )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 46
Example-1: First & Follow
Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T’ * F T’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
T’ * F T’ Rule 3
A Q B R
FOLLOW(F)={ *,+, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 47
Example-2: First & Follow
SABCDE
A a |
B b |
C c NT First Follow
D d | S {a,b,c} {$}
E e | A {a, } {b, c}
B {b, } {c}
C {c} {d, e, $}
D {d, } {e, $}
E {e, } {$}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 48
Parsing Methods
Parsing
Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 49
LL(1) parser (Predictive parser or Non recursive descent parser)
LL(1) is non recursive top down parser.
1. First L indicates input is scanned from left to right.
2. The second L means it uses leftmost derivation for input string
3. 1 means it uses only input symbol to predict the parsing process.
a + b $ INPUT
X
Predictive
Y
Stack parsing OUTPUT
Z program
$
Parsing table M
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 51
Rules to construct predictive parsing table
1. For each production → of the grammar, do steps 2 and 3.
2. For each terminal * in ?+ST4 , Add → to UV , *W.
3. If is in ?+ST4 , Add → to UV , XW for each terminal X in "NOON . If is in
?+ST4 , and $ is in "NOON , add → to UV , $W.
4. Make each undefined entry of M be error.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 52
Example-1: LL(1) parsing
SaBa
BbB | ϵ
NT First
Step 1: Not required S {a}
Step 2: Compute FIRST B {b, }
First(S) S a B a Rule 1
SaBa A add to "#$ %
FIRST(S)={ a }
First(B)
BbB B
B b B B
Rule 1
A A
add to "#$ % Rule 2
add to "#$ %
FIRST(B)={ b , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 53
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 2: Compute FOLLOW B {b, } {a}
Follow(S)
Rule 1: Place $ in FOLLOW(S)
Follow(S)={ $ }
Follow(B)
SaBa BbB
S a B a Rule 2 B b B Rule 3
A Q B R First(β 8 A Q B Follow(A)=follow(B)
Follow(B)={ a }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 54
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}
NT Input Symbol
a b $
S SaBa
B
SaBa
Rule: 2
a=FIRST(aBa)={ a } A
a = first( )
M[S,a]=SaBa M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 55
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}
NT Input Symbol
a b $
S SaBa
B BbB
BbB
Rule: 2
a=FIRST(bB)={ b } A
a = first( )
M[B,b]=BbB M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 56
Example-1: LL(1) parsing
SaBa
BbB | ϵ NT First Follow
S {a} {$}
Step 3: Prepare predictive parsing table B {b, } {a}
NT Input Symbol
a b $
S SaBa Error Error
B Bϵ BbB Error
Bϵ
Rule: 3
b=FOLLOW(B)={ a } A
b = follow(A)
M[B,a]=B M[A,b] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 57
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(S) B {b, }
SaB S C {c, }
S a B S
Rule 1 Rule 2
A add to "#$ % A add to "#$ %
FIRST(S)={ a , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 58
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(B) B {b, }
BbC B C {c, }
B b C B
Rule 1 Rule 2
A add to "#$ % A add to "#$ %
FIRST(B)={ b , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 59
Example-2: LL(1) parsing
SaB | ϵ
BbC | ϵ
CcS | ϵ
Step 1: Not required
NT First
Step 2: Compute FIRST S { a, }
First(C) B {b, }
CcS C C {c, }
C c S C
Rule 1 Rule 2
A add to "#$ % A add to "#$ %
FIRST(B)={ c , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 60
Example-2: LL(1) parsing
Step 2: Compute FOLLOW
Follow(S) Rule 1: Place $ in FOLLOW(S)
Follow(S)={ $ }
CcS SaB | ϵ
BbC | ϵ
C c S Rule 3 CcS | ϵ
A Q B Follow(A)=follow(B)
Follow(S)=Follow(C) ={$}
NT First Follow
S {a, } {$}
BbC SaB B {b, } {$}
C {c, } {$}
B b C Rule 3 S a B Rule 3
A Q B Follow(A)=follow(B) A Q B Follow(A)=follow(B)
Follow(C)=Follow(B) ={$} Follow(B)=Follow(S) ={$}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 61
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a, } {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB
B
C
SaB Rule: 2
A
a=FIRST(aB)={ a } a = first( )
M[S,a]=SaB M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 62
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB S
B
C
S Rule: 3
A
b=FOLLOW(S)={ $ } b = follow(A)
M[S,$]=S M[A,b] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 63
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB S
B BbC
C
BbC Rule: 2
A
a=FIRST(bC)={ b } a = first( )
M[B,b]=BbC M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 64
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB S
B BbC B
C
B Rule: 3
A
b=FOLLOW(B)={ $ } b = follow(A)
M[B,$]=B M[A,b] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 65
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB S
B BbC B
C CcS
CcS Rule: 2
A
a=FIRST(cS)={ c } a = first( )
M[C,c]=CcS M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 66
Example-2: LL(1) parsing
SaB | ϵ
NT First Follow
BbC | ϵ
S {a} {$}
CcS | ϵ
B {b, } {$}
Step 3: Prepare predictive parsing table C {c, } {$}
N Input Symbol
T a b c $
S SaB Error Error S
B Error BbB Error B
C Error Error CcS C
C Rule: 3
A
b=FOLLOW(C)={ $ } b = follow(A)
M[C,$]=C M[A,b] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 67
Example-3: LL(1) parsing
EE+T | T
TT*F | F
F(E) | id
Step 1: Remove left recursion
ETE’
E’+TE’ | ϵ
TFT’
T’*FT’ | ϵ
F(E) | id
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 68
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(E) E’+TE’ | ϵ
E T E’ Rule 3 TFT’
ETE’ A Y1 Y2 First(A)=First(Y1) T’*FT’ | ϵ
F(E) | id
FIRST(E)=FIRST(T) = {(, id }
NT First
First(T) E { (,id }
T F T’ Rule 3 E’
TFT’
A Y1 Y2 First(A)=First(Y1)
T { (,id }
FIRST(T)=FIRST(F)= {(, id } T’
First(F) F { (,id }
F(E) Fid
F ( E ) F id
A Rule 1 A Rule 1
add to "#$ % add to "#$ %
FIRST(F)={ ( , id }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 69
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(E’) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
E’ + T E’ Rule 1
add to "#$ % NT First
A
E { (,id }
E’ E’ { +, }
T { (,id }
T’
E’ Rule 2
F { (,id }
A add to "#$ %
FIRST(E’)={ + , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 70
Example-3: LL(1) parsing
Step 2: Compute FIRST ETE’
First(T’) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
T’ * F T’ Rule 1
add to "#$ % NT First
A
E { (,id }
T’ E’ { +, }
T { (,id }
T’ { *, }
T’ Rule 2
F { (,id }
A add to "#$ %
FIRST(T’)={ * , }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 71
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(E) E’+TE’ | ϵ
TFT’
Rule 1: Place $ in FOLLOW(E) T’*FT’ | ϵ
F(E) | id
F(E)
NT First Follow
E { (,id } { $,) }
E’ { +, }
F ( E ) Rule 2 T { (,id }
A Q B R
T’ { *, }
F { (,id }
FOLLOW(E)={ $, }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 72
Example-3: LL(1) parsing
ETE’
Step 2: Compute FOLLOW E’+TE’ | ϵ
FOLLOW(E’) TFT’
T’*FT’ | ϵ
ETE’ F(E) | id
NT First Follow
E T E’ Rule 3 E { (,id } { $,) }
A Q B
E’ { +, } { $,) }
T { (,id }
E’+TE’ T’ { *, }
F { (,id }
E’ +T E’ Rule 3
A Q B
FOLLOW(E’)={ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 73
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
ETE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E T E’ Rule 2 E { (,id } { $,) }
A B R
E’ { +, } { $,) }
T { (,id }
T’ { *, }
F { (,id }
E T E’ Rule 3
A B R
FOLLOW(T)={ +, $, )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 74
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T) E’+TE’ | ϵ
TFT’
E’+TE’ T’*FT’ | ϵ
F(E) | id
NT First Follow
E’ + T E’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, }
F { (,id }
E’ + T E’ Rule 3
A Q B R
FOLLOW(T)={ +, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 75
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(T’) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T F T’ Rule 3 E { (,id } { $,) }
A Q B
E’ { +, } { $,) }
T’*FT’ T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T’ *F T’ Rule 3
A Q B
FOLLOW(T’)={+ $,) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 76
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
TFT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T F T’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id }
T F T’ Rule 3
A Q B R
FOLLOW(F)={ *, + ,$ , )
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 77
Example-3: LL(1) parsing
Step 2: Compute FOLLOW ETE’
FOLLOW(F) E’+TE’ | ϵ
TFT’
T’*FT’ T’*FT’ | ϵ
F(E) | id
NT First Follow
T’ * F T’ Rule 2 E { (,id } { $,) }
A Q B R
E’ { +, } { $,) }
T { (,id } { +,$,) }
T’ { *, } { +,$,) }
F { (,id } {*,+,$,)}
T’ * F T’ Rule 3
A Q B R
FOLLOW(F)={ *,+, $, ) }
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 78
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
ETE’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(TE’)={ (,id } A
a = first( )
M[E,(]=ETE’ M[A,a] = A
M[E,id]=ETE’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 79
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
E’+TE’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(+TE’)={ + } A
a = first( )
M[E’,+]=E’+TE’ M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 80
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
E’ F { (,id } {*,+,$,)}
Rule: 3
b=FOLLOW(E’)={ $,) } A
b = follow(A)
M[E’,$]=E’ M[A,b] = A
M[E’,)]=E’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 81
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
TFT’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(FT’)={ (,id } A
a = first( )
M[T,(]=TFT’ M[A,a] = A
M[T,id]=TFT’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 82
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
TFT’
NT Input Symbol
T’*FT’ | ϵ
id + * ( ) $ F(E) | id
E ETE’ ETE’
E’ E’+TE’ E’ E’ NT First Follow
T TFT’ TFT’ E { (,id } { $,) }
T’ T’*FT’ E’ { +, } { $,) }
F T { (,id } { +,$,) }
T’ { *, } { +,$,) }
T’*FT’ F { (,id } {*,+,$,)}
Rule: 2
a=FIRST(*FT’)={ * } A
a = first( )
M[T’,*]=T’*FT’ M[A,a] = A
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 83
Example-3: LL(1) parsing
Step 3: Construct predictive parsing table ETE’
E’+TE’ | ϵ
NT Input Symbol TFT’
id + * ( ) $ T’*FT’ | ϵ
E ETE’ ETE’ F(E) | id
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 87
Example-3: LL(1) parsing
Step 4: Parse the string : id + id * id $ NT Input Symbol
id + * ( ) $
STACK INPUT OUTPUT
E ETE’ Error Error ETE’ Error Error
E$ id+id*id$
E’ Error E’+TE’ Error Error E’ E’
TE’$ id+id*id$ ETE’
T TFT’ Error Error TFT’ Error Error
FT’E’$ id+id*id$ TFT’
T’ Error T’ T’*FT’ Error T’ T’
idT’E’$ id+id*id$ Fid
F Fid Error Error F(E) Error Error
T’E’$ +id*id$
E’$ +id*id$ T’
+TE’$ +id*id$ E’+TE’
TE’$ id*id$ FT’E’$ id$
FT’E’$ id*id$ TFT’ idT’E’$ id$ Fid
idT’E’$ id*id$ Fid T’E’$ $
T’E’$ *id$ E’$ $ T’
*FT’E’$ *id$ T*FT’ $ $ E’
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 88
Parsing methods
Parsing
Parsing without
backtracking (predictive LR parsing
parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 89
Recursive descent parsing
A top down parsing that executes a set of recursive procedure to process the input without
backtracking is called recursive descent parser.
There is a procedure for each non terminal in the grammar.
Consider RHS of any production rule as definition of the procedure.
As it reads expected input symbol, it advances input pointer to next position.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 90
Example: Recursive descent parsing
Procedure E Proceduce Match(token t)
Procedure T
{ {
{
If lookahead=num If lookahead=t
If lookahead=’*’
{ lookahead=next_token;
{
Match(num); Else
Match(‘*’);
T(); Error();
If lookahead=num
} }
{
Else
Match(num);
Error();
T(); Procedure Error
} {
If lookahead=$
Else Print(“Error”);
{
Error(); }
Declare success;
}
}
Else
Else
Error();
}
NULL E num T
} T * num T |
3 * 4 $ Success
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 91
Example: Recursive descent parsing
Procedure E Procedure T Proceduce Match(token t)
{ { {
If lookahead=num If lookahead=’*’ If lookahead=t
{ { lookahead=next_token;
Match(num); Match(‘*’); Else
T(); If lookahead=num Error();
} { }
Else Match(num);
Error(); T(); Procedure Error
If lookahead=$ } {
{ Else Print(“Error”);
Declare success; Error(); }
}
Else }
Error(); Else
} NULL E num T
} T * num T |
3 * 4 $ Success 3 4 * $ Error
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 92
Handle & Handle pruning
Handle: A “handle” of a string is a substring of the string that matches the right side of a
production, and whose reduction to the non terminal of the production is one step along the
reverse of rightmost derivation.
Handle pruning: The process of discovering a handle and reducing it to appropriate left hand
side non terminal is known as handle pruning.
EE+E
EE*E String: id1+id2*id3
Eid
Rightmost Derivation Right sentential form Handle Production
id1+id2*id3 id1 Eid
E
E+E E+id2*id3 id2 Eid
E+E*E E+E*id3 id3 Eid
E+E*id3 E+E*E E*E EE*E
E+id2*id3 E+E E+E EE+E
id1+id2*id3 E
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 94
Shift reduce parser
The shift reduce parser performs following basic operations:
1. Shift: Moving of the symbols from input buffer onto the stack, this action is called shift.
2. Reduce: If handle appears on the top of the stack then reduction of it by appropriate rule is
done. This action is called reduce action.
3. Accept: If stack contains start symbol only and input buffer is empty at the same time then
that action is called accept.
4. Error: A situation in which parser cannot either shift or reduce the symbols, it cannot even
perform accept action then it is called error action.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 95
Example: Shift reduce parser
Grammar: Stack Input Buffer Action
EE+T | T $ id+id*id$ Shift
TT*F | F $id +id*id$ Reduce Fid
Fid $F +id*id$ Reduce TF
String: id+id*id $T +id*id$ Reduce ET
$E +id*id$ Shift
$E+ id*id$ Shift
$E+id *id$ Reduce Fid
$E+F *id$ Reduce TF
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ Reduce Fid
$E+T*F $ Reduce TT*F
$E+T $ Reduce EE+T
$E $ Accept
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 96
Viable Prefix
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce
parser are called viable prefixes.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 97
Parsing Methods
Parsing
Parsing without
backtracking (predictive LR parsing
Parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 98
Operator precedence parsing
Operator Grammar: A Grammar in which there is no Є in RHS of any production or no adjacent
non terminals is called operator grammar.
Example: E EAE | (E) | id
A + | * | -
Above grammar is not operator grammar because right side EAE has consecutive non
terminals.
In operator precedence parsing we define following disjoint relations:
Relation Meaning
a<.b a “yields precedence to” b
a=b a “has the same precedence as” b
a.>b a “takes precedence over” b
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 100
Precedence & associativity of operators
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 101
Steps of operator precedence parsing
1. Find Leading and trailing of non terminal
2. Establish relation
3. Creation of table
4. Parse the string
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 102
Leading & Trailing
Leading:- Leading of a non terminal is the first terminal or operator in production of that non
terminal.
Trailing:- Trailing of a non terminal is the last terminal or operator in production of that non
terminal.
Example: EE+T | T
TT*F | F
Fid
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 103
Rules to establish a relation
.
1. For a = b, ⇒ * X, where is or a single non terminal [e.g : (E)]
2. a <.b ⇒ N[ . % 4ℎ67 N[ \. O6*]+7^ % [e.g : +T]
3. a .>b ⇒ % . N[ 4ℎ67 %S*+A+7^ % . _ N[ [e.g : E+]
4. $ <. Leading (start symbol)
5. Trailing (start symbol) .> $
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 104
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E +T| T
Nonterminal Leading Trailing T T *F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 105
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E+ T| T
Nonterminal Leading Trailing T T* F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 106
Example: Operator precedence parsing
Step 1: Find Leading & Trailing of NT
E E+ T| T
Nonterminal Leading Trailing T T* F| F
E {+,*,id} {+,*,id} F id
T {*,id} {*,id}
F {id} {id}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 107
Example: Operator precedence parsing
Step 4: Parse the string using precedence table
Assign precedence operator between terminals
String: id+id*id
+ * id $
$ id+id*id $ + .> <. <. .>
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 108
Example: Operator precedence parsing
Step 4: Parse the string using precedence table EE+T | T
1. Scan the input string until first .> is encountered. TT*F | F
Fid
2. Scan backward until <. is encountered.
3. The handle is string between <. and .>
$ <. Id .> + <. Id .> * <. Id .> $ Handle id is obtained between <. and .> + * id $
Reduce this by Fid + .> <. <. .>
$ F + <. Id .> * <. Id .> $ Handle id is obtained between <. and .> .> .>
Reduce this by Fid
* <. .>
$ $ Parsing Done
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 109
Operator precedence function
Algorithm for constructing precedence functions
1. Create functions ?* and ^* for each a that is terminal or $.
2. Partition the symbols in as many as groups possible, in such a way that ?* and ^X are in the same
group if * X.
3. Create a directed graph whose nodes are in the groups, next for each symbols * *7] X do:
a) if * \d X, place an edge from the group of ^X to the group of ?*
b) if * d_ X, place an edge from the group of ?* to the group of ^X
4. If the constructed graph has a cycle then no precedence functions exist. When there are no cycles
collect the length of the longest paths from the groups of ?* and ^X respectively.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 111
Operator precedence function
1. Create functions fa and ga for each a that is terminal or $. E E+T | T
T T*F | F
F id
* b`,∗, +]} @S $
f+ f* fid f$
g+ g* gid g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 112
Operator precedence function
Partition the
.
symbols in as many as groups possible, in such a way that fa and gb are in the same
group if a = b.
+ * id $
+ .> <. <. .>
gid fid .> .>
* <. .>
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 113
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb
g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>
f+ .> g+ f+ g+
g+ f+
f* .> g+ f* g+
fid .> g+ fid g+
f$ g$ f$ <. g+ f$ g+
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 114
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb
g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>
f+ <. g* f+ g*
g+ f+
f* .> g* f* g*
fid .> g* fid g*
f$ g$ f$ <. g* f$ g*
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 115
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb
g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 116
Operator precedence function
3. if a <· b, place an edge from the group of gb to the group of fa
if a ·> b, place an edge from the group of fa to the group of gb
g
+ * id $
+ .> <. <. .>
gid fid
f * .> .> <. .>
f+ <. g$ f+ g$
g+ f+
f* <. g$ f* g$
fid <. g$ fid g$
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 117
Operator precedence function
+ * id $
f 2
gid fid g
f* g*
4. If the constructed graph has
a cycle then no precedence
g+ f+ functions exist. When there
are no cycles collect the
length of the longest paths
f$ g$ from the groups of fa and gb
respectively.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 118
Operator precedence function
+ * id $
gid fid f 2
g 1
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 119
Operator precedence function
+ * id $
gid fid f 2 4
g 1
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 120
Operator precedence function
+ * id $
gid fid f 2 4
g 1 3
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 121
Operator precedence function
+ * id $
gid fid f 2 4 4
g 1 3
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 122
Operator precedence function
+ * id $
gid fid f 2 4 4
g 1 3 5
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 123
Operator precedence function
+ * id $
gid fid f 2 4 4 0
g 1 3 5 0
f* g*
g+ f+
f$ g$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 124
Parsing Methods
Parsing
Parsing without
backtracking (predictive LR parsing
Parsing)
SLR
LL(1)
CLR
Recursive
descent LALR
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 125
LR parser
LR parsing is most efficient method of bottom up parsing which can be used to parse large
class of context free grammar.
The technique is called LR(k) parsing:
1. The “L” is for left to right scanning of input symbol,
2. The “R” for constructing right most derivation in reverse,
3. The “k” for the number of input symbols of look ahead that are used in making parsing
decision. a + b $ INPUT
X
LR parsing
Y
program OUTPUT
Z
$
Parsing Table
Action Goto
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 127
Computation of closure & goto function
SAS | b
S’S.
ASA | a
Closure(I): SA.S
S.AS
S.b
A.SA
A.a
S’.S
S.AS Goto(I,b)
S.b Sb.
A.SA AS.A
A.a A.SA
A.a
S.AS
S.b
Aa.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 129
Example: SLR(1)- simple LR
S AA
S AA . e5
A aA | b S’ S. e
@ 4@ #2, A a . A
e
ef @ 4@ #0, A. aA e3
S A . A A. b
S’.S
A. aA
S. AA
A. b A b. e4
A. aA @ 4@ #2, X
A. b
e3 A aA . e6 LR(0) item set
Augmented
grammar A a . A
@ 4@ #0, X A. aA @ 4@ #3, * A a . A
A. b e3
A b. e4 A. aA
A. b
A b. e4
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 131
Rules to construct SLR parsing table
1. Construct h b #0, #1, … … . #7}, the collection of sets of LR(0) items for ’.
2. State + is constructed from #+ . The parsing actions for state + are determined as follow :
a) If V → . * W is in #+ and GOTO #+ , * #1 , then set h%#N V+, *W to “shift j”. Here a must be
terminal.
b) If V → . W is in #+ , then set h%#N V+, *W to “reduce A ” for all a in "NOON ; here A may not be
S’.
c) If V → . W is in #+ , then set action V+, $W to “accept”.
3. The goto transitions for state i are constructed for all non terminals A using
the+? N%N #+ , #1 4ℎ67 N%N V+, W 1.
4. All entries not defined by rules 2 and 3 are made error.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 132
Example: SLR(1)- simple LR
S AA . e5 "@AA@B b$}
S’ S. e
@ 4@ #2, A a . A "@AA@B b*, X, $}
e
ef @ 4@ #0, A. aA e3
S A . A A. b Action Go to
S’. S
A. aA Item a b $ S A
S. AA
A. b A b. e4 set
A. aA @ 4@ #2, X
0 S3 S4 1 2
A. b
e3 A aA . e6 1 Accept
2 S3 S4 5
A a . A
3 S3 S4 6
@ 4@ #0, X A. aA @ 4@ #3, * A a . A 4 R3 R3 R3
A. b e3
A b. e4 A. aA
5 R1
A. b
6 R2 R2 R2
S AA
A aA | b A b. e4
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 133
How to calculate look ahead?
How to calculate look ahead?
SCC
S’ . S , $
C cC | d
A . X i , j
Closure(I)
Lookahead First ij
S’.S,$ First $
$
S.CC, $
C.cC, c|d S . C C , $
C.d, c|d A . X i , j
Lookahead First ij
First h$
q, r
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 135
Example: CLR(1)- canonical LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,S
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X
e8
A.b, a|b
e3 A aA.,a|b LR(1) item set
Augmented
grammar Aa.A, a|b e3
@ 4@ #0, X A.aA ,a|b @ 4@ #3, * A a.A , a|b
A. b, a|b A.aA , a|b
A b., a|b e4
A.b , a|b
S AA
A aA | b e4 A b., a|b
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 136
Example: CLR(1)- canonical LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,$
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X Item Action Go to
e8 set a b $ S A
A.b, a|b
e3 A aA.,a|b 0 S3 S4 1 2
1 Accept
Aa.A, a|b e3 2 S6 S7 5
@ 4@ #0, X A.aA ,a|b @ 4@ #3, * A a.A , a|b 3 S3 S4 8
A. b, a|b 4 R3 R3
A b., a|b e4 A.aA , a|b
5 R1
A.b , a|b 6 S6 S7 9
S AA 7 R3
A aA | b 8 R2 R2
e4 A b., a|b
9 R2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 137
Example: LALR(1)- look ahead LR
e5 e6 e9
S AA. ,$ A aA.,$
S’ S., $ e A a.A,$ e6
@ 4@ #2, A a.A,$
e A. aA,$
ef @ 4@ #0, A. aA,$
A. b, $
S A.A,$ A. b, $
S’.S,$ A b. ,$
A.aA, $ e7 e7
S.AA,$
A. b, $ A b. ,$
A.aA, a|b @ 4@ #2, X e36 CLR
e8
A.b, a|b
e3 A aA.,a|b Aa.A, a|b|$
e3 A.aA , a|b|$
Aa.A, a|b
@ 4@ #0, X
A. b, a|b|$
A.aA ,a|b @ 4@ #3, * A a.A , a|b
A. b, a|b A.aA , a|b e47
A b., a|b e4
A.b , a|b A b., a|b|$
S AA e89
A aA | b e4 A b., a|b A aA.,a|b|$
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 139
Example: LALR(1)- look ahead LR
Item Action Go to
set a b $ S A
0 S3 S4 1 2 Item Action Go to
1 Accept set a b $ S A
2 S6 S7 5 0 S36 S47 1 2
3 S3 S4 8 1 Accept
4 R3 R3 2 S36 S47 5
5 R1 36 S36 S47 89
6 S6 S7 9 47 R3 R3 R3
5 R1
7 R3
89 R2 R2 R2
8 R2 R2
9 R2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 140
(YACC)
YACC tool or YACC Parser Generator
YACC is a tool which generates the parser.
It takes input from the lexical analyzer (tokens) and produces parse tree as an output.
Yacc
Yacc Compiler y.tab.c
specification
(translate.y)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 142
Structure of Yacc Program
Any Yacc program contains mainly three sections
1. Declaration
2. Translation rules
3. Supporting C-routines
Structure of Program
<left side><alt 1>|<alt 2>|……..|<alt n>
Declaration It is used to declare variables, constant & Header files
%%
Example:
<left side> : <alt 1> {semantic action 1}
%{
%% | <alt 2> {semantic action 2}
int x,y;
Translation rule | <alt n> {semantic action n}
const int digit=50;
%% %%
#include <ctype.h>
%}
Supporting C routines All the function needed are specified over here.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit 31 –– Syntax
Basic Probability
Analysis (I) 143
Example: Yacc Program
Program: Write Yacc program for simple desk calculator
/* Declaration */ /* Translation rule */ /* Supporting C routines*/
%{ %% yylex()
#include <ctype.h> line : expr ‘\n’ {print(“%d\n”,$1);} {
%} expr : expr ‘+’ term {$$=$1 + $3;} int c;
% token DIGIT | term; c=getchar();
term : term ‘*’ factor {$$=$1 * $3;} if(isdigit(c))
| factor; {
factor : ‘(‘ expr ‘)’ {$$=$2;} yylval= c-’0’
| DIGIT; return DIGIT
%% }
EE+T | T return c;
TT*F | F }
F(E) | id
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 144
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (I) 145
Compiler Design (CD)
GTU # 3170701
Unit – 3
Syntax Analysis (II)
E. Value
Memory
Type
Return location
Type
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 4
Synthesized attributes
Value of synthesized attribute at a node can be computed from the value of attributes at the children of that
node in the parse tree.
A syntax directed definition that uses synthesized attribute exclusively is said to be S-attribute definition.
Example: Syntax directed definition of simple desk calculator
Production Semantic rules
L En Print (E.val)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 5
Example: Synthesized attributes
String: 3*5+4n;
Production Semantic rules
L
L En Print (E.val)
n
E.val=19
E E1+T E.Val = E1.val + T.val
F.val=3 digit.lexval=5
parse tree showing the value
digit.lexval=3 of the attributes at each node
is called Annotated parse tree
Annotated parse tree for 3*5+4n
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 6
Exercise
Draw Annotated Parse tree for following:
1. 7+3*2n
2. (3+4)*(5+6)n
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 7
Syntax directed definition to translates arithmetic expressions from infix to prefix notation
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 8
Inherited attribute
An inherited value at a node in a parse tree is computed from the value of attributes at the
parent and/or siblings of the node.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 9
Example: Inherited attribute
Example: Pass data types to all identifier real id1,id2,id3
D
Production Semantic rules
D→TL L.in = T.type LL.in=real
T
T.type=real
T → int T.type = integer
real ,
T → real T.type = real id
id3
L1
L.in=real
L → L1 , id L1.in = L.in,
addtype(id.entry,L.in)
,
L → id addtype(id.entry,L.in) L.in=real
L1 id2
id
id
id1
L → Lid
DTL 1 , id
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 10
Dependency graph
The directed graph that represents the interdependencies between synthesized and inherited
attribute at nodes in the parse tree is called dependency graph.
For the rule XYZ the semantic action is given by X.x=f(Y.y, Z.z) then synthesized attribute X.x
depends on attributes Y.y and Z.z.
The basic idea behind dependency graphs is for a compiler to look for various kinds of
dependency among statements to prevent their execution in wrong order.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 12
Algorithm : Dependency graph
for each node n in the parse tree do
for each attribute a of the grammar symbol at n do
Construct a node in the dependency graph for a;
for each node n in the parse tree do
for each semantic rule b=f(c1,c2,…..,ck)
associated with the production used at n do
for i=1 to k do
construct an edge from the node for Ci to the node for b;
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 13
Example: Dependency graph
Example: Production Semantic rules
EE1+E2 EE1+E2 E.val = E1.val+E2.val
E1 E2
val + val
The edges to E.val from E1.val and E2.val shows that E.val is depends on E1.val and E2.val
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 14
Evaluation order
A topological sort of a directed acyclic graph is any ordering 1, 2, … … … . . , of the nodes
of the graph such that edges go from nodes earlier in the ordering to later nodes.
If is an edge from to then appears before in the ordering.
1 T.type=real L.in=real 2
real 3 ,
L.in=real id3 4
,
L.in=real id2 6
5
7 id1
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 15
Construction of syntax tree
Following functions are used to create the nodes of the syntax tree.
1. Mknode (op,left,right): creates an operator node with label op and two fields containing
pointers to left and right.
2. Mkleaf (id, entry): creates an identifier node with label id and a field containing entry, a
pointer to the symbol table entry for the identifier.
3. Mkleaf (num, val): creates a number node with label num and a field containing val, the
value of the number.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 17
Construction of syntax tree for expressions
Example: construct syntax tree for a-4+c
P1: mkleaf(id, entry for a); P5 +
P2: mkleaf(num, 4);
P3: mknode(‘-‘,p1,p2);
P4: mkleaf(id, entry for c); P3 id
- P4
P5: mknode(‘+’,p3,p4);
Entry for c
P1 id P2 Num 4
Entry for a
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 18
Bottom up evaluation of S-attributed definitions
S-attributed definition is one such class of syntax directed definition with synthesized
attributes only.
Synthesized attributes can be evaluated using bottom up parser only.
Synthesized attributes on the parser stack
Consider the production AXYZ and associated semantic action is A.a=f(X.x, Y.y, Z.z)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 20
Bottom up evaluation of S-attributed definitions
Production Semantic rules Input State Val Production Used
L En Print (val[top]) 3*5n - -
E E1+T val[top]=val[top-2] + val[top] *5n 3 3
*5n F 3 Fdigit
ET
*5n T 3 TF
T T1*F val[top]=val[top-2] * val[top]
5n T* 3
TF n T*5 3,5
F (E) val[top]=val[top-2] - val[top] n T*F 3,5 Fdigit
F digit n T 15 TT1*F
n E 15 ET
Implementation of a desk calculator En 15
with bottom up parser L 15 L En
Move made by translator
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 21
L-Attributed definitions
A syntax directed definition is L-attributed if each inherited attribute of , 1 , on
the right side of 1, 2 … depends only on:
1. The attributes of the symbols 1, 2, … j-1 to the left of in the production and
2. The inherited attribute of A.
Example: Production Semantic Rules
A LM L.i:=l(A.i)
M.i=m(L.s)
AXYZ A.s=f(M.s)
A QR R.i=r(A.i)
Q.i=q(R.s)
NotL-
L-Attributed
Attributed
A.s=f(Q.s)
Above syntax directed definition is not L-attributed because the inherited attribute Q.i of the
grammar symbol Q depends on the attribute R.s of the grammar symbol to its right.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 23
Translation Scheme
Translation scheme is a context free grammar in which attributes are associated with the
grammar symbols and semantic actions enclosed between braces { } are inserted within the
right sides of productions.
Attributes are used to evaluate the expression along the process of parsing.
During the process of parsing the evaluation of attribute takes place by consulting the semantic
action enclosed in { }.
A translation scheme generates the output by executing the semantic actions in an ordered
manner.
This process uses the depth first traversal.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 25
Example: Translation scheme (Infix to postfix notation)
String: 9-5+2 ETR
R addop !". #$ $ $ R1 | %
E
T num & .' #
T R
- R
9 {Print(9)} T {Print(-)}
5 {Print(5)} + R
T {Print(+)}
2 {Print(2)} %
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 26
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006(CD)
(PS) Unit
Unit31––Syntax
Basic Probability
Analysis (II) 27
Compiler Design (CD)
GTU # 3170701
Unit – 4
Error Recovery
Errors
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 4
Lexical error
Lexical errors can be detected during lexical analysis phase.
Typical lexical phase errors are:
1. Spelling errors
2. Exceeding length of identifier or numeric constants
3. Appearance of illegal characters
Example:
fi ( )
{
}
In above code 'fi' cannot be recognized as a misspelling of keyword if rather lexical analyzer
will understand that it is an identifier and will return it as valid identifier.
Thus misspelling causes errors in token formation.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 5
Syntax error
Syntax error appear during syntax analysis phase of compiler.
Typical syntax phase errors are:
1. Errors in structure
2. Missing operators
3. Unbalanced parenthesis
The parser demands for tokens from lexical analyzer and if the tokens do not satisfies the
grammatical rules of programming language then the syntactical errors get raised.
Example:
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 6
Semantic error
Semantic error detected during semantic analysis phase.
Typical semantic phase errors are:
1. Incompatible types of operands
2. Undeclared variable
3. Not matching of actual argument with formal argument
Example:
id1=id2+id3*60 (Note: id1, id2, id3 are real)
(Directly we can not perform multiplication due to incompatible types of variables)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 7
Error recovery strategies (Ad-Hoc & systematic methods)
There are mainly four error recovery strategies:
1. Panic mode
2. Phrase level recovery
3. Error production
4. Global generation
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 9
Panic mode
In this method on discovering error, the parser discards input symbol one at a time. This
process is continued until one of a designated set of synchronizing tokens is found.
Synchronizing tokens are delimiters such as semicolon or end.
These tokens indicate an end of the statement.
If there is less number of errors in the same statement then this strategy is best choice.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 10
Phrase level recovery
In this method, on discovering an error parser performs local correction on remaining input.
The local correction can be:
1. Replacing comma by semicolon
2. Deletion of semicolons
3. Inserting missing semicolon
This type of local correction is decided by compiler designer.
This method is used in many error-repairing compilers.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 11
Error production
If we have good knowledge of common errors that might be encountered, then we can augment
the grammar for the corresponding language with error productions that generate the
erroneous constructs.
Then we use the grammar augmented by these error production to construct a parser.
If error production is used then, during parsing we can generate appropriate error message and
parsing can be continued.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 12
Global correction
Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a
related string y, such that number of insertions, deletions and changes of token require to
transform x into y is as small as possible.
Such methods increase time and space requirements at parsing time.
Global correction is thus simply a theoretical concept.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 13
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 4––Basic
ErrorProbability
Recovery 14
Compiler Design (CD)
GTU # 3170701
Unit – 5
Intermediate Code Generation
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 4
Abstract syntax tree & DAG
A syntax tree depicts the natural hierarchical structure of a source program.
A DAG (Directed Acyclic Graph) gives the same information but in a more compact way
because common sub-expressions are identified.
Ex: a = b * -c+b*-c
= =
+
a +
a
* * *
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 5
Postfix Notation
Postfix notation is a linearization of a syntax tree.
In postfix notation the operands occurs first and then operators are arranged.
Ex: (A + B) * (C + D)
Postfix notation: A B + C D + *
Ex: (A + B) * C
Postfix notation: A B + C *
Ex: (A * B) + (C * D)
Postfix notation: A B * C D * +
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 6
Three address code
Three address code is a sequence of statements of the general form,
a:= b op c
Where a, b or c are the operands that can be names or constants and op stands for any
operator.
Example: a = b + c + d
t1=b+c
t2=t1+d
a= t2
Here t1 and t2 are the temporary names generated by the compiler.
There are at most three addresses allowed (two for operands and one for result). Hence, this
representation is called three-address code.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 7
Different Representation of Three Address Code
There are three types of representation used for three address code:
1. Quadruples
2. Triples
3. Indirect triples
Ex: x= -a*b + -a*b
t 1= - a
t2 = t1 * b
t 3= - a
Three Address Code
t4 = t3 * b
t5 = t2 + t4
x= t5
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 9
Quadruple
The quadruple is a structure with at the most four fields such as op, arg1, arg2 and result.
The op field is used to represent the internal code for operator.
The arg1 and arg2 represent the two operands.
And result field is used to store the result of an expression.
Quadruple
No. Operator Arg1 Arg2 Result
x= -a*b + -a*b
t1= - a (0) uminus a t1
t2 = t1 * b (1) * t1 b t2
t3= - a (2) uminus a t3
t4 = t3 * b (3) * t3 b t4
t5 = t2 + t4
(4) + t2 t4 t5
x= t5
(5) = t5 x
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 10
Triple
To avoid entering temporary names into the symbol table, we might refer a temporary value by
the position of the statement that computes it.
If we do so, three address statements can be represented by records with only three fields: op,
arg1 and arg2.
Quadruple Triple
No. Operator Arg1 Arg2 Result No. Operator Arg1 Arg2
(0) uminus a t1 (0) uminus a
(1) * t1 b t2 (1) * (0) b
(2) uminus a t3 (2) uminus a
(3) * t3 b t4 (3) * (2) b
(4) + t2 t4 t5 (4) + (1) (3)
(5) = t5 x (5) = x (4)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 11
Indirect Triple
In the indirect triple representation the listing of triples has been done. And listing pointers are
used instead of using statement.
This implementation is called indirect triples.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 12
Exercise
Write quadruple, triple and indirect triple for following:
1. -(a*b)+(c+d)
2. a*-(b+c)
3. x=(a+b*c)^(d*e)+f*g^h
4. g+a*(b-c)+(x-y)*d
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 13
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS)Unit 5 Unit
– Intermediate
1 – Basic Probability
Code Generation 14
Compiler Design (CD)
GTU # 3170701
Unit – 6
Run-Time Environments
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 4
Procedures
A program is made up of procedures.
Procedure: declaration that associate an identifier with a statement.
Identifier: procedure name
Statement: procedure body
Procedure call: procedure name appears within an executable statement.
Example:
main() void quicksort(int m, int n) void readarray()
{ { {
int n; int i=partition(m, n); ………
readarray(); quicksort(m, i-1); }
quicksort(1, n); quicksort(i+1, n); int partition(int y, int z)
} } {
……
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 5
Activation Tree
main() void quicksort(int m, int n)
If ‘a’ and ‘b’ are two procedures, their activations void
willreadarray()
be:
{ { {
Non-overlapping: when one is called after other
int n; nested procedure int i=partition(m, n);
Nested: ………
readarray();procedure: new quicksort(m,
Recursive activation i-1); }
begins before an earlier
activation ofn);
quicksort(1, the same procedure has ended.
quicksort(i+1, n); int partition(int y, int z) main
} An activation tree shows
} the way control enters
{ and leaves
readarray() quicksort(1,10)
activations. ……
} Partition(1,10) quicksort(6,10)
main() void quicksort(int m, int n) void readarray()
Properties of activation trees are :- quicksort(1,4)
{ { {
1. Each node represents an activation of a procedure.
int n; int i=partition(m, n); ……… Partition(1,4) quicksort(4,4)
2. The root shows the activation of quicksort(m,
readarray();
the main function.
i-1); } quicksort(1,2)
3. The node for procedure
quicksort(1, n); ‘a’ is thequicksort(i+1,
parent of noden); for procedure ‘b’ y, int z)
int partition(int Activation Tree
}
if and only if the control flows
}
from procedure a to procedure
{
b.
4. If node ‘a’ is left of the node ‘b’ if and only if the lifetime of ……
a
occurs before the lifetime of b.
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 6
Control stack
Control stack or runtime stack is used to keep track of the live procedure activations i.e the procedures
whose execution have not been completed.
A procedure name is pushed on to the stack when it is called (activation begins) and it is popped when it
returns (activation ends).
Information needed by a single execution of a procedure is managed using an activation record or
frame. When a procedure is called, an activation record is pushed into the stack and as soon as the
control returns to the caller function the activation record is popped.
main
Partition(1,4) quicksort(4,4)
quicksort(1,2)
Activation Tree
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 7
Scope of a declaration
A declaration in a language is a syntactic construct that associate information with a name.
Var i: integer;
There may be declaration of the same name in different parts of a program.
The scope rules of a language determine which declaration of a name applies when the name
appears in the text of a program.
The portion of the program, to which declaration applies is called the scope of the declaration.
An occurrence of a name in a procedure is said to be local to the procedure if it is in the scope
of a declaration within the procedure; otherwise the occurrence is said to be nonlocal.
The distinction between local and non local names carries over to any syntactic construct that
can have declaration within it.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 8
Binding of names
Environment: function that maps a name to a storage location.
State: function that maps a storage location to the value held there.
Environment State
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 9
Storage organization
The executing target program runs in it’s own logical address space in which each program
value has a location.
The management and organization of this logical address space is shared between the
compiler, operating system and target machine.
The operating system maps the logical address into the physical address, which are usually
spread throughout memory.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 11
Subdivision of Runtime Memory
The compiler demands for a block of memory to operating system.
The compiler utilizes this block of memory executing the compiled program. This block of
memory is called run time storage.
The run time storage is subdivided to hold code and data such as, the generated target code
and data objects.
The size of generated code is fixed. Hence the target code occupies the determined area of the
memory.
Code area
Static data area
Stack
Free
Heap
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 12
Subdivision of Runtime Memory
Code block consisting of a memory location for code.
The amount of memory required by the data objects is known at the compiled time and hence
data objects also can be placed at the statically determined area of the memory.
Stack is used to manage the active procedure.
Managing of active procedures means when a call occurs then execution of activation is
interrupted and information about status of the stack is saved on the stack.
Heap area is the area of run time storage in which the other information is stored.
Code area Memory location for code are determined at compile time
Static data area Location of static data can also be determine at compile time
Stack Data objects allocated at run time (Activation Record)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 13
Activation Record
Control stack is a run time stack which is used to keep track of the live procedure activations
i.e. it is used to find out the procedures whose execution have not been completed.
When it is called (activation begins) then the procedure name will push on to the stack and
when it returns (activation ends) then it will popped.
Activation record is used to manage the information needed by a single execution of a
procedure.
An activation record is pushed into the stack when a procedure is called and it is popped when
the control returns to the caller function.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 14
Activation Record
The execution of a procedure is called its activation.
An activation record contains all the necessary information required to call a procedure.
Return value: used by the called procedure to return a value to calling
procedure. Return value
Actual parameters: This field holds the information about the actual Actual parameters
parameters. Control link
Control link (optional): points to activation record of caller. Access link
Access link (optional): refers to non-local data held in other activation Machine status
records.
Local variable
Machine status: holds the information about status of machine just Temporary values
before the function call.
Local variables: hold the data that is local to the execution of the
procedure.
Temporary values: stores the values that arise in the evaluation of an
expression.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 15
Compile-Time Layout of Local Data
The amount of storage needed for a name is determined from its type. (e.g.: int, char, float…)
Storage for an aggregate, such as an array or record, must be large enough to hold all it’s
components.
The field of local data is laid out as the declarations in a procedure are examined at compile
time.
We keep a count of the memory locations that have been allocated for previous declarations.
From the count we determine a relative address of the storage for a local with respect to some
position such as the beginning of the activation record.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 16
Storage allocation strategies
The different storage allocation strategies are:
Code area
Static allocation: lays out storage for all data objects at compile time. Static data area
Stack allocation: manages the run-time storage as a stack. Stack
Heap allocation: allocates and de-allocates storage as needed at run
time from a data area known as heap. Heap
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 18
Static allocation
In static allocation, names are bound Example:
to storage as the program is compiled, int main()
so there is no need for a run-time
support package. {
Code for main
int num1=100, num2=200, res; code
Since the bindings do not change at Code for sum
run-time, every time a procedure is res = sum(num1, num2);
activated, its names are bounded to printf(“\n Addition is %d : “,res);
the same storage location. ina num1
return (0);
Limitation int num 2 Activation
} int res record for
1. Size of data object must be known at Static
int sum(int num1, int num2) data main
compile time.
{ ina num1
2. Recursive procedures are restricted. Activation
int num3; int num 2
3. Data structure can’t be created int num3 record for
dynamically. num3 = num1 + num2; sum
return (num3);
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 19
Stack allocation
All compilers for languages that use procedures, functions or methods as units of user define
actions manage at least part of their run-time memory as a stack.
Each time a procedure is called, space for its local variables is pushed onto a stack, and when
the procedure terminates, the space is popped off the stack.
Locals are bound to fresh storage in each activations.
Locals are deleted when the activation ends.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 20
Stack allocation
At run time, an activation record can be allocated by incrementing ‘top’ by the size of the record.
Deallocated by decrementing ‘top’ by the size of record.
main
Position in activation tree Activation record on the Remarks
stack ra() qs(1,10)
main main Frame for main
Int n pa(1,10)
qs(1,4)
main main Ra is activated
Int n pa(1,4)
ra qs(1,2)
ra()
Int i
Frame ra has been poped Activation Tree
main main main
and qs(1,10) is pushed.
Int n Int n
ra() qs(1,10) qs(1,10) qs(1,10)
Int i Int i
qs(1,4)
Int i
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 21
Stack allocation: Calling Sequences
Procedures calls are implemented by generating what are known as calling sequences in the
target code.
A call sequence allocates an activation record and enters the information into its fields.
A Return sequence restore the state of machine so the calling procedure can continue its
execution.
The code is calling sequence of often divided between the calling procedure (caller) and
procedure is calls (callee).
Parameter and return value Caller’s
Control link activation
Links and saved status record
Calling procedure
Temporaries and local data
Caller’s
Parameter and returned value responsibility
Control link
Links and saved status Callee’s
top_sp Called procedure activation
Temporaries and local data Callee’s record
responsibility
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 22
Stack allocation: Calling Sequences
The calling sequence and its division between caller and callee are as follows:
1. The caller (Calling procedure) evaluates the actual parameters.
2. The caller stores a return address and the old value of top_sp into the callee’s activation
record. The caller then increments the top_sp to the respective positions.
3. The callee (Called procedure) saves the register values and other status information.
4. The callee initializes its local data and begins execution.
Array A
Array B Arrays of p
Array C
top_sp Activation
Control link record for q
top Arrays of q
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 25
Stack allocation: Dangling Reference
When there is a reference to storage that has been deallocated.
It is a logical error to use dangling reference, since, the value of de-allocated storage is
undefined according to the semantics of most languages.
main()
{
int *p;
p=dengle();
}
int *dengle()
{
int i=10;
return &i;
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 26
Heap Allocation
The stack allocation strategy can not be used for following condition:
1. The values of the local names must be retained when an activation ends.
2. A called activation outlives the called.
Position in activation tree Activation record on the Remarks
heap
Retains activation record
main s for ra
Control link
ra() qs(1,10)
rs
Control link
qs(1,10)
Control link
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD)
(PS) UnitUnit
6 –1Run-time
– Basic Probability
Environments 28
Compiler Design (CD)
GTU # 3170701
Unit – 7
Code Generation & Optimization
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 4
Input to code generator
Input to the code generator consists of the intermediate representation of the source program.
Types of intermediate language are:
1. Postfix notation
2. Three address code
3. Syntax trees or DAGs
The detection of semantic error should be done before submitting the input to the code
generator.
The code generation phase requires complete error free intermediate code as an input.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 5
Target program
The output may be in form of:
1. Absolute machine language: Absolute machine language program can be placed in a
memory location and immediately execute.
2. Relocatable machine language: The subroutine can be compiled separately. A set of
relocatable object modules can be linked together and loaded for execution.
3. Assembly language: Producing an assembly language program as output makes the
process of code generation easier, then assembler is require to convert code in binary
form.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 6
Memory management
Mapping names in the source program to addresses of data objects in run time memory is done
cooperatively by the front end and the code generator.
We assume that a name in a three-address statement refers to a symbol table entry for the
name.
From the symbol table information, a relative address can be determined for the name in a data
area.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 7
Instruction selection
Example: the sequence of statements
a := b + c
d := a + e
would be translated into
MOV b, R0
ADD c, R0 MOV b, R0
MOV R0, a ADD c, R0
MOV a, R0 ADD e, R0
ADD e, R0 MOV R0, d
MOV R0, d
So, we can eliminate redundant statements.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 8
Register allocation
The use of registers is often subdivided into two sub problems:
During register allocation, we select the set of variables that will reside in registers at a point in
the program.
During a subsequent register assignment phase, we pick the specific register that a variable will
reside in.
Finding an optimal assignment of registers to variables is difficult, even with single register
value.
Mathematically the problem is NP-complete.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 9
Choice of evaluation
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
Picking a best order is another difficult, NP-complete problem.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 10
Approaches to code generation
The most important criterion for a code generator is that it produces correct code.
The design of code generator should be in such a way so it can be implemented, tested, and
maintained easily.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 11
Target machine
We will assume our target computer models a three-address machine with:
1. load and store operations
2. computation operations
3. jump operations
4. conditional jumps
The underlying computer is a byte-addressable machine with general-purpose registers,
0, 1, . . . ,
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 13
Addressing Modes
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 14
Instruction Cost
Mode Form Address Extra cost
Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *k(R) contents(k + contents(R)) 1
Total Cost=6
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 15
Instruction Cost
Mode Form Address Extra cost
Absolute M M 1
Register R R 0
Indexed k(R) k +contents(R) 1
Indirect register *R contents(R) 0
Indirect indexed *k(R) contents(k + contents(R)) 1
Total Cost=2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 16
Basic Blocks
A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.
The following sequence of three-address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4+t5
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 18
Algorithm: Partition into basic blocks
Input: A sequence of three-address statements.
Output: A list of basic blocks with each three-address statement in exactly one block.
Method:
1. We first determine the set of leaders, for that we use the following rules:
I. The first statement is a leader.
II. Any statement that is the target of a conditional or unconditional goto is a leader.
III. Any statement that immediately follows a goto or conditional goto statement is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 19
Example: Partition into basic blocks
begin
prod := 0; Block B1 (1) prod := 0 Leader
i := 1; (2) i := 1
(3) t1 := 4*i Leader
do (4) t2 := a [t1]
prod := prod + a[t1] * b[t2]; (5) t3 := 4*i
(6) t4 :=b [t3]
i := i+1; (7) t5 := t2*t4
while i<= 20 (8) t6 := prod +t5
Block B2
(9) prod := t6
end (10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 20
Optimization of Basic block
Transformation on Basic Blocks
A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block.
Many of these transformations are useful for improving the quality of the code.
Types of transformations are:
1. Structure preserving transformation
2. Algebraic transformation
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 22
Structure Preserving Transformations
Structure-preserving transformations on basic blocks are:
1. Common sub-expression elimination
2. Dead-code elimination
3. Renaming of temporary variables
4. Interchange of two independent adjacent statements
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 23
Common sub-expression elimination
Consider the basic block,
a:= b+c
b:= a-d
c:= b+c
d:= a-d
The second and fourth statements compute the same expression, hence this basic block may
be transformed into the equivalent block:
a:= b+c
b:= a-d
c:= b+c
d:= b
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 24
Dead-code elimination
Suppose s dead, that is, never subsequently used, at the point where the statement :
appears in a basic block.
Above statement may be safely removed without changing the value of the basic block.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 25
Renaming of temporary variables
Suppose we have a statement
t:=b+c, where t is a temporary variable.
If we change this statement to
u:= b+c, where u is a new temporary variable,
Change all uses of this instance of t to u, then the value of the basic block is not changed.
In fact, we can always transform a basic block into an equivalent block in which each statement
that defines a temporary defines a new temporary.
We call such a basic block a normal-form block.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 26
Interchange of two independent adjacent statements
Suppose we have a block with the two adjacent statements,
t1:= b+c
t2:= x+y
Then we can interchange the two statements without affecting the value of the block if and only
if neither nor is 1 and neither nor is 2.
A normal-form basic block permits all statement interchanges that are possible.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 27
Algebraic Transformation
Countless algebraic transformation can be used to change the set of expressions computed by
the basic block into an algebraically equivalent set.
The useful ones are those that simplify expressions or replace expensive operations by cheaper
one.
Example: x=x+0 or x=x*1 can be eliminated.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 28
Flow Graph
We can add flow-of-control information to the set of basic blocks making up a program by
constructing a direct graph called a flow graph.
Nodes in the flow graph represent computations, and the edges represent the flow of control.
Example of flow graph for following three address code:
prod=0 Block B1
i=1
t1 := 4*i
t2 := a [t1]
t3 := 4*i
Flow Graph t4 :=b [t3]
t5 := t2*t4 Block B2
t6 := prod +t5
prod := t6
t7 := i+1
i := t7
if i<=20 goto B2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 29
A simple code generator
The code generation strategy generates target code for a sequence of three address statement.
It uses function getReg() to assign register to variable.
The code generator algorithm uses descriptors to keep track of register contents and
addresses for names.
Address descriptor stores the location where the current value of the name can be found at run
time. The information about locations can be stored in the symbol table and is used to access
the variables.
Register descriptor is used to keep track of what is currently in each register. The register
descriptor shows that initially all the registers are empty. As the generation for the block
progresses the registers will hold the values of computation.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 31
A Code Generation Algorithm
The algorithm takes a sequence of three-address statements as input. For each three address
statement of the form x:= y op z perform the various actions. Assume L is the location where
the output of operation y op z is stored.
1. Invoke a function getReg() to find out the location L where the result of computation y op z
should be stored.
2. Determine the present location of ‘y’ by consulting address description for y if y is not present
in location L then generate the instruction MOV y' , L to place a copy of y in L
3. Present location of z is determined using step 2 and the instruction is generated as OP z' , L
4. If L is a register then update it’s descriptor that it contains value of x. update the address
descriptor of x to indicate that it is in L.
5. If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register
will no longer contain y or z.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 32
Generating a code for assignment statement
The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following
sequence of three address code:
t:= a-b
Statement Code Generated Register descriptor Address descriptor u:= a-c
t:= a - b MOV a,R0 R0 contains t t in R0 v:= t +u
SUB b, R0
d:= v+u
u:= a - c MOV a,R1 R0 contains t t in R0
SUB c, R1 R1 contains u u in R1
v:= t + u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R0
d:= v + u ADD R1,R0 R0 contains d d in R0
MOV R0, d d in R0 and memory
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 33
Machine independent optimization
Code Optimization
Code Optimization is a program transformation technique which, tries to improve the code by
eliminating unnecessary code lines and arranging the statements in such a sequence that
speed up the execution without wasting the resources.
Advantages
1. Faster execution
2. Better performance
3. Improves the efficiency
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 35
Code Optimization techniques (Machine independent techniques)
Techniques
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 36
Compile time evaluation
Compile time evaluation means shifting of computations from run time to compile time.
There are two methods used to obtain the compile time evaluation.
Folding
In the folding technique the computation of constant is done at compile time instead of run
time.
Example : length = (22/7)*d
Here folding is implied by performing the computation of 22/7 at compile time.
Constant propagation
In this technique the value of variable is replaced and computation of an expression is done at
compilation time.
Example : pi = 3.14; r = 5;
Area = pi * r * r;
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then computation of
3.14 * 5 * 5 is done during compilation.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 37
Common sub expressions elimination
The common sub expression is an expression appearing repeatedly in the program which is
computed previously.
If the operands of this sub expression do not get changed at all then result of such sub
expression is used instead of re-computing it each time.
Example:
t1 := 4 * i
t2 := a + 2
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5
t1
Before
After Optimization
Optimization
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 38
Code Movement or Code Motion
Optimization can be obtained by moving some amount of code outside the loop and placing it
just before entering in the loop.
It won't have any difference if it executes inside or outside the loop.
This method is also called loop invariant computation.
Example:
While(i<=max-1) N=max-1;
{ While(i<=N)
sum=sum+a[i]; {
} sum=sum+a[i];
}
Before Optimization After Optimization
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 39
Reduction in Strength
priority of certain operators is higher than others.
For instance strength of * is higher than +.
In this technique the higher strength operators can be replaced by lower strength operators.
Example:
A=A*2 A=A+A
Before Optimization After Optimization
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 40
Dead code elimination
The variable is said to be dead at a point in a program if the value contained into it is never
been used.
The code containing such a variable supposed to be a dead code.
Example:
i=0;
if(i==1)
{
Dead Code
a=x+5;
}
If statement is a dead code as this condition will never get satisfied hence, statement can be
eliminated and optimization can be done.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 41
Machine dependent optimization
Machine dependent optimization
Machine dependent optimization may vary from machine to machine.
Machine-dependent optimization is done after the target code has been generated and when
the code is transformed according to the target machine architecture.
Machine-dependent optimizers put efforts to take maximum advantage of the memory
hierarchy.
Techniques
3. Peephole optimization
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 43
Peephole optimization
Peephole optimization is a simple and effective technique for locally improving target code.
This technique is applied to improve the performance of the target program by examining the
short sequence of target instructions (called the peephole) and replacing these instructions by
shorter or faster sequence whenever possible.
Peephole is a small, moving window on the target program.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 44
Redundant Loads & Stores
Especially the redundant loads and stores can be eliminated in following type of
transformations.
Example:
MOV R0,x
MOV x,R0
We can eliminate the second instruction since x is in already R0.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 45
Flow of Control Optimization
The unnecessary jumps can be eliminated in either the intermediate code or the target code by
the following types of peephole optimizations.
We can replace the jump sequence.
Goto L1
…… Goto L2
L1: goto L2
If a<b goto L1
…… If a<b goto L2
L1: goto L2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 46
Algebraic simplification
Peephole optimization is an effective technique for algebraic simplification.
The statements such as x = x + 0 or x := x* 1 can be eliminated by peephole optimization.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 47
Reduction in strength
Certain machine instructions are cheaper than the other.
In order to improve performance of the intermediate code we can replace these instructions by
equivalent cheaper instruction.
For example, x2 is cheaper than x * x.
Similarly addition and subtraction are cheaper than multiplication and division. So we can add
effectively equivalent addition and subtraction for multiplication and division.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 48
Machine idioms
The target instructions have equivalent machine instructions for performing some operations.
Hence we can replace these target instructions by equivalent machine instructions in order to
improve the efficiency.
Example: Some machines have auto-increment or auto-decrement addressing modes.
(Example : INC i)
These modes can be used in code for statement like i=i+1.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 49
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit 7– Unit
Code1Generation
– Basic Probability
and Optimization 50
Compiler Design (CD)
GTU # 3170701
Unit – 8
Instruction-Level Parallelism
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 4
Instruction Pipelines and Branch Delays
Practically every processor, be it a high-performance supercomputer or a standard machine,
uses an instruction pipeline. With an instruction pipeline, a new instruction can be fetched every
clock while preceding instructions are still going through the pipeline.
Shown in below Fig. is a simple 5-stage instruction pipeline: it first fetches the instruction (IF),
decodes it (ID), executes the operation (EX), accesses the memory (MEM), and writes back the
result (WB).
The figure shows how instructions i, i + 1, i + 2, i + 3, and i + 4 can execute at the same time.
Each row corresponds to a clock tick, and each column in the figure specifies the stage each
instruction occupies at each clock tick.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 5
Instruction Pipelines and Branch Delays
If the result from an instruction is available by the time the succeeding instruction needs the
data, the processor can issue an instruction every clock.
Branch instructions are especially problematic because until they are fetched, decoded and
executed, the processor does not know which instruction will execute next. Many processors
speculatively fetch and decode the immediately succeeding instructions in case a branch is not
taken. When a branch is found to be taken, the instruction pipeline is emptied and the branch
target is fetched.
Thus, taken branches introduce a delay in the fetch of the branch target and introduce
"hiccups" in the instruction pipeline. Advanced processors use hard-ware to predict the
outcomes of branches based on their execution history and to pre-fetch from the predicted
target locations. Branch delays are nonetheless observed if branches are mis-predicted.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 6
Pipelined Execution
Some instructions take several clocks to execute.
One common example is the memory-load operation. Even when a memory access hits in the cache, it
usually takes several clocks for the cache to return the data.
We say that the execution of an instruction is pipelined if succeeding instructions not dependent on the
result are allowed to proceed.
Thus, even if a processor can issue only one operation per clock, several operations might be in their
execution stages at the same time.
If the deepest execution pipeline has n stages, potentially n operations can be "in flight" at the same time.
Note that not all instructions are fully pipelined. While floating-point adds and multiplies often are fully
pipelined, floating-point divides, being more complex and less frequently executed, often are not.
Most general-purpose processors dynamically detect dependences between consecutive instructions and
automatically stall the execution of instructions if their operands are not available. Some processors,
especially those embedded in hand-held devices, leave the dependence checking to the software in order
to keep the hardware simple and power consumption low. In this case, the compiler is responsible for
inserting "noop" instructions in the code if necessary to assure that the results are available when needed.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 7
Multiple Instruction Issue
By issuing several operations per clock, processors can keep even more operations in flight.
The largest number of operations that can be executed simultaneously can be computed by
multiplying the instruction issue width by the average number of stages in the execution
pipeline.
Like pipelining, parallelism on multiple-issue machines can be managed either by software or
hardware.
Machines that rely on software to manage their parallelism are known as VLIW (Very-Long-
Instruction-Word) machines, while those that manage their parallelism with hardware are known
as superscalar machines.
Simple hardware schedulers execute instructions in the order in which they are fetched.
If a scheduler comes across a dependent instruction, it and all instructions that follow must
wait until the dependences are resolved (i.e., the needed results are available).
Such machines obviously can benefit from having a static scheduler that places independent
operations next to each other in the order of execution.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 8
Multiple Instruction Issue
More sophisticated schedulers can execute instructions "out of order."
Operations are independently stalled and not allowed to execute until all the values they depend
on have been produced.
Even these schedulers benefit from static scheduling, because hardware schedulers have only
a limited space in which to buffer operations that must be stalled. Static scheduling can place
independent operations close together to allow better hardware utilization.
More importantly, regardless how sophisticated a dynamic scheduler is, it cannot execute
instructions it has not fetched. When the processor has to take an unexpected branch, it can
only find parallelism among the newly fetched instructions.
The compiler can enhance the performance of the dynamic scheduler by ensuring that these
newly fetched instructions can execute in parallel.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 9
Code-Scheduling Constraints
Code scheduling is a form of program optimization that applies to the machine code that is
produced by code generator.
Code scheduling subject to three kinds of constraints:
1. Control dependence constraints: All the operations executed in the original program must
be executed in the optimized one.
2. Data dependence constraints: The operations in the optimized program must produce the
same result as the corresponding ones in the original program.
3. Resource constraints: The schedule must not oversubscribe the resources one the
machine.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 11
Code-Scheduling Constraints
These scheduling constraints guarantee that the optimized program produces the same result
as the original.
However, because code scheduling changes the order in which the operation execute, the state
of the memory at any one point may not match any of the memory states in a sequential
execution.
This situation is a problem if a program’s execution is interrupted by, for example, a thrown
exception or a user interested breakpoint.
Optimized programs are therefore harder to debug.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 12
Basic-Block Scheduling
1. Data-Dependence Graphs
2. List Scheduling of Basic Blocks
3. Prioritized Topological Orders
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 14
Data-Dependence Graphs
We represent each basic block of machine instructions by a data-dependence graph, G =
(N,E), having a set of nodes N representing the operations in the machine instructions in the
block and a set of directed edges E representing the data-dependence constraints among the
operations. The nodes and edges of G are constructed as follows:
1. Each operation n in N has a resource-reservation table RTn, whose value is simply the
resource-reservation table associated with the operation type of n.
2. Each edge e in E is labeled with delay de indicating that the destination node must be issued
no earlier than de clocks after the source node is issued. Suppose operation n± is followed by
operation n2, and the same location is accessed by both, with latencies l1 and l2 respectively.
That is, the location's value is produced l1 clocks after the first instruction begins, and the
value is needed by the second instruction l2 clocks after that instruction begins Then, there is
an edge n1 n2 in E labeled with delay l1 — l2
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 15
List Scheduling of Basic Blocks
The simplest approach to scheduling basic blocks involves visiting each node of the data-
dependence graph in "prioritized topological order.
Since there can be no cycles in a data-dependence graph, there is always at least one
topological order for the nodes. However, among the possible topological orders, some may be
preferable to others.
There is some algorithm for picking a preferred order.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 16
Prioritized Topological Orders
List scheduling does not backtrack; it schedules each node once and only once. It uses a
heuristic priority function to choose among the nodes that are ready to be scheduled next. Here
are some observations about possible prioritized orderings of the nodes:
Without resource constraints, the shortest schedule is given by the critical path, the longest
path through the data-dependence graph. A metric useful as a priority function is the height of
the node, which is the length of a longest path in the graph originating from the node.
On the other hand, if all operations are independent, then the length of the schedule is
constrained by the resources available. The critical resource is the one with the largest ratio of
uses to the number of units of that resource available. Operations using more critical resources
may be given higher priority.
Finally, we can use the source ordering to break ties between operations; the operation that
shows up earlier in the source program should be scheduled first
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 17
Pass structure of assembler
A complete scan of the program is called pass.
Types of assembler are:
1. Two pass assembler (Two pass translation)
2. Single pass assembler (Single pass translation)
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 19
Two pass assembler (Two pass translation)
l l l l l
Data structures
Source Target
Pass I Pass II
Program Program
Intermediate code
Data access
Control transfer
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 20
Two pass assembler (Two pass translation)
Intermediate Data
code structures
Intermediate Data
code structures
The second pass synthesizes the target program by using address information recorded in the
symbol table.
Two pass translation handles a forward reference to a symbol naturally because the address of
each symbol would be known before program synthesis begins.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 22
One pass assembler (One pass translation)
A one pass assembler requires one scan of the source program to generate machine code.
Location counter processing, symbol table construction and target code generation proceed in
single pass.
The issue of forward references can be solved using a process called back patching.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 23
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3170701
#3130006
(CD) (PS)
Unit
8Unit
– Instruction-Level
1 – Basic Probability
Parallelism 24