Compiler Design Full PDF
Compiler Design Full PDF
UNIT
Compiler
CONTENTS
Part-1 1-20 to 1-6C
Introduetion to Compiler:
Phases and Passees
* **********
to 1-30C
Part-7 The Syntactic Specification. I-ZTC
of Programming Languages
Context Free
Grammar (CPG),
Derivation and Parse Trees,
Capabilities of CFG
1-1 C(CS/TT-Sem-5)
1-2 C (CS/IT-Sem-5) Introduetion to Compiler
LPART- 1
Introduction to Compiler, Phases and Passes.
Questions-Answers
called token.
C. These tokens may be keywords identifiers, operator symbols and
punctuation symbols.
ii. Phase 2 (Syntax analyzer):
a. The syntax analyzer phase is also called parsing phase.
b. The syntax analyzer groups tokens together into syntactic
structures.
The output of this phase is parse tree.
generation phase.
It uses parse tree and symbol table to check whether the given
b.
program is semantically consistent with language definition.
Source program
Lexical analyzer
Syntaxanalyzer
Semantic analyzer
Code optimizer
Code generator-
Target program
Fig. 1.1.1.
program is detected.
Lexical analyzer
Token stream
id, =
id,+ id) id,+ id) 2
Syntax analyzer
Parse tree
Semantic analyzer
pt 0 real
Intermediatecode
LKeneration
3 int_to_real (2)
4
Compiler Design 1-5 C (CS/IT-Sem-5)
Code optimization
Optimized code
id,=t22
Machinecode Machine code
MOVR, b
ADD R, R,,C
MUL R2, R,. R,
MUL R2, Ri, # 2.0
ST id, R2
Answer
Types of passes:
. Single-pass compiler
a In a single-pass compiler, when a line source is processed it is
scanned and the tokens are extracted.
b. Then the syntax of the line is analyzed and the tree structure,
some tables containing information about each token are built.
2 Multi-pass compiler : In multi-pass compiler, it scan the input source
once and produces first modified form, then scans the modified form
and produce a second modified form and so on, until the object form is
produced.
Answer
Role of compiler writing tools:
LCompiler writing tools are used for automatic design of compiler
component.
2. Every tool uses specialized language.
3 Writing tools are used as debuggers, version manager.
Introduction to Compiler
1-6C (CSIT-Sem-5)
normally
1. Parser generator: The procedure produces syntax analyzer,
from input that is based on context free grammar.
PART-2
Bootstrapping
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Cross compiler:A cross compiler is a compiler capable ofcreating executable
code for a platform other than the one on which the compiler is running.
Bootstrapping:
1. Bootstrapping is the process of writing a compiler (or assembler) in the
source programming language that it intends to compile.
2. Bootstrapping leads to a self-hosting compiler.
3. An initial minimal core version of the compiler is generated in a different
language.
Compiler Design 1-7C (CS/IT-Sem-5)
Fig. 14.1
6. To create a new
language, L, for machine A :
a. Create °C^ a compiler for a subset, S, of the desired language, L,
using language A, which runs on machine A. (Language A may be
assembly language.)
A
Fig. 14.2
b. Create "C3 ,a compiler for language L written in subset
a
of L.
A
Fig. 1.43
A
L A
s
Fig. 144
1-8C (CS/AT-Sem-5) Introduction to Compiler
PART 3
Finite State Machines and Regular Expressions and their
Application to Lexical Analysis, Optimization of DFA
Based Pattern Matchers
Questions-Answers
6. The
expressions
2 are also
got by repeated application of the rules from (1) to (5)
over regular express1on.
example.
Compiler Design
1-9C(CS/IT-Sem-5)
Answer
DFA:
1. A finite automata is said to be
on the same
deterministic if we have only one transition
input symbol from some state.
2. A DFA is a set of five
tuples and represented as
M
=Q. 2, 8, 4o. F)
where, A set
of non-empty finite states
= A set of
non-empty finite input symbols
o Initial state
of DFA
F= A
non-empty finite set of final state
6 =QxE »Q.
NFA:
1. Afinite automata said to be
is
one
non-deterministic, we have more than
possible transition on the same input symboliffrom some state.
2. Anon-deterministic finite automata is set
as
and of five tuples represented
M = Q, 2, , qo» F)
where, A set
of non-empty finite states
Z = A set of non-empty finite
input symbols
90 Initial state of NFA and member
of Q2
F=A non-empty finite set of final states and
member of Q
oq,a)
42
Fig. 1.6.1.
Example: DFA for the Ianguage that contains the strings ending with
0 over 2 = 10, 11.
0
Start 0
Fig. 1.6.2.
NFA for the language L which accept all the strings in which the third
symbol from right end is always a over 2= la, b|.
a, b
a,b
a, b
(4)- a, b
Fig. 1.6.3.
Answer
Thompson's construction :
1. It is an algorithm for transforming a regular expression to equivalent
NFA.
2. Following rules are defined for a regular expression as a basis for the
construction:
i The NFA representing the empty string is
Compiler Design 1-11 C (CS/IT-Sem-5)
v. The Kleene closure must allow for taking zero or more instances
of the letter from the input; thus a" looks like :
Q
For example:
Construct NFA for r = (a |b)"a
For r=a,
start
For r b,
start
-0
For r a b
3
star
star
Que 1.8. 1 Construct the NFA for the regular expression a labbla'b
by using Thompson's construction methodology.
Step 2:
Step 3: 1,
Compiler Design 1-13C (Cs/AT-Sem-5)
AKTU 2016-17,Marke10
Answer
Step 1: a
Step 2: b*
Step 3:b
Step 4:ab*
Step 5:ab
o-046DO
o000-0-O
Step 6:ab*|ab
Step 4: Final state of DFA will be all states which contain F (final states oft
NFA).
0, 1o
:Ecan be neglected so
9, =
9, 4
S/E
9
43 4
o/2 Let
as.A
4,93 4,9, as B
4,9,9,as C
9,4,94 9,94 9,4,9
9,9,9 9,9 4,9,9 9,9,9,asD
Compiler Design 1-15C (CS/IT-Sem-5)
- L -
B
B C
D C
D
C C
D B
C
No two rows are similar.
Que 1.12.| How does finite automata useful for lexical analysis ?
Answer
1. Lexical analysis is the process of
reading the source text of a
program
and
converting it into a sequence of tokens.
2 The lexical structure of every
programming language can be
specified
by a regular language, a common way to implement a lexical
S to:
analyzer
a.
Specify regular expressions for all of the kinds of tokens in the
language.
b. The disjunction of all of the
regular expressions thus describes
any possible token in the language.
c. Convert the overall
regular expression specifying all possible
tokens into a Deterministic Finite Automaton (DFA).
d Translate the DFA into a program that simulates the DFA. This
program is the lexical analyzer.
1-16C (CS/IT-Sem-5) Introduction to Compiler
AKTU 2017-18,Marks 10
Answer
1. DFA for all strings over la, b} such that fifth symbol from right is a:
2 Regular expression :
. C
a. C
Fig. 1.14.1.
C
8/2
2 1919
lo
91,9l9g99%9|9o4
Let 90919g =A
g9= B
=C
Transition table for NFA:
8/2 a b
A B
B A
8/2
/a, b, c
PDead state
a, b, c
Fig. 1.14.2
PART-4
Implementation of Lexical Analyzers, Lexical Analyzer Generator,
LEX Compiler.
1-18C (CSAT-Sem-5) Introduction to Compiler
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Lexical analyzer can be implemented in following step:
1. Input to the lexical analyzer is a source program.
2 By using input buffering scheme, it scans the source program.
3. Regular expressions are used to represent the input patterns.
4. Now this input pattern is converted into NFA by using finite automation
machine.
Regular expression
Finite automata
Tokenized
Input program Lexical analyzer output file
Symbol table
Answer
For efficient design of compiler, various tools are used to automate the
phases ofcompiler.The lexical analysis phase can be automated using a
tool called LEX.
Compiler Design 1-19C (CSIT-Sem-5)
Answer
1. Automatic generation of lexical analyzer is done using LEX
programming language.
2. The LEX specification file can be denoted using the extension.l (often
pronounced as dot L).
3. Por example, let consider specification file
us as
x.l.
4 This x.lfile is then given to LEX compiler to produce lex.yy.e as shown
in Pig. 1.17,.1. This lex.yy.c is a C program which is actually a lexical
analyzer program.
Lex specification
LEX lex.yy.c
x.. compilerLexicalanalyzer
program
Fig1.17.1
5. The LEX specification file stores the regular expressions for the token
and the lex.yy.c file consists of the tabular representation of the
transition diagrams constructed for the regular expression.
6. In specification file, LEX actions are associated with every regular
expression.
7. These actions are simply the pieces of C code that are directly carried
Answer
The LLEX program consists of three parts
Declaration section
Rule section
1 Declaration section:
constants can be
declaration of variable
a. In the declaration section,
done.
expressions.
2 Rule section :
R, laction, is
Where each R, is a regular expression and each action, a program
int count;
(%\t]+ * *+" indicates zero or more and this pattern is use foor
ignoring the white spaces/
auto | double | if| static | break | else | int | struct| case
enum
| long | switch | char| extern| near | typedet | const| float |
register| union | unsigned | void | while | default
printf"C keyword(%d):\t %s",count,yytext);
la-zA-Z]+ printfi%8: is not the
keyword \n, yytext
main)
yylex();
Que 1.20.|What are the various LEX actions that are used in LEX
programming ?
Answer
There are following LEX actions that can be used for ease of programming
using LEX tool
1. BEGIN: It indicates the start state. The lexical analyzer starts at statee
).
file.
b.
If yyWrap() returns 0 then scanner continues
scanning
C.1f yywrap) returns 1 that means end of file is encountered.
6 yyin:It is the standard input file that stores
input source program.
7. yvleng : yyleng stores the length or number of characters in the input
string.
Answer
Token:
A token is a pair consisting of a token name and an optional attribute
value.
Pattern:
of the form that the lexemes of a token may
1 A pattern is a
description
take.
2. Hegular expressions play an important role for specifying patterns.
3. Ifakeyword is considered as token, pattern is just sequence ofcharacters.
PART-5
Formal Grammars and their Application to Syntar Analysis,
BNE Notation.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Compiler Design 1-23 C (CS/IT-Sem-5)
Answer
A grammar or phrase structured grammar is combination of four tuples and
can be represented as G (V, T, P, S). Where,
1. V is finite non-empty set of variables/non-terminals. Generally non-
terminals are represented by capital letters like A, B, C, . . X , Y, Z.
2 Tis finite non-empty set of terminals, sometimes also represented by
or V Generally terminals are represented by a, b, c, x, y, 2, a, B, 7etc.
3. Pis finite set whose elements are in the form a > p. Where a and ß are
strings, made up by combination of V and Tie., (VuT). a has at least
one symbol from V. Elements of P are called productions or produetion
rule or rewriting rules.
Answer
Context free grammar:
1 A CFG describes a language by recursive rules called productions.
2. A CFG can be described as a combination of four tuples and represented
by G(V, T, P, S).
where,
V> set of variables or non-terminal represented by A, B,., Y, Z.
T> set of terminals represented by a, b, c, . X, Y, 2, t,
*,()etc.
Sstarting symbol.
P-set of productions.
3. The production used in CFG must be in the form of A >a, where A isa
variable and a is string of symbols (VuT°.
4. The example of CFG is:
G= (V, T,
P, S)
1-24C (CSIT-Sem-5) Introduction to Compiler
EE* E
E(E)
E id
Answer
Answer
BNFnotation:
1. The BNF (Backus-Naur Form) is a notation technique for context free
grammar. This notation is useful for specifying the syntax of the
language.
2. The BNF specification is as
<symbol> := Expl | Exp2| Exp3..
Where symbol> is a non terminal, and Exp1, Exp2 is a sequence of
symbols. These symbols can be combination of terminal or non
terminals.
Compiler Design 1-25 C (CS/IT-Sem-5)
3 For example :
<Address>; = <fullname>: "," <street>"," <zip code>
<fullname>: = <firstname> " " <middle name> " " <surname>
We can specify first name, middle name, surname, street name, city
and zip code by valid strings.
The BNP not ation is more often non-formal and in human readable
form. But commonly used notations in BNF are :
a.
Optional symbols are written with square brackets.
b. For repeating the symbol for 0 or more number of times asterisk
can be used.
LPART-6
Ambiguity, YACC
Questions-Answers
Long Answer Type and Medium Answer Type Questions
E ECE)
Eid
Parse tree for id(idjid + is
follows
Que 1.28. Consider the grammar G given as :
SABlaaB
A alAa
B-
Determine whether the grammár G is ambiguous or not. If G is
ambiguous then construct an unambiguous grammar equivalent
to G.
1-27 C (Cs/1T-Sem-5)
Compiler Design
Answer
Given
S AB|aaß
A a |Aa
Let us generate string aab from the given grammar. Parse tree for generating
string aab are as follows:
Fig. 1.28.1
Here for the same string, we are getting more than one parse tree. Hence,
gTammar is an ambiguous grammar.
The grammar
SAB
A Aala
Bb
1s an unambiguous grammar equivalent to G. Now this grammar has only
Fig. 1.28.2
PART 7]
The Syntactic Specification of Programming Languages: Context
Free Grammar (CFG), Derivation and Parse Trees,
Capabilities of CFG.
1-28 C (Cs/AT-Sem-5) Introduction to Compiler
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Que 1.29.| Define parse tree. What are the conditions for
constructing a parse tree from a CFG ?
Answerr
Parse tree
1. A parse tree is an ordered tree in which left hand side of a production
represents a parent node and children nodes are represented by the
production's right hand side.
Parse tree is the tree representation of deriving a Context Free Language
(CFL) from a given Context Free Grammar (CFG). These types
of trees
are sometimes called derivation trees.
Answer
5. We say that the production a>ß is applied to the string aab to obtain
aßb or we say that aab directly drives aßb.
6. Now suppose a,. " ,g . O, are string in (V,UV", m21 and
Que 1.31. What do you mean by left most derivation and right
most derivation with example ?
Answer
Left most derivation: The derivation S>s is called a left most derivation,
ifthe production is applied only to the left most variable (non-terminal) at
every step
Example: Let us consider a grammar G that consist of production rules
E
E>E+E|E * | id.
Firstly take the production
(Replace E >E * E)
EE+EE* E+E
i d * E+E (Replace E id)
i d * id + E (Replace E id)
every step.
Example: Let us consider a grammar Ghaving production.
EE +E |E* E| d.
Start with production
EE * E
E * E +E (Replace E >E + E)
E * E+ id (Replace E id)
E * id + id (Replace E id)
i d * id+ id
(ReplaceE+id)
1-30C (CSAT-Sem-5)
Introduction to Compiler
Que 1.32.Describe the capabilities of CFG.
Answer
Various capabilities of CFG are:
1. Context free grammar is useful to
describe most of the
languages. programming
2. If the
grammar is properly designed then an
constructed automatically. efficient parser can be
3. Usingthe features of
associatively and precedence information,
grammars for expressions can be constructed.
4. Context free grammar is
capable of describing nested structures like
balanced parenthesis, matching begin-end, corresponding if-then-else's
and so on.
CONTENTS
Part-1 Basie Parsing Techniques ********** 2-2C to 2-4C
Parsers Shift Reduce Parsing
2-17C to 2-27C
Part-5 Constructing SLR .
Parsing Tables
Parsing Tables
2-1 C (CS/IT-Sem-5)
2-2 C (CSAT-Sem-5) Basic Parsing Techniques
PART1
Basic Parsing Techniques: Parsers, Shif Reduce Parsing.
Questions-Answers
Que 2.1.|What is parser ? Write the role of parser. What are the
most popular parsing techniques ?
OR
Explain about basie parsing techniques. What is top-down parsing?
Explain in detail.
Answer
A parser for any grammar is a program that takes as input string w and
produces as output a parse tree for w.
Role of parser:
1. The role of parsing is to determine the syntactic validity of a source
string
2. Parser helps to report any syntax errors and recover from those erTors.
3. Parser helps to construct parse tree and passes it to rest of phases of
compiler.
There are basically two type of parsing techniques
1. Top-down parsing:
a. Top-down parsing attempts to find the left-most derivation for an
input string w, that start from the root (or start symbol), and
create the nodes in pre-defined order.
b. In top-down parsing, the input string w is scanned by the parser
from left to right, one symboltoken at a time.
C. The left-most derivation generates the leaves of parse tree in left
to right order, which matches to the input scan order.
d In the top-down parsing, parsing decisions are based on the
lookahead symbol (or sequence of symbols).
2 Bottom-up parsing:
a. Bottom-up parsing can be defined as an attempt to reduce the input
stringw to the start symbol of a grammar by finding out the right
most derivation of w in reverse.
Compiler Design 2-3C (CS/TT-Sem-5)
b. Parsing involves searching for the substring that matches the
side of any of the right
productions of the grammar.
This substring is
C.
replaced by the left hand side non-terminal of the
production.
d Process of replacing the right side of the
non-terminal is called "reduction".
production by the left side
readwrite head
Answer
There are two most common conflict encountered in shift-reduce parser
1. Shift-reduce conflict
a. The shift-reduce conflict is the most common type of conflict found
in grammars.
PART-2
Operator Precedence Parsing.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
A+|-I'1/1
1. Now c o n s i d e r t h e s t r i n g i d + id * id
2. Wewill insert $ symbols at the start and end of the input string. We will
also insert precedence operator by referring the precedence relation
table.
S< id >+<* id > *<id > $
$$ Parsing is done.
2-6C(CS/AT-Sem-5) Basic Parsing Techniques
Fig. 2.6.1.
Compiler Design 2-7 C (CS/IT-Sem-5)
3
022 3 0
else
begin
Ifa» bor a #b then shift a onto the stack and increment p to next
input symbol.
i. else ifa < b then reduceb from the stack
iv. Repeat:
CEpop the stack
v. Until the top stack terminal is related by » to the terminal most
recently popped.
else
vi. Call the error correcting routine
end
Operator precedence table :
id $
)>>
id>>
<
2-8C (CSAT-Sem-5) Basic Parsing Techniques
Parsing:
$ id> +(< id >*< id >))$| Handle id is obtained between<
Reduce this by F-> id
(F+(<*id >*<* id >)$ Handle id is obtained between<>
Reduce this by F-> id
(F+ (F.< id >))8 Handle id is obtained between <
Reduce this by F> id
(F + (F*F) Remove all the non-terminals.
(+(*)) Insert $ at the beginning and at the
end.
Also insert the precedence operators.
$<+>(<** >)) $ The
This
operator is surrounded by <
indicates that * becomes handle.
evaluate E + T.
Parsing1s done.
PART-3
Top-down Parsing, Predictive Parsers
Questions-Answers
Answer 1ype Questions
Long Answer Type and Medium
?
Que 2.7. What are the problems with top-down parsing
Answer
Problems with top-down parsing are:
1. Backtracking:
of non-terminal
a. Backtracking is a technique in which for expansion
mismatch occurs then
symbol, choose alternative and if some
we
4 Aa
Subtree
Answer
Left factoring and left recursion: Refer Q 2.7, Page 2-8C, Unit-2.
Left factoring can be eliminated by the following scheme :
a. In general if
AaA' IY,- 1Y
A'B,1P,.I,
Left recursion can be eliminated by following scheme:
a In general if
SAB
S BSB |bB
SSASBl aSB | bB
SaSBS |bBS"
S ASBS|E
BABA |a
BBABBA | bBA| g
BbBA B' |aB'
BABBA B|E
A BS |a
A SAS|aS |a
A A BAABIaAB|a
A aABA' |aA'
A'BAAB A' |e
is
The production after left recursion
S aSB S |bBS'
SASB S'|e
A aABA' |aA
Compiler Design 2-11 C (CS/IT-Sem-5)
A' BAABA'|E
BbBAB |aß'
B'ABBA B'|E
Que 2.10. Write short notes on top-down parsing. What are top-
down parsing techniques ?
Answer
Top-down parsing: Refer . 2.1, Page 2-2C, Unit-2.
Top-down parsing techniques are:
1. Recursive-descent parsing:
i A top-down parser that executes a set of recursive procedures to
process the input without backtracking is called recursive-descent
efficiently
2 Predictive parsing:
A predictive parsing is an efficient way of implementing
descent parsing by handling the stack recursive-
of activation records
explicitly.
T h e predictive parser has an input, a stack, a parsing table, and an
output. The input econtains the string to be parsed, followed by $,
the right end-marker.
a+bsInput
XY Predictive
Stack parsing Output
Lprogramn|
Parsing
Table
The program determines X symbol on top of the stack, and 'a' the
current input symbol. These two symbols determine the action of
the parser.
b IfX=a: $. the parser pops X off the stack and advances the
1nput ponter to the next input symbol.
Answer
First and follow are defined in First and follow are used in
top-downparser. bottom-up parser.
Predictive parser and recursive| Shift-reduce parser, operator
descent parser are top-down precedence parser, and LR parser
Que 2.12.| What are the problems with top-down parsing ? Write
Predictive parsing
S.No. Recursive descent parsing
1 CFG is used to build recursive Recursive routine is not build.
routine.
2. RHS of production rule is Production rule is not converted
converted into program. into program.
3.
.
Parsing table s not |Parsing table is constr ed
constructed.
E+TE"|e
T> FT'
T*FT'|E
PF|a|b
First we remove left recursion
FFlgl
F a F ' |bF
F"F"|e
FIRSTE) = FIRST(T) = FIRSTYP) = la, b)
FIRST(T) = (", td
FOLLOWE) = {$1
FOLLOWE') = 1$
FOLLOWT) = l+,$)
FOLLOW T) =l+, $
FOLLOWF) = l", +, $
E E TE E>TE|
E E+TE
T T>FT T>FT
T' TE T *FT| "-
F a F ' |F>bF"
F'
*F
PART-4
Automatic Generation of Efficient Parsers: LR Parsers, The
Canonical Collections of LR(0) Ttems
Questions-Answers
Answer
Working of LR parser:
The working of LR parser can be understood by using block diagram as
shown in Fig. 2.15.1.
s Input tape
LR parsing8
Output
Stack Xm program
Action BOto
2. In LR parser, it has input buffer for storing the input string, a stack for
storing the grammar symbols, output and a parsing table comprised of
two parts, namely action and goto.
3. There is a driver program and reads the input symbol one at a time from
the input buffer. This program is same for all LR
parser
It reads the input string one symbol at a time and maintains a stack.
5. The stack always maintains the
following form:
SX, S,X S2 SmX,S
where, 1s a grammar symbol, each S, is the state and S, state is top
of the stack.
6. The action of the driver program depends on action 1S, a,| where a, is
the current input symbol.
7. Following action are
possible for input a, a,. 1
. . .
G
a. Shift: If action (S a, = shift S, the parser shift the input symbol,
a, onto the stack and then stack state S. Now current input symbol
becomes 4;.1
Stack Input
S4, S, 1,.2 4
b. Reduce : If action [S," al = reduce A > ß the parser executes aa
reduce move using the A >B production of the grammar. If A >
h a s r grammar symbols, first 2r symbols are popped of the stack
r state symbol and r grammar symbol). So, the top of the stack
now becomes Sm then A is pushed on the stack, and then state
goto (S A] is pushed on the stack. The current input symbol is
still a
Stack Input
SX, S,X , AS
where, S = Goto [Sm Al
Answer
1. LR(0) items: The LR (0) item for grammar G is production rule in
which symbol is inserted at some position in R.H.S. of the rule. For
examplee
SABCc
SA BC
S AB C
SABC
The production S E generates only one item S>*.
PART-5
Constructing SLR Parsing Tables
Questions-Answers
nput
Parsing of input string
string
Output
Fig. 2.17.1. Working of SLR parser.
Method
Let C'= Uo1 , The states of the parser are 0, 1....n state i being
constructed from
The parsing action for state is determined as follows:
1. If [A a ° a ß l is in I, and GOTO (I, a) =1, then set ACTION li, al to
"shift . Here a is a terminal.
If A a®] is in 1, the set ACTION Li, al to "reduce A >d" for all 'a' in
FOLLOW (A).
3. If IS-S°] is in 1, then set ACTION [i, S] to "accept".
The goto transitions for state i are constructed using the rule.
4. If GOTO (, A) =I, then GOTO li, A] =j.
5. All entries not defined by rules (1) through rule (4) are made "error".
6 The initial state of the parser is the one constructed from the set of
S].
itemscontaining IS
The parsing table consisting of the parsing ACTION and GOTO function
determined by this algorithm is called the SLR parsing table for G. An
LR parser using the SLR parsing table for G is called the SLR parser
for G and a grammar having an SLR parsing table is said to be SLR(1).
Answer
The augmented grammar G for the above grammar G 1s
S S
S A)
S A, P
S(P,P
P {num, num
The canonical collection of sets of LR(0) item for grammar are as follows:
S S
S A)
S A, P
S (P,P
P [num, num)
I, GOTO .
= S)
S S.
1,= GOTO, .A)
S A °)
SA,P
GOTO , )
S(P,P
P> °{num, num
I GOTO. )
°num, num)
1, =GOTO ,))
S>A)
GOTO , )
SA,. P
P num, num}
GOTO (g. P)
, S>(P:, P
I = GOTO, num)
P> {num °, num}
I,= GOTO U, P) S A, P
-GOTO,9 S (P,°P
10
P>° {num, num}
GOTO,) P {num, ° num
1
= GOTO0,P) S> (P,Pe
GOTo , num)
3
P {num, num °)
13
GOTO13)
14
P-> Inum, numl
2-20 C (CS/MT-Sem-5) Basic Parsing Techniques
Action Goto
Item
t Num A
2
accept
S
12
11
12
14
I, = GOTO , A)
:SA
S
s
AS|°b
ASA|a
GOTO b)
= ,,
:Sb
I,= GOTO , a )
: A >a
1, = GOTO (4, A)
:A>SA
I = GOTO 4, S) = 7,
I, = GOTO U, a) = I,
1, = GOTO , S)
:S AS
I, = GOTO U, A) =1,
1 = GOTO, b) =1,
Let us numbered the production rules in the s ammar as:
1. SAS
2. Sb
3. ASA
4 A a
FIRST S) = FIRST(A) = la, b)
FOLLOWS) = 1$, a, b}
FOLLOWA) = la, b)
Action Goto
States|
accept
2
T2
3 3
2-22 C (CS/IT-Sem-5) Basic Parsing Techniques
AC
E- E
EE + E
E E
(E)
E id
The set of LR(0) items is as follows
Compiler Design 2-23 C (Cs/AT-Sem-5
E' *E
E E+ E
E E E
E (E)
i d
I, =
GOTO G, E)
E E
E E+E
EE*E
I = GOTO , )
E»(°E)
B E+E
E E*E
E(E)
Eid
1 =
GOTO G, id)
E ide
I GOTO , +)
EE+E
E *E +E
E E *E
E (E)
E°d
I, = GOTO , )
E> E * *E
E > °E *E
()
= GOTO (d2, E)
E > (E)
E ° +E
E> E» *E
= GOTO E)
, E> E + E
E Ee * E
E > E° *E
GOTO , E)
I
E>E* E
E E° +
E> E° *E
I, = GOTO 6, ))
E (E)
2-24 C (CSAT-Sem-5) Basic Parsing Techniques
Action Goto
State| id E
0
S 1
S accept
2 6
3
3
Que 221.Perform shift reduce parsing for the given input strings
usingthe grammar S (L)|aL >L, S|S
i. (a,(a, a)) (a,a) AKTU 2018-19,Marks 07
Answer
S cB\ccA
A cA|a
B- ceB|b AKTU 2018-19, Marks 07
Answer
The augmented grammar is :
SS
S cB |ccA
A cA Ja
BccBb
The canonical collection of LR (0) items are
:S S
S cB|° ccA
A cA|°a
B»° ccB|° b
1, = GOTO , S)
:SS
= GOTO, c)
1,Sc*
Blc cA
Ac°A
Bc°cB
A> °cA|°a
B°ccB|*b
2-26 C (CSIT-Sem-5) Basic Parsing Techniques
1, = GOTO ( , a)
Aa
GOTO a,,b)
:Bb
I, = GOTO 1, B)
,:ScB3.
, GOTO (U, A)
AcA
1, = GOTO, c)
I,:S cc A
B>cc B
A c°A
B c cB
A cA/°a
B ccB/ b
I, = GOTO (u, A)
S ccA
A cA
1, = GOTO4, B)
:BccB
GOTO(,e)
10B ce B
A c * A
B>c°cB
BecB| b
A cA| ° a
GOTO 0.A)
AcA
DFA for set of items:
a
A
Fig. 2.22.1.
2-27 C (CS/IT-Sem-5)
Compiler Design
Let us numbered the production rules in the grammar a8
S cB
2. S-ccA
3. CA
A-
B ccB
B b
Action GOTO
States a s
S S S
Accept
SaS S, 6
sS, Sto 8
s s s s
SS S10
LPART-6
Constructing Canonical LR Parsing Tables.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Algorithm for construetion of canonical LR parsing table :
4The initial stateof parser is the one constructed from the set containing
items (S" *S, $].
Ifthe parsing action funetion has no multiple entries then grammar is
said to be LR(1) or LR.
PART7
Constructing LALR Parsing Tables Using Ambiguous Grammars, An
Automatic Parser Generator, Implementation of LR Parsing Tables.
Questions-Answers
Answer
The algorithm for construction of LALR parsing table is as
Input: An augmented grammar G
Output: The LALR parsing table function ACTION and GOTO for G'
Compiler Design 2-29 C (CS/IT-Sem-5)
Method:
1. Construct C =W , . , I,) the collection ofsets of LR (1) items
2. For each core present among the LR (1) items, find all sets having that
core, and replace these sets by their
union
3. Let C = Wo» J . be the resulting sets of LR (1) items. The
parsing actions for state i are constructed from,, If there is a parsing
action conflicts, the algorithms fails to produce a parser and the
(1).
grammar is said not
to be LALR
4 The goto table construeted as follows. If S is the union of one or more
sets of LR(1) items, i.e., J = 1 U l 2 u l . . U t h e n the cores of the
GOTO ( , X), GOTO 2, X),., GOTO , X ) are the same. Since
I all have the same core. Let k be the union of all sets of the
items having the same core as GOTO ( , X). Then GOTO (J, X) = k.
The table produced by this algorithm is called LALR parsing tabile for
grammar G. If there are no parsing action conflicts, then the given
grammar is said to be LALR1) grammar.
The collection of sets of items constructed in step 3 of this algorithm is
called LALR(1) collections.
Augmented grammar
S'S
SaAd|bBd| aBe|bAe
A f
Bf
Canonical collection of LR(1) grammar:
S S, $
SaAd,$
SbBd,$
SaBe,$
SbAe,$
A f , dle
B f , dle
2-30 C (CSAT-Sem-5) Basic Parsing Techniques
1,: = GOToM,, S)
S' S,$
GOTO,a)
S> a°Ad,$
SaBe,$
Af,d
B f, e
1:= GOTOd,, b)
SbBd,$
SbAe,
A,d
Bf.e
GOTOu, A)
S aA°d, $
s:= GOTOu,, B)
S aßed, $
I:= GOTOI,
A f,d
B»f,e
:= GOTO,. B)
S>bBed, $
I : = GOTOU,, A)
bA°e,
I:= GOTO1,.
Af,d
B f,e
10GOTOu,d)
S aAd,$
GOTOI, d)
SaBd,$
I12= GOTOu, d)
S bBd ,$
13= GOTOd,e)
S bAe,s
Compiler Design 2-31 C (CS/IT-Sem-5)
de f B
Accept
11
12
13
SAa
SbAc
SBc
SbBa
2-32 C (CSAT-Sem-5) Basic Parsing Techniques
A d
B-d
Canonical collection of sets of LR(O) items for grammar are as follows:
:S,$
SAa,$
SbAc,$
SBc,$
SbBa,$
A*d, a
B °d, c
I, GOTO , . S)
1,:S>S ,$
= GOTO ,.A)
I2:S> A*a, $
=GOTO 4,b)
1,:S> b°Ac, $
SbBa,$
A d, c
B°d, a
GOT , B)
SBc, $
= GOTO, d)
IsAd,a
B>do, c
I GOTO,a)
s : S > Aa,$
GOTO , A)
1, =
,:SbA*c, $
GOTO u, B)
:S bBea, $
d)
1, = GOTO ,
Compiler Design 2-33 C (CIT-Sem-5)
g:A d*, c
Bd,a
10 GOTO (1,. c)
10SBc,$
GOTo
I (,, c)
S bAc°, $
Table 2.26.1.
10
5 5
9 6
10
11
12
Since the table does not have any conflict. So, it is LR(1).
For LALR(1) table, item set 5 and item set 9 are same. Thus we merge
both the item sets (,1g)= item set 54. Now, the resultant parsing table
becomes
2-34 C (CS/AT-Sem-5) Basic Parsing Techniques
Table 2.26.2.
a d sAB
accept
S10
59 s96
S
S2
10
11
12
SAA
A aA |b
2-35 C (CS/IT-Sem-5)
Compiler Design
The LR (1) items will be:
S S, $
S AA,$
A aA, alb
A b, alb
1, GOTO , S)
SS, $
I = GOTO( . A)
iSAA, $
A aA, $
A b, $
= GOTO , a)
g: AaA, alb
AaA,alb
Ab, a/b
I, = GOTO . b)
A b, a/b
1, = GOTO , , A)
I5S AA.S
I, = GOTO (u, a)
gAaeA, $
A aA,$
A b, $
, =GOTO , b)
A bo, $
= GOTO 1,, A)
gAaA»,alb
I GOTO, A)
gAaA , $
2-36 C(CS/AT-Sem-5) Basic Parsing Techniques
Table 2. .27.1.
Action Goto
State
S 2
accep
6 9
S
3
8
9
Since table does not contain any conflict. So it is LR(1).
The goto table will be for LALR I, and I, will be unioned, I, and I, will
be unioned, and , and I, will be unioned.
Table 2.27.2.
Action Goto
State A
0 35 Sq7
accept
36 B9
47
47 3
89
2-37C (CS/IT-Sem-5)
Compiler Design
contain any conflict. So, it is also LALR 1).
Since, LALR table does not
DFA
(1o
A
Fig. 2.27.1.
example.
Q.2. Explain operator precedence parsing with
ABE Refer Q. 2.4.
parsing?
What are the problems with top-down
Q.3.
Refer Q. 2.7.
Ans
and left recursion
understand by left factoring
Q4. What do you eliminated ?
and how it is
ABE Refer Q. 2.8.
AB Refer Q. 2.12.
3
UNIT
Syntax-Directed
Translations
CONTENTS
Part-1 : Syntax-Directed Translation 3-2C to 3-5C
Syntax-Directed Translation Scheme,
Implementation of Syntax-Directed
Translators
Expressions
3-1C (CS/TT-Sem-5)
Syntax-Directed Translations
3-2C (CSIT-Sem-5)
PART 1
Syntax-Directed Translation Syntax-Directed Translation
Schemes, Implementation of Syntax-Directed Translators.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Construet
syntax directed translation.
an
Que 3.1.Define
annotated parse tree for the expression (4 * 7 1) * 2, using the
is
with a set form
of semantic rules of the a:=flb,,6 . b,), where a
an attribute obtained from the function f.
2. Syntax directed translation is a kind of abstract specification.
3. It is done for static analysis of the language.
grammar.
subsets called
5. The directed translation is partitioned into two
syntax
the synthesized and inherited attributes of grammar.
Lexical analysis
Token stream
Syntaxanalysis
Parse tree
Semantic analysis
Dependency graph
Syntax directed
translation Evaluation order
for semantic rules
Translation of constructs
Fig. 3.1.1
Compiler Design 3-3 C (CS/IT-Sem-5)
(E)
E.val 29
=
(F) TF.val =2
EE.val 29 id id.lexval = 2
T.val = 28(T).
Fval 1
T.val 4T)
EF.val = 7
d id
id.lexval =4 id.lexval = 7
Fig. 3.1.2.
Answer
Syntax directed translation : Refer Q. 3.1, Page 3-2C, Unit-3.
Semantic actions are attached with every node of annotated parse tree.
Example: A parse tree along with the values of the attributes at nodes
(called an annotated parse tree") for an expression 2 +3"5 with synthesized
attributes is shown in the Fig. 3.2.1.
(E E.val=17
E.val-2 E
(T)T.val=15
digit dignt
digit.lexval=2 digit.lexval=3
Fig. 3.2.1.
3-4 C (CS/IT-Sem-5) Syntax-Directed Translations
Answer
Attributes:
1. Attributes associated information with language construct by
are
E.val-2 (E)
T)Tval-15
Inherited attribute
1. An inherited attribute is one whose value at a node in a parse tree 1s
defined in terms of attributes at the parent and/or sibling ofthat node.
Inherited attributes are convenient for expressing the dependence ofa
programming language construct.
For example: Syntax directed definitions that uses inherited attribute
aregiven as
D TL L.type = T:lype
T-
T
nt
real
Ttype: = integer
T.type: = real
L L, id L.type: =L.type
enter (td.prt, L.type)
L id enter (id.prt, L.type)
The parse tree, along with the attribute values at the parse tree nodes, for an
input string int id,, id, and id, is shown in the Fig. 3.3.2.
T.type=int T
(L)Ltypeint
int)
L.type=intL)
PART-2
Intermediate Code, Postfix Notation, Parse Trees and Syntax Trees.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Intermediate code generation is the fourth phase of compiler which takes
parse tree as an input from semantic phase and generates an intermediate
code as output.
The benefits of intermediate code are:
1. Intermediate code is machine independent, which makes it easy to
retarget the compiler to generate code for newer and diferent processors.
2. Intermediate code is nearer to the target machine as compared to the
source language so it is easier to generate the object code.
3. The intermediate code allows the machine independent optimization of
the code by using specialized techniques.
4. Syntax directed translation implements the intermediate code
generation, thus by augmenting the parser, it can be folded into the
parsing
Que 3.6. What is postfix translation ? Explain it with suitable
example.
Answer
Postfix (reverse polish) translation: It is the type of translation in which
the operator symbol is placed after its two operands.
For example: 12)
Consider the expression: (20+(-5)* 6 +
Postfix for above expression can be calculate as
= 5-
(20 + 6+ 12)
20+t + 12 t6
t+ 12 34 20t2
12 +
is only
tree. Why parse tree construction
Que 3.7. Define parse
possible for CFG?
Answer
Parse tree : A parse tree is an ordered tree in which left hand side of a
Answer
1. Asyntax tree is a tree that shows the syntactic structure of a program
tree.
while omitting irrelevant details present in a parse
2. Syntax tree is condensed form of the parse tree.
2.
fields
n the node for an operator, one field identifies the operator and the
remaining field contains pointer to the nodes for the operands.
the identifier.
c. Mkleaf(num, val) :It ereates a number node with label num and
a field containing val, the value of the number
For example: Construet a syntax tree for an expressiona - 4 +c. n
this sequence. P1 P q . Pare pointers to nodes, and entry a and
entry c are pointers to the symbol table entries for identifier 'a' and c
respectively.
P1 mkleaf (id, entry a);
P2 mkleaf (num, 4);
P3 mknode(P1 Pg);
P4 mkleaf (id, entry c;
Ps mknode(+PaP
The tree is constructed in bottom-up fashion. The function calls mkleaf
(id, entry a) and mkleaf (num, 4) construct the leaves for a and 4. The
pointers to these nodes are saved using Pi and pz. Call mknode
( P P,) then constructs the interior node with the leaves for a and 4
as children. The syntax tree will be:
id
to entry for c
id num 4
to entry for a
Fig. 3.8.1. The syntax tree for a-4+c
Answer
*
(b c)- d/2
Syntax tree for given expression :a +
Fig. 3.9.1.
Compiler Design 3-9 C (CS/IT-Sem-5)
a , -d/2)
d2/
,
Put value of t,. , . ,
= t , d 2/-
at, "d 2/-
=
abc +
* d2/-
L PART-3
Three Address Code, Quadruples and Triples.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
1 Three address code is an abstract form of intermediate code that can be
implemented as a record with the address fields.
2. The general form of three address code representation is
a b op c
where a, b and c are operands that can be names, constants and op
represents the operator.
3. The operator can be fixed or floating point arithmetic operator or logical
operators or boolean valued data. Only single operation at right side of
the expression is allowed at a time.
4 There are at most three addresses are allowed (two for operands and
one for result). Hence, the name of this representation is three address
code.
For exanmple: The three address code for the expression a =b+c +d
will be :
= b+c
+d
a:
Here t, and
t, are the temporary names generated by the compiler.
3-10C (CAT-Sem-5) Syntax-Directed Translations
Answer
Different ways to write three address code are:
1. Quadruple representation
The quadruple is a structure with at most four fields such as op,
argl, arg, result.
b. The op field is used to represent the internal code for operator, the
argl and arg2 represent the two operands used and result field is
used to store the result of an expression.
For example: Consider the input statement x =-a *b+-a *b
The three address code is
(2) uminus a
(3)
(4) (1) 3
5)
3. IfA<= D goto 6
4 = A +2
5. A =1
6. , C+1
7. C=t
for the
Que 3.12.| Write the quadruples, triple and indirect triple
following expression
(r+y) (y+2)+ (x+y+z)
(2) 2
(3)
(4)
(5)
3-12C (CS/AT-Sem-5) Syntax-Directed Translations
case 1:x=X+1|
case 2:y =y +2
case 3: z =z+3
default: C=C-1
101:, a
102 goto ll5
+b goto 103
103: t= 1 goto 105
104 goto 107
105:12=x+l
106: x l2 =
= base -84
12 xt2
PART-44
Translation of Assignment Statements.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Que 3.16. How would you convert the following into intermediate
code ? Give a suitable example.
i. Assignment statements
EE) E. place:E,-place
E id id_entry: =look_up(id.name);
if id_entry + nil then
append (id _entry ":= E.place)
else error;/" id not declared7
1. The look_ up returns the entry for id.name in the symbol table if it exists
there.
goto
last
switch expression
Example
switchch)
case 1:C = a + b;
break;
case 2: c =a - b;
break;
if ch = 1 goto L,
if ch = 2 goto L,
a +b
ast
goto
: = a-b
goto last
last
Que 3.16. Write down the translation procedure for control
statement and switch statement. AKTU 2018-19, Marks 07
Answer
L Boolean expression are used along with if-then, if-then-else,
while-do, do-while statement constructs.
2. S I f E then S1 | IfE then S1 else S2 while E do S1 |do E1 whileE.
3. All these statements 'E' correspond to a boolean expression evaluation.
4 This expression E should be converted to three address code.
5. This is then integrated in the context of control statement.
Translation procedure for if-then and if-then-else statement:
1. Consider a grammar for if-else
S i fE then S, | if E then S, else S
2. Syntax directed translation scheme for if-then is given as follows :
SifEthen S,
B.true = new_label()
B.false:= S.next
S,next:= S.next
Compiler Design 3-17C (Cs/IT-Sem-5)
5. The S.code is the important rule which ultimately generates the three
address code.
S i fE then S, else S,
E.true = new_label)
E.false = new_label(0
S, next := S.next
S,.next := S.next
ifa<b E.true
true: S
E. false: a = a+7
S1.next:=S.begin
S.codeE.code
=genlS.begin :9||
1|
S.code|| E.code ||
gen E.true :)||
gen gotoS.being)
PART5
Boolean Expressions, Statements that alter the Flow of Control.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
1. Backpatching is the activity of filling up unspecified information of labels
using appropriate semantic actions during the code generation process.
2. Backpatching refers to the process of resolving forward branches that
have been used in the code, when the value of the target becomes
known.
3. Backpatching is done to overcome the problem of processing the
incomplete information in one pass.
Backpatching can be used to generate code for boolean expressions and
flow of control statements in one pass.
To generate code using backpatching following functions are used :
1. Makelist(i) : Makelist is a function which creates a new list from one
item wherei is an index into the array ofinstruetions.
2 Merge(p, P,): Merge is a function which concatenates the lists pointed
by p, and p2, and returns a pointer to the concatenated list.
Compiler Design 3-19C (CS/IT-Sem-5)
3
Backpatch(p, i): Inserts i as the target label for each
of the instructions
on the list pointed by p.
Backpatching in boolean
expressions:
. T h e solution is to generate a sequence of branching statements where
the addresses of the jumps are
temporarily left unspecified.
2. For each boolean expression E maintain two lists:
we
6. nextinstr is a global variable that stores the number of the next staternient
to be generated.
B,truelist);
B.falselist =B, falselist;}
i B-B,AND MB,
backpatch (B,truelist, M.instr);
B.truelist= B,truelist;
B.falselist = merge (B,.falselist,
B, falselist);}
i. B-B, IB.truelist =B,.falselist;
B.falselist =B, truelist;)
3-20 C (CSMT-Sem-5) Syntax-Directed Translations
iv.
B(B,NB.truelist =
B,.truelist;
B.falselist = B,.falselist;
append (goto_);}
vi. B>falselB.falselist =makelist (nextinstr);
append goto_;}
vii. M> e{M.instr =
nextinstr;}
Three address code:
100: ifP < Q goto
101 goto 102
102: ifR <S goto 104
103 goto
104: ifT< U goto_
105 goto
Answer
Translation scheme for boolean expression can be understand by following
example.
Consider the boolean expression generated by the following grammar
E E OR E
EE AND E
E» NOTE
E (E)
Eid relop id
E> TRUE
E FALSE
Here the relop is denoted by s, 2,, <,>. The OR and AND are left associate.
EE,ORE,
E.place:= newtemp)
append(B.place ="E, place OR' E,place)
EE, AND E,
B.place:= newtemp()
append(E.place '=" E,place 'AND' E, place)
ENOTE
E.place = newtempí)
append(.place ":="NOT E, place)
E(E)
E.place= E, place
E id, relop 1d,
E.place= newtemp(
append(if id,place relop.op id, place 'goto'
nextstate + 3);
append(E.place:="0);
append'goto' next state +2);
append(E.place = "1')
ETRUE
E.place= newtemp();
append(E.place ="1'
EFALSE
E.place := newtempl)
append(B.place'="'0)
PART-6
Postfix Translation: Array References in Arithmetic Expressions.
Questions-Answers
Long Answer Type and Mecium Answer Type Questions
3-22 C (CS/IT-Sem-5)
Syntax-Directed Translations
Answer
1. In a production A a , the translation rule
of A.CODE consists of the
concatenation of the CODE translations of the non-terminals in a in the
same order as the
non-terminals appear in a.
Production can be factored to achieve postfix form.
Postfix translation of
while statement:
Production:S while M1E do M2 S1
Can be
factored as
1.SCS1
2 C WE do
3. Wwhile
A suitable transition scheme is given as
CWE do CWE do
SCS1 BACKPATCH (S1.NEXT,
C.QUAD)
S.NEXT= C.FALSE
GEN (goto C.QUAD)
B,.code = T.code
F.code = id.code
where '||'sign is used for coneatenation.
PART-7
Procedures Call.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Ifbegins each array element is 'w' then the ith element of array A
widthinoflocation,
base + (i - low) * d
where low is the lower bound on the subseript and base is the relative
address of the storage allocated for an array i.e., base is the relative
address of Allow].
3. A two dimensional array is normally stored in one of two forms, either
row-major (row by row) or column-major (column by column).
The Fig. 3.22.1 for row-major and column-major are given as
where low, and low, are lower bounds on the values of i and i, and n
is the number of values that i, can take.
6. That is, if high, is the upper bound on the value ofi, then n, = Ihigh, -
low2
7. Assuming that i, and ig are only values that are not known at compile
time, we can rewrite above expression as:
PART-8
Declarations Statements.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
In the declarative statements the data items along with their data types are
declared.
For example:
SD offset:= 0
D> id: T lenter_tablid.name, T.type,offset);
offset:= offset + T.width)}
T> integer T.type:=integer;
T.width:= 8
1. Initially, the value of offset is set to zero. The computation of offset can
be done by the formula offset
using offset + width.
=
Q.4. What is syntax tree? What are the rules to construct syntax
tree for an expression ?
Ans. Refer Q. 3.8.
CONTENTS
Part-1 :
Symbol Tables: ****************************************** Z to 4-7C
Data Structure for
Symbol T'ables
Part-5
Error Detection and Recovery.
Lexical Phase Errors
4-16C
to 4-21C
Syntactic Phase Errors
Semantic Errors
4-1C (CSIT-Sem-5)
4-2 C (CS/IT-Sem-5)
Symbol Tables
Symbol
PART-1
Tables: Data Structure for Symbol Tables.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
1. A symbol table is a data structure used by a compiler to keep track of
scope, life and binding information about names.
2. These information are used in the source program to identify the various
program elements, like variables, constants, procedures, and the labels
of statements.
3. Asymbol table must have the
a.
following capabilities
Lookup: To determine whether a given name is in the table.
b. Insert: To add a new name (a new entry) to the table.
c. Access: To access the information related with the given name.
d Modify: To add new information about a known name.
Delete To delete
e
: a name or group of names from the table
Que 4.2.What are the symbol table requirements ? What are the
demerits in the uniform structure of symbol table ?
Answer
The basic requirements of a symbol table are as follows:
L Struetural flexibility : Based on the usage of identifier, the symbol
table entries must contain all the necessary information.
2
Fast lookup/search: The table lookup/search depends on the
implementation of the symbol table and the speed of the search should
be as tast as possible.
Efficient utilization of space: The symbol table must be able to
gTOW or shrink dynamicaly for an efficient usage of space.
4
Ability to
handle languagecharacteristics: The characteristie of
a language such as scoping and implicit declaration needs to be handled.
int a, b, C; level 1 */
intta, b;
level2 */
float c, d; level 3 */
level 4 */
ii. After the first three declarations, the symbol table will be
b int
aint
int
b nt
int
iv. As the control come out from Level 2.
int
int
int
v.When control will enter into Level 3.
d
float
float
int
int
|d float
C float
ne
nt
Jint
vi. On leaving the control from Level 4.
float
float
c
int
int
int
vii. On leaving the control from Level 3.
C int
b int
ix.
| ntj
On leaving the function entirely, the symbol table will be again empty.
different data
Que 44. What is the role of symbol table ? Discuss
structures used for symbol table.
OR
Discuss the various data structures used for symbol table with
suitable example.
Compiler Design 4-5 C (CS/AT-Sem-5)
Answer
Role of symbol table:
It keeps the track of semantics of variables
2. It stores information about
3. It helps to achieve
scope
compile
time efficiency.
Different data structures used in implementing
symbol table are:
1. Unordered list:
a.
Simple to implement symbol table.
b. It is implemented as array or a linked list.
an
d. Insertion of variable take O(1) time , but lookup is slow for large
tables i.e., Ofn).
2 Ordered list:
a If an array is sorted, it can be searched using binary search in
Olog , n).
msg
1 Unordered list
Symbol Tables
4-6 C (CS/TT-Sem-5)
3 int
2. Ordered list:
Id Name Type Id
Id1 int Id
Id2 int Id2
Id3 mg Id3
function
3. Search tree
4 Hash table :
Name
Data 1
Linkl
Name Name2
Data2
Link2
Hash table Name3
Data3
Link3
Storage table
b. Unlike variables
stored for constants.
or
procedures, no runtime location needs to be
C. These are
typically placed right into the code stream by the compiler
at compilation time.
3 Types (user defined):
a. A user defined type is combination of one
b.
or more
existing types.
Types are acessed by name and reference a type definition
structure.
4. Classes:
a. Classes are abstract data types which restrict access to its members
and provide convenient language level polymorphism.
b. This includes the location of the default
constructor and destructor,
and the address of the virtual function table.
5. Records:
a. Records represent a collection of
possibly heterogeneous members
which can be accessed by name.
b. The symbol table probably needs to record each of the record's
members.
Various
Unit4.
data structure used for symbol table: Refer Q4.4, Page 4-4C,
PART-2
Representing Scope Information.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
symbol table.
Answer
1. Scope information characterizes the declaration of identifiers and the
portions of the program where it is allowed to use each identifier.
2. Different languages have different scopes for declarations. For example,
in FORTRAN, the scope of a name is a single subroutine, whereas in
4-8C (CSMT-Sem-5) Symbol Tables
int y *"1
Scope of variable y
*********
Scope of argument n
Scope of labels:
void jumper (
goto Sim;
goto sim,
****************"
Answer
1. Scoping is method of keeping variables in different parts of program
distinct from one another.
2. Scoping is generally divided into two classes:
a. Static scoping:Static scoping is also called lexical scoping. In this
scoping a variable always refers to its top level environment.
b. Dynamic scoping: In dynamic scoping, a
global identifier refers
to the identifier associated with the most recent environment.
Answer
C
statie link
static link-
static li
Calls
A calls E
E calls B static
Bcalls D
Dcalls C
PART-3
Run-Time Administration: Implementation of Simple Stack
Allocation Scheme
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Actual parameters
Control link
Access link
Saved machine status
Local data
Temporaries
Fields of activation record are:
1. Return value : It is used by calling procedure to return a value to
calling procedure.
2 Actual parameter : It is used by calling procedures to supply
parameters to the called procedures.
3 Control link: It points to activation record of the caller.
4 Access link: It is used to refer to non-local data held in other activation
records.
Code
Statie
1eap
Free Memory
LStack
Fig. 4.11.1.
Symbol Tables
4-12 C (CSIT-Sem-5)
size and do
1. Code: It stores the executable target code which is of fixed
not change during compilation.
2 Static allocation:
a The static allocation is for all the data objects at compile time
compile time.
b. The size of the data objects is known at
The names of these objects are bound to storage at compile time
done by static allocation.
only and such an alocation of data objects is
d. In static allocation, the compiler can determine amount of storage
it becomes easy for a
required by each data object. Therefore,
data in the activation record.
compiler to find the address of these
At compile time, compiler can fill the addresses at which the target
code can find the data on which it operates.
3 Heap allocation: There are two methods used for heap management
When all access path to a object are destroyed but data object
continue to exist, such type of objects are said to be garbaged.
i The garbage
that object space.
ii. Ingarbage collection, all the elements whose garbage collection
and returned to the free space list.
bit is 'on' are garbaged
b. Reference counter:
For example:
main (
int n1=10;n2=20;
printf"n1: %d, n2: %d\n", nl, n2);
Swapln1,n2);
int t;
t=C,
c=d;
d=t;
printfl"nl: %d, n2:%d\n", nl, n2);
Output: 10 20
20 10
20 10
i. Call by reference:
(address) of actual arguments is
1 In call by reference, the location
to formal arguments of the called function. This means by
passed
actual arguments we can alter them
accessing the addresses of
within the called function.
is possible within
In call by reference, alteration to actual arguments
called function; therefore the code must handle arguments carefully
results.
else we get unexpected
Forexample:
#include <stdio.h>
swapByReferencel&n1, &n2)
printf"nl:d, n2: %d\n", ni, n2);
int t;
t= a; "a =
"b; "b =Ft:
Output:nl:20, n2: 10
PART4
Storage Allocation in Block Struetured Language.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
1.
Hashing is important
an
table. This method is
technique used to search the records of symbol
superior to list organization.
2. In hashing scheme, a hash table and symbol table are maintained.
The hash table consists ofk entries from 0, 1 to k- 1.These entries are
basically pointers to symbol table pointing to the names of symbol table.
To determine whether the 'Name' is in symbol table, we used a hash
function "h such that h{name) will result any integer between 0 to
k-1.
We can search any name by position =h(name).
5. Using this position, we can obtain the exact locations of name in symbol
table.
4-16C (CSIT-Sem-5) Symbol Tables
6. The hash table and symbol table are shown in Fig. 4.14.1
Symbol table
hash table Name Info hash link
Sum Sum
avg
avg
Fig. 4.14.1.
PART-5
Eror Detection and Recovery: Lexical Phase Errors, Syntactic
Phase Errors, Semantic Errors.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Error recovery: Error recovery is an
important feature
of any compiler,
through which compiler can read and execute the complete program even it
have some errors.
Compiler Design 4-17C (CS/AT-Sem-5)
Properties of error message are follows: as
4 There should be no
duplicacy of error messages, i.e.,
not be reported again and same error should
again.
Goals of error
handling are as follows
1. Detect the
presence of errors and produce "meaningful" diagnostics.
2. To recover
quickly enough to be able to detect
3 Error
subsequent errors.
handling components should not significantly slow down the
compilaiion of syntactically correct programs.
Que 4.16.| What are lexical phase errors, syntactie phase errors
and semantie phase errors ? Explain with suitable example.
AKTU 2015-16, Marks 10
Answer
1. Lexical phase error:
a. Alexical phase error is a
the
sequence of character thatdoes not match
pattern of token i.e., while
scanning the source program, the
compiler may not generate a valid token from the source
program.
b. Reasons due to which errors are found in lexical phase are
For example:
i In Fortran, an identifier with more than 7 characters long is a
lexical error.
ii. In Pascal program, the character -, & and @ if occurred is a
lexical error.
2 Syntactie phase errors (syntax error):
a Syntactic errors are those errors which occur due to the mistake
done by the programmer during coding process.
Symbol Tables
4-18C (CSAT-Sem-5)
b. Reasons due t o which errors are found in syntactic phase are
i. Missing of semicolon
int x;
4 Logical errors :
a Logical errors are the logical mistakes founded in the program
which is not handled by the compiler.
b. In these types of errors, program is syntactically correct but does
not operate as desired.
For example:
Let consider following piece of code :
4
y=5
average =x + y/2;
The given code do not give the average ofr and y because BODMAS
property is not used properly.
OR
Explain logical phase error and syntactie phase error. Also suggest
methods for recovery of error.
AKTU 2017-18, Marke 10
Answer
Lexical and syntactic error: Refer Q. 4. 16, Page 4-17C, Unit-4.
Various error recovery methods are:
1.
Panic mode recovery:
a. This is the simplest method to implement and used by most of the
parsing methods.
For example:
Let consider a piece of code
a =b+C;
d=e +f
By using panic mode it skipsa =b+cwithout checking the error in
the code.
2 Phrase-level recovery :
For example:
Let consider a piece of code
E E-E| *A|/A
E
When error production encounters"A, it sends an error message
or not.
to the user asking to use
* as unary
Global correction:
a. Global correction is a theoretical concept.
in operator
Que 4.18.|Explain in detail the error recovery process
precedence parsing method. AKTU 2018-19,Marks 07
Answer
Error recovery in operator precedence parsing:
1. There are two points in the parsing process at which an operator
error :
precedence parser can discover syntactic
If no precedence relation holds between the terminal on top of the
stack and the current input.
b. If a handle has been found, but there is no production with this
a. Missing operand
b. Missing operator
C. No expression between parenthesess
d These error diagnostic issued at handling of errors during reduction.
a. Missing operand
b. Unbalanced right parenthesis
c. Missing right parenthesis
d Missing operators
Q.1. What are the symbol table requirements ? What are the
in the uniform structure of symbol table ?
Ans
demerits
Refer Q. 4.2.
93. Deseribe symbol table and its entries. Also, discuss various
data structure used for symbol table.
Ans Refer Q. 4.5.
98. What are lexical phase errors, syntactie phase errors and
semantic phase errors ? Explain with suitable example.
An Refer Q 4.16.
5 UNIT
Code Generation
CONTENTS
Part-1 Code Generation 5-2C to 5-3C
***********************
.
Design Issues
5-1 C (CSIT-Sem-5)
5-2C (CS/AT-Sem-5) Code Generation
PART-1
Code Generation: Design Issues.
Questions-Answers
Answer
Code generation is the final phase of compiler.
It takes as input the Intermediate Representation (IR) produced by the
front end of the compiler, along with relevant symbol table information,
and produces as output a semantically equivalent target program as
shown in Fig. 5.1.1.
representations.
3. Instruction selection:
a. The code generator must map the IR program into a code sequence
b.
If the IR is high level, the code generator may translate each IR
statement into a sequence of machine instructions using code
templates.
4 Register allocation :
a. A key problem in code generation is deciding what values to hold
in which registers on the target machine do not have enough
space to hold all values.
Values that are not held in registers need to reside in memory.
Instructions involving register operands are invariably shorter
and faster than those involving operands in memory, so efficient
utilization of registers is particularly important.
C.The use of registers is often subdivided into two subproblems:
i
Register allocation, during which we select the set of variables
that will reside in registersat each in the
point program.
i Register assignment, during which we pick the specific register
that a variable will reside in.
5. Evaluation order:
aThe order in which computations are performed can affect the
efficiency of the target code.
b. Some computation orders require fewer registers to hold
intermediate results than others.
PART-2
The Target Language, Address in Target Code.
QuestionsAnswers
Long Answer Type and Medium Answer Type Questions
PART-3
Basic Blocks and Flow Graphs, Optimization of Basic
Blocks, Code Generator.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Output:A list of basic blocks with each three address statements in exactly
one block.
Method:
1. We first determine the set of leaders, the first statement of basic block.
The rules we use are given as
a The first statement is a leader.
b. Any statement which is the target of a conditional or unconditional
goto is a leader.
c.Any statement which immediatelyfollows a conditional goto is a
leader.
2. For each leader construct its basic block, which consist of leader and all
statements up to the end of program but not including the next leader.
Any statement not placed in the block can never be executed and may
now be removed, if desired.
Compiler Design 5-5 C (CS/IT-Sem-5)
Answer
1. A tlow graph isa directed graph in which the flow control information is
added to the basie blocks.
2. The nodes to the flow graph are represented by basic blocks.
3. The block whose leader is the first statement is called initial blocks.
4. There is a directed edge from block B,, to block B, if B, immediately
follows B,-, in the given sequence. We can say that B, is a predecessor
of B.
For example: Consider the three address code as :
1. prod:=0 2. i =1
3. 4i 4.
=alt, computation of alil
5. 4*i 6. t:=blt computationof blil /
7. t , " t, 8. t= prod +
9. prod :=t 10. 1, =i + 1
11. i = t 12. ifi 10 goto
<=
(3)9
The flow graph for the given code can be drawn as follows:
4i
alt
t 4°1
t. bltg
prod + t5
6
prod=ts
i+1
10
if i
goto (3)
graph.
5-6 C (CS/AT-Sem-5) Code Generation
Answer
1. Dominators :
a. In control flow graphs, a node d dominates a noden if every path
from the entry node to n must go through d. This is denoted as d
dom n.
b. By definition, every node dominates itself.
c. A node d strietly dominates a node n if d dominates n and d is not
equal to n.
d The immediate dominator (or idom) of a node n is the unique node
that strictly dominates n but does not strictly dominate any other
node that strictly dominates n. Every node, except the entry node,
has an immediate dominator.
e. A dominator tree is a tree where each node's children are those
2. Natural loops:
The natural loop be defined by a back edge n > d such that
a there exists a
can
C. These edges are called back edges and for a loop there can be
more than one back edge.
head and p is a tail and head dominates
Ifthere isp~q then q is
tail.
a
3. Pre-header:
a. The pre-header is a new block created such that successor of this
block is the header block.
b. All the computations that can be made before the header block
can be made before the pre-header block.
Compiler Design 5-7C (CS/T-Sem-5)
Pre-header
header
L header
o Bo
Answer
Different issues in code optimization are:
1. Function preserving transformation: The function
transformations are basically divided into following types: preserving
a. Common sub-expression elimination:
i A common
is already
sub-expression
is nothing but the
expression which
computed and the same expression is used
and again in the again
program.
ii. If the result of the
expression not changed then we eliminate
computation of same
expression again and again.
5-8C (CS/MT-Sem-5) Code Generation
For example:
Before common sub-expression elimination:
a=t'4-b+c;
* ****************
* * * * * ***********
m=t 4-b ++ C;
***************
********************
n= t
4-b+ C
After common sub-expression elimination
temp =t * 4 - b + c;
a = temp
***************
* **************
m= temp,
**************
************
n = temp
. In given example, the equation a =t " 4 - b + c i s occurred most
***********************
**********************
c Copy propagation:
i Copy propagation is the concept where we can copy the result
of common sub-expression and use it in the program.
i In this technique the value of variable is replaced and
computation of an expression is done at the compilation time.
Compiler Design 5-9C(CS/IT-Sem-5)
For example:
pi = 3.14;
r=
Answer
Transformation
1 A number of transformations can be applied to basic block without
changing set of expression computed by the block.
Transformation helps us in improving quality of code and act as optimizer.
There are two important classes as local transformation that can be
applied to the basic block:
a. Structure preserving transformation: They are as follows:
i. Common sub-expression elimination: Refer Q. 5.6,
Page 5-7C, Unit-5.
ii. Dead code elimination: ReferQ. 5.6, Page 5-7C, Unit-5.
iii. Interchange of statement : Suppose we have a block with
the two adjacent statements,
templ =a +b
5-10C (Cs/AT-Sem-5) Code Generation
temp2 = m +n
PART-4|
Machine Independent Optimizations, Loop Optimization.
Question5-Answers
Answer
Code optimization :
Code optimization
c. The use of intermix instructions along with the data increases the
speed of execution.
2 Machine independent: The machine independent optimization can
be achieved using following criteria:
a The code should be analyzed completely and use alternative
equivalent sequence of source code that will produce a minimum
amount of target code.
b U s e appropriate program structure in order to improve the
efficiency of target code.
C.By eliminating the unreachable code from the source program.
d Move two or more identical computations at one place and make
use of the result instead of each time computing the expressions.
A: B+C+D
E B+C+F
might be evaluated as
T,= B+C
A T, +D
E T, +F
In the given example, B +Cis stored in T, which act as local optimization
of common sub-expression.
9ue 5.10.| Explain what constitute loop in fow graph and how
a a
Answer
Following term constitute a loop in flow graph: Refer Q. 5.5,
Page 5-5C, Unit-5.
Loop optimization is a process of increasing execution time and reducing
the overhead associated with loops.
The loop optimization is carried out by following methods:
1. Code motion :
a Code motion is a technique which moves the code outside the
loop.
If some expression in the loop whose result remains unchanged
even after executing the loop for several times, then such an
expression should be placed just before the loop (i.e., outside the
loop).
C. Code motion is done to reduce the execution time of the program.
Induction variables:
For example :
ti =1 int i 1;
whileeic=100) whiletic=100)
Can be
a li)=blil; written as
alil=blil;
alil=blil:
6. Loop fusion or
loop jamming: In loop fusion method, several loops
are
merged to one
loop.
Compiler Design 5-13C (CSAT-Sem-5)
For example:
for i:=1 to n do Can be written as 1or 1:=l to n*m do
for j:=1 to m do alil-10
alijl:=10
2 T1: =4°T|
3 12:= addr{A)-4
4 T3:= T2 [T1]
5 T4:= addr{B) -4
6 T5:= T4[T1]
7. T6:= T3*T5
&
9.
PROD: =PROD+TS
l: =l+1
10. Ifl« =20 goto (3)
a. Find the basic blocks and low graph of above sequence.
b. Optimize the code sequence by applying function preserving
transformation optimization technique.
PROD = 0
1=1
T 4I
T2 addnA) 4 -
Ta T, IT
TaddrB)- 4
Ts T IT,l
Te = T3 *Ts
PROD =PROD +TsB2
I =I+1
IfI«= 20 goto Ba
Fig. 5.11.1.
b. Function preserving transformation :
L. Common sub-expression elimination: No any block has any sub
Loop
L optimization
Code motion:In block B, we can see that value of T, and T, is calculated
these two instructions
every time when loop is executed. So, we can move
shown in Fig. 5.11.2.
outside the loop and put in block B, as
PROD 0
I= 1
T2= addrA) -4 B
T addrB) - 4
T=4I
Ta T, IT
Ts T4 lT
T6 T3Ts
PROD = PROD + T6| P2
I = I+1
IfI «= 20 goto B2
Fig. 5.11.2.
Compiler Desigm 6-15 C(CSIT-Sem-5)
2 Induction variable :A variable I and T, are called an induction variable
of loop L because every time the variable / change the value of 7, is also
change. To remove these variables we use other method that is called
reduction in strength.
PROD = 0
T 4I
T2 addr(A) - 4 B
Taddr(B)-4
T=T+4
Ta T IT
Ts=T, IT
Ts= T3 * T s
Ba
PROD =
PROD +
T6
if Ti< =
80 goto B2
Fig. 5.11.3
Que 5.12.| Write short notes on the following with the help of
example
i Loop unrolling
ii. Loop jamming8
iii. Dominators
iv. Viable prefix
AKTU 2018-19, Marks 07
Answer
i. Loop unrolling : Refer Q.
5.10, Page 5-11C, Unit-5:
ii. Loop jamming: Refer
Q. 5.10, Page 5-11C, Unit-5.
ii. Dominators: Refer Q. 5.5,
Page 5-5C, Unit-5.
For example: In the flow graph,
5-16C (CSMT-Sem-5) Code Generation
STACK INPUT
$
z3
z*3
3
$AX3
AS we see, x,7¥, Will never appear on the stack. So, it is not a viable
prefix.
PART-5s
DAG Representation of Basic Blocks.
Compiler Design 5-17C (CSMT-Sem-5)
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
DAG: Refer Q.5.13, Page 5-17C, Unit-6.
Algorithm
Input: A basic block.
Output: A DAG with label for each node (identifier).
Method:
1. Create nodes with one or two left and right children.
2. Create linked list of attached identifiers for each node.
3. Maintain all identifiers for which a node is associated.
4 Node (identifier) represents value that identifier has the current point
in DAG construction process. Symbol table store the value of node
(identifier).
For example:
Given expression: a* (b -c) + ( b - c) * d
Step 1: t=b-
Step 2 2 (b-c) * d
3 a b - c)
Step 3
4 a (b-c)+ (b-c). d
Step 4:
e =b+c
Fig.5.15.1.
The two Occurrences of sub-expressions b + c compute the same value.
2. Value computed by a and e are same.
Applications of DAG :
1. Scheduling: Directed acyclic graphs representations of partial orderings
have many applications in scheduling for systems of tasks.
2 Data processing networks : A directed acyclic graph may be used too
represent a network of processing elements.
3 Data compression :Directed acyclic graphs may also be used as a
compact representation of a collection of sequences. In this type of
application, one finds a DAG in which the paths form the sequences.
4 It helps in finding statement that can be recorded.
Answer
Directed acyclic graph: Refer Q.5.13, Page 5-17C, Unit-5.
Numerical:
Given expression: a +a (b-c)+ (b-c) * d
The construction of DAG with three address code will be as follows
Step 1: =b-c
Step 2: ta (b c ) * d
Step 3: a ( b - c)
Que 5.17. How would you represent the following equation using
DAG?
a= b°-c +b*-e AKTU2018-10, Marks 07
Compiler Design 5-21 C (CS/1T-Sem-5)
Answer
Code representation using DAG of equation:a = b* - c+b *-c
Step 1
-C
Step 2:
t3= t2+ 2
Step 3:
Step 4
t
a)
Que 5.18. Give the algorithm for the elimination oflocal and global
common sub-expressions algorithm with the help of example.
is
such thatb opc is available at the entry to B and neither b nor c
redefined in B prior to s.
back to but
b. Follow flow of control backwards in the graph passing
defines b op c. the last computation of
not through each block that
b op c in such a block reaches
s.
temp d).
d to that block (where t is
a new
statementt =
t
d Replace s
by a =
PART-6
Global Data Flow Analysis.
Value Numbers and Algebraic Laus,
Questions-Answers
OR
? How does it use in code optimization?
What is data flow analysis
Compiler Design 5-23 C (cS/IT-Sem-5)
Answer
1. Data flow analysis is a process in which the values are computed using
data flow properties.
2. In this analysis, the analysis is made on data flow.
3.
AprogTram's Control Flow Graph (CFG) is used to those
determine
of a program to which a particular value assigned to a variable might
parts
propagate.
A simple way to perform data flow analysis of programs is to set up data
flow equations for each node of the control flow graph and solve them
by repeatedly calculating the output from the input locally at each node
until the whole system stabilizes, i.e., it reaches a fix point.
5. Reaching definitions is used by data flow analysis in code optimization.
Reaching definitions:
A definition D reaches at point p if there is a path from D top along
which D is not killed.
d:
y=2 B1
42:x=y+2 B2
2. A definition D of variablex is killed when there is a redefinition
of x.
d1: y=2 B1
d2:y=y+2 | B2
d3:x=y+2 | B3
3. The definition d1 is said to a reaching definition for block B2. But the
definition dl is not a reaching definition in block B3, because it is killed
by definition d2 in block B2.
Answer
UNIVERSITY EXAMINATION.